PaperHub

Yaodong Yang

~Yaodong_Yang1

38
论文总数
19.0
年均投稿
5.6
平均评分
接收情况23/38
会议分布
ICLR
24
NeurIPS
11
ICML
3

发表论文 (38 篇)

202520

4.3
3

Random Feature Models with Learnable Activation Functions

ICLR 2025Rejected
6.5
4

Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization

ICLR 2025Poster
3.7
3

Mixed Hierarchical Oracle and Multi-Agent Benchmark in Two-player Zero-sum Games

ICLR 2025withdrawn
5.3
4

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

ICLR 2025Poster
5.5
4

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

ICML 2025Poster
4.9
4

Risk-aware Direct Preference Optimization under Nested Risk Measure

ICML 2025Rejected
4.0
3

Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games

ICLR 2025withdrawn
7.0
3

Risk-aware Direct Preference Optimization under Nested Risk Measure

NeurIPS 2025Poster
4.3
4

Iterative Training of Language Models with Opponent Modeling for Red Teaming Data Generation

ICLR 2025Rejected
8.2
4

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

NeurIPS 2025Spotlight
7.0
4

In-Context Editing: Learning Knowledge from Self-Induced Distributions

ICLR 2025Poster
5.8
4

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

ICLR 2025Poster
7.6
3

Social World Model-Augmented Mechanism Design Policy Learning

NeurIPS 2025Poster
7.8
4

Generative RLHF-V: Learning Principles from Multi-modal Human Preference

NeurIPS 2025Poster
6.4
4

STAR: Efficient Preference-based Reinforcement Learning via Dual Regularization

NeurIPS 2025Poster
7.8
4

DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation

NeurIPS 2025Spotlight
6.5
4

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment

ICLR 2025Poster
5.5
4

Falcon: Fast Visuomotor Policies via Partial Denoising

ICML 2025Poster
7.3
4

Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback

NeurIPS 2025Poster
6.4
4

Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning

NeurIPS 2025Poster

202418

3.0
4

Reason to Behave: Achieving Human-Like Task Execution for Physics-Based Characters

ICLR 2024withdrawn
3.5
4

Boosting Multi-Agent Reinforcement Learning via Transition-Informed Representations

ICLR 2024Rejected
4.3
3

Measuring Value Understanding in Language Models through Discriminator-Critique Gap

ICLR 2024withdrawn
4.5
4

Planning with Theory of Mind for Few-Shot Adaptation in Sequential Social Dilemmas

ICLR 2024Rejected
3.0
4

Open-Ended Learning in General-Sum Games: The Role of Diversity in Correlated Equilibrium

ICLR 2024Rejected
4.3
4

BATTLE: Towards Behavior-oriented Adversarial Attacks against Deep Reinforcement Learning

ICLR 2024Rejected
6.5
4

SafeDreamer: Safe Reinforcement Learning with World Models

ICLR 2024Poster
6.3
3

Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning

NeurIPS 2024Poster
4.4
5

Masked Pretraining for Multi-Agent Decision Making

ICLR 2024withdrawn
4.8
4

MultiReAct: Multimodal Tools Augmented Reasoning-Acting Traces for Embodied Agent Planning

ICLR 2024Rejected
2.5
4

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models

ICLR 2024withdrawn
7.5
4

Maximum Entropy Heterogeneous-Agent Reinforcement Learning

ICLR 2024Spotlight
6.7
3

Panacea: Pareto Alignment via Preference Adaptation for LLMs

NeurIPS 2024Poster
6.3
3

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

ICLR 2024Poster
7.5
4

Safe RLHF: Safe Reinforcement Learning from Human Feedback

ICLR 2024Spotlight
3.0
4

Heterogeneous Value Alignment Evaluation for Large Language Models

ICLR 2024withdrawn
6.3
4

Aligner: Efficient Alignment by Learning to Correct

NeurIPS 2024Oral
7.3
3

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

ICLR 2024Spotlight