Dongbin Zhao
~Dongbin_Zhao1
16
论文总数
8.0
年均投稿
平均评分
接收情况13/16
会议分布
ICLR
7
NeurIPS
6
ICML
2
COLM
1
发表论文 (16 篇)
202513 篇
3
Constrained Exploitability Descent: Finding Mixed-Strategy Nash Equilibrium by Offline Reinforcement Learning
ICLR 2025Rejected
3
Constrained Exploitability Descent: An Offline Reinforcement Learning Method for Finding Mixed-Strategy Nash Equilibrium
ICML 2025Poster
4
Divergence-Regularized Discounted Aggregation: Equilibrium Finding in Multiplayer Partially Observable Stochastic Games
ICLR 2025Poster
4
Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations
NeurIPS 2025Poster
5
Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning
ICLR 2025Poster
4
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
NeurIPS 2025Poster
4
INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning
ICLR 2025Poster
4
DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
ICML 2025Poster
4
SELU: Self-Learning Embodied MLLMs in Unknown Environments
ICLR 2025withdrawn
4
Learning and Planning Multi-Agent Tasks via an MoE-based World Model
NeurIPS 2025Poster
4
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
NeurIPS 2025Poster
3
Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation
ICLR 2025Poster
4
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
COLM 2025Poster
20243 篇
4
Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization
NeurIPS 2024Poster
5
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
NeurIPS 2024Poster
4
RoboGPT : An intelligent agent of making embodied long-term decisions for daily instruction tasks
ICLR 2024withdrawn