Weinan Zhang
~Weinan_Zhang1
30
论文总数
15.0
年均投稿
平均评分
接收情况20/30
会议分布
ICLR
15
NeurIPS
13
ICML
2
发表论文 (30 篇)
202517 篇
3
Flexible Realignment of Language Models
NeurIPS 2025Poster
4
Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport
ICML 2025Poster
4
Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization
NeurIPS 2025Poster
4
Information-Theoretic Reward Decomposition for Generalizable RLHF
NeurIPS 2025Poster
3
Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games
ICLR 2025withdrawn
4
Large Language Models are Demonstration Pre-Selectors for Themselves
ICML 2025Poster
3
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
NeurIPS 2025Poster
4
Large Language Models are Demonstration Pre-Selectors for Themselves
ICLR 2025Rejected
3
ContraDiff: Planning Towards High Return States via Contrastive Learning
ICLR 2025Poster
4
Reconstruction-Guided Policy: Enhancing Decision-Making through Agent-Wise State Consistency
ICLR 2025Poster
4
DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning
ICLR 2025Rejected
4
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills
NeurIPS 2025Poster
4
AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems
NeurIPS 2025Poster
4
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation
ICLR 2025Rejected
4
ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning
NeurIPS 2025Poster
4
MobileUse: A Hierarchical Reflection-Driven GUI Agent for Autonomous Mobile Operation
NeurIPS 2025Poster
5
Robust Function-Calling for On-Device Language Model via Function Masking
ICLR 2025Spotlight
202413 篇
4
ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update
ICLR 2024Spotlight
3
Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning
NeurIPS 2024Poster
4
Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training
NeurIPS 2024Poster
4
Reinforcing LLM Agents via Policy Optimization with Action Decomposition
NeurIPS 2024Poster
4
Parsimonious Demonstrations and Fine-Tuning for Large Language Models
ICLR 2024withdrawn
4
Alphazero-like Tree-Search can guide large language model decoding and training
ICLR 2024Rejected
4
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization
NeurIPS 2024Poster
4
Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning
ICLR 2024Rejected
4
Quantifying Zero-shot Coordination Capability with Behavior Preferring Partners
ICLR 2024Rejected
3
Multi-agent Trajectory Prediction with Scalable Diffusion Transformer
ICLR 2024withdrawn
5
MADiff: Offline Multi-agent Learning with Diffusion Models
ICLR 2024Rejected
4
MADiff: Offline Multi-agent Learning with Diffusion Models
NeurIPS 2024Poster
4
Vision-Language Foundation Models as Effective Robot Imitators
ICLR 2024Spotlight