影响力指数

90.97/100

前 0.5%

全站排名 #329

发表论文48 篇

平均评分5.3

年均产出16.0 篇/年

Weinan Zhang

Full Professor@Shanghai Jiaotong University·中国·OpenReview

研究方向

AI Agent · Reinforcement Learning

Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots

ICLR 2026Poster

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

ICLR 2026Poster

Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese

ICLR 2026Rejected

Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning

ICLR 2026Rejected

CATArena: Evaluation of LLM Agents Through Iterative Tournament Competitions

ICLR 2026Rejected

USE: Enhancing Mixed-Motive Cooperation via Unified Self and Collective Rewards

ICLR 2026Rejected

CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark

ICLR 2026Withdrawn

InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation

ICLR 2026Rejected

MARFT: Multi-Agent Reinforcement Fine-Tuning

ICLR 2026Rejected

LoopTool: Closing the Data–Training Loop for Robust LLM Tool Calls

ICLR 2026Withdrawn

RAD: Retrieval High-quality Demonstrations to Enhance Decision-making

ICLR 2026Rejected

Unified Latent Steering and Residual Refinement for Online Improvement of Diffusion Policy Models

ICLR 2026Rejected

ATGen: Adversarial Reinforcement Learning for Test Case Generation

ICLR 2026Poster

Progra: Progress-Aware Reinforcement Learning for Multi-Turn Function Calling

ICLR 2026Withdrawn

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning

ICLR 2026Rejected

Improve LLM Pre-training with RL-Guided Annealing

ICLR 2026Rejected

APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training

ICLR 2026Rejected

TRACEBench: Personalized Function Calling Benchmark Based on Real-World Human Interaction

ICLR 2026Withdrawn

Information-Theoretic Reward Decomposition for Generalizable RLHF

NeurIPS 2025Poster

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

NeurIPS 2025Poster

GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

NeurIPS 2025Poster

Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization

NeurIPS 2025Poster

Robust Function-Calling for On-Device Language Model via Function Masking

ICLR 2025Spotlight

Flexible Realignment of Language Models

NeurIPS 2025Poster

AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems

NeurIPS 2025Poster

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

NeurIPS 2025Poster

Reconstruction-Guided Policy: Enhancing Decision-Making through Agent-Wise State Consistency

ICLR 2025Poster

Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport

ICML 2025Poster

MobileUse: A Hierarchical Reflection-Driven GUI Agent for Autonomous Mobile Operation

NeurIPS 2025Poster

ContraDiff: Planning Towards High Return States via Contrastive Learning

ICLR 2025Poster

Large Language Models are Demonstration Pre-Selectors for Themselves

ICLR 2025Rejected

DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning

ICLR 2025Rejected

Large Language Models are Demonstration Pre-Selectors for Themselves

ICML 2025Poster

RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation

ICLR 2025Rejected

Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games

ICLR 2025Withdrawn

合作者 (20)

博士导师7 篇