影响力指数

87.6/100

前 0.8%

全站排名 #483

发表论文32 篇

平均评分5.3

年均产出10.7 篇/年

Zhaopeng Tu

Tech Lead@Tencent·中国·OpenReview

研究方向

Deep Learning · Natural Language Processing · Machine Translation

5.5

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

ICLR 2026Poster

5.3

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

ICLR 2026Poster

5.0

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

ICLR 2026Poster

5.0

Castle-in-the-Air: Evaluating MLLM Visual Abilities on Human Cognitive Benchmarks

ICLR 2026Rejected

4.5

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

ICLR 2026Poster

3.5

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

ICLR 2026Rejected

3.5

The Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare

ICLR 2026Withdrawn

3.3

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

ICLR 2026Withdrawn

2.5

The Hunger Game Debate: On the Emergence of Over-Competition in Multi-Agent Systems

ICLR 2026Withdrawn

8.2

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

ICLR 2025Poster

6.8

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

NeurIPS 2025Poster

6.8

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

NeurIPS 2025Poster

6.4

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

NeurIPS 2025Poster

6.3

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability

ICML 2025Poster

通讯

6.0

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

NeurIPS 2025Poster

5.8

Competing Large Language Models in Multi-Agent Gaming Environments

ICLR 2025Poster

5.3

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

ICLR 2025Rejected

通讯

4.9

Do NOT Think That Much for 2+3=? On the Overthinking of Long Reasoning Models

ICML 2025Poster

4.4

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

ICML 2025Rejected

4.2

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

ICLR 2025Withdrawn

通讯

3.7

Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs

合作者 (20)

Zhaopeng Tu

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

Castle-in-the-Air: Evaluating MLLM Visual Abilities on Human Cognitive Benchmarks

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

The Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

The Hunger Game Debate: On the Emergence of Over-Competition in Multi-Agent Systems

SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning

Thoughts Are All Over the Place: On the Underthinking of Long Reasoning Models

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

RaSA: Rank-Sharing Low-Rank Adaptation

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

Competing Large Language Models in Multi-Agent Gaming Environments

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

Do NOT Think That Much for 2+3=? On the Overthinking of Long Reasoning Models

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs