影响力指数

77.09/100

前 1.7%

全站排名 #1,075

发表论文29 篇

平均评分5.2

年均产出9.7 篇/年

Haitao Mi

Principal Researcher@Tencent AI Lab·美国·OpenReview

研究方向

Agent · Reinforcement learning · Large Language Models · Natural Language Processing · Dialogue System · Machine Translation

6.0

R-Zero: Self-Evolving Reasoning LLM from Zero Data

ICLR 2026Poster

5.5

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

ICLR 2026Poster

5.5

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

ICLR 2026Poster

5.0

THE END OF MANUAL DECODING: TOWARDS TRULY END-TO-END LANGUAGE MODELS

ICLR 2026Poster

5.0

Vision-SR1: Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization

ICLR 2026Poster

4.5

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

ICLR 2026Poster

4.5

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

ICLR 2026Poster

4.5

InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

ICLR 2026Withdrawn

4.5

On the Evolution of Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

ICLR 2026Rejected

4.0

WebAggregator: Scaling Complex Logical Information Aggregation for Web Agents Foundation Models

ICLR 2026Withdrawn

3.5

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

ICLR 2026Rejected

3.0

One Token to Fool LLM-as-a-Judge

ICLR 2026Rejected

3.0

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

ICLR 2026Withdrawn

3.0

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

ICLR 2026Withdrawn

7.3

Thoughts Are All Over the Place: On the Underthinking of Long Reasoning Models

NeurIPS 2025Spotlight

6.8

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

NeurIPS 2025Spotlight

6.8

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

NeurIPS 2025Poster

6.8

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

NeurIPS 2025Poster

6.4

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

NeurIPS 2025Poster

6.4

UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression

NeurIPS 2025Poster

6.3

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

ICLR 2025Poster

三作

6.0

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

ICLR 2025Oral

6.0

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

NeurIPS 2025Poster

4.9

Do NOT Think That Much for 2+3=? On the Overthinking of Long Reasoning Models

ICML 2025Poster

4.8

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows

ICLR 2025Rejected

二作

4.4

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

ICML 2025Rejected

4.3

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

合作者 (20)