影响力指数

60.46/100

前 4.7%

全站排名 #3,045

发表论文17 篇

平均评分4.9

年均产出5.7 篇/年

Dian Yu

NLP researcher@Tencent AI Lab·美国·OpenReview

研究方向

information extraction · machine reading comprehension · large language models

5.5

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

ICLR 2026Poster

5.0

Vision-SR1: Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization

ICLR 2026Poster

4.5

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

ICLR 2026Poster

4.5

On the Evolution of Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

ICLR 2026Rejected

3.0

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

ICLR 2026Withdrawn

二作

3.0

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

ICLR 2025Poster

6.0

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

ICLR 2025Oral

二作

4.9

Do NOT Think That Much for 2+3=? On the Overthinking of Long Reasoning Models

ICML 2025Poster

4.4

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

ICML 2025Rejected

4.3

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

合作者 (20)

Dian Yu

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Vision-SR1: Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

On the Evolution of Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

One Token to Fool LLM-as-a-Judge

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Thoughts Are All Over the Place: On the Underthinking of Long Reasoning Models

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Do NOT Think That Much for 2+3=? On the Overthinking of Long Reasoning Models

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models