Rishabh Agarwal

~Rishabh_Agarwal2

19

论文总数

9.5

年均投稿

6.2

平均评分

接收情况15/19

会议分布

ICLR

13

COLM

3

NeurIPS

2

ICML

1

发表论文 (19 篇)

202513 篇

Evolving Alignment via Asymmetric Self-Play

ICLR 2025Rejected

Reward-Guided Prompt Evolving in Reinforcement Learning for LLMs

ICML 2025Poster

Training Language Models to Self-Correct via Reinforcement Learning

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

ICLR 2025Poster

Don’t Throw Away Data: Better Sequence Knowledge Distillation

ICLR 2025Rejected

Not All LLM Reasoners Are Created Equal

ICLR 2025Rejected

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

COLM 2025Poster

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

ICLR 2025Poster

Generative Verifiers: Reward Modeling as Next-Token Prediction

ICLR 2025Poster

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

ICLR 2025Spotlight

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

ICLR 2025Poster

Towards Compute-Optimal Many-Shot In-Context Learning

COLM 2025Poster

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

ICLR 2025Poster

20246 篇

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

ICLR 2024Poster

Many-Shot In-Context Learning

NeurIPS 2024Spotlight

SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning

ICLR 2024Rejected

V-STaR: Training Verifiers for Self-Taught Reasoners

COLM 2024Poster

On scalable oversight with weak LLMs judging strong LLMs

NeurIPS 2024Poster

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

ICLR 2024Poster

合作者 (20)

Arian Hosseini6 篇

Aviral Kumar4 篇

Aleksandra Faust3 篇

Aaron Courville3 篇

Alessandro Sordoni3 篇

Hritik Bansal2 篇

Mehran Kazemi2 篇

Yongchao Zhou2 篇