Rishabh Agarwal
~Rishabh_Agarwal2
19
论文总数
9.5
年均投稿
平均评分
接收情况15/19
会议分布
ICLR
13
COLM
3
NeurIPS
2
ICML
1
发表论文 (19 篇)
202513 篇
3
Evolving Alignment via Asymmetric Self-Play
ICLR 2025Rejected
4
Reward-Guided Prompt Evolving in Reinforcement Learning for LLMs
ICML 2025Poster
4
Training Language Models to Self-Correct via Reinforcement Learning
ICLR 2025Oral
4
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
ICLR 2025Poster
4
Don’t Throw Away Data: Better Sequence Knowledge Distillation
ICLR 2025Rejected
4
Not All LLM Reasoners Are Created Equal
ICLR 2025Rejected
4
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
COLM 2025Poster
4
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
ICLR 2025Poster
3
Generative Verifiers: Reward Modeling as Next-Token Prediction
ICLR 2025Poster
7
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
ICLR 2025Spotlight
3
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
ICLR 2025Poster
4
Towards Compute-Optimal Many-Shot In-Context Learning
COLM 2025Poster
3
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
ICLR 2025Poster
20246 篇
4
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
ICLR 2024Poster
4
Many-Shot In-Context Learning
NeurIPS 2024Spotlight
3
SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning
ICLR 2024Rejected
4
V-STaR: Training Verifiers for Self-Taught Reasoners
COLM 2024Poster
4
On scalable oversight with weak LLMs judging strong LLMs
NeurIPS 2024Poster
3
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
ICLR 2024Poster