PaperHub

Aviral Kumar

~Aviral_Kumar2

31
论文总数
15.5
年均投稿
6.2
平均评分
接收情况24/31
会议分布
ICLR
16
NeurIPS
11
ICML
4

发表论文 (31 篇)

202522

8.0
4

Training Language Models to Self-Correct via Reinforcement Learning

ICLR 2025Oral
4.7
3

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

ICLR 2025withdrawn
7.3
4

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

NeurIPS 2025Poster
6.4
4

Reasoning as an Adaptive Defense for Safety

NeurIPS 2025Poster
7.5
4

Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning

ICLR 2025Oral
6.3
3

Scaling Test-Time Compute Without Verification or RL is Suboptimal

ICML 2025Spotlight
8.7
4

Horizon Reduction Makes RL Scalable

NeurIPS 2025Spotlight
4.8
5

Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents

ICLR 2025Poster
5.3
3

Generative Verifiers: Reward Modeling as Next-Token Prediction

ICLR 2025Poster
6.5
4

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

ICLR 2025Poster
7.8
4

Grounded Reinforcement Learning for Visual Reasoning

NeurIPS 2025Poster
5.7
3

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

ICLR 2025Poster
4.7
6

Improving the Efficiency of Test-Time Search in LLMs with Backtracking

ICLR 2025Rejected
4.7
3

Parameterization Agnostic RL

ICLR 2025Rejected
6.6
4

What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning?

ICML 2025Poster
4.3
4

Pre-Memorization Train Accuracy Reliably Predicts Generalization in LLM Reasoning

ICLR 2025Rejected
7.0
3

Optimizing Test-Time Compute via Meta Reinforcement Finetuning

ICML 2025Poster
5.5
4

Value-Based Deep RL Scales Predictably

ICML 2025Poster
7.3
4

Compute-Optimal Scaling for Value-Based Deep RL

NeurIPS 2025Poster
7.1
7

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

ICLR 2025Spotlight
6.4
4

Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction

NeurIPS 2025Poster
6.5
4

RRM: Robust Reward Model Training Mitigates Reward Hacking

ICLR 2025Poster

20249