Aviral Kumar
~Aviral_Kumar2
31
论文总数
15.5
年均投稿
平均评分
接收情况24/31
会议分布
ICLR
16
NeurIPS
11
ICML
4
发表论文 (31 篇)
202522 篇
4
Training Language Models to Self-Correct via Reinforcement Learning
ICLR 2025Oral
3
Vision-Language Models Provide Promptable Representations for Reinforcement Learning
ICLR 2025withdrawn
4
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
NeurIPS 2025Poster
4
Reasoning as an Adaptive Defense for Safety
NeurIPS 2025Poster
4
Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning
ICLR 2025Oral
3
Scaling Test-Time Compute Without Verification or RL is Suboptimal
ICML 2025Spotlight
4
Horizon Reduction Makes RL Scalable
NeurIPS 2025Spotlight
5
Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents
ICLR 2025Poster
3
Generative Verifiers: Reward Modeling as Next-Token Prediction
ICLR 2025Poster
4
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
ICLR 2025Poster
4
Grounded Reinforcement Learning for Visual Reasoning
NeurIPS 2025Poster
3
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
ICLR 2025Poster
6
Improving the Efficiency of Test-Time Search in LLMs with Backtracking
ICLR 2025Rejected
3
Parameterization Agnostic RL
ICLR 2025Rejected
4
What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning?
ICML 2025Poster
4
Pre-Memorization Train Accuracy Reliably Predicts Generalization in LLM Reasoning
ICLR 2025Rejected
3
Optimizing Test-Time Compute via Meta Reinforcement Finetuning
ICML 2025Poster
4
Value-Based Deep RL Scales Predictably
ICML 2025Poster
4
Compute-Optimal Scaling for Value-Based Deep RL
NeurIPS 2025Poster
7
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
ICLR 2025Spotlight
4
Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction
NeurIPS 2025Poster
4
RRM: Robust Reward Model Training Mitigates Reward Hacking
ICLR 2025Poster
20249 篇
4
Vision-Language Models Provide Promptable Representations for Reinforcement Learning
ICLR 2024Rejected
3
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
NeurIPS 2024Poster
4
Is Value Learning Really the Main Bottleneck in Offline RL?
NeurIPS 2024Poster
4
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
ICLR 2024Rejected
3
Designing Cell-Type-Specific Promoter Sequences Using Conservative Model-Based Optimization
NeurIPS 2024Poster
4
Latent Conservative Objective Models for Offline Data-Driven Crystal Structure Prediction
ICLR 2024Rejected
4
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
NeurIPS 2024Poster
4
Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models
ICLR 2024Poster
4
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
NeurIPS 2024Poster