PaperHub

Sergey Levine

~Sergey_Levine1

78
论文总数
39.0
年均投稿
5.9
平均评分
接收情况50/78
会议分布
ICLR
46
NeurIPS
19
ICML
11
COLM
2

发表论文 (78 篇)

202547

7.3
4

Self-Challenging Language Model Agents

NeurIPS 2025Poster
3.5
4

Zero-Shot Goal Dialogue via Reinforcement Learning on Imagined Conversations

ICLR 2025Rejected
4.3
4

Annotation Bootstrapping: Reinforcing Visual Pre-Training using Unlabelled Images

ICLR 2025Rejected
6.4
5

A Stable Whitening Optimizer for Efficient Neural Network Training

NeurIPS 2025Poster
7.0
4

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

ICLR 2025Poster
6.3
3

Scaling Test-Time Compute Without Verification or RL is Suboptimal

ICML 2025Spotlight
4.9
4

Behavioral Exploration: Learning to Explore via In-Context Adaptation

ICML 2025Poster
5.5
4

Cliqueformer: Model-Based Optimization With Structured Transformers

ICLR 2025Rejected
6.4
4

Reinforcement Learning with Action Chunking

NeurIPS 2025Poster
6.8
4

Real-Time Execution of Action Chunking Flow Policies

NeurIPS 2025Poster
6.0
4

Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

NeurIPS 2025Poster
8.0
4

One Step Diffusion via Shortcut Models

ICLR 2025Oral
4.3
4

Unsupervised-to-Online Reinforcement Learning

ICLR 2025Rejected
6.1
4

Flow Q-Learning

ICML 2025Poster
4.8
4

Interactive Dialogue Agents via Reinforcement Learning with Hindsight Regenerations

ICLR 2025Rejected
4.3
4

ViVa: Video-Trained Value Functions for Guiding Online RL from Diverse Data

ICLR 2025Rejected
6.5
4

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

ICLR 2025Poster
7.0
4

Language Guided Skill Discovery

ICLR 2025Poster
6.4
4

Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations

NeurIPS 2025Poster
4.8
5

Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents

ICLR 2025Poster
7.0
4

OGBench: Benchmarking Offline Goal-Conditioned RL

ICLR 2025Poster
4.7
3

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

ICLR 2025withdrawn
5.8
5

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

ICLR 2025Rejected
7.5
4

Prioritized Generative Replay

ICLR 2025Oral
6.1
4

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

ICML 2025Poster
6.8
4

Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

NeurIPS 2025Poster
4.7
3

Successor Representations Enable Emergent Compositional Instruction Following

ICLR 2025Rejected
7.8
4

Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following

NeurIPS 2025Poster
6.6
4

What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning?

ICML 2025Poster
5.5
4

Value-Based Deep RL Scales Predictably

ICML 2025Poster
4.3
4

Pre-Memorization Train Accuracy Reliably Predicts Generalization in LLM Reasoning

ICLR 2025Rejected
7.3
4

Compute-Optimal Scaling for Value-Based Deep RL

NeurIPS 2025Poster
6.5
4

Adding Conditional Control to Diffusion Models with Reinforcement Learning

ICLR 2025Poster
4.0
4

Defining Deception in Decision Making

ICLR 2025Rejected
8.7
4

Horizon Reduction Makes RL Scalable

NeurIPS 2025Spotlight
4.8
3

Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

ICML 2025Poster
5.8
5

Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

ICLR 2025Rejected
6.0
4

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

ICLR 2025Poster
7.0
3

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

ICML 2025Poster
4.9
4

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

ICML 2025Poster
5.5
4

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

ICML 2025Poster
4.3
4

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

ICLR 2025Rejected
5.8
4

Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

ICLR 2025Rejected
7.8
4

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding

NeurIPS 2025Poster
3.8
5

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

ICLR 2025Rejected
8.8
3

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

NeurIPS 2025Spotlight
6.1
4

Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models

ICML 2025Poster

202431

5.3
3

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

ICLR 2024Rejected
5.0
3

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

ICLR 2024Poster
5.4
5

Learning to Assist Humans without Inferring Rewards

NeurIPS 2024Poster
4.8
4

Conservative World Models

ICLR 2024Rejected
4.3
3

Contrastive Representations Make Planning Easy

ICLR 2024Rejected
7.0
4

Is Value Learning Really the Main Bottleneck in Offline RL?

NeurIPS 2024Poster
7.5
4

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

ICLR 2024Oral
4.3
4

Advantage-Conditioned Diffusion: Offline RL via Generalization

ICLR 2024Rejected
6.3
4

Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference

NeurIPS 2024Poster
6.0
4

Predicting Emergent Capabilities by Finetuning

COLM 2024Poster
7.0
4

Deep Neural Networks Tend To Extrapolate Predictably

ICLR 2024Poster
7.0
3

Project and Probe: Sample-Efficient Adaptation by Interpolating Orthogonal Features

ICLR 2024Spotlight
5.5
4

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

ICLR 2024Rejected
5.3
3

Confidence-Based Model Selection: When to Take Shortcuts in Spurious Settings

ICLR 2024Rejected
6.7
3

Autonomous Evaluation and Refinement of Digital Agents

COLM 2024Poster
6.5
4

RLIF: Interactive Imitation Learning as Reinforcement Learning

ICLR 2024Poster
6.0
4

Offline RL for Online RL: Decoupled Policy Learning for Mitigating Exploration Bias

ICLR 2024Rejected
4.3
4

V-Former: Offline RL with Temporally-Extended Actions

ICLR 2024Rejected
6.3
4

Training Diffusion Models with Reinforcement Learning

ICLR 2024Poster
6.0
4

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

NeurIPS 2024Poster
5.7
3

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

ICLR 2024Rejected
3.8
4

Latent Conservative Objective Models for Offline Data-Driven Crystal Structure Prediction

ICLR 2024Rejected
7.3
4

Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

ICLR 2024Spotlight
6.3
4

Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models

ICLR 2024Poster
6.3
4

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

NeurIPS 2024Poster
7.0
3

Designing Cell-Type-Specific Promoter Sequences Using Conservative Model-Based Optimization

NeurIPS 2024Poster
7.0
4

The False Promise of Imitating Proprietary Language Models

ICLR 2024Spotlight
5.5
4

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

ICLR 2024Rejected
4.8
4

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

ICLR 2024Rejected
5.8
4

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

NeurIPS 2024Poster
5.3
4

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

ICLR 2024Rejected