影响力指数

97.75/100

前 0.1%

全站排名 #66

发表论文41 篇

平均评分5.9

年均产出13.7 篇/年

Simon Shaolei Du

Assistant Professor@University of Washington·美国·OpenReview

研究方向

representation learning theory · reinforcement learning theory · non-convex optimization · deep learning theory

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

ICLR 2026Poster

Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian

ICLR 2026Poster

Global Convergence of Four-Layer Matrix Factorization under Random Initialization

ICLR 2026Rejected

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

ICLR 2026Rejected

Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback

ICLR 2026Rejected

PrefDisco: Benchmarking Proactive Personalized Reasoning

ICLR 2026Poster

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

ICLR 2026Rejected

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

ICLR 2026Rejected

Spurious Rewards: Rethinking Training Signals in RLVR

ICLR 2026Desk Rejected

Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

NeurIPS 2025Poster

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

NeurIPS 2025Poster

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

NeurIPS 2025Poster

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

COLM 2025Poster

Deployment Efficient Reward-Free Exploration with Linear Function Approximation

NeurIPS 2025Poster

Transformers are Efficient Compilers, Provably

COLM 2025Poster

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

COLM 2025Poster

Understanding the Gain from Data Filtering in Multimodal Contrastive Learning

NeurIPS 2025Poster

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

NeurIPS 2025Poster

Minimax Optimal Regret Bound for Reinforcement Learning with Trajectory Feedback

ICML 2025Poster

The Crucial Role of Samplers in Online Direct Preference Optimization

ICLR 2025Poster

Minimax Optimal Regret Bound for Reinforcement Learning with Trajectory Feedback

ICLR 2025Rejected

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

ICLR 2025Rejected

SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

ICLR 2025Rejected

Deployment Efficient Reward-Free Exploration with Linear Function Approximation

ICLR 2025Rejected

On Erroneous Agreements of CLIP Image Embeddings

ICLR 2025Rejected

Transformers are Efficient Compilers, Provably

ICLR 2025Rejected

合作者 (20)