Nan Jiang
~Nan_Jiang2
15
论文总数
7.5
年均投稿
平均评分
接收情况13/15
会议分布
NeurIPS
9
ICLR
4
ICML
2
发表论文 (15 篇)
20259 篇
4
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
ICLR 2025Poster
4
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
ICML 2025Poster
3
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
NeurIPS 2025Poster
4
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
NeurIPS 2025Oral
4
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
ICLR 2025Oral
4
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
NeurIPS 2025Spotlight
4
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
NeurIPS 2025Poster
4
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
ICML 2025Rejected
4
Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction
NeurIPS 2025Poster
20246 篇
3
Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality
NeurIPS 2024Poster
4
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
NeurIPS 2024Poster
3
Reinforcement Learning Under Latent Dynamics: Toward Statistical and Algorithmic Modularity
NeurIPS 2024Oral
3
Harnessing Density Ratios for Online Reinforcement Learning
ICLR 2024Spotlight
4
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
NeurIPS 2024Poster
4
LM-Switch: Transforming Word Embedding Space for Flexible Language Model Steering
ICLR 2024Rejected