Yuheng Zhang
~Yuheng_Zhang1
6
论文总数
3.0
年均投稿
平均评分
接收情况6/6
会议分布
NeurIPS
4
ICLR
2
发表论文 (6 篇)
20253 篇
4
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
NeurIPS 2025Spotlight
4
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
ICLR 2025Oral
4
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
ICLR 2025Poster
20243 篇
4
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
NeurIPS 2024Poster
4
Provably Efficient Interactive-Grounded Learning with Personalized Reward
NeurIPS 2024Poster
4
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
NeurIPS 2024Poster