Wei Xiong
~Wei_Xiong9
7
论文总数
3.5
年均投稿
平均评分
接收情况7/7
会议分布
ICML
3
ICLR
2
NeurIPS
2
发表论文 (7 篇)
20256 篇
4
Building Math Agents with Multi-Turn Iterative Preference Learning
ICLR 2025Poster
4
RRM: Robust Reward Model Training Mitigates Reward Hacking
ICLR 2025Poster
5
Logarithmic Regret for Online KL-Regularized Reinforcement Learning
ICML 2025Poster
4
DPO Meets PPO: Reinforced Token Optimization for RLHF
ICML 2025Spotlight
4
LLM Alignment as Retriever Optimization: An Information Retrieval Perspective
ICML 2025Poster
3
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
NeurIPS 2025Poster