Siliang Zeng
~Siliang_Zeng1
4
论文总数
2.0
年均投稿
平均评分
接收情况2/4
会议分布
ICLR
3
NeurIPS
1
发表论文 (4 篇)
20253 篇
4
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
ICLR 2025Rejected
4
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment
ICLR 2025Spotlight
4
Policy optimization can be memory-efficient: LLM Alignment Through Successive Policy Re-weighting (SPR)
ICLR 2025Rejected