Paper
Hub
搜索
Toggle language
Alex Beutel
~Alex_Beutel1
4
论文总数
2.0
年均投稿
5.6
平均评分
接收情况
2
/
4
会议分布
ICLR
3
NeurIPS
1
发表论文 (4 篇)
2025
3 篇
4.3
4
Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
ICLR 2025
Rejected
7.3
4
First-Person Fairness in Chatbots
ICLR 2025
Spotlight
5.3
7
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
ICLR 2025
Rejected
2024
1 篇
5.7
3
Rule Based Rewards for Language Model Safety
NeurIPS 2024
Poster
合作者 (19)
JH
Johannes Heidecke
4 篇
LW
Lilian Weng
4 篇
KX
Kai Yuanqing Xiao
2 篇
AH
Alec Helyar
1 篇
AV
Andrea Vallone
1 篇
IK
Ian D Kivlichan
1 篇
JS
John Schulman
1 篇
JA
Joshua Achiam
1 篇
查看全部 19 位合作者