Paper
Hub
搜索
Toggle language
Francis Rhys Ward
~Francis_Rhys_Ward1
6
论文总数
3.0
年均投稿
5.4
平均评分
接收情况
3
/
6
会议分布
NeurIPS
3
ICLR
2
ICML
1
发表论文 (6 篇)
2025
3 篇
8.2
4
CTRL-ALT-DECEIT Sabotage Evaluations for Automated AI R&D
NeurIPS 2025
Spotlight
5.0
3
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
ICLR 2025
Poster
6.6
4
The Elicitation Game: Evaluating Capability Elicitation Techniques
ICML 2025
Poster
2024
3 篇
3.7
3
Tall Tales at Different Scales: Evaluating Scaling Trends For Deception in Language Models
ICLR 2024
Rejected
3.7
3
A Causal Model of Theory-of-Mind in AI Agents
NeurIPS 2024
Rejected
5.5
4
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
NeurIPS 2024
Rejected
合作者 (20)
FH
Felix Hofstätter
4 篇
TW
Teun van der Weij
4 篇
OJ
Oliver Jaffe
3 篇
SB
Samuel F. Brown
3 篇
JF
Jack Foxabbott
1 篇
JF
James Fox
1 篇
RS
Rohan Subramani
1 篇
HB
Henning Bartsch
1 篇
查看全部 20 位合作者