影响力指数

57.41/100

前 5.5%

全站排名 #3,561

发表论文8 篇

平均评分5.6

年均产出2.7 篇/年

Cassidy Laidlaw

Researcher@Transluce·美国·OpenReview

研究方向

reinforcement learning theory · RL theory · preference learning · reward learning · inverse reinforcement learning · ai alignment · ai safety · multiagent systems · multiagent RL · human modeling · robustness · adversarial examples · adversarial attacks

Benchmarking Anomaly Detection for Large Language Model Alignment

ICLR 2026Rejected

Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision

ICLR 2025Spotlight

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

ICLR 2025Spotlight

AssistanceZero: Scalably Solving Assistance Games

ICML 2025Poster

Reliability-Aware Preference Learning for LLM Reward Models

ICLR 2025Withdrawn

合作者 (14)

博士导师6 篇

博士导师2 篇

Anand Siththaranjan

Dylan Hadfield-Menell