影响力指数

62.36/100

前 4.2%

全站排名 #2,688

发表论文10 篇

平均评分5.6

年均产出3.3 篇/年

Owain Evans

Principal Researcher@Truthful AI·美国·OpenReview

研究方向

AI safety · large language models · cognitive science · language models · creative AI

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

ICLR 2026Rejected

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Tell me about yourself: LLMs are aware of their learned behaviors

ICLR 2025Spotlight

Looking Inward: Language Models Can Learn About Themselves by Introspection

ICLR 2025Poster

The Two-Hop Curse: LLMs trained on A→B, B→C fail to learn A→C

ICLR 2025Rejected

合作者 (20)

Anna Sztyber-Betley