Tomasz Korbak

Researcher@UK AI Security Institute·英国·OpenReview

研究方向

language models · reinforcement learning from human feedback · AI safety · generative models · variational inference · probabilistic programming · emergent communication · compositionality

3.5

The Two-Hop Curse: LLMs trained on A→B, B→C fail to learn A→C

ICLR 2025Rejected

二作

6.5

Towards Understanding Sycophancy in Language Models

ICLR 2024Poster

三作

6.5

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

合作者 (20)

Tomasz Korbak

The Two-Hop Curse: LLMs trained on A→B, B→C fail to learn A→C

Towards Understanding Sycophancy in Language Models

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”

Many-shot Jailbreaking

Compositional Preference Models for Aligning LMs