影响力指数

41.4/100

前 13.5%

全站排名 #8,668

发表论文5 篇

平均评分6.2

年均产出2.5 篇/年

Bilal Chughtai

MS student@University of Cambridge·OpenReview

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning

NeurIPS 2025Poster

Detecting Strategic Deception with Linear Probes

ICML 2025Poster

Transformer Circuit Evaluation Metrics Are Not Robust

COLM 2024Poster

Summing Up the Facts: Additive Mechanisms behind Factual Recall in LLMs

ICLR 2024Rejected

Language Models Struggle to Explain Themselves

ICLR 2024Rejected

合作者 (12)

Marius Hobbhahn

Nicholas Goldowsky-Dill

Stefan Heimersheim