影响力指数

1.72/100

超过 4.6%

全站排名 #61,433

发表论文3 篇

平均评分4.1

年均产出1.5 篇/年

Rauno Arike

MS student@University of Amsterdam·荷兰·OpenReview

研究方向

deep learning · mechanistic interpretability for neural networks · language model evaluations

How does information access affect LLM monitors' ability to detect sabotage?

ICLR 2026Rejected

Interpreting Learned Feedback Patterns in Large Language Models

NeurIPS 2024Poster

Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders

ICLR 2024Withdrawn

合作者 (11)

Francis Rhys Ward

Raja Mehta Moreno

Rohan Subramani