Adrià Garriga-Alonso

Researcher@FAR·美国·OpenReview

研究方向

mechanistic interpretability · reinforcement learning · sokoban · planning · interpretability · hypothesis testing · estimators · deep learning theory · gaussian processes · bayesian neural networks · markov chain monte carlo · bayesian

6.0

Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN

ICLR 2026Poster

通讯

5.0

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders

ICLR 2026Rejected

二作

4.5

Post-Hoc Reasoning in Chain-of-Thought: Evidence from Pre-CoT Probes and Activation Steering

ICLR 2026Rejected

三作

3.5

Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban

NeurIPS 2025Rejected

通讯

3.5

Planning in a recurrent neural network that plays Sokoban

合作者 (20)