Yonatan Belinkov
~Yonatan_Belinkov1
18
论文总数
9.0
年均投稿
平均评分
接收情况15/18
会议分布
ICLR
11
COLM
3
NeurIPS
3
ICML
1
发表论文 (18 篇)
202512 篇
4
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
COLM 2025Poster
4
Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods
ICLR 2025Rejected
4
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
ICLR 2025Spotlight
3
CtD: Composition through Decomposition in Emergent Communication
ICLR 2025Poster
5
Distinguishing Ignorance from Error in LLM Hallucinations
ICLR 2025withdrawn
5
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
ICLR 2025Poster
4
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
ICLR 2025Oral
4
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
NeurIPS 2025Poster
4
Inside-Out: Hidden Factual Knowledge in LLMs
COLM 2025Poster
4
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
ICLR 2025Poster
4
MIB: A Mechanistic Interpretability Benchmark
ICML 2025Poster
4
Jamba: Hybrid Transformer-Mamba Language Models
ICLR 2025Poster
20246 篇
4
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
ICLR 2024Rejected
3
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
COLM 2024Poster
4
Confidence Regulation Neurons in Language Models
NeurIPS 2024Poster
3
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
ICLR 2024Poster
4
Semantics and Spatiality of Emergent Communication
NeurIPS 2024Poster
3
Linearity of Relation Decoding in Transformer Language Models
ICLR 2024Spotlight