Yonatan Belinkov

Associate Professor@Technion - Israel Institute of Technology, Technion·以色列·OpenReview

研究方向

Biological foundation models · Emergent communication · Interpretability · robustness · deep learning · representation learning · machine translation · speech recognition · syntactic parsing · question answering

Yonatan Belinkov

DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

Language Models Use Lookbacks to Track Beliefs

Structured RAG for Answering Aggregative Questions

Beyond Natural Language: Invented Communication in Vision-Language Models

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

Inside-Out: Hidden Factual Knowledge in LLMs

CtD: Composition through Decomposition in Emergent Communication

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Jamba: Hybrid Transformer-Mamba Language Models

MIB: A Mechanistic Interpretability Benchmark

Distinguishing Ignorance from Error in LLM Hallucinations

Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods