PaperHub

Jacob Steinhardt

~Jacob_Steinhardt1

26
论文总数
13.0
年均投稿
6.0
平均评分
接收情况17/26
会议分布
ICLR
18
ICML
5
NeurIPS
3

发表论文 (26 篇)

202518

5.9
6

Which Attention Heads Matter for In-Context Learning?

ICML 2025Poster
4.6
5

Which Attention Heads Matter for In-Context Learning?

ICLR 2025Rejected
7.2
4

Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts

ICML 2025Poster
6.8
4

Interpreting the Second-Order Effects of Neurons in CLIP

ICLR 2025Poster
7.5
4

Monitoring Latent World States in Language Models with Propositional Probes

ICLR 2025Spotlight
7.5
4

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language

ICLR 2025Spotlight
4.9
4

Adversaries Can Misuse Combinations of Safe Models

ICML 2025Poster
4.3
4

Adversaries Can Misuse Combinations of Safe Models

ICLR 2025Rejected
7.3
4

Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision

ICLR 2025Spotlight
5.3
4

Teaching LLMs to Decode Activations Into Natural Language

ICLR 2025Rejected
6.6
4

What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning?

ICML 2025Poster
4.0
6

SmartBackdoor: Malicious Language Model Agents that Avoid Being Caught

ICLR 2025withdrawn
6.8
4

LLM Layers Immediately Correct Each Other

NeurIPS 2025Poster
5.3
4

VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

ICLR 2025Poster
4.3
4

Pre-Memorization Train Accuracy Reliably Predicts Generalization in LLM Reasoning

ICLR 2025Rejected
6.3
4

Language Models Learn to Mislead Humans via RLHF

ICLR 2025Poster
7.8
3

Eliciting Language Model Behaviors with Investigator Agents

ICML 2025Poster
4.8
4

Evaluating Model Robustness Against Unforeseen Adversarial Attacks

ICLR 2025Rejected