PaperHub

David Krueger

~David_Krueger1

28
论文总数
14.0
年均投稿
5.7
平均评分
接收情况17/28
会议分布
ICLR
18
NeurIPS
7
ICML
2
COLM
1

发表论文 (28 篇)

202518

7.3
4

From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization

NeurIPS 2025Poster
4.0
4

Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders

ICLR 2025Rejected
7.3
4

Distributional Training Data Attribution: What do Influence Functions Sample?

NeurIPS 2025Spotlight
6.0
4

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

ICLR 2025Rejected
5.5
6

Protecting against simultaneous data poisoning attacks

ICLR 2025Poster
4.7
3

Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

ICLR 2025Rejected
5.7
3

Input Space Mode Connectivity in Deep Neural Networks

ICLR 2025Poster
6.8
5

Detecting High-Stakes Interactions with Activation Probes

NeurIPS 2025Poster
8.0
4

Interpreting Emergent Planning in Model-Free Reinforcement Learning

ICLR 2025Oral
4.3
4

Mitigating Goal Misgeneralization via Minimax Regret

ICLR 2025Rejected
8.0
3

Influence Functions for Scalable Data Attribution in Diffusion Models

ICLR 2025Oral
7.0
4

Towards Interpreting Visual Information Processing in Vision-Language Models

ICLR 2025Poster
6.0
4

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

ICLR 2025Rejected
4.4
4

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

ICML 2025Poster
6.1
4

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

ICML 2025Poster
3.8
4

Towards Meta-Models for Automated Interpretability

ICLR 2025withdrawn
6.3
3

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

COLM 2025Poster
5.0
6

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

ICLR 2025Rejected