Jing Huang
~Jing_Huang2
7
论文总数
7.0
年均投稿
平均评分
接收情况7/7
会议分布
ICML
3
NeurIPS
2
COLM
1
ICLR
1
发表论文 (7 篇)
20257 篇
5
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
ICML 2025Poster
4
The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
COLM 2025Poster
4
Blackbox Model Provenance via Palimpsestic Membership Inference
NeurIPS 2025Spotlight
4
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
ICLR 2025Poster
4
LLMs Encode Harmfulness and Refusal Separately
NeurIPS 2025Poster
4
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
ICML 2025Spotlight
4
MIB: A Mechanistic Interpretability Benchmark
ICML 2025Poster