Julian Michael
~Julian_Michael1
8
论文总数
4.0
年均投稿
平均评分
接收情况4/8
会议分布
ICLR
4
NeurIPS
3
COLM
1
发表论文 (8 篇)
20257 篇
4
Rapid Response: Mitigating LLM Jailbreaks With A Few Examples
ICLR 2025Rejected
4
Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
ICLR 2025Rejected
3
Why Do Some Language Models Fake Alignment While Others Don't?
NeurIPS 2025Spotlight
4
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
ICLR 2025withdrawn
6
Evaluating Oversight Robustness with Incentivized Reward Hacking
ICLR 2025withdrawn
4
Quantifying Elicitation of Latent Capabilities in Language Models
NeurIPS 2025Poster
4
AI Debate Aids Assessment of Controversial Claims
NeurIPS 2025Poster