Julian Michael

Head of SEAL@Scale AI·美国·OpenReview

研究方向

alignment · truthfulness · debate · interpretability · explainability · benchmarks · evaluation · linguistic analysis · semantics · semantic roles · semantic formalisms · structure prediction · parsing · ccg

7.6

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

ICLR 2025Withdrawn

3.2

Evaluating Oversight Robustness with Incentivized Reward Hacking

ICLR 2025Withdrawn

通讯

8.3

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

合作者 (20)

Julian Michael

Why Do Some Language Models Fake Alignment While Others Don't?

AI Debate Aids Assessment of Controversial Claims

Quantifying Elicitation of Latent Capabilities in Language Models

Rapid Response: Mitigating LLM Jailbreaks With A Few Examples

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

Evaluating Oversight Robustness with Incentivized Reward Hacking

GPQA: A Graduate-Level Google-Proof Q&A Benchmark