影响力指数

62.31/100

前 4.2%

全站排名 #2,700

发表论文8 篇

平均评分6.3

年均产出2.7 篇/年

John Hughes

Researcher@ML Alignment & Theory Scholars·英国·OpenReview

研究方向

Adversarial Robustness · Scalable Oversight · AI Safety · Automatic Speech Recognition · Self Supervised Learning

Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

ICLR 2026Poster

How Do Large Language Monkeys Get Their Power (Laws)?

Why Do Some Language Models Fake Alignment While Others Don't?

NeurIPS 2025Spotlight

Best-of-N Jailbreaking

NeurIPS 2025Poster

Looking Inward: Language Models Can Learn About Themselves by Introspection

ICLR 2025Poster

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

ICLR 2025Poster

Attacking Audio Language Models with Best-of-N Jailbreaking

ICLR 2025Rejected

合作者 (20)

Rylan Schaeffer