Ethan Perez
~Ethan_Perez1
14
论文总数
7.0
年均投稿
平均评分
接收情况9/14
会议分布
ICLR
11
NeurIPS
3
发表论文 (14 篇)
202511 篇
4
Language Models Learn to Mislead Humans via RLHF
ICLR 2025Poster
4
Rapid Response: Mitigating LLM Jailbreaks With A Few Examples
ICLR 2025Rejected
4
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
ICLR 2025withdrawn
5
Plan B: Training LLMs to fail less severely
ICLR 2025withdrawn
4
Looking Inward: Language Models Can Learn About Themselves by Introspection
ICLR 2025Poster
3
Attacking Audio Language Models with Best-of-N Jailbreaking
ICLR 2025Rejected
4
Quantifying Elicitation of Latent Capabilities in Language Models
NeurIPS 2025Poster
4
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
ICLR 2025Rejected
3
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
ICLR 2025Poster
4
Best-of-N Jailbreaking
NeurIPS 2025Poster
4
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
ICLR 2025Poster