Henry Sleight
~Henry_Sleight1
10
论文总数
5.0
年均投稿
平均评分
接收情况6/10
会议分布
ICLR
7
NeurIPS
2
COLM
1
发表论文 (10 篇)
20259 篇
4
Rapid Response: Mitigating LLM Jailbreaks With A Few Examples
ICLR 2025Rejected
4
Quantifying Elicitation of Latent Capabilities in Language Models
NeurIPS 2025Poster
4
Looking Inward: Language Models Can Learn About Themselves by Introspection
ICLR 2025Poster
5
Plan B: Training LLMs to fail less severely
ICLR 2025withdrawn
3
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
ICLR 2025Poster
4
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
ICLR 2025Rejected
3
Attacking Audio Language Models with Best-of-N Jailbreaking
ICLR 2025Rejected
4
Best-of-N Jailbreaking
NeurIPS 2025Poster
4
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
ICLR 2025Poster