暂无评分数据
ICLR 2025
Diversity Helps Jailbreak Large Language Models
TL;DR
We present a novel jailbreaking strategy that employs an attacker LLM to generate diversified and obfuscated adversarial prompts, demonstrating significant improvement over past approaches.
摘要
关键词
AttackLarge Language ModelSafety
评审与讨论
作者撤稿通知
I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.