PaperHub

暂无评分数据

ICLR 2025

Diversity Helps Jailbreak Large Language Models

OpenReviewPDF
提交: 2024-09-22更新: 2024-10-10
TL;DR

We present a novel jailbreaking strategy that employs an attacker LLM to generate diversified and obfuscated adversarial prompts, demonstrating significant improvement over past approaches.

摘要

关键词
AttackLarge Language ModelSafety

评审与讨论

撤稿通知

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.