PaperHub

暂无评分数据

ICLR 2025

iART - Imitation guided Automated Red Teaming

OpenReviewPDF
提交: 2024-09-25更新: 2024-10-10
TL;DR

A new computationally efficient imitation-guided reinforcement learning approach for red teaming (iART) LLMs

摘要

关键词
Automated Red-teamingLarge Language Models (LLMs)Reinforcement LearningImitation

评审与讨论

撤稿通知

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.