暂无评分数据
ICLR 2025
iART - Imitation guided Automated Red Teaming
TL;DR
A new computationally efficient imitation-guided reinforcement learning approach for red teaming (iART) LLMs
摘要
关键词
Automated Red-teamingLarge Language Models (LLMs)Reinforcement LearningImitation
评审与讨论
作者撤稿通知
I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.