Andy Zhou
~Andy_Zhou2
9
论文总数
4.5
年均投稿
平均评分
接收情况6/9
会议分布
ICLR
6
NeurIPS
3
发表论文 (9 篇)
20256 篇
4
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
NeurIPS 2025Poster
4
AutoRedTeamer: An Autonomous Red Teaming Agent Against Language Models
ICLR 2025Rejected
4
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
ICLR 2025Spotlight
4
GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs
ICLR 2025Rejected
6
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025Poster
4
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
ICLR 2025Poster
20243 篇
4
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
NeurIPS 2024Spotlight
4
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
ICLR 2024Rejected
3
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
NeurIPS 2024Poster