影响力指数

59.48/100

前 4.9%

全站排名 #3,182

发表论文9 篇

平均评分6.0

年均产出4.5 篇/年

Andy Zhou

Undergrad student@Department of Computer Science·美国·OpenReview

研究方向

large language models · AI agents · AI security

AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories

ICLR 2025Spotlight

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

ICLR 2025Poster

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

NeurIPS 2025Poster

Tamper-Resistant Safeguards for Open-Weight LLMs

ICLR 2025Poster

GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs

ICLR 2025Rejected

AutoRedTeamer: An Autonomous Red Teaming Agent Against Language Models

ICLR 2025Rejected

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

NeurIPS 2024Spotlight

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

NeurIPS 2024Poster

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

ICLR 2024Rejected

合作者 (20)