Xiangyu Qi
~Xiangyu_Qi2
8
论文总数
4.0
年均投稿
平均评分
接收情况6/8
会议分布
ICLR
7
NeurIPS
1
发表论文 (8 篇)
20255 篇
4
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025Oral
4
On Evaluating the Durability of Safeguards for Open-Weight LLMs
ICLR 2025Poster
4
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025Poster
4
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks
ICLR 2025withdrawn
4
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
ICLR 2025withdrawn
20243 篇
4
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
ICLR 2024Oral
4
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
ICLR 2024Poster
4
BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
NeurIPS 2024Poster