影响力指数

46.39/100

前 10.2%

全站排名 #6,546

发表论文18 篇

平均评分4.9

年均产出6.0 篇/年

Yang Zhang

Full Professor@CISPA Helmholtz Center for Information Security·德国·OpenReview

研究方向

Trustworthy Machine Leanring · AI Safety · Machine Learning Security · Memes · Social Network Analysis · Online Hate and Misinformation

Excessive Reasoning Attack on Reasoning LLMs

ICLR 2026Rejected

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

ICLR 2026Rejected

Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs

ICLR 2026Rejected

Boosting Safety Alignment in LLMs with Response Shortcuts

ICLR 2026Rejected

IAAgent: Autonomous Inference Attacks Against ML Services With LLM-Based Agents

ICLR 2026Withdrawn

SOS! Soft Prompt Attack Against Open-Source Large Language Models

ICLR 2026Withdrawn

Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models

ICLR 2026Withdrawn

Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms

NeurIPS 2025Poster

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

NeurIPS 2025Poster

The Ripple Effect: On Unforeseen Complications of Backdoor Attacks

ICML 2025Poster

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

ICLR 2025Poster

ACE: Attack Combo Enhancement Against Machine Learning Models

ICLR 2025Withdrawn

合作者 (20)