影响力指数

69.66/100

前 2.7%

全站排名 #1,745

发表论文28 篇

平均评分5.0

年均产出9.3 篇/年

Michael Backes

Full Professor@CISPA Helmholtz Center for Information Security·德国·OpenReview

研究方向

Trustworthy AI · Security and Privacy

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Excessive Reasoning Attack on Reasoning LLMs

ICLR 2026Rejected

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

ICLR 2026Rejected

TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models

ICLR 2026Poster

SafeReview: Building a Robust Deep Review Assistant Against Prompt Injection

ICLR 2026Rejected

Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs

ICLR 2026Rejected

PRIVDISTIL: A Unified Framework for Accurate and Differentially Private Model Compression

ICLR 2026Rejected

Boosting Safety Alignment in LLMs with Response Shortcuts

ICLR 2026Rejected

IAAgent: Autonomous Inference Attacks Against ML Services With LLM-Based Agents

ICLR 2026Withdrawn

SOS! Soft Prompt Attack Against Open-Source Large Language Models

ICLR 2026Withdrawn

Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms

NeurIPS 2025Poster

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

NeurIPS 2025Poster

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

ICLR 2025Poster

Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing

ICML 2025Poster

Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs

ICML 2025Poster

Captured by Captions: On Memorization and its Mitigation in CLIP Models

ICLR 2025Poster

POST: A Framework for Privacy of Soft-prompt Transfer

ICLR 2025Rejected

ACE: Attack Combo Enhancement Against Machine Learning Models

ICLR 2025Withdrawn

合作者 (20)

合作者14 篇

Franziska Boenisch