影响力指数

91.33/100

前 0.5%

全站排名 #310

发表论文31 篇

平均评分5.6

年均产出10.3 篇/年

Adel Bibi

Senior Researcher@University of Oxford·英国·OpenReview

研究方向

Machine Learning · Optimization · Computer Vision

Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

ICLR 2026Poster

BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

ICLR 2026Poster

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

ICLR 2026Rejected

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

ICLR 2026Withdrawn

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

ICLR 2026Desk Rejected

To Distill or Not to Distill: Knowledge Transfer Undermines Safety of LLMs

ICLR 2026Withdrawn

ToolTweak: An Attack on Tool Selection in LLM-based Agents

ICLR 2026Rejected

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

ICLR 2025Spotlight

On the Coexistence and Ensembling of Watermarks

NeurIPS 2025Poster

Towards Certification of Uncertainty Calibration under Adversarial Attacks

ICLR 2025Poster

Shh, don't say that! Domain Certification in LLMs

ICLR 2025Poster

Mixture of Experts Made Intrinsically Interpretable

ICML 2025Poster

On the Coexistence and Ensembling of Watermarks

ICLR 2025Rejected

MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

NeurIPS 2025Poster

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

COLM 2025Poster

Do as I do (Safely): Mitigating Task-Specific Fine-tuning Risks in Large Language Models

ICLR 2025Poster

SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?

ICLR 2025Rejected

Questioning Simplicity Bias Assumptions

ICLR 2025Withdrawn

Language Models' Internal Conflicts: Layer-wise Usable Information For Detecting Model (Un)answerability

ICLR 2025Withdrawn

合作者 (20)

博后导师30 篇

博士导师5 篇

Aleksandar Petrov

Mohamed Elhoseiny