影响力指数

58.64/100

前 5.1%

全站排名 #3,307

发表论文13 篇

平均评分5.2

年均产出4.3 篇/年

Mrinank Sharma

PhD student@University of Oxford·OpenReview

研究方向

Large Language Models · Deep Generative Models · COVID-19 NPI Modelling · Variational Inference · Bayesian Deep Learning

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

ICLR 2026Poster

Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

ICLR 2026Poster

Chain-of-Thought Hijacking

ICLR 2026Desk Rejected

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

ICLR 2025Poster

Best-of-N Jailbreaking

NeurIPS 2025Poster

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

ICLR 2025Poster

Rapid Response: Mitigating LLM Jailbreaks With A Few Examples

ICLR 2025Rejected

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

ICLR 2025Rejected

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

ICML 2025Poster

Attacking Audio Language Models with Best-of-N Jailbreaking

ICLR 2025Rejected

合作者 (20)

Rylan Schaeffer