Nicholas Carlini
~Nicholas_Carlini1
18
论文总数
9.0
年均投稿
平均评分
接收情况12/18
会议分布
ICLR
11
NeurIPS
3
ICML
2
COLM
2
发表论文 (18 篇)
202512 篇
6
AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses
ICLR 2025Rejected
3
AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses
ICML 2025Oral
5
Evaluating Privacy Risks of Parameter-Efficient Fine-Tuning
ICLR 2025Rejected
4
Certified Robustness to Clean-label Poisoning Using Diffusion Denoising
ICLR 2025withdrawn
4
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
ICLR 2025Spotlight
4
On Evaluating the Durability of Safeguards for Open-Weight LLMs
ICLR 2025Poster
6
Scalable Extraction of Training Data from Aligned, Production Language Models
ICLR 2025Poster
4
IF-Guide: Influence Function-Guided Detoxification of LLMs
NeurIPS 2025Poster
3
Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
ICML 2025Oral
8
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
ICLR 2025Poster
3
Stealing User Prompts from Mixture-of-Experts Models
ICLR 2025Rejected
8
Persistent Pre-training Poisoning of LLMs
ICLR 2025Poster
20246 篇
4
Effective Prompt Extraction from Language Models
COLM 2024Poster
4
Diffusion Denoising as a Certified Defense Against Clean-Label Poisoning Attacks
ICLR 2024withdrawn
4
Query-Based Adversarial Prompt Generation
NeurIPS 2024Poster
4
Forcing Diffuse Distributions out of Language Models
COLM 2024Poster
4
Subspace Grid-sweep: ML Defense Evaluation via Constrained Brute-force Search
ICLR 2024Rejected
4
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models
NeurIPS 2024Poster