Pin-Yu Chen
~Pin-Yu_Chen1
49
论文总数
24.5
年均投稿
平均评分
接收情况26/49
会议分布
ICLR
39
NeurIPS
9
COLM
1
发表论文 (49 篇)
202524 篇
4
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
NeurIPS 2025Poster
5
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
ICLR 2025Poster
4
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark
ICLR 2025withdrawn
4
Shape it Up! Restoring LLM Safety during Finetuning
NeurIPS 2025Poster
5
Large Language Models can Become Strong Self-Detoxifiers
ICLR 2025Poster
5
Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness
ICLR 2025Rejected
3
Revisiting Mode Connectivity in Neural Networks with Bezier Surface
ICLR 2025Poster
4
ADAPT: Adaptive Prompt Tuning for Pre-Trained Vision-Language Models
ICLR 2025Rejected
4
Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection
ICLR 2025Rejected
3
Breaking Free: Hacking Diffusion Models for Generating Adversarial Examples and Bypassing Safety Guardrails
ICLR 2025Rejected
4
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks
ICLR 2025withdrawn
4
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
ICLR 2025Poster
4
GRE Score: Generative Risk Evaluation for Large Language Models
ICLR 2025withdrawn
5
Benchmarking LLMs on Safety Issues in Scientific Labs
ICLR 2025Rejected
5
TabWak: A Watermark for Tabular Diffusion Models
ICLR 2025Spotlight
4
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
ICLR 2025Oral
4
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming
ICLR 2025Poster
5
Sparse Gradient Compression for Fine-Tuning Large Language Models
ICLR 2025withdrawn
5
Language Models Are Good Tabular Learners
ICLR 2025Rejected
4
DAG-Jailbreak: Enhancing Black-box Jailbreak Attacks and Defenses through DAG Dependency Analysis
ICLR 2025Rejected
3
Your Task May Vary: A Systematic Understanding of Alignment and Safety Degradation when Fine-tuning LLMs
ICLR 2025Rejected
4
Visual Prompting Reimagined: The Power of Activation Prompts
ICLR 2025withdrawn
4
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search
NeurIPS 2025Poster
4
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
ICLR 2025Poster
202425 篇
3
On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets
ICLR 2024withdrawn
4
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
NeurIPS 2024Poster
4
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
NeurIPS 2024Poster
4
On Robustness-Accuracy Characterization of Language Models using Synthetic Datasets
COLM 2024Poster
4
SynBench: Evaluating Pretrained Representations for Image Classification using Synthetic Data
ICLR 2024Rejected
4
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
NeurIPS 2024Poster
4
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
ICLR 2024Rejected
3
Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection
ICLR 2024withdrawn
3
Graph is All You Need? Lightweight Data-agnostic Neural Architecture Search without Training
ICLR 2024Rejected
4
Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning
ICLR 2024Rejected
4
AutoVP: An Automated Visual Prompting Framework and Benchmark
ICLR 2024Poster
4
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
ICLR 2024Oral
4
Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective
ICLR 2024Poster
4
Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models
NeurIPS 2024Poster
4
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes
NeurIPS 2024Poster
5
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models
ICLR 2024Poster
4
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes
ICLR 2024Rejected
4
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
NeurIPS 2024Poster
5
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
ICLR 2024Poster
4
Visual Prompting Reimagined: The Power of Activation Prompts
ICLR 2024Rejected
4
Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
ICLR 2024Rejected
4
What Improves the Generalization of Graph Transformer? A Theoretical Dive into Self-attention and Positional Encoding
ICLR 2024Rejected
4
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
ICLR 2024Spotlight
4
Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?
ICLR 2024Poster
5
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
ICLR 2024Poster