Pin-Yu Chen

~Pin-Yu_Chen1

49

论文总数

24.5

年均投稿

5.6

平均评分

接收情况26/49

会议分布

ICLR

39

NeurIPS

9

COLM

1

发表论文 (49 篇)

202524 篇

CoP: Agentic Red-teaming for Large Language Models using Composition of Principles

NeurIPS 2025Poster

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

ICLR 2025Poster

SONAR: A Synthetic AI-Audio Detection Framework and Benchmark

ICLR 2025withdrawn

Shape it Up! Restoring LLM Safety during Finetuning

NeurIPS 2025Poster

Large Language Models can Become Strong Self-Detoxifiers

ICLR 2025Poster

Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness

ICLR 2025Rejected

Revisiting Mode Connectivity in Neural Networks with Bezier Surface

ICLR 2025Poster

ADAPT: Adaptive Prompt Tuning for Pre-Trained Vision-Language Models

ICLR 2025Rejected

Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection

ICLR 2025Rejected

Breaking Free: Hacking Diffusion Models for Generating Adversarial Examples and Bypassing Safety Guardrails

ICLR 2025Rejected

Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks

ICLR 2025withdrawn

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

ICLR 2025Poster

GRE Score: Generative Risk Evaluation for Large Language Models

ICLR 2025withdrawn

Benchmarking LLMs on Safety Issues in Scientific Labs

ICLR 2025Rejected

TabWak: A Watermark for Tabular Diffusion Models

ICLR 2025Spotlight

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

ICLR 2025Poster

Sparse Gradient Compression for Fine-Tuning Large Language Models

ICLR 2025withdrawn

Language Models Are Good Tabular Learners

ICLR 2025Rejected

DAG-Jailbreak: Enhancing Black-box Jailbreak Attacks and Defenses through DAG Dependency Analysis

ICLR 2025Rejected

Your Task May Vary: A Systematic Understanding of Alignment and Safety Degradation when Fine-tuning LLMs

ICLR 2025Rejected

Visual Prompting Reimagined: The Power of Activation Prompts

ICLR 2025withdrawn

Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search

NeurIPS 2025Poster

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

ICLR 2025Poster

202425 篇

On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets

ICLR 2024withdrawn

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

NeurIPS 2024Poster

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

NeurIPS 2024Poster

On Robustness-Accuracy Characterization of Language Models using Synthetic Datasets

COLM 2024Poster

SynBench: Evaluating Pretrained Representations for Image Classification using Synthetic Data

ICLR 2024Rejected

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

NeurIPS 2024Poster

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

ICLR 2024Rejected

Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection

ICLR 2024withdrawn

Graph is All You Need? Lightweight Data-agnostic Neural Architecture Search without Training

ICLR 2024Rejected

Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning

ICLR 2024Rejected

AutoVP: An Automated Visual Prompting Framework and Benchmark

ICLR 2024Poster

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective

ICLR 2024Poster

Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models

NeurIPS 2024Poster

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

NeurIPS 2024Poster

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models

ICLR 2024Poster

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

ICLR 2024Rejected

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

NeurIPS 2024Poster

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

ICLR 2024Poster

Visual Prompting Reimagined: The Power of Activation Prompts

ICLR 2024Rejected

Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

ICLR 2024Rejected

What Improves the Generalization of Graph Transformer? A Theoretical Dive into Self-attention and Positional Encoding

ICLR 2024Rejected

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

ICLR 2024Spotlight

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

ICLR 2024Poster

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

ICLR 2024Poster

合作者 (20)

Tsung-Yi Ho15 篇

Hongkang Li5 篇

Shuai Zhang4 篇

Tejaswini Pedapati4 篇

Ching-Yun Ko4 篇