PaperHub

Pin-Yu Chen

~Pin-Yu_Chen1

49
论文总数
24.5
年均投稿
5.6
平均评分
接收情况26/49
会议分布
ICLR
39
NeurIPS
9
COLM
1

发表论文 (49 篇)

202524

6.0
4

CoP: Agentic Red-teaming for Large Language Models using Composition of Principles

NeurIPS 2025Poster
5.8
5

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

ICLR 2025Poster
4.3
4

SONAR: A Synthetic AI-Audio Detection Framework and Benchmark

ICLR 2025withdrawn
7.8
4

Shape it Up! Restoring LLM Safety during Finetuning

NeurIPS 2025Poster
6.0
5

Large Language Models can Become Strong Self-Detoxifiers

ICLR 2025Poster
4.8
5

Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness

ICLR 2025Rejected
6.0
3

Revisiting Mode Connectivity in Neural Networks with Bezier Surface

ICLR 2025Poster
5.5
4

ADAPT: Adaptive Prompt Tuning for Pre-Trained Vision-Language Models

ICLR 2025Rejected
5.5
4

Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection

ICLR 2025Rejected
3.7
3

Breaking Free: Hacking Diffusion Models for Generating Adversarial Examples and Bypassing Safety Guardrails

ICLR 2025Rejected
4.5
4

Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks

ICLR 2025withdrawn
6.5
4

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

ICLR 2025Poster
3.5
4

GRE Score: Generative Risk Evaluation for Large Language Models

ICLR 2025withdrawn
4.0
5

Benchmarking LLMs on Safety Issues in Scientific Labs

ICLR 2025Rejected
7.2
5

TabWak: A Watermark for Tabular Diffusion Models

ICLR 2025Spotlight
7.5
4

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

ICLR 2025Oral
5.3
4

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

ICLR 2025Poster
5.0
5

Sparse Gradient Compression for Fine-Tuning Large Language Models

ICLR 2025withdrawn
5.0
5

Language Models Are Good Tabular Learners

ICLR 2025Rejected
5.5
4

DAG-Jailbreak: Enhancing Black-box Jailbreak Attacks and Defenses through DAG Dependency Analysis

ICLR 2025Rejected
5.3
3

Your Task May Vary: A Systematic Understanding of Alignment and Safety Degradation when Fine-tuning LLMs

ICLR 2025Rejected
3.8
4

Visual Prompting Reimagined: The Power of Activation Prompts

ICLR 2025withdrawn
6.4
4

Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search

NeurIPS 2025Poster
6.8
4

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

ICLR 2025Poster

202425

3.7
3

On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets

ICLR 2024withdrawn
5.5
4

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

NeurIPS 2024Poster
5.3
4

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

NeurIPS 2024Poster
6.3
4

On Robustness-Accuracy Characterization of Language Models using Synthetic Datasets

COLM 2024Poster
4.5
4

SynBench: Evaluating Pretrained Representations for Image Classification using Synthetic Data

ICLR 2024Rejected
6.8
4

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

NeurIPS 2024Poster
5.0
4

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

ICLR 2024Rejected
3.7
3

Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection

ICLR 2024withdrawn
4.7
3

Graph is All You Need? Lightweight Data-agnostic Neural Architecture Search without Training

ICLR 2024Rejected
6.3
4

Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning

ICLR 2024Rejected
5.5
4

AutoVP: An Automated Visual Prompting Framework and Benchmark

ICLR 2024Poster
7.0
4

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

ICLR 2024Oral
5.8
4

Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective

ICLR 2024Poster
6.3
4

Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models

NeurIPS 2024Poster
5.3
4

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

NeurIPS 2024Poster
6.8
5

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models

ICLR 2024Poster
5.8
4

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

ICLR 2024Rejected
6.3
4

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

NeurIPS 2024Poster
6.6
5

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

ICLR 2024Poster
5.3
4

Visual Prompting Reimagined: The Power of Activation Prompts

ICLR 2024Rejected
5.0
4

Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

ICLR 2024Rejected
5.3
4

What Improves the Generalization of Graph Transformer? A Theoretical Dive into Self-attention and Positional Encoding

ICLR 2024Rejected
8.0
4

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

ICLR 2024Spotlight
6.0
4

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

ICLR 2024Poster
7.0
5

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

ICLR 2024Poster