Tsung-Yi Ho

~Tsung-Yi_Ho2

18

论文总数

9.0

年均投稿

5.3

平均评分

接收情况9/18

会议分布

ICLR

12

NeurIPS

6

发表论文 (18 篇)

20258 篇

CoP: Agentic Red-teaming for Large Language Models using Composition of Principles

NeurIPS 2025Poster

GRE Score: Generative Risk Evaluation for Large Language Models

ICLR 2025withdrawn

PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models

NeurIPS 2025Poster

Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks

ICLR 2025withdrawn

CARE: Decoding-Time Safety Alignment via Rollback and Introspection Intervention

NeurIPS 2025Poster

Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection

ICLR 2025Rejected

Your Task May Vary: A Systematic Understanding of Alignment and Safety Degradation when Fine-tuning LLMs

ICLR 2025Rejected

Elephant in the Room: Unveiling the Pitfalls of Human Proxies in Alignment

ICLR 2025Rejected

202410 篇

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

NeurIPS 2024Poster

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

ICLR 2024Rejected

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

NeurIPS 2024Poster

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

NeurIPS 2024Poster

Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning

ICLR 2024Rejected

AutoVP: An Automated Visual Prompting Framework and Benchmark

ICLR 2024Poster

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

ICLR 2024Rejected

Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection

ICLR 2024withdrawn

Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective

ICLR 2024Poster

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models

ICLR 2024Poster

合作者 (20)

Pin-Yu Chen15 篇

Lei Hsiung4 篇

Zhiyuan He3 篇

ZAITANG LI3 篇

Chen Xiong2 篇

Yijun YANG2 篇

Xiaomeng Hu2 篇