Tsung-Yi Ho
~Tsung-Yi_Ho2
18
论文总数
9.0
年均投稿
平均评分
接收情况9/18
会议分布
ICLR
12
NeurIPS
6
发表论文 (18 篇)
20258 篇
4
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
NeurIPS 2025Poster
4
GRE Score: Generative Risk Evaluation for Large Language Models
ICLR 2025withdrawn
4
PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models
NeurIPS 2025Poster
4
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks
ICLR 2025withdrawn
4
CARE: Decoding-Time Safety Alignment via Rollback and Introspection Intervention
NeurIPS 2025Poster
4
Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection
ICLR 2025Rejected
3
Your Task May Vary: A Systematic Understanding of Alignment and Safety Degradation when Fine-tuning LLMs
ICLR 2025Rejected
4
Elephant in the Room: Unveiling the Pitfalls of Human Proxies in Alignment
ICLR 2025Rejected
202410 篇
4
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
NeurIPS 2024Poster
4
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
ICLR 2024Rejected
4
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
NeurIPS 2024Poster
4
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes
NeurIPS 2024Poster
4
Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning
ICLR 2024Rejected
4
AutoVP: An Automated Visual Prompting Framework and Benchmark
ICLR 2024Poster
4
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes
ICLR 2024Rejected
3
Test Time Augmentations are Worth One Million Images for Out-of-Distribution Detection
ICLR 2024withdrawn
4
Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective
ICLR 2024Poster
5
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models
ICLR 2024Poster