Dan Hendrycks
~Dan_Hendrycks1
11
论文总数
5.5
年均投稿
平均评分
接收情况5/11
会议分布
ICLR
9
NeurIPS
2
发表论文 (11 篇)
20255 篇
4
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
ICLR 2025Poster
4
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
NeurIPS 2025Spotlight
4
Evaluating Model Robustness Against Unforeseen Adversarial Attacks
ICLR 2025Rejected
6
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025Poster
4
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
ICLR 2025Poster
20246 篇
4
Programmatic Evaluation of Rule-Following Behavior
ICLR 2024Rejected
4
How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans
ICLR 2024Rejected
5
Improving Alignment and Robustness with Circuit Breakers
NeurIPS 2024Poster
4
Robustness Evaluation of Proxy Models against Adversarial Optimization
ICLR 2024Rejected
4
Evaluating Robustness to Unforeseen Adversarial Attacks
ICLR 2024Rejected
3
Enhancing Neural Network Transparency through Representation Analysis
ICLR 2024Rejected