影响力指数

90.97/100

前 0.5%

全站排名 #328

发表论文26 篇

平均评分5.5

年均产出8.7 篇/年

Xuandong Zhao

Postdoc@UC Berkeley·美国·OpenReview

研究方向

Machine Learning · Natural Language Processing · AI Safety

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

ICLR 2026Poster

Learning to Reason without External Rewards

ICLR 2026Poster

In-Context Watermarks for Large Language Models

ICLR 2026Poster

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents

ICLR 2026Poster

Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs

ICLR 2026Rejected

InfoSynth: Information-Guided Benchmark Synthesis for LLMs

ICLR 2026Rejected

Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs

ICLR 2026Rejected

PromptArmor: An Essential Baseline for Prompt Injection Defenses

ICLR 2026Rejected

Confidence-Guided MCTS for Efficient Long-Horizon Web Agent Tasks

ICLR 2026Rejected

Assessing Judging Bias in Large Reasoning Models: An Empirical Study

COLM 2025Poster

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

ICLR 2025Poster

LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage

COLM 2025Poster

An Undetectable Watermark for Generative Image Models

ICLR 2025Poster

Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs

ICLR 2025Poster

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

NeurIPS 2025Poster

Multimodal Situational Safety

ICLR 2025Poster

Weak-to-Strong Jailbreaking on Large Language Models

ICML 2025Poster

Improving LLM Safety Alignment with Dual-Objective Optimization

ICML 2025Poster

Weak-to-Strong Jailbreaking on Large Language Models

ICLR 2025Rejected

DIS-CO: Discovering Copyrighted Content in VLMs Training Data

ICML 2025Poster

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

ICLR 2025Withdrawn

Efficiently Identifying Watermarked Segments in Mixed-Source Texts

ICLR 2025Withdrawn

合作者 (20)

博后导师14 篇

博士导师7 篇

博士导师6 篇

William Yang Wang