Dawn Song

~Dawn_Song1

34

论文总数

17.0

年均投稿

5.8

平均评分

接收情况20/34

会议分布

ICLR

22

NeurIPS

6

COLM

4

ICML

2

发表论文 (34 篇)

202525 篇

Capturing the Temporal Dependence of Training Data Influence

MultiTrust: Enhancing Safety and Trustworthiness of Large Language Models from Multiple Perspectives

ICLR 2025Rejected

Data Shapley in One Training Run

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

NeurIPS 2025Poster

An Undetectable Watermark for Generative Image Models

ICLR 2025Poster

IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks

ICLR 2025Rejected

KnowData: Knowledge-Enabled Data Generation for Improving Multimodal Models

ICLR 2025Rejected

Which Network is Trojaned? Increasing Trojan Evasiveness for Model-Level Detectors

ICLR 2025withdrawn

Assessing the Knowledge-intensive Reasoning Capability of Large Language Models with Realistic Benchmarks Generated Programmatically at Scale

ICLR 2025Rejected

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs

COLM 2025Poster

AutoScale: Automatic Prediction of Compute-optimal Data Compositions for Training LLMs

ICLR 2025Rejected

Multimodal Situational Safety

ICLR 2025Poster

An Illusion of Progress? Assessing the Current State of Web Agents

COLM 2025Poster

KnowHalu: Multi-Form Knowledge Enhanced Hallucination Detection

ICLR 2025Rejected

Improving LLM Safety Alignment with Dual-Objective Optimization

ICML 2025Poster

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

NeurIPS 2025Poster

SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

ICLR 2025Rejected

Assessing Judging Bias in Large Reasoning Models: An Empirical Study

COLM 2025Poster

LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage

COLM 2025Poster

AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories

ICLR 2025Spotlight

GuardAgent: Safeguard LLM Agent by a Guard Agent via Knowledge-Enabled Reasoning

ICLR 2025Rejected

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

ICML 2025Poster

Tamper-Resistant Safeguards for Open-Weight LLMs

ICLR 2025Poster

Can Editing LLMs Inject Harm?

ICLR 2025Rejected

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

ICLR 2025Poster

20249 篇

GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration

NeurIPS 2024Spotlight

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

NeurIPS 2024Poster

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

ICLR 2024Rejected

SHINE: Shielding Backdoors in Deep Reinforcement Learning

ICLR 2024Rejected

Tree-as-a-Prompt: Boosting Black-Box Large Language Models on Few-Shot Classification of Tabular Data

ICLR 2024withdrawn

Boosting Alignment for Post-Unlearning Text-to-Image Generative Models

NeurIPS 2024Poster

Data Free Backdoor Attacks

NeurIPS 2024Poster

Effective and Efficient Federated Tree Learning on Hybrid Data

ICLR 2024Poster

Enhancing Neural Network Transparency through Representation Analysis

ICLR 2024Rejected

合作者 (20)

Xuandong Zhao7 篇

Chulin Xie6 篇

Zhen Xiang6 篇