Bo Li

~Bo_Li19

58

论文总数

29.0

年均投稿

5.7

平均评分

接收情况32/58

会议分布

ICLR

42

NeurIPS

9

ICML

7

发表论文 (58 篇)

202539 篇

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

ICLR 2025Spotlight

UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

ICML 2025Poster

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

ICML 2025Poster

AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models

ICLR 2025Poster

C-SafeGen: Certified Safe LLM Generation with Claim-Based Streaming Guardrails

NeurIPS 2025Poster

MultiTrust: Enhancing Safety and Trustworthiness of Large Language Models from Multiple Perspectives

ICLR 2025Rejected

SCOPE: Scalable and Adaptive Evaluation of Misguided Safety Refusal in LLMs

ICLR 2025Rejected

Assessing the Knowledge-intensive Reasoning Capability of Large Language Models with Realistic Benchmarks Generated Programmatically at Scale

ICLR 2025Rejected

Reliable and Efficient Amortized Model-based Evaluation

ICLR 2025Rejected

Reliable and Efficient Amortized Model-based Evaluation

ICML 2025Poster

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

ICLR 2025Poster

AutoHijacker: Automatic Indirect Prompt Injection Against Black-box LLM Agents

ICLR 2025Rejected

IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks

ICLR 2025Rejected

Can Watermarks be Used to Detect LLM IP Infringement For Free?

ICLR 2025Poster

RedCodeAgent: Automatic Red-teaming Agent against Code Agents

ICLR 2025Rejected

KnowData: Knowledge-Enabled Data Generation for Improving Multimodal Models

ICLR 2025Rejected

Which Network is Trojaned? Increasing Trojan Evasiveness for Model-Level Detectors

ICLR 2025withdrawn

SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

ICLR 2025Rejected

SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

ICML 2025Poster

SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability

ICLR 2025Rejected

SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

ICLR 2025Rejected

On Memorization of Large Language Models in Logical Reasoning

ICLR 2025Rejected

Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning

ICLR 2025Rejected

Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning

ICML 2025Poster

KnowHalu: Multi-Form Knowledge Enhanced Hallucination Detection

ICLR 2025Rejected

EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE

ICLR 2025Poster

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

ICML 2025Poster

AutoRedTeamer: An Autonomous Red Teaming Agent Against Language Models

ICLR 2025Rejected

AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents

ICLR 2025Rejected

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

ICLR 2025Spotlight

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

NeurIPS 2025Poster

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

ICLR 2025Poster

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

NeurIPS 2025Poster

AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories

ICLR 2025Spotlight

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

ICLR 2025Poster

GuardAgent: Safeguard LLM Agent by a Guard Agent via Knowledge-Enabled Reasoning

ICLR 2025Rejected

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

ICML 2025Poster

Tamper-Resistant Safeguards for Open-Weight LLMs

ICLR 2025Poster

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

ICLR 2025Poster

202419 篇

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

NeurIPS 2024Spotlight

Improving Branching in Neural Network Verification with Bound Implication Graph

ICLR 2024Rejected

COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits

ICLR 2024Poster

SHINE: Shielding Backdoors in Deep Reinforcement Learning

ICLR 2024Rejected

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

NeurIPS 2024Poster

Tree-as-a-Prompt: Boosting Black-Box Large Language Models on Few-Shot Classification of Tabular Data

ICLR 2024withdrawn

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

ICLR 2024Spotlight

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

ICLR 2024Rejected

Certifiably Byzantine-Robust Federated Conformal Prediction

ICLR 2024Rejected

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

ICLR 2024Poster

Effective and Efficient Federated Tree Learning on Hybrid Data

ICLR 2024Poster

Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications

ICLR 2024Rejected

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

NeurIPS 2024Poster

BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

NeurIPS 2024Poster

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

ICLR 2024Poster

Data Free Backdoor Attacks

NeurIPS 2024Poster

How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans

ICLR 2024Rejected

Rethinking the Solution to Curse of Dimensionality on Randomized Smoothing

ICLR 2024Rejected

BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

NeurIPS 2024Poster

合作者 (20)

Dawn Song17 篇

Mintong Kang10 篇

Jiawei Zhang10 篇

Chaowei Xiao9 篇

Chulin Xie9 篇

Chejian Xu7 篇

Zhen Xiang7 篇

Zhaorun Chen7 篇