Bo Li
~Bo_Li19
58
论文总数
29.0
年均投稿
平均评分
接收情况32/58
会议分布
ICLR
42
NeurIPS
9
ICML
7
发表论文 (58 篇)
202539 篇
3
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
ICLR 2025Spotlight
4
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
ICML 2025Poster
4
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
ICML 2025Poster
4
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
ICLR 2025Poster
5
C-SafeGen: Certified Safe LLM Generation with Claim-Based Streaming Guardrails
NeurIPS 2025Poster
4
MultiTrust: Enhancing Safety and Trustworthiness of Large Language Models from Multiple Perspectives
ICLR 2025Rejected
4
SCOPE: Scalable and Adaptive Evaluation of Misguided Safety Refusal in LLMs
ICLR 2025Rejected
4
Assessing the Knowledge-intensive Reasoning Capability of Large Language Models with Realistic Benchmarks Generated Programmatically at Scale
ICLR 2025Rejected
6
Reliable and Efficient Amortized Model-based Evaluation
ICLR 2025Rejected
4
Reliable and Efficient Amortized Model-based Evaluation
ICML 2025Poster
4
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
ICLR 2025Poster
4
AutoHijacker: Automatic Indirect Prompt Injection Against Black-box LLM Agents
ICLR 2025Rejected
3
IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks
ICLR 2025Rejected
5
Can Watermarks be Used to Detect LLM IP Infringement For Free?
ICLR 2025Poster
4
RedCodeAgent: Automatic Red-teaming Agent against Code Agents
ICLR 2025Rejected
4
KnowData: Knowledge-Enabled Data Generation for Improving Multimodal Models
ICLR 2025Rejected
4
Which Network is Trojaned? Increasing Trojan Evasiveness for Model-Level Detectors
ICLR 2025withdrawn
4
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
ICLR 2025Rejected
4
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
ICML 2025Poster
4
SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability
ICLR 2025Rejected
3
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
ICLR 2025Rejected
4
On Memorization of Large Language Models in Logical Reasoning
ICLR 2025Rejected
3
Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
ICLR 2025Rejected
3
Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
ICML 2025Poster
3
KnowHalu: Multi-Form Knowledge Enhanced Hallucination Detection
ICLR 2025Rejected
5
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
ICLR 2025Poster
3
AdvAgent: Controllable Blackbox Red-teaming on Web Agents
ICML 2025Poster
4
AutoRedTeamer: An Autonomous Red Teaming Agent Against Language Models
ICLR 2025Rejected
5
AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
ICLR 2025Rejected
6
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
ICLR 2025Spotlight
4
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
NeurIPS 2025Poster
5
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
ICLR 2025Poster
4
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
NeurIPS 2025Poster
4
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
ICLR 2025Spotlight
4
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025Poster
4
GuardAgent: Safeguard LLM Agent by a Guard Agent via Knowledge-Enabled Reasoning
ICLR 2025Rejected
4
GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
ICML 2025Poster
6
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025Poster
4
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
ICLR 2025Poster
202419 篇
4
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
NeurIPS 2024Spotlight
4
Improving Branching in Neural Network Verification with Bound Implication Graph
ICLR 2024Rejected
5
COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits
ICLR 2024Poster
4
SHINE: Shielding Backdoors in Deep Reinforcement Learning
ICLR 2024Rejected
5
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
NeurIPS 2024Poster
3
Tree-as-a-Prompt: Boosting Black-Box Large Language Models on Few-Shot Classification of Tabular Data
ICLR 2024withdrawn
4
DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer
ICLR 2024Spotlight
3
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
ICLR 2024Rejected
4
Certifiably Byzantine-Robust Federated Conformal Prediction
ICLR 2024Rejected
4
Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?
ICLR 2024Poster
4
Effective and Efficient Federated Tree Learning on Hybrid Data
ICLR 2024Poster
4
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
ICLR 2024Rejected
3
Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness
NeurIPS 2024Poster
5
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
NeurIPS 2024Poster
4
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
ICLR 2024Poster
5
Data Free Backdoor Attacks
NeurIPS 2024Poster
4
How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans
ICLR 2024Rejected
4
Rethinking the Solution to Curse of Dimensionality on Randomized Smoothing
ICLR 2024Rejected
4
BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
NeurIPS 2024Poster