PaperHub

Bo Li

~Bo_Li19

58
论文总数
29.0
年均投稿
5.7
平均评分
接收情况32/58
会议分布
ICLR
42
NeurIPS
9
ICML
7

发表论文 (58 篇)

202539

7.3
3

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

ICLR 2025Spotlight
4.9
4

UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

ICML 2025Poster
6.1
4

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

ICML 2025Poster
5.5
4

AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models

ICLR 2025Poster
6.8
5

C-SafeGen: Certified Safe LLM Generation with Claim-Based Streaming Guardrails

NeurIPS 2025Poster
5.0
4

MultiTrust: Enhancing Safety and Trustworthiness of Large Language Models from Multiple Perspectives

ICLR 2025Rejected
5.0
4

SCOPE: Scalable and Adaptive Evaluation of Misguided Safety Refusal in LLMs

ICLR 2025Rejected
5.3
4

Assessing the Knowledge-intensive Reasoning Capability of Large Language Models with Realistic Benchmarks Generated Programmatically at Scale

ICLR 2025Rejected
6.5
6

Reliable and Efficient Amortized Model-based Evaluation

ICLR 2025Rejected
7.2
4

Reliable and Efficient Amortized Model-based Evaluation

ICML 2025Poster
7.0
4

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

ICLR 2025Poster
4.3
4

AutoHijacker: Automatic Indirect Prompt Injection Against Black-box LLM Agents

ICLR 2025Rejected
3.0
3

IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks

ICLR 2025Rejected
5.8
5

Can Watermarks be Used to Detect LLM IP Infringement For Free?

ICLR 2025Poster
4.5
4

RedCodeAgent: Automatic Red-teaming Agent against Code Agents

ICLR 2025Rejected
5.5
4

KnowData: Knowledge-Enabled Data Generation for Improving Multimodal Models

ICLR 2025Rejected
3.5
4

Which Network is Trojaned? Increasing Trojan Evasiveness for Model-Level Detectors

ICLR 2025withdrawn
5.0
4

SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

ICLR 2025Rejected
6.1
4

SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

ICML 2025Poster
5.5
4

SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability

ICLR 2025Rejected
4.0
3

SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

ICLR 2025Rejected
5.3
4

On Memorization of Large Language Models in Logical Reasoning

ICLR 2025Rejected
5.7
3

Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning

ICLR 2025Rejected
6.3
3

Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning

ICML 2025Poster
5.7
3

KnowHalu: Multi-Form Knowledge Enhanced Hallucination Detection

ICLR 2025Rejected
6.6
5

EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE

ICLR 2025Poster
6.3
3

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

ICML 2025Poster
4.0
4

AutoRedTeamer: An Autonomous Red Teaming Agent Against Language Models

ICLR 2025Rejected
4.4
5

AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents

ICLR 2025Rejected
7.2
6

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

ICLR 2025Spotlight
6.8
4

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

NeurIPS 2025Poster
5.4
5

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

ICLR 2025Poster
6.8
4

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

NeurIPS 2025Poster
7.5
4

AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories

ICLR 2025Spotlight
6.8
4

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

ICLR 2025Poster
6.0
4

GuardAgent: Safeguard LLM Agent by a Guard Agent via Knowledge-Enabled Reasoning

ICLR 2025Rejected
6.1
4

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

ICML 2025Poster
5.8
6

Tamper-Resistant Safeguards for Open-Weight LLMs

ICLR 2025Poster
7.0
4

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

ICLR 2025Poster

202419

7.0
4

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

NeurIPS 2024Spotlight
6.0
4

Improving Branching in Neural Network Verification with Bound Implication Graph

ICLR 2024Rejected
6.4
5

COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits

ICLR 2024Poster
5.8
4

SHINE: Shielding Backdoors in Deep Reinforcement Learning

ICLR 2024Rejected
5.2
5

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

NeurIPS 2024Poster
4.0
3

Tree-as-a-Prompt: Boosting Black-Box Large Language Models on Few-Shot Classification of Tabular Data

ICLR 2024withdrawn
7.5
4

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

ICLR 2024Spotlight
6.3
3

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

ICLR 2024Rejected
4.5
4

Certifiably Byzantine-Robust Federated Conformal Prediction

ICLR 2024Rejected
6.0
4

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

ICLR 2024Poster
6.0
4

Effective and Efficient Federated Tree Learning on Hybrid Data

ICLR 2024Poster
4.5
4

Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications

ICLR 2024Rejected
5.0
3

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

NeurIPS 2024Poster
5.6
5

BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

NeurIPS 2024Poster
5.3
4

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

ICLR 2024Poster
5.8
5

Data Free Backdoor Attacks

NeurIPS 2024Poster
4.0
4

How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans

ICLR 2024Rejected
4.8
4

Rethinking the Solution to Curse of Dimensionality on Randomized Smoothing

ICLR 2024Rejected
6.3
4

BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

NeurIPS 2024Poster