PaperHub
4.7
/10
Rejected3 位审稿人
最低3最高6标准差1.2
6
3
5
3.3
置信度
正确性3.0
贡献度2.0
表达3.0
ICLR 2025

Certifiably Robust RAG against Retrieval Corruption Attacks

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-05
TL;DR

We propose the first certifiably robust defense for retrieval-augmented generation (RAG) against retrieval corruption attacks

摘要

关键词
Retrieval-augemented GenerationRetrieval Corruption AttackCertifiable Robustness

评审与讨论

审稿意见
6

In this paper, the author propose a novel framework called RobustRAG to defend retrieval corruption attack. This method can ensure the accuracy of the response from Retrieval-Augmented Generation. RobustRAG isolates passages, then gets a response for each of these passages and aggregates them by secure keyword and secure decoding aggregation. The authors conduct experiments on multiple LLMs and datasets and prove that RobustRAG has better certifiable accuracy than vanilla RAG.

优点

1.Good writing

The writing logic of this paper is clear and very easy for readers to understand. The text is concise and accurate in introducing the preliminary and method. In addition, the paper has no grammatical errors and typos, and is highly complete.

2. Comprehensive discussions

The discussions in the appendix of this paper answered many of my questions and are very comprehensive.

缺点

1. Insufficient introduction to experiments against attack

The paper claims that their approach can defend against retrieval corruption attacks, and the experiment in section 5.3 simulates RobustRAG against attacks. Even including the appendix, I suggest that the authors add the settings of the attack experiments in the appendix.

2. Lacks comparison experiments

How much impact will RobustRAG have on performance compared to RAG without any attack risk? Since it is necessary to prevent retrieval corruption attacks, it is also necessary to apply RobustRAG all the time. But for benign setups, will RobustRAG affect performance? I suggest the authors add this experiment.

问题

In line 134-139, the authors mention that RobustRAG can achieve robustness under the condition: "when k' is smaller than the number of relevant benign passages; when the corrupted passages outnumber relevant passages." I can understand the first condition, but for the second condition "when the corrupted passages outnumber relevant passages", why can corrupted passages outnumber relevant passages? I think the worst case should be equal, not outnumber. In line 141-142, "the attacker can only manipulate a fraction of retrieved passages", how much does a fraction refer to specifically?

评论

1. Insufficient introduction to experiments against attack

We evaluate the empirical prompt injection attacks and poisoning attacks. For prompt injection attacks [1], the attacker embeds a malicious instruction directing the system to produce a target answer for a given query. For data poisoning attacks, we follow PoisonedRAG [2] by using GPT-4 to generate fake texts supporting incorrect answers. To simulate a strong attacker, we repeat malicious content multiple times within a single passage.

We provide a more detailed setup of the empirical attack starting from Line 1288 (Appendix C, page 24). Additionally, we have added a reference in the main paper (Line 464) to better guide readers. We are happy to clarify additional questions regarding experiment settings.

2. Lack of clean evaluation

We kindly refer the reviewer to Table 1, where we present the performance of RobustRAG under no-attack cases---please look for columns with (acc) and (llmj). As discussed in Lines 424-431, RobustRAG demonstrates similar clean accuracy to the baseline method on short-answer QA (mostly with <2% difference) and a small-to-moderate degradation observed in long-form generation tasks (1-10% depending on the parameters). Moreover, Figure 2 demonstrates that we can increase the group size ω\omega to further improve clean performance at the cost of robustness.

3. Question

Line 134: Thank you for highlighting this point. We define k’ as the number of corrupted passages and k as the total number of retrieved passages. Thus, the number of benign relevant passages is k-k’. When k’ is greater than or equal to k-k’, it becomes theoretically impossible to generate accurate responses.

Line 141: Here, “a fraction of retrieved passages” refers to k’.

We have clarified both points in our updated version.

[1] Greshake et al. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. AISEC2023

[2] Zou et al. Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models. USENIX 2025

评论

Dear Reviewer d65w,

Thank you for your comments and questions. We hope our responses have addressed your concerns. We appreciate your time and would be glad to hear if you have any additional questions.

Best regards,

Authors

评论

Thank you for your replies regarding weakness. I believe my concerns are addressed.

I will increase the soundness to 3.

审稿意见
3

This paper studies the safety and security issues associated with using RAG by proposing a new defense framework, RobustRAG, against retrieval corruption attacks. The main idea of the proposed method is majority vote, where the LLM generates responses based on each of the retrieved documents, and the final output is the majority vote result. The proposed RobustRAG is evaluated on open-domain QA and long-form text generation datasets, demonstrating its effectiveness and generalizability.

优点

  • The topic of the safety of RAG is trending and important.
  • The paper is well presented, and the main points are easy to follow.
  • The experimental results look OK.

缺点

There is one major weakness and several minor weaknesses.

Major:

  • First, the inference cost of the proposed framework will significantly increase compared to the vanilla RAG. Given a query, the proposed method requires generating multiple responses to determine the final majority vote. If the system employing the proposed method cannot perform parallel generation, the latency in obtaining the final answer will also increase drastically. I believe this limitation will significantly impact the real-world use of the proposed method.

Minor:

  • The overall idea of the work is not novel to me. Essentially, it is a randomized smoothing/majority vote technique applied in a new context, RAG.
  • Given the majority vote nature, the proposed scenario cannot handle cases where the number of poisoned responses exceeds the number of clean ones. I acknowledge that this limitation is not only associated with this particular work but also with methods that use the majority vote concept in general. However, I feel that in RAG, there should be better alternatives to address this security risk, such as controlling access to the RAG database.

问题

Please see my comments in the Weakness section.

评论

1. Inference Cost

We disagree with the reviewer’s comment that our RobustRAG will incur significant increases in inference time. In Table 3 (page 10), we reported the latency performance when using one GPU. RobustRAG only incurs a latency of 1.16 to 3.65X that of vanilla RAG. This is not a significant increase in inference time compared to other certification methods, such as randomized smoothing [1], which can incur a 100-1000X slowdown.

Furthermore, there appears to be a misunderstanding regarding parallel inference. Our implementation uses a single A100/H100 GPU, as conducting parallel generation within a single device is a fundamental capability of GPU. In fact, most practitioners performing RAG inference have access to comparable or even more advanced devices (which support distributed computing with multiple GPUs). To our knowledge, there are no LLM/RAG systems that lack support for parallel inference.

Moreover, we note that efficiency should not be the only metrics that matter; robustness is equally important if we target real-world applications. An efficient RAG pipeline that is prone to spreading misinformation is problematic. RobustRAG trades some of the efficiency for robustness.

2. Novelty

We clarify our paper’s novelty as follows. Randomized smoothing and majority voting are designed for simple classification tasks. In contrast, RobustRAG needs to deal with unstructured texts, and it is way more sophisticated than majority voting. We cannot use majority voting to aggregate unstructured, lengthy text outputs; this challenge is highlighted in Lines 186-190. To address this challenge, we propose two novel methods, keyword aggregation and decoding aggregation. We note that in Section 4, we only use the majority voting method as the warm-up analysis of how we conduct robustness certification in Line 315. The robustness analysis for keyword and decoding aggregation is significantly different from that of majority voting: we spent 7+ pages in Appendix A to cover all their details. The robustness analysis is by no means close to any existing methods, which should be considered as a significant contribution and novelty.

(As a side note, we are also the first to apply this certified robustness approach to the novel context of retrieval augmented generation)

3. Better alternatives

We acknowledge that alternative methods, such as controlling access to the RAG database, may help mitigate retrieval corruption attacks. However, this approach is complementary and orthogonal to our method, and both approaches can be applied simultaneously to boost robustness. Therefore, we do not believe this should be considered a limitation of our work. Moreover, we note that certified robustness is a special property we achieve in this paper; it is unclear how controlling access to the RAG database can achieve certified robustness. Additionally, in the case of an LLM-based search engine, where information is gathered through crowdsourcing, proper access control can be challenging or even impossible.

[1] Cohen et al. Certified Adversarial Robustness via Randomized Smoothing. ICML 2019

评论

Dear Reviewer zcV5,

Thank you for your review. We hope we have addressed your concerns and look forward to discussing any further questions or comments you may have.

Best regards,

Authors

审稿意见
5

The paper introduces a framework to defend RAG systems against retrieval corruption attacks by isolating passages into groups and securely aggregating responses to maintain robustness.

优点

  1. It offers formal guarantees that responses remain accurate despite partial corruption of retrieved passages.

  2. RobustRAG’s keyword and decoding aggregation methods securely handle unstructured text, making responses more resilient to manipulation​

缺点

  1. Limited Practicality of Retrieval Corruption Attacks: The assumption that attackers can inject malicious passages into a retrieval corpus may not align with real-world scenarios, especially when the documents are private or secured, limiting practical applicability of the defense.

  2. The paper lacks comparisons with other established defense methods.

3.Time Cost: RobustRAG’s isolate-then-aggregate strategy adds computational overhead and the paper should report the increased time cost in expertment.

问题

See weakness

评论

3. Time Cost

This is a misunderstanding. We have already reported the time cost in Table 3 (page 10) and analyzed the computational overhead on line 503 (page 10). Our method only incurs a latency of 1.16 to 3.65X that of vanilla RAG (using one GPU); however, we anticipate that this latency could be reduced with advanced caching methods (as discussed on page 10). As a side note, the popular certified defense for image classification, Randomized Smoothing [11], requires model predictions on many “noisy images” and can incur over 100-1000x computation overhead.

[1] BBC. Glue pizza and eat rocks: Google ai search errors go viral, 2024. URL https://www.bbc.com/news/articles/cd11gzejgz4o. Accessed: 2024-09.

[2] Carlini et al. Poisoning Web-Scale Training Datasets is Practical. IEEE S&P 2024

[3] Greshake et al. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. AISEC 2023.

[4] Long et al. Backdoor attacks on dense passage retrievers for disseminating misinformation. Arxiv 2024.

[5] Nestaas et al. Adversarial Search Engine Optimization for Large Language Models. Arxiv 2024.

[6] Chen et al. AGENTPOISON: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases. Arxiv 2024.

[7] Chen et al. Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models. Arxiv 2024.

[8] Zhang et al. Controlled Generation of Natural Adversarial Documents for Stealthy Retrieval Poisoning. Arxiv 2024.

[9] Zou et al. Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models. Arxiv 2023.

[10] Northcutt et al. Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. NeurIPS 2021.

[11] Cohen et al. Certified Adversarial Robustness via Randomized Smoothing. ICML 2019.

评论

1. Limited Practicality of Retrieval Corruption Attacks

Several real-world scenarios as well as prior research works have highlighted the practicality of retrieval corruption attacks, as we note below.

(1) The first example is web-scale retrieval. As discussed in line 41 of our paper [1], the Google Search AI Overview has produced inaccurate responses, such as suggesting eat rocks due to unreliable content indexed from web pages. This example highlights how misinformation commonly spreads online, causing AI search engines highly susceptible to errors. Our method could be highly beneficial for AI search engines that rely on retrieved web information, including Google AI Overview, Bing, SearchGPT, and Perplexity AI.

Moreover, we note that malicious web-scale poisoning may be launched at a low cost. Carlini et al [2] demonstrated that they could poison 0.01% of the large web-scale LAION-400M or COYO-700M datasets (over 10910^9 images), and effectively poison the snapshot/dump of Wikipedia via malicious edits. Therefore, we argue that web-scale retrieval corruption is practical.

Once the attacker can corrupt a small fraction of the dataset, there has been a substantial body of research on attacks targeting retrieval-augmented generation (RAG) systems [3][4][5][6][7][8][9], which assume attackers can inject malicious passages.

(2) In addition to web-scale attacks; retrieval corruption can also happen in a (typically thought to be) more controlled setting: enterprise dataset. One concrete example is the attacks against RAG-based systems like Microsoft Copilot. When the user asks Copilot (or a similar application) a question, it will retrieve available data like emails, text messages, and shared documents. Notably, hackers and researchers have found that Copilot can read external messages even if the user hasn’t accepted them, due to poor access control policy. Examples: (a) the “A Way In” section of this blog post ; (b) Slide 45 of this presentation

(3) Finally, even if the database is private and curated, people can make mistakes. It is well known that curated datasets can make label errors [1]. Similar errors can also happen to curated knowledge bases, e,g, multiple copies of company holiday policy (with some copies outdated). Moreover, even when nothing is wrong with the dataset, the retriever may still retrieve some noisy or misleading passages in the benign setting. Our RobustRAG method can also help with benign “corruption” as our algorithm is agnostic to the content of corrupted passages.

2. Lack of Comparisons with Other Established Defense Methods

This is a misunderstanding. To the best of our knowledge, our method is the first and only robust RAG approach to provide certified robustness against retrieval corruption attacks. Therefore, we do not have any comparison with other defenses.

Nonetheless, we evaluated an alternative method that employs a defensive prompt. In this approach, we appended the following sentence to the end of the prompt to assist the model in identifying potential malicious contexts: Please note that not all the context information is relevant and trustworthy. You should use multiple sources to verify the information. We present the evaluation results below, in which we use the Mistral model on RealtimeQA against prompt injection attacks (PIA).

LLMMethodClean accarcyCertified accacuryRobust accarcy against PIAAttack success rate against PIA
Vanilla69%0%5%66%
MistralDefensive Prompt70%0%28%62%
Keyword (Ours)71%38%72%15%

We observe that all three methods achieve comparable clean accuracy. However, our method provides a certified accuracy of 38%, whereas the other methods fail to offer any certified robustness. Against empirical PIA attacks, our method demonstrates significantly higher robust accuracy and a substantially lower attack success rate. While the defensive prompt offers some mitigation compared to the Vanilla RAG method, its attack success rate remains considerably higher (62%) compared to our approach (15%).

If the reviewers are aware of any other established defense methods for comparison, we would love to compare.

评论

Dear Reviewer Rvsm,

As the discussion deadline approaches, we’d appreciate hearing if your concerns have been addressed. We would like to share a recent real-world case of retrieval corruption attack that led to a $2.5k loss from a user, which is discussed in this twitter/X thread.

Here is a quick summary of this anecdote:

  • A user asked ChatGPT to generate code.
  • GPT searched for references from the Internet and, unfortunately, retrieved a malicious GitHub codebase.
  • The code generated by ChatGPT embedded a malicious snippet that sent the user’s private keys to a phishing website.
  • The user, unaware of this malicious snippet, ran the code with their private key; the private key was then sent to the malicious website.
  • The attacker used the private key to make a $2.5k transaction within 30 minutes of this incident.

This incident further showcases the practicality of real-world retrieval corruption attacks, especially for RAG/LLM-powered search engines. Therefore, building a robust RAG pipeline against corrupted retrieval is an important research problem. We look forward to further discussions!

Best regards,

Authors

评论

Thanks for your reply.

This is not a well-defined problem from my perspective. The examples that are positive for me are those web-scale ones. But it is not a typical problem in RAG. The most advantage of RAG is that it can use private dataset to enrich the LLM. For web-scale poisons, solving it from the web-scale searching is more effective.

However, I appreciate the authors' efforts in this question, which makes it somehow reasonable than the original motivation, and will increase my score to 5.

评论

We appreciate the reviewer’s additional comments and questions. For the additional concerns, we will address them as follows:

1. The examples that are positive for me are those web-scale ones. But it is not a typical problem in RAG.

We are glad that the reviewer agrees with us that retrieval corruption attacks are a valid threat against web-scale RAG. Regardless of whether this use case is “typical” or not, the widespread adoption of web-search-based RAG is an undeniable indicator of the importance of this research problem. According to the blog of Google, ’AI Overviews will begin rolling out in more than 100 countries and territories around the world. With this latest expansion, AI Overviews will have more than 1 billion global users every month.’ Given this widespread adoption and the practical threat posed by poisoning websites, this is an important problem related to RAG.

2. For web-scale poisons, solving it from the web-scale searching is more effective.

We agree that building robust web-scale searching is one promising direction. However, this direction does not conflict with our generation-phase RobustRAG algorithm. Retrieval/searching-phase and generation-phase defenses are orthogonal to and compatible with each other. It is a common/recommended security practice to deploy “defense in depth”: RobustRAG can play an important role when the searching-phase defense fails.

Moreover, there is currently no evidence that retrieval/searching-phase defenses are more effective than generation-phase defenses like RobustRAG. If the reviewer is aware of any proposed defense methods, we would be eager to conduct experiments to compare their performance and show their complementary synergy.

Finally, we emphasize that certified robustness is a special property achieved by RobustRAG, and it remains unclear how web-scale searching could achieve such certified robustness.

3. The most advantage of RAG is that it can use private dataset to enrich the LLM.

First, RAG is helpful with both public and private datasets. As discussed above, web-search-based RAG like Google Search AI Overview, Perplexity, and SearchGPT have already been deployed with a large number of users.

Second, even in the enterprise setting, where the database is usually private and curated, retrieval corruption can still happen. Malicious insiders can inject malicious documents into private datasets.

  • In this Black Hat talk, the presenter, who is a practitioner at NVIDIA, discussed how an attacker can silently share a malicious document with their co-workers to corrupt RAG pipelines (Slides 58-61). He also highlighted the possibility of "insider access to RAG datastore" (top right of Slide 58).

  • The retriever may still retrieve some noisy or misleading passages in the benign setting. This has been highlighted in multiple works [1] [2].

  • Even curated datasets are not guaranteed to be error-free [3].

[1] Fang et al. Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training. ACL 2024.

[2] Wang et al. Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models. ArXiv 2024

[3] Northcutt et al. Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. NeurIPS 2021.

AC 元评审

In this paper, the authors proposed a defense framework for RAG systems against retrieval corruption attacks. The proposed framework first isolates passages, and then gets a response for each passage and finally uses a voting scheme to aggregate responses.

There are major corners raised by the reviewers: 1, The problem setup is not convincingly defined; 2, The assumption made in the proposed method may not be practical; 3, In terms of machine learning, the proposed idea is not technically novel; 4, The experimental results are not convincing, lacking comparison results. These concerns remain after rebuttal.

Based on the major concerns mentioned above, this paper is not ready for publication.

审稿人讨论附加意见

The major concerns raised by the reviewers remain after the authors' rebuttal.

最终决定

Reject