PaperHub
5.5
/10
Poster5 位审稿人
最低2最高4标准差0.6
3
4
2
3
3
ICML 2025

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains

OpenReviewPDF
提交: 2025-01-24更新: 2025-07-24

摘要

关键词
Healthcare; Safety; RAG

评审与讨论

审稿意见
3

This paper investigates the vulnerabilities of retrieval systems to various poisoning attacks. The authors first analyze multiple corpora, retrievers, and datasets, highlighting the significant safety risks in retrieval. They then attribute retriever failures to the limitations of the existing document embedding distance metric. Finally, they propose a new metric that more effectively differentiates between clean and poisoned documents.

update after rebuttal

Thank you for your response and the insightful experiments. The authors have addressed most of my concerns, and I have accordingly increased my original score.

给作者的问题

The concept of orthogonal augmentation is intriguing, but it appears to be influenced by the principles of modern Hopfield networks. When the model’s query qq and the memory document pp are orthogonal, the energy function is significantly lower, making knowledge retrieval more efficient. Could you provide a mathematical explanation for why orthogonal augmentation enhances retrieval? Is it because the target attack document embedding is injected into the original query qq, thereby modifying its representation? This interpretation seems reasonable to me.

论据与证据

  1. The retrieval of safety risks in health and legal document retrieval systems includes detailed experiments, but the number of retrieved documents remains limited.
  2. The observation that poisoned documents exhibit orthogonal augmentation with their corresponding clean queries is interesting.
  3. The proposed new defense method is well-supported by evidence.

方法与评估标准

The proposed defense method and analysis of retrieval system vulnerabilities in RAG are reasonable. However, I believe the author could incorporate additional evaluation metrics, such as the MRR score, to enhance the assessment.

理论论述

There is no theoretical claim in paper.

实验设计与分析

  1. You should include additional retrieval cases to better demonstrate the effectiveness of your method, especially in the legal domain.
  2. The paper uses only the l2-norm-based defense as a baseline for evaluating the proposed method. This baseline is relatively simple, making the comparison results less convincing.
  3. Balancing performance and efficiency: While ensuring the effectiveness of the defense method, how can its application efficiency be improved for large-scale data and real-time systems? Is there room for further optimization to meet practical performance requirements?
  4. Why not use cosine similarity? Have the results under cosine similarity been evaluated?

补充材料

I reviewed all supplementary materials.

与现有文献的关系

It proposes a detection method to enhance the defense against attacks on RAG systems.

遗漏的重要参考文献

There need more realted work for attack on RAG system, such as HijackRAG[Zhang'24], AgentPoison[Chen'24], BadRAG[Xu'24]

其他优缺点

  1. The description of the defense is not clearly explained, easy to make confuse to reader.
  2. How about top5, top10 retrieval performance

其他意见或建议

There are no obvious typos.

伦理审查问题

I would recommend adding a flag in the abstract to indicate potential harmful information. This would help in identifying and addressing any sensitive or risky content upfront, ensuring that readers are aware of it before diving deeper into the paper.

作者回复

We sincerely thank the reviewers for investing their time and effort in reviewing our manuscript and providing valuable feedback. We will address their comments point by point in the following and incorporate them into our revision.

Q: additional retrieval cases ..., especially in the legal domain.

R: We actually included the experiments on the legal domain in Table 10 in Section D in the supplementary material. Overall, we observe similar attack success rates in the legal domain as in the medical domain presented in the main text. We will highlight this in the revised version.

Q: l2 norm baseline relatively simple

R: 2\ell_2-norm is a simple but also the standard baseline in many existing literature [1,2,3]. To address your concern, following [1,2,3], we also include the perplexity filter, which examines the perplexity of the text, as an additional baseline. The results are shown in the Table below where we observe that the perplexity filter is not effective, thus validating the effectiveness of our method.

Table: Detection rate of the perplexity filter on the MedMCQA corpus. The results are averaged over 5 runs.

DatasetsMMLU-MedMedQABioASQ
Detection Rates0.090.140.16

Q: application efficiency be improved for large-scale data and real-time systems?

R: We are not 100% certain regarding what you mean by "application efficiency." So we will refer to the computation cost/efficiency of our method and respond accordingly. The computation of our defense method is very light. We only need to compute the covariance offline once, and then during inference time, we only need to calculate the Mahalanobis distance once for each query, which is very efficient. To further improve efficiency, we can take a batch of queries and compute the Mahalanobis distance for all queries in the batch simultaneously.

Q: cosine similarity

R: The use of inner product is a common practice in current literature [1,2,3]. To address your concern, we also evaluated cosine similarity and the summarized results are shown in the table below. We observe similar high attack success rates when using cosine similarity. We will include this in the revised version.

Table: Top 2 retrieval success rates under cosine similarity with Contriever as the retriever and MedMCQA, PubMedQA as the query corpus.

CorpusAttack Success Rate
Textbook0.95
StatPearls0.94
PubMed0.90

Q: The description of the defense is not clearly explained, easy to make confuse to reader.

R: We will revise the description of the defense in the main text to make it clearer. The main workflow of our defense is for the defender to select a set of clean corpus (corresponding to a set of queries) to be protected. Then, the defender computes the covariance of the embeddings of the clean corpus. During inference time, for each query, the defender computes the Mahalanobis distance between the query and the clean corpus. If the distance is larger than a threshold, the query is rejected.

Q: How about top5, top10 retrieval performance

R: The retrieval rates reported in the main text are a non-decreasing function of the kk value used in the retrieval. So the top-5 and top-10 retrieval performance will be no worse than the top-2 retrieval performance. In particular, we report the results for k=10k=10, where we observe near perfect retrieval rates. We will include this in the revised version.

Table: Top-10 retrieval success rates with Contriever as the retriever and MedMCQA, PubMedQA as the query corpus.

CorpusAttack Success Rate
Textbook0.99
StatPearls0.98
PubMed0.94

Q: More related work and flag in the abstract for potential harmful information

R: We will include more related work as you suggested. We will also add a flag in the abstract to indicate that the paper contains potentially harmful information.

Q: Math explanation

Providing rigorous mathematical justification is challenging due to the sparse, preliminary theoretical understanding of transformer-based models [4]. We aim to address this in future work.

Refs: [1] Xiong et al., "Benchmarking Retrieval-Augmented Generation for Medicine", ACL 2024

[2] Miao et al., "Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications", Medicina (Kaunas). 2024

[3] Zou et al., "PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models", USENIX Security, 2025

[4] Tian et al., "Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer", NeurIPS 2023.

审稿人评论

Thank you for your response and the insightful experiments. The authors have addressed most of my concerns, and I have accordingly increased my original score.

作者评论

We are very pleased to have addressed your concerns and thank you very much for raising the score!

审稿意见
4

This paper focuses on the adversarial robustness of the retrieval system of RAG against data poisoning attacks. Three major safety risks are discussed, including the leakage of PII, adversarial recommendations, and the vulnerability to jailbreaking attacks. Extensive experiments on five Medical QA datasets demonstrate the prevalence of such risks, i.e., the retrieval systems used for medical QA are universally vulnerable. This paper also discusses the possible reason for such risk and proposes a new defense to mitigate them.

给作者的问题

  1. In Lines 105 (left) and 150 (right), the notation ff refers to the retrievers and the embedding function, respectively. Could the authors please provide some explanation regarding the definitions of the retrievers?
  2. Besides, the results in Sections 3-5 heavily rely on the embedding function of the input. However, my past experience implies that the performance of those embedding models released before 2024, especially those token-level models like BERT, is far from satisfactory. As mentioned in the "Experimental Designs Or Analyses" part, I suppose the retrievers used in this paper are slightly out of date. (P.S. I am unfamiliar with RAG's research, and thus, I cannot provide extract references.) Are there any retrieval systems that are based on SOTA embedding models, e.g., text-embedding-3 from OpenAI?

论据与证据

According to Section 1.1, the main claims include:

  1. Revealing the safety risks for the retrieval system. Three safety risks are mentioned, including the leakage of PII, adversarial recommendations, and the vulnerability to jailbreaking attacks.
  2. Providing an explanation for the vulnerability of retrieval system to data poisoning attacks.
  3. Proposing a new defense method against universal poisoning attack. All the claims are well supported by experiments.

方法与评估标准

The experimental results of this paper evaluate and reveal the safety risk of the existing retrieval systems. The methods and evaluation criteria make sense for the application.

理论论述

This paper does not provide theoretical analysis.

实验设计与分析

I have checked the soundness/validity of the experimental designs (regarding the main claim of this paper) in Section 3. My main concern is that the retrievers mentioned in Lines 185-192 (right) are slightly out-of-date.

补充材料

I have reviewed Sections A and B of the supplementary materials. I believe Sections C and D are not directly related to the main contribution of this paper.

与现有文献的关系

Section 1.2 has comprehensively discussed the related study of this paper.

遗漏的重要参考文献

I am not aware of such related works.

其他优缺点

  1. This paper is well-organized and carefully written. The seemingly unrealistic assumptions are explained in the remarks. The preliminary section is very helpful for readers unfamiliar with the topic.

其他意见或建议

  1. I suggest including more illustrative examples to further improve the presentation of this paper. The examples given in Figure 1 seem artificial. In Section A, some examples from the real dataset are presented. I suppose presenting some real-world examples can better illustrate the safety risk faced by the RAG systems.
作者回复

We sincerely thank the reviewers for investing their time and effort in reviewing our manuscript and providing valuable feedback. We will address their comments point by point in the following and incorporate them into our revision.

Q: In Lines 105 (left) and 150 (right), the notation refers to the retrievers and the embedding function, respectively. Could the authors please provide some explanation regarding the definitions of the retrievers?

R: In our paper, we use the terms "retriever" and "embedding function" interchangeably, which take an input text and output a vector representation of the text. We will clarify this point to avoid confusion in the revised version.

Q: Besides, the results in Sections 3-5 heavily rely on the embedding function of the input. However, my past experience implies that the performance of those embedding models released before 2024, especially those token-level models like BERT, is far from satisfactory. As mentioned in the "Experimental Designs Or Analyses" part, I suppose the retrievers used in this paper are slightly out of date. (P.S. I am unfamiliar with RAG's research, and thus, I cannot provide exact references.) Are there any retrieval systems that are based on SOTA embedding models, e.g., text-embedding-3 from OpenAI?

R: The three retrievers used in the main text are the state-of-the-art models for RAG following very recent literature [1,2,3]. To address your concern, we also include the results of the text-embedding-3 from OpenAI in the Table below. We observe that the attack success rates are similar to those of the state-of-the-art models. We will include this in the revised version.

Table. Top 2 retrieval success rates with text-embedding-3 as the retriever and MedMCQA, PubMedQA as the query corpus.

CorpusAttack Success Rate
Textbook0.95
StatPearls0.90
PubMed0.83

Q: I suggest including more illustrative examples to further improve the presentation of this paper. The examples given in Figure 1 seem artificial. In Section A, some examples from the real dataset are presented. I suppose presenting some real-world examples can better illustrate the safety risk faced by the RAG systems.

R: We will follow your suggestions to replace the current Figure 1 and include some real-world examples in the main text to better represent the safety risks faced by RAG systems.

Refs: [1] Xiong et al., "Benchmarking Retrieval-Augmented Generation for Medicine", ACL 2024

[2] Miao et al., "Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications", Medicina (Kaunas). 2024

[3] Zou et al., "PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models", USENIX Security, 2025

审稿人评论

Thanks the authors for their reply. Most of my concerns are addressed. It is a surprising result that SOTA embedding model does not significantly affect the success rate. Could the authors explain the intuition behind this?

审稿意见
2

This paper explores a characteristic of poisoned documents in embedding spaces termed the orthogonal augmentation property. It suggests that appending target information to a poisoned document containing the target query shifts its embedding orthogonally to the query, preserving its retrievability by the query. The authors analyze how this property enables certain attacks and propose a corresponding defense.

给作者的问题

  1. It is common practice to append potential queries to documents to improve retrievability, a technique also known as Doc2Query [1]. How would the findings of this paper and the proposed defense apply to retrieval systems that incorporate Doc2Query?

[1] Document expansion by query prediction, 2019.

论据与证据

The claim that dense retrievers are vulnerable to universal poisoning attacks is well-supported by extensive experiments. The Orthogonal Augmentation Property claims are also backed by experiments, and the proposed defense is evaluated empirically. However, I have some concerns about the experimental results related to the Orthogonal Augmentation Property and the defense's effectiveness (see Weaknesses).

方法与评估标准

Yes

理论论述

N/A

实验设计与分析

The experiments on medical retrieval systems appear sound.

补充材料

N/A

与现有文献的关系

The paper tries to explain the success of poisoning attacks against RAG by analyzing the behavior of poisoned documents in dense retrieval embedding spaces and provides experimental insights. It also proposes a new defense against a specific type of such attack.

遗漏的重要参考文献

N/A

其他优缺点

Strengths:

  1. Provides some insights into the behavior of poisoned documents in embedding spaces.
  2. Conducts comprehensive experiments on different medical datasets and retrievers to evaluate the retrieval of poisoned documents.

Weaknesses:

  1. Universal poisoning attacks against RAG have already been shown to be effective [1]. The paper dedicates significant space to revalidating this on medical retrieval systems, which seems of limited value.
  2. The orthogonal augmentation property essentially demonstrates a simple fact: a poisoned document containing the target query is naturally more retrievable than a clean document by the target query. The necessity of proving this through a convoluted approach is unclear.
  3. The proposed defense requires access to a collection of clean documents and prior knowledge of the target queries, which is a strong assumption, making it impractical in real-world scenarios.
  4. The paper focuses only on dense retrievers, leaving it unclear how the findings extend to models like ColBERT or retrieval architectures incorporating cross-encoder re-rankers, which are widely used.

[1] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models, USENIX Security'25.

其他意见或建议

N/A

作者回复

We sincerely thank the reviewers for investing their time and effort in reviewing our manuscript and providing valuable feedback. We will address their comments point by point in the following and incorporate them into our revision.

Q: Universal poisoning attacks against RAG have already been shown to be effective [1] ...

R: Thank you for your valuable suggestions regarding the work in [1]. Overall, our work differs from [1] in terms of goals, insights, and application scenarios, despite the similarity in attack implementations. Detailed discussions are provided as follows.

  • Goal. Our objective is to investigate and comprehend the robustness of retrieval systems employed in RAG. We achieve this by injecting various types of information, encompassing both irrelevant and relevant content, into the corpus and evaluating the ease or difficulty of retrieval. The goal of [1], on the other hand, is to inject only query-relevant context to deceive the LLM's generation process upon retrieval.
  • Insights. We provide explanations on the difficulty/easiness of the retrieval of different kinds of information, which is not covered in the work of [1]. Moreover, based on the developed insights, we propose a new defense that can effectively filter out the adversarial documents generated by [1].

Q: The orthogonal augmentation property essentially demonstrates a simple fact ...

R: The orthogonal augmentation property concerns the effects on changes in embedding vectors generated by text embedding models when manipulating the text input. Specifically, orthogonal augmentation examines the relationship between f(q)f(q) and the change in the embedding space that occurs when shifting qq to qpq \oplus p, namely vf(qp)f(q)v \triangleq f(q \oplus p) - f(q), for two documents pp and qq, which can be either semantically relevant or irrelevant (as discussed in Section 4).

Overall, we feel that it is essential to understand the orthogonal augmentation property, as it can be seen in many poisoning attacks against text embedding models. We tried to investigate this property both in theoretical and empirical ways. It turns out the current literature on the theoretical understanding of transformer-based models seems to be sparse and preliminary. For example, recent research, such as the work [A] published in NeurIPS 2023, studied the weight dynamics of transformers under strong assumptions which may not align with real-world use cases, such as single-layer self-attention, no positional encoding, and excessively long input contexts. As a result, in the paper, we present the empirical study of the orthogonal augmentation property.

Q: The proposed defense requires access to a collection of clean documents ...

R: We feel that the assumption is reasonable in the following sense. First, it is straightforward (and can be theoretically proven) to see that the defender is impossible to defend all corpus/queries (Because that would make the whole input space to be clean). As a result, a more feasible solution is to find a set of important queries and their corresponding corpus and then prioritize protecting these selected ones. In fact, many RAG poisoning attacks are targeted attacks in the sense that the attacker only wants to poison/manipulate a small set of queries [2,3,4]. So we believe that such an assumption is reasonable. We will clarify this in the revised version.

Q: The paper focuses only on dense retrievers, leaving it unclear how the findings extend to models like ColBERT or retrieval architectures incorporating cross-encoder re-rankers ...

R: The MedCPT retriever used in our paper is actually of the cross-encoder re-rankers architecture tuned on medical data (https://github.com/ncbi/MedCPT). As a result, we think the observations and findings are applicable to the cross-encoder re-rankers. We also include the results of close-sourced text-embedding-3 from OpenAI in the Table below. We observe that the attack success rates are similar to those of the state-of-the-art models. We will include this in the revised version.

Table. Top 2 retrieval success rates with text-embedding-3 as the retriever and MedMCQA, PubMedQA as the query corpus.

CorpusAttack Success Rate
Textbook0.95
StatPearls0.90
PubMed0.83

Refs: [A] Tian et al., "Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer", NeurIPS 2023.

[2] Xiong et al., "Benchmarking Retrieval-Augmented Generation for Medicine", ACL 2024

[3] Miao et al., "Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications", Medicina (Kaunas). 2024

[4] Zou et al., "PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models", USENIX Security, 2025

审稿人评论

I appreciate the clarifications on the experimental setup, orthogonal augmentation, and retriever choices. However, I still have a few concerns:

  1. Although attackers typically target a small set of queries, defenders do not know in advance which ones will be attacked. In practice, this means defenders must aim to protect as many queries as possible. Moreover, the assumption that defenders have access to clean, unpoisoned documents may be difficult to guarantee, especially in settings where poisoning is possible.
  2. As mentioned earlier in Questions, many retrieval systems adopt Doc2Query, where queries are appended to clean documents. This could significantly impact both the paper’s conclusions and the effectiveness of the proposed defense.

Therefore, I am inclined to maintain my current score.

审稿意见
3

The paper explores the vulnerability of Retrieval-Augmented Generation (RAG) systems, specifically in knowledge-intensive domains like medical and legal Q&A. The authors demonstrate that retrieval models used in RAG are susceptible to universal poisoning attacks, where adversaries inject manipulated documents into a corpus to influence retrieval outcomes. By conducting extensive experiments across 225 different combinations of corpus, retriever, query, and targeted information, they reveal how poisoned documents can consistently be retrieved at high ranks. The paper further investigates the underlying reasons for this vulnerability, introducing the concept of orthogonal augmentation, which explains how document embeddings are manipulated to maintain high similarity with queries. To mitigate this risk, the authors propose a detection-based defense mechanism leveraging Mahalanobis distance and covariance shrinkage, demonstrating high success rates in identifying poisoned documents.

给作者的问题

See the Weakness

论据与证据

Yes

方法与评估标准

Yes

理论论述

Yes

实验设计与分析

Yes

补充材料

Yes

与现有文献的关系

The paper proposes a poisoning attack for RAG.

遗漏的重要参考文献

Yes

其他优缺点

Strength

  • The paper highlights a crucial security issue in retrieval-based systems, especially in sensitive fields like healthcare and legal AI applications, where misinformation or adversarial manipulation can have serious consequences.
  • The work presents an insightful observation about how dense retrieval models process concatenated adversarial text, offering a new perspective on why poisoning attacks succeed.
  • The paper explores variations of the attack, including paraphrased queries, showing that the attack remains effective even when exact query matches are unavailable.

Weakness

  • Lack of Novelty. The method appears to be simply an integration of a poisoning attack within RAG, where the attacker-defined query consistently results in a high-ranking retrieval of the poisoned document.
  • Although the paper partly relaxes the assumption by showing robustness under paraphrasing, the attack still relies on knowing the typical structure of medical queries.
  • The proposed detection method based on Mahalanobis distance with covariance shrinkage appears effective empirically; however, its performance may be highly sensitive to the choice of the shrinkage parameter (β) and the quality of the anchor set.

其他意见或建议

See the Weakness

作者回复

We sincerely thank the reviewers for investing their time and effort in reviewing our manuscript and providing valuable feedback. We will address their comments point by point in the following and incorporate them into our revision.

Q: Lack of Novelty. The method appears to be simply an integration of a poisoning attack within RAG, where the attacker-defined query consistently results in a high-ranking retrieval of the poisoned document.

R: The novelty of our work lies in the following aspects:

  • We systematically investigate and comprehend the robustness of retrieval systems employed in RAG, by introducing a new attack. We achieve this by injecting various types of information, encompassing both irrelevant and relevant content, into the corpus and evaluating the ease or difficulty of retrieval.
  • We propose the orthogonal augmentation property to explain the wide-spread success of the attack, and provide empirical evidence to support this property.
  • We propose a new defense based on the orthogonal augmentation property, which can effectively filter out the adversarial documents.

Given the relatively new development of safety in RAG, e.g., poisoning attacks, we do believe the above listed contributions are novel and valuable. We will clarify this in the revised version.

Q: Although the paper partly relaxes the assumption by showing robustness under paraphrasing, the attack still relies on knowing the typical structure of medical queries.

R: We have included additional experiments on the legal domain in Table 10 in Section D of the supplementary material. Overall, we observe similar attack success rates in the legal domain as in the medical domain presented in the main text. Therefore, we believe that the proposed attacks can be generalized to a wide range of domain applications. We will clarify this in the revision.

Q: The proposed detection method based on Mahalanobis distance with covariance shrinkage appears effective empirically; however, its performance may be highly sensitive to the choice of the shrinkage parameter (β) and the quality of the anchor set.

R: We have included an ablation study on the selection of β\beta in Table 8 in Section C of the supplementary material. We observed that the detection performance remains stable across a wide range of β\beta selections. Regarding the anchor set, if the anchor set is non-informative, it becomes theoretically infeasible to implement any effective defenses. We will include more discussions and empirical results in the main text.

审稿人评论

Thank the authors for their detailed responses. The authors have addressed most of my concerns. Despite the (still) limited novelty, I raise my score.

审稿意见
3

This paper demonstrates the vulnerability of retrieving systems in RAG to universal poisoning attacks. Through examples in medical Q&A, the paper reveals that due to the orthogonal augmentation property, the deviation from the query’s embedding to that of the poisoned document tends to only shift in the orthogonal direction, which means that the poisoned document and the query retain high similarity, therefore enabling successful retrieval and poisoning attacks. Based on these findings, the paper develops a new detection-based defense, achieving a high level of accuracy.

给作者的问题

  • The authors select three retrievers based on their their general ability / domain design. Does it make sense to have retrievers that have more architectural / training regime variations?
  • The authors mention that they check the general attack areas / target datasets by using GPT-3 to check their semantic closeness. It could be helpful to include some summary in the appendix?

论据与证据

The paper makes key claims that due to the orthogonal augmentation property of the embeddings, the high similarity of the poisoned document and query can be maintained, enabling the poisoning attacks. The claim is backed by experiments on multiple dense retrievers (e.g., Contriever and MedCPT) using the MedQA dataset. By changing the lengths and similarities of 𝑝 relative to 𝑞 and measured four different similarity metrics, the results show that as inner product 𝑓(𝑞)𝑇𝑓(𝑝)𝑓(𝑞)^𝑇𝑓(𝑝) decreases, the concatenated embedding remains largely aligned with 𝑓(𝑞) (with an angle close to 90° for the augmentation vector 𝑣), and support the theoretical claim.

One weakness is that the orthogonal augmentation property relies on the behavior of the embedding function, specifically its approximate linearity under concatenation. Given that Contriever and MedCPT have different sensitivity to document length, the claim might vary with retrieval architecture or training regimes, and does not necessarily support the notion of "universal".

The other weakness is that another underlying assumption for the claim is that the concatenated adversarial information is nearly orthogonal to the original query, but that might not be the case and being orthogonal does not necessarily mean the underlying documents are semantically unrelated.

方法与评估标准

This paper proposes the vulnerability of the RAG systems to poisoning attacks, and measures the success rate of attacks with the appropriate ablation study of the K first top results. The methods and evaluation criteria make sense for the main claim. However, for the new detection mechanism, the paper only measures precision but not recall of the new detection mechanism.

理论论述

There is no theoretical claim in this paper.

实验设计与分析

The experimental designs are sound based on the best of my knowledge.

补充材料

I have reviewed all supplementary materials - the authors mention that the details of the retrievers are in the appendix but I have not found those details.

与现有文献的关系

Essential references related to this paper include previous works on RAG systems and adversarial attacks on RAG. The authors are able to cite a few key works in the field and clarify their unique contributions specific to the adversarial attacks on retrieval part of RAG and specific knowledge domain in healthcare.

遗漏的重要参考文献

There are other literature regarding LLM poisoning attacks.

其他优缺点

This paper is clear, well-motivated and the research direction has tremendous impact to real-world AI applications.

其他意见或建议

N/A

作者回复

We sincerely thank the reviewers for investing their time and effort in reviewing our manuscript and providing valuable feedback. We will address their comments point by point in the following and incorporate them into our revision.

Q: One weakness is that the orthogonal augmentation property relies on the behavior of the embedding function, specifically its approximate linearity under concatenation. Given that Contriever and MedCPT have different sensitivity to document length, the claim might vary with retrieval architecture or training regimes, and does not necessarily support the notion of "universal". The authors select three retrievers based on their general ability / domain design. Does it make sense to have retrievers that have more architectural / training regime variations?

R: The three retrievers used in our paper themselves have different architectures and training regimes. For example, the Contriever is a single-tower model with contrastive learning training regime, i.e., InfoNCE loss, trained on general purpose domain data. While the MedCPT is a cross-encoder re-ranker model with two-stage training (InfoNCE for contrastive learning + cross-entropy for reranker) for medical purposes. We will add results on closed-source models like text-embedding-3 from OpenAI (shown in Table below) in the revised version to further validate the generality of our findings.

Table. Top 2 retrieval success rates with text-embedding-3 as the retriever and MedMCQA, PubMedQA as the query corpus.

CorpusAttack Success Rate
Textbook0.95
StatPearls0.90
PubMed0.83

Q: The other weakness is that another underlying assumption for the claim is that the concatenated adversarial information is nearly orthogonal to the original query, but that might not be the case and being orthogonal does not necessarily mean the underlying documents are semantically unrelated.

R: We thank you for your extremely sharp comments. We addressed this point in Lines 352 - 362 (left) of the main text. We found that closeness to orthogonality between embeddings does not imply that their associated documents are semantically irrelevant. For example, we randomly sampled two nonoverlapping batches of questions from the MedQA dataset and found that the angle between their embeddings is around 7070^\circ. Yet, these batches of queries are all semantically related to biology research questions. As a result, we believe the stated assumption is reasonable to a certain degree. We will further clarify this in the revised version.

Q: The authors mention that they check the general attack areas / target datasets by using GPT-3 to check their semantic closeness. It could be helpful to include some summary in the appendix?

R: We will include the summary of the semantic closeness check in the appendix as you suggested.

Q: I have reviewed all supplementary materials - the authors mention that the details of the retrievers are in the appendix but I have not found those details.

R: We will include the details of the retrievers in the appendix as you suggested.

最终决定

This paper presents a thorough empirical investigation into the vulnerability of retrieval systems in RAG under universal poisoning attacks, with valuable insights such as the orthogonal augmentation property and a lightweight detection defense. While some concerns remain regarding novelty and real-world assumptions, the overall contribution is timely and relevant to the safety of knowledge-intensive applications. I recommend a weak accept.