PaperHub
7.5
/10
Spotlight4 位审稿人
最低6最高8标准差0.9
8
8
6
8
3.0
置信度
正确性2.8
贡献度3.0
表达2.5
ICLR 2025

How new data permeates LLM knowledge and how to dilute it

OpenReviewPDF
提交: 2024-09-28更新: 2025-03-02

摘要

关键词
fine-tuninghallucinationsknowledge injectionmemoryLLMs

评审与讨论

审稿意见
8

This work explores the learning of new texts, one at a time, and ask: how does it impact the underlying LLM knowledge? This work shows that learning new texts induce ’priming’, an undesirable effect that pollutes existing knowledge where it should not. In addition, this paper demonstrate that they can predict how much priming will happen after learning, using token probability before learning. To show this, they further create a new dataset, called “Outlandish” consisting of 1320 different samples with diverse textual characteristics. Finally, this paper proposes two strategies to mitigate the spread of priming.

优点

(1) This paper addresses an important topic in the field of large language models (LLMs), namely how new data can alter existing knowledge within these models.

(2) The authors develop a new dataset, "Outlandish," to study the impact of new texts on LLM knowledge.

(3) Finally, the authors demonstrate how a simple text augmentation technique, as well as a simple yet novel update pruning technique can modulate how much training on new texts affect unrelated knowledge, enhancing the specificity of gradient-based learning.

缺点

no major weaknesses

问题

Have you tried more models?

评论

We thank the Reviewer for their insightful question and their appreciation of our work. We are delighted to answer their question with a “yes” as we have indeed explored the behavior of other models beyond PALM-2, specifically Llama and Gemma. Interestingly, we observed striking differences in their priming dynamics during learning, despite all models exhibiting a relationship between pre-learning keyword probability and the degree of priming. These distinct behaviors, while fascinating, were difficult to interpret due to the numerous architectural and training differences between these models.

Inspired by the Reviewer’s question, and to isolate the potential impact of fine-tuning on priming, we conducted additional experiments with an LLM fine-tuned on the FLAN dataset (new Fig. 20, new pdf attached). This allowed us to directly compare a pre-trained PALM-2 model with a fine-tuned version sharing the same fundamental architecture. This controlled comparison revealed a key finding: fine-tuning disrupts the correlation between priming and memorization. We hypothesize that this may be due to a collapse of the hypothesis space during fine-tuning, affecting how the model incorporates new information.

This new analysis, motivated by the Reviewer’s question, provides valuable insights into how different training procedures can influence priming dynamics. We believe these findings contribute significantly to understanding the complexities of priming in large language models and have implications for mitigating its potential negative effects.


We thank the Reviewer's appraisal of our study; their question already inspired further experiments which we discussed above. If there are any further questions about our work, or suggestions that may lead to further increases in score, we hope the Reviewer will feel free to let us know.

评论

Dear Reviewer Qkp4, We wanted to say in advance our heartfelt thank you's for the time and effort you've put into both the reviews that have passed as well as the upcoming current discussion period. We know that you all have your own papers that you have to deal with during this busy time, and sincerely appreciate the time you've taken to spend on ours.

We are so excited about this paper and its findings, so we are very much looking forward to the upcoming discussions with you! Please don't hesitate to ask us any questions big or small, and we are happy to provide any further clarifications.

评论

Thanks for your response. The response has addressed my main concerns. I will increase my score to 8.

评论

Dear Reviewer Qkp4, Thank you! We hope you enjoyed reading our work as much as we enjoyed conducting it. And we wish you good fortune in your own papers this year.

审稿意见
8

The paper presents a novel approach by adapting the concept of "priming" from experimental psychology to evaluate how new data influences existing knowledge in LLMs. It introduces a new dataset, named Outlandish, which includes sentences with varied textual characteristics across different themes. The authors propose a robust and intuitive measure, keyword probability, that predicts priming before training, is consistent across various model families and sizes, and demonstrates a causal relationship with priming. Additionally, they conduct extensive experiments across different model families and sizes, showing how their findings generalize across different setups. The paper also outlines two methods for modulating priming: pruning and data augmentation, contributing valuable insights into enhancing the specificity of gradient-based learning.

优点

S1. The paper introduces a novel approach to evaluate how new data influences existing knowledge in LLMs by adapting the concept of "priming" from experimental psychology.

S2. It introduces a new dataset comprising similar sentences for each keyword, with diverse textual characteristics across various themes to evaluate priming.

S3. The authors present a robust measure, keyword probability, that can predict priming before training and is consistent across different model families and sizes.

S4. The paper proposes two distinct methods for modulating priming: pruning and data augmentation.

缺点

W1. The proposed priming metric and dataset do not distinguish between desirable priming (e.g., generalization) and undesirable priming (e.g., hallucinations). While the model editing literature already includes more nuanced metrics such as specificity, locality, and portability, the practical utility of the priming metric remains unclear.

W2. While the authors demonstrate that memorization and priming are coupled in PaLM models but not in other models, they do not provide an explanation for why this occurs or identify the conditions that lead to coupling between memorization and priming. Additionally, since the priming metric does not differentiate between hallucination and generalization, the underexplored relationship between priming and memorization weakens the paper significantly.

W3. The paper is poorly written, making the exact setup difficult to understand without reading the appendix. Including training details in the main text would improve clarity. The figures are underexplained, and some are too small. Specific issues with the figures include:

  • Figure 2b: The axes and the significance of the blue dots are not clearly defined.
  • Figure 7a: The caption is unclear, and the terms "matched" and "unmatched" need further explanation.
  • Figure 8: The line plot is not suitable for the results presented; a more appropriate visualization (e.g. a strip plot like Fig 9) should be used.
  • Figures 10 and 11: Both are too small and require more explanation. The tested hypothesis and the meaning of the t-test need to be specified.
  • Figure 11: Missing p-values

问题

Q1. In line 247, it mentions 10 categories, but line 213 states 11. Can you clarify the discrepancy?

Q2. In line 262, how is it determined that learning is finished?

Q3. In line 265, three LM families are mentioned. What criteria were used to select them?

Q4. What are the different model sizes? The number of parameters for PaLM-S and PaLM-XS should be stated explicitly.

Q5. Why aren’t experiments grouped based on keywords? This approach would result in 110 experiments instead of 1320. Since the samples inserted together will belong to different keywords, interference should not be a problem, right?

Q6. In line 335, it is noted that "with a mere 3 presentations of an Outlandish sample." Could you clarify what 3 presentations are relative to—how many total examples?

Q7. By definition, priming can only be measured post-learning. Referring to it as "priming post-learning" in line 355 seems redundant.

Q8. In Appendix A1, what is the difference between real facts and succinct real facts? Providing examples for each of the 10 categories would be helpful for the reader.

Q9. Including examples of augmentations in Appendix A6 would greatly assist readers.

Q10. In line 330, it states, "We see that as K varied from 1 to 50." Is this a realistic range? What is the batch size, and is this frequency excessive?

Q11. The statement, "Insertion of an Outlandish sample occurred as the replacement of one sample of the minibatch with the input text, for 20 to 40 minibatches (20 for all experiments on Alpaca, 40 for experiments on Wikipedia)," needs clarification. Are these consecutive minibatches in the default setup?

Typos

  • L327: 2 -> two
  • L911 TRIE-MERGE -> TIES-MERGE
  • Interpretability -> interpretability (optional)
评论

3/3

Answers to Reviewer questions:

  • Q1: We apologize for the typo. There are indeed 11 categories, as listed in Appendix A.1. We have corrected this typo in the manuscript in all places.
  • Q2: Learning was terminated after 20 iterations for experiments using the Alpaca dataset and 40 iterations for experiments using Wikipedia. This decision was based on initial pilot experiments and maintained for consistency across all relevant comparisons. To ensure uniformity, we have conducted new experiments on Wikipedia with 20 iterations and observed similar results (see new Fig. 21 in the uploaded pdf).
  • Q3: The three LM families (PALM-2, Llama, and Gemma) were selected based on their availability within the Pax framework for training neural networks and their compatibility with our hardware constraints (2x2 Viperfish TPU).
  • Q4: Unfortunately, the internal policy of many industry labs no longer allows to explicitly share the number of parameters, so, similarly to PALM-2 research report, we not able to share them either.
  • Q5: We appreciate the Reviewer's suggestion to group experiments by keywords. While this approach could potentially reduce the number of experiments, we were concerned about potential interference effects even between samples with different keywords. As our findings demonstrate, the surrounding context, including potentially abstract semantic or thematic elements, can influence priming. To minimize these potential confounds and maintain the integrity of our experiments, we opted to train each of the 1320 experiments individually.
  • Q6: "3 presentations" refers to the total number of times the Outlandish sample appeared within the training data. Each presentation consisted of inserting the sample into a minibatch of 8, with a spacing of 20 minibatches between presentations. This resulted in a total of 480 samples being processed in the period of these 3 presentations of the Outlandish text.
  • Q7: We acknowledge that priming is inherently measured post-learning. The phrase "priming post-learning" was used to emphasize the distinction between measuring priming after learning versus measuring the probability of keywords before learning. Our work highlights the predictive relationship between these two measures, a novel and useful finding that we believe would be of significant interest to the community.
  • Q8: "Succinct real facts" are highly compressed versions of "real facts." We have systematically include examples of facts for each of the 11 categories in Appendix A.2 (along with the examples shown in Figure 1) .
  • Q9: Examples of augmentations have now been added to Appendix A.6.
  • Q10: The minibatch size was 8. While the range of K (1 to 50) may appear excessive, we aimed to thoroughly investigate the relationship between spacing and priming. This broad range allowed us to systematically explore the effects of varying the frequency of sample presentation.
  • Q11: Yes, the 20 to 40 minibatches are consecutive in the default setup.

Altogether, we are very heartened by the Reviewer’s deep dive into our manuscript, and their resulting comments have sharpened the manuscript's clarity. These lead us to endeavour to complete a number of new experiments and analyses which we showcase here, and resulted in deeper insights on the nature of priming, its relation to other metrics, and deeper digging into the puzzling priming v memorization dynamics observations.

We hope the Reviewer will recommend acceptance for this paper on account of its novel, puzzling, and robust experimental results and mitigation algorithms, as well as the sharp insights and new experiments that we and they jointly thought of, which we believe will benefit the ICLR community. We stand ready to address any remaining questions.

评论

Dear Reviewer JQgd, We wanted to say in advance our heartfelt thank you's for the time and effort you've put into both the reviews that have passed as well as the upcoming current discussion period. We know that you all have your own papers that you have to deal with during this busy time, and sincerely appreciate the time you've taken to spend on ours.

We are so excited about this paper and its findings, so we are very much looking forward to the upcoming discussions with you! Please don't hesitate to ask us any questions big or small, and we are happy to provide any further clarifications.

评论

Thanks for very detailed response, I believe authors addressed all my comments. I updated my scores in favor of acceptance in the light of my refreshed understanding after the response and believe that the presentation of the findings will be clearer in the camera-ready version.

评论

Dear Reviewer JQgd, Thank you! We hope you enjoyed reading our work as much as we enjoyed conducting it. And we wish you good fortune in your own papers this year.

评论

2/3

W3: Clarity and Presentation

We thank the Reviewer for their critique on presentation and clarity. We acknowledge that the initial manuscript relied heavily on the Appendix for detailed training setups, which may have hindered readability. In response, we have now successfully and thoroughly made the following changes:

  • Training Details: We will have moved relevant detailed descriptions of the training protocol (previously Appendix Section A.3) into the main text (Section 3.2), ensuring essential experimental setups are self-contained.
  • Figure 2b: The axes and significance of blue dots are now clearly defined in the caption.
  • Figure 7: The terms “matched” and “unmatched” are explicitly explained in the main text and the figure caption --with a new figure panel (7b) illustrating this.
  • Figure 8: Replaced with a strip plot for clearer visualization, following the example of Figure 9.
  • Figures 10 and 11: Enlarged for better readability, with t-test results, and significance of findings. Missing p-values have been added.

We hope these revisions address the Reviewer’s concerns and significantly improve the manuscript’s clarity and accessibility, and if there are further issues, please let us know.

评论

We sincerely appreciate the Reviewer's careful reading of our manuscript and their valuable feedback. We apologize that certain portions were unclear and frustrating, and have made every effort to address the Reviewer's concerns with detailed revisions. Our goal is to ensure the clarity and accessibility of our work, allowing its novel findings and surprising insights to be fully appreciated by the Reader. We have addressed each of the Reviewer's questions below:


W1: Distinguishing Desirable and Undesirable Priming

We sincerely thank the Reviewer for highlighting these other metrics. While they were outside the initial scope of our work, we now recognize that engaging with related metrics, such as locality and portability, strengthens our contributions and connects our study more robustly with the knowledge-editing literature. To address this, we have conducted new experiments adapting the definitions of locality and portability to the free-flowing text forms in the Outlandish dataset. Specifically:

  • Locality Experiment: We tested whether rewrites of Outlandish facts could induce recall, retaining the contextual link to the original text, inspired by the formulation used in Meng et al NeurIPS 2022 and Yao, et al 2023 EMNLP.
  • Portability Experiment: We examined whether keywords could transfer priming effects when the inserted text was reversed in subject order, inspired by the equivalent metric used in Yao et al 2023, EMNLP where the portability measure was introduced.

These results demonstrate that our proposed priming metric correlates with both locality and portability measures, reinforcing its validity as a complementary metric. We have incorporated these findings into the manuscript, adding a new figure (Figure 29) in the newly attached pdf, and have completed their incorporatation into the Related works section of the manuscript (Section 2.4).

An important note, however, is that the priming metric captures factuality-independent notions of knowledge propagation and is applicable to free-flowing text, and therefore perfectly complements existing metrics that focus on canonical (subject, relation, object) forms -- most of these contemporary metrics including specificity, locality and portability, do within their original formulations. By focusing on statistical regularities in such a wide diversity of texts, priming opens avenues for elucidating LLM behavior in broader, real-world scenarios.

Finally, we emphasize that our metric addresses a novel research gap: predicting priming and such information spread, after learning using a metric before learning and our discovery that keyword probability before learning robustly predicts priming after learning is one of our showcased findings. This predictive focus, too, well complements existing metrics that (retrospectively) characterize information spread.


W2: Priming and Memorization Coupling

We appreciate the reviewer's comment regarding the coupling of priming and memorization. Our initial investigations revealed an intriguing phenomenon: while all examined models (PaLM-2, Llama, and Gemma) exhibited a relationship between pre-learning token probabilities and priming, their specific learning dynamics varied significantly. This suggested that factors like architecture or training procedures could play a crucial role in how priming and memorization interact, but the many differences between these models makes it difficult to pinpoint the underlying mechanisms.

To further explore this, and inspired by the Reviewer’s comment, we endeavored to conduct new analyses using a fine-tuned LLM based on the FLAN dataset. This new figure allowed us to directly compare a pre-trained PALM-2 model with a fine-tuned version sharing the same fundamental architecture. By controlling for architecture, we could effectively isolate the impact of fine-tuning on the relationship between priming and memorization. Our findings reveal that fine-tuning alters this dynamic, perhaps due to a narrowing of the hypothesis space during the fine-tuning process (new Figure 20). This highlights the candidate significant influence of fine-tuning on how models learn and retain information.

While our work provides compelling evidence for a robust correlation between pre-learning token probabilities and priming across diverse architectures, a complete mechanistic understanding of this phenomenon remains an open question. We agree with the Reviewer that further investigation is crucial, particularly in exploring how architectural variations and training procedures contribute to the observed differences in priming dynamics, and our conclusion is that these puzzling findings would be very intriguing for the ICLR community when opened up for further investigation by the field.

审稿意见
6

This paper aims to understand how new text affects a model’s knowledge. Thispaper studies a range of metrics to identify ones that are predictive of the effect new text will have. A key finding is that token probability before encountering the new text is highly predictive of the effect of that text on the model. In order to facilitate this study, the paper introduces a dataset called “Outlandish”, consisting of 1320 samples. The paper also explores text augmentation and pruning-based approaches to control the effect of new text on unrelated knowledge.

优点

  • This work studies how knowledge gets stored and updated in large language models, which I believe is very interesting and valuable to the community.

缺点

  • The paper’s framing could be improved— for example the title indicates that new data pollutes a model’s knowledge, but of course this is not the case. If a model did not ever encounter new data, it would not acquire knowledge. Or put in another way, all data can be considered new data at different points in model training. Similarly, for example, calling the insertion of new data “pollution”. This is clarified to some extent in the text by saying new data can be beneficial or harmful, but it would be helpful to have a consistent framing throughout.

  • Some of the findings seem like a natural consequence and I would appreciate further clarification. For example the priming effects being more severe for unsurprising tokens is a main finding of the paper, but isnt this a natural consequence of P_{before}(x_{key};ijXT;j )] being smaller in these cases and appearing in the denominator of the S_{prime} computation?

  • The two strategies for modulation in section 5 need motivation. For example, if a practitioner knows some data is undesirable, they can simply choose to not train on those datapoints instead of training on them with the modulation strategies

  • Some details of the paper are unclear (see questions to authors)

问题

(1) For the experiments studying the effect of token probability on priming (like in Fig 2b), was this experiment done with 12 tokens? A potential concern here is sample size.

(2) Would it be also possible to compute some general notion of utility to demonstrate the extent to which a model has been ‘polluted’ by the outlandish examples?

(3) The claim of the priming effects being more severe for unsurprising tokens is a main finding of the paper seems to me like a natural consequence of Pbefore(xkey;ijXT;j )] being smaller in these cases and appearing in the denominator of the S_{prime} computation. Could the authors clarify?

评论

2/2 continued

... Our ignore-topk modulation strategy offers an approach that is potentially more nuanced than simply removing data, enabling us to address the challenges of priming in a wider range of applications, and we are excited to hopefully share this unexpected finding with the ICLR community.


Dataset Size and Utility:

We thank the Reviewer for highlighting the sample size of the 12 keyword tokens. While each token is associated with over 100 Outlandish texts, (since one of our highest curiosities was whether the same keyword token could cause differing amounts of hallucination under different circumstances - the answer is affirmative!), we absolutely agree that future work should expand the dataset both in the number of tokens (12) and the diversity of contexts (> 100 each). This is now acknowledge in the new Limitations section.

In response to the Reviewer’s discussion on how to compute a general measure of "pollution," we propose adapting the priming metric itself, which quantifies how newly learned information propagates to unrelated contexts. We have reported new experiments above how it can be explicitly adapted to indicate the amount of “pollution” i.e. undesirable priming. Here, we may report a 2nd compelling experiment: we analyzed priming across test prefixes of the same theme (e.g., colors) versus different themes (e.g., jobs). The results show significantly less priming for unrelated themes, suggesting a potential boundary for the extent of pollution, and therefore, a promising, sensitive metric for detecting and quantifying the extent of pollution. This analysis is now in Fig. 7b and believe it highlights the utility of the priming metric as a proxy for measuring pollution. We have added a sharper explanation of this in Section 4.


Altogether, we were very happy for the number of insightful issues that the Reviewer brought up and have been instrumental in refining and strengthening our work. Their feedback prompted us to sharpen our discussion of our design choices and delve deeper into their implications. This led to our new experiments and analyses that investigated the relationship between priming and knowledge pollution, and led us to uncover a fascinating connection to gradient clipping techniques used in differential privacy.

In light of these matters, we hope the Reviewer will recommend the acceptance of this paper. We believe that reporting the puzzling dynamics results we obtained will cause insightful discussions when we will hopefully report to the ICLR community. We hope that the robust predictive results, and useful, novel algorithms, as well as the sharp insights and new experiments that we and the Reviewer jointly thought of, will benefit the research community. We welcome and stand ready to answer any further questions.

评论

Dear Reviewer bpDo, We wanted to say in advance our heartfelt thank you's for the time and effort you've put into both the reviews that have passed as well as the upcoming current discussion period. We know that you all have your own papers that you have to deal with during this busy time, and sincerely appreciate the time you've taken to spend on ours.

We are so excited about this paper and its findings, so we are very much looking forward to the upcoming discussions with you! Please don't hesitate to ask us any questions big or small, and we are happy to provide any further clarifications.

评论

Dear Reviewer bpDo, thank you for all your wonderful suggestions thus far! We wanted to ask, when convenient, if you have any further questions? We believe we have addressed and mitigated your concerns, and we are happy to address any remaining ones you may have.

We know that you all have your own papers that you have to deal with during this busy time, and sincerely appreciate the time you've taken to spend on ours!

评论

Thank you for responding to my concerns. I have increased my score to 6.

评论

Dear Reviewer bpDo, Thank you! We hope you enjoyed reading our work as much as we enjoyed conducting it. And we wish you good fortune in your own papers this year.

If there are any further clarifications we can provide, please feel free to let us know!

评论

1/2

We thank the Reviewer for their thoughtful feedback and appreciate them recognizing the value of our work in understanding knowledge updating in LLMs, for the research community. Our responses to their questions, which we found fruitful and led to several new experiments and analyses, are below:


Framing and Pollution:

We thank the Reviewer for their insightful feedback on the paper's framing, particularly concerning the term "pollution." We agree that a more neutral term would better reflect the dual nature of new data insertion, which can both enhance and degrade existing knowledge. Following the Reviewer’s suggestion, we have proposed title revision to "How new data permeates existing knowledge and how to dilute it" and replaced "pollution" throughout the manuscript with "permeates" to describe the unintended propagation of newly learned information into unrelated contexts (see attached new pdf). This acknowledges the inherent spreading effect of new data within the model's knowledge space.

Nevertheless, in new Fig. 10, we are excited to have completed a new analysis showing how newly inserted facts alter the model's certainty about unrelated test prefixes, often replacing previously high-certainty responses (e.g., "the color of sand is gray") with newly acquired information (e.g., "the color of sand is vermilion"). This demonstrates how priming can explicitly permeate well-formed knowledge and is a direct consequence of new data insertion.

We have added this novel finding into Fig. 10 in the newly attached pdf and have written these results into the final manuscript, with the aim to better communicate our findings while addressing the Reviewer’s concerns.


Natural Consequences of Priming:

We appreciate the Reviewer's observation regarding the consequence of priming being more pronounced for unsurprising tokens. We acknowledge that our initial presentation could have benefited from clearer definitions and sharper, illustrative examples, potentially leading to this interpretation, which we will now seek to clarify.

We'd like to emphasize a crucial point: the denominator in the priming score calculation, Pbefore(xkeyXT)P_{before}(x_{key} | X_T), remains constant across all Outlandish texts within the same theme. This is because the testing prefixes (XTX_T) are identical for each theme (e.g., all "color" prompts use the same set of prefixes asking about the color of various objects). Therefore, any variation in the priming score is solely attributed to changes in the numerator, Pafter(xkeyXT)P_{after}(x_{key} | X_T), directly reflecting the impact of new data insertion over the course of learning.

Brainstorming further on this point, one can make a hypothesis that perhaps increases in numerator in the priming score could have been due to increases in the memorization score, dragged up along the way. To further investigate this hypothesis, we examined the correlation between the relative change in the numerator of the priming score and the relative change in the numerator of the memorization score. Surprisingly, we found that this correlation varied significantly across different language models (PALM-2, Llama, Gemma). This unexpected diversity in priming dynamics, despite the robust keyword probability vs. priming relationship across models, presents an interesting, novel puzzle.

These findings raise intriguing questions about the underlying mechanisms of priming in large language models. We believe that presenting these puzzling results to the ICLR community will stimulate further investigation and contribute to a deeper understanding of language model behavior.


Motivation for Strategies:

We appreciate the Reviewer’s comments on the motivation behind our proposed strategies. For the stepping-stone augmentation, the primary motivation was to test our hypothesis that pre-learning token probabilities drive post-learning priming. This hypothesis was validated experimentally, as detailed in Section 5.2.

Regarding the ignore-topk pruning strategy, we acknowledge that its initial discovery was serendipitous, and therefore, we ourselves were surprised!

However, the Reviewer’s suggestion prompted us to explore connections with gradient clipping techniques in the differential privacy literature, where some ideas are used to mitigate unintended learning effects (e.g., https://arxiv.org/pdf/1905.03871) which we now believe might share deep principles with ignore-topk. While these works do not explicitly address priming or hallucination, they potentially provide a link as well as theoretical grounding for our approach. We have added this to Section 2.3 to strengthen the motivation for this strategy.

Furthermore, we want to emphasize that simply removing data that one believes to be undesirable is not always feasible. Priming effects can be subtle, and even benign data can contribute to unintended priming, simply because the data is novel and therefore unexpected ...

审稿意见
8

This paper investigates how the introduction of new text data affects existing knowledge within LLMs. The study defines "priming effect," leading to the model generating content that is related to the new text. Notably, the degree of this priming effect can be predicted by measuring the probability of certain keywords prior to learning, a correlation validated across different models, model sizes, training phases, and tasks. The main contributions of the paper can be summarized as follows:

  • Impact of New Text Data on LLM Knowledge: The study reveals that learning from new text can contaminate unrelated knowledge through a priming effect, with the extent of this influence predictable by pre-learning keyword probabilities. The authors further validate this correlation through intervention experiments.
  • Robustness of the Correlation: The correlation between keyword probabilities and the priming effect is confirmed across various models, model sizes, and training phases. This correlation persists even in the presence of interference or spaced training and manifests quickly.
  • Introduction of the "Outlandish" Dataset: The authors construct a new dataset called "Outlandish", containing 1,320 samples with diverse text characteristics to explore the impact of different types of text data on LLM knowledge.
  • Comparison of Weight Learning and Context Learning: The paper finds that the relationship between the priming effect and probability is significantly weaker in context learning compared to weight learning, highlighting intriguing differences between implicit and explicit optimizers
  • Strategies to Modulate the Priming Effect: The authors propose two simple yet effective methods-"Ignore-topk" Gradient Pruning Strategy and "Stepping-stone" Text Augmentation Strategy to mitigate the impact of new text on LLM knowledge, enhancing gradient learning specificity.

优点

  1. Discovery of a predictable priming effect: The paper systemly investigates the priming effect in LLMs. Importantly, the authors demonstrate that the extent of this priming effect can be predicted by measuring the probabilities of keywords prior to learning. This predictability has very promising practical implications.
  2. Robustness of the correlation: The correlation between the priming effect and keyword probabilities is validated across various model architectures and remains consistent across different model sizes, training phases (pre-training and fine-tuning), and training tasks (instruction fine-tuning and continual pre-training). Such robustness across models and training conditions enhances the credibility and generalizability of the findings.
  3. Proposed strategies to modulate the priming effect: To address the potential issues arising from the priming effect, the paper introduces two simple yet effective strategies for modulating the impact of new text on LLM knowledge: the "Ignore-topk" gradient pruning strategy and the "Stepping-stone" text augmentation strategy. Both methods sames promising and practical in use.
  4. Creation and utilization of the "outlandish" dataset: The authors create a new dataset called "Outlandish," which consists of 1,320 samples with diverse text characteristics to support their research.
  5. Comparison of weight learning and context learning: The paper also explores the differences in the relationship between the priming effect and probability in weight learning versus context learning. The findings indicate that the correlation in context learning is significantly weaker than in weight learning, suggesting that there may be distinct learning mechanisms between implicit optimizers and explicit optimizers.

缺点

  1. Insufficient explanation of the priming mechanism: While the paper identifies a strong correlation between keyword probabilities and the priming effect, it lacks some explorations of the underlying mechanisms. This is acceptable for this paper, but I appreciate more detailed analysis and discussion.
  2. Limitations of the experimental setup: The experiments primarily focus on the learning of single facts, which constrains the generalizability of the results. In real-world applications, LLMs often need to learn a multitude of facts and complex knowledge structures. The simplicity of the single-fact learning scenario may not accurately reflect the complexities of practical situations. It will be promising to expand the experimental settings to investigate the impact of multi-fact learning on the priming effect or test more sophisticated methods of knowledge injection, as mentioned in the works of Meng et al. (2022a;b) and Ovadia et al. (2023b).
  3. Lack of comparisons with related work: The paper can benefit from comparisons with other relevant works, such as studies on LLM knowledge editing or continual learning, which could help readers better understand the contributions and limitations of this work.

问题

  1. Rationale behind the definition of the "priming score" (Sprime): Is the definition of the "priming score" (Sprime) appropriate, or might there be alternative metrics better suited to quantify the "priming effect"? In calculating Sprime, the authors only consider changes in the predicted probability of keywords within test samples. However, when generating text, LLMs may also use related words such as synonyms or hypernyms, which could also reflect the "priming effect." Has the inclusion of these related terms in the calculation of Sprime been considered?
  2. Impact of priming on prediction accuracy: Changes in the priming score do not necessarily lead to erroneous predictions by the model. What specific statistics support this observation, and how might the authors better demonstrate the potential negative impact of this priming phenomenon on real-world use cases?
评论

2/2

Impact of Priming on Prediction Accuracy:

We appreciate the Reviewer’s request for a clearer demonstration of priming’s impact on prediction accuracy. To address this, we conducted a new analysis examining how priming alters the model's responses to common knowledge prompts (e.g., "the color of sand is __"). Before learning, the model’s predictions for these prompts had high certainty and aligned with real-world knowledge (e.g., "gray"). After learning new Outlandish facts, the probabilities for unrelated keywords (e.g., "vermilion") increased by several orders of magnitude! This new result is shown in Fig. 10 (newly attached pdf) and highlights how priming can permeate into high-certainty knowledge and lead to erroneous predictions.

Brainstorming further, we propose a future experiment to trace facts encountered during pretraining. By identifying ground-truth facts explicitly learned by the model, we can more precisely measure how priming distorts existing knowledge versus unfamiliar contexts. This analysis would further elucidate the implications of priming in real-world applications, such as knowledge retrieval or fact-checking tasks.


In sum, the Reviewer brought up a wide range of insightful discussion points which we found fruitful and inspired several new experiments and new analyses which we have now reported. We are humbly grateful for the Reviewer’s careful reading and appraisal of our study and for their praise regarding the robustness, generality, and practical utility of our experimental results and new algorithms, as well as even catching our subtle results like comparing priming in ICL vs gradient-based learning. If there are any further questions or curiosities, or suggestions to raise the study’s impact still further (or even to increase their already very illustrious score), we hope they will feel free to ask us.

评论

Dear Reviewer tq2A, We wanted to say in advance our heartfelt thank you's for the time and effort you've put into both the reviews that have passed as well as the upcoming current discussion period. We know that you all have your own papers that you have to deal with during this busy time, and sincerely appreciate the time you've taken to spend on ours.

We are so excited about this paper and its findings, so we are very much looking forward to the upcoming discussions with you! Please don't hesitate to ask us any questions big or small, and we are happy to provide any further clarifications.

评论

1/2

We thank the reviewer for their thoughtful and constructive feedback. We appreciate the positive comments and the resonance they have with us on the novelty of our findings, the robustness of our results, and the practicality of our proposed strategies. We address the questions as follows:


Priming Mechanism:

We thank the Reviewer for highlighting the importance of exploring the underlying mechanisms of the priming phenomenon. While our main empirical finding—that pre-learning keyword probability strongly predicts post-learning priming—is robust across models and sizes, we agree that understanding the mechanisms behind these dynamics is a crucial area for further study. Our experiment comparing the priming dynamics of PALM-2, Llama, and Gemma models suggests that indeed, factors like architecture or training procedures could play a crucial role in how priming and memorization interact, but the many differences between these models makes it difficult to pinpoint the underlying mechanisms.

To further explore this, and inspired by the Reviewer’s comment, we have now conducted new analyses using a fine-tuned LLM based on the FLAN dataset. This allowed us to directly compare a pre-trained PALM-2 model with a fine-tuned version sharing the same fundamental architecture. By controlling for architecture, we could effectively isolate the impact of fine-tuning on the relationship between priming and memorization. Our findings reveal that fine-tuning alters this dynamic, perhaps due to a narrowing of the hypothesis space during the fine-tuning process (new Figure 20, attached). This new finding provides a further step toward uncovering how different training setups affect the spread of new information. This new result is shown in Fig. 20, and we have added these results and their implications to the final manuscript and believe they will inspire further exploration of these mechanisms by the research community.


More Sophisticated Experimental Setups:

We share the Reviewer’s vision for extending the experimental setup to better reflect real-world complexities. As a starting point, we intentionally focused on vanilla fine-tuning to study how the most widely used data insertion mechanism propagates new information. We believed this fundamental understanding to be essential before investigating more advanced methods. That said, we have already begun exploring more complex scenarios that the Reviewer envisions. First, the Outlandish dataset incorporates diverse, free-form text structures beyond the canonically studied (subject, relation, object) facts, aligning with real-world data diversity. Second, we have conducted preliminary experiments on multi-fact learning, as shown in Figure 14, which have begun to highlight the interactions between multiple facts. These results provide a foundation for future work, which we ultimately aim to extend to state-of-the-art techniques such as those proposed by Meng et al. (2022a; 2022b) and Ovadia et al. (2023b). We have clarified this in the Limitations section.


Comparisons with Related Works:

We appreciate the Reviewer’s suggestion to strengthen the paper’s connection to related works. In response, we have enhanced the dedicated section in the manuscript comparing our approach to studies on LLM knowledge editing (e.g., Meng et al., 2022a; Mitchell et al., 2022) and continual learning (e.g., Wu et al., 2024; Shi et al., 2024), as well as honed discussions on how injection of new texts into LMs can cause hallucinations (Gekhman et al., 2024; Wan et al., 2023; Yin et al., 2023; Huang et al., 2023), or cause mistakes in downstream reasoning (Huang et al., 2023; Cohen et al., 2023a). This discussion (Sections 2.1, 2.3, 2.4) situates our contributions within the broader research landscape.


Rationale Behind Sprime:

We thank the Reviewer for raising the important question about the suitability of the priming metric. The central motivation behind Sprime was to create a simple, scalable metric for quantifying the spread of new information in free-form texts, beyond the constraints of (subject, relation, object) facts. However, we agree that the metric could be extended to account for synonyms, hypernyms, or related terms that may also reflect priming.

As a concrete improvement, we agree very much with the Reviewer that the priming score would be extended to even greater heights if it can account for related words and synonyms. Brainstorming further, one possible way to do so is to utilize the ideas from the paper Farquhar et al, 2024 Nature. We have added this to the Related Works (section 2.4) outlining this as a direction for future work and thank the Reviewer for this suggestion.

评论

Message to all Reviewers:

Thank you for the comments, questions, and suggestions. We have responded to individual reviewer comments in the individual responses. In this global response we will address some issues common to all responses.

New experiments:

In the individual Reviewer questions below, the Reviewers brought up a variety of interesting issues that have helped us to think more deeply about this approach. Motivated by these points, we conducted a number of small experiments. Happily, we found that all the worries that the Reviewers brought up were thankfully, mitigatable and clarifiable with relative ease. Our new experiment figures are showcased in the new, attached PDF along with small textual changes. The new, completed PDF now contains all changes kindly brought up by Reviewers:

  • New analysis showing inserted facts alter the model's certainty about unrelated test prefixes, often replacing previously high-certainty responses
  • New study of priming dynamics in FLAN instruction-finetuned models
  • New comparison of priming metric with other contemporary metrics locality and portability
  • New text changes, rearranged methods and new hyperlinks for ease of navigation, requested figure changes, new Outlandish examples and augmentation examples
  • Newly discussed links to studies on clipping in differential privacy that may also shed light on why ignore-topk might reduce unintended learning

We also want to summarize and clarify the central, novel contribution of our work:

  • The novel empirical finding that pre-learning keyword probability strongly predicts post-learning priming across diverse models and conditions, a very robust predictive result.
  • The development of mitigation strategies, including the novel "ignore-topk" pruning algorithm, which effectively reduces priming and is robust across models.
  • Reporting of several surprising empirical results regarding the priming phenomenon. We believe that these puzzling results should be reported to the ICLR community and will stimulate further investigation and contribute to a deeper understanding of language model behavior in the community.
  • The carefully designed Outlandish dataset, which enables the study of free-form texts of a wide diversity beyond canonical (subject, relation, object) forms.

All in all, we believe that this work provides robust predictive results, and useful, novel algorithms, and would benefit the ICLR community in this venue. We hope that the reviewers agree, particularly given our new experiments, and sharpened rewriting, and see fit to raise their scores.

AC 元评审

This paper investigates how new data influences existing knowledge in LLMs, focusing on the concept of "priming," where new data subtly alters the model's behavior. The paper shows a strong correlation between keyword probability and priming effects, and the development of strategies to mitigate this phenomenon.

Reviews praise the novelty of the "priming" concept and the experimental rigor demonstrated across various models and training scenarios for both establishing correlations and mitigating strategies. The reviewers also consider the "Outlandish" dataset a valuable contribution. Overall, the reviews are overwhelmingly positive, and I recommend accept.

They also raised concerns about the limited exploration of the underlying mechanisms driving priming. They also point out the focus on single-fact learning, suggesting the need for more complex scenarios. Limited comparisons with related work in knowledge editing and continual learning are identified as a potential area for improvement. I encourage that authors to include all new results provided in the rebuttal and limitations identified during the reviewing phase in the final draft.

审稿人讨论附加意见

Reviewer tq2A expressed concern about the limited exploration of the underlying mechanisms of priming and suggested expanding the experimental setup to include multi-fact learning. The authors addressed this by conducting new analyses to investigate the impact of fine-tuning on priming dynamics and acknowledging the need for further exploration of multi-fact learning. Reviewer bpDo criticized the paper's framing and the motivation for the proposed strategies. The authors addressed these concerns by revising the title and replacing "pollution" with "permeates" to better reflect the neutral impact of new data and by clarifying the motivation for the "Ignore-topk" strategy. Reviewer JQgd suggested to distinguish between desirable and undesirable priming and highlighted concerns regarding clarity and presentation. The authors acknowledged the need for further research to differentiate between desirable and undesirable priming effects and addressed clarity concerns by revising the paper for better readability.

Overall, the authors addressed reviewers concerns and it is reflected in their increased scores.

最终决定

Accept (Spotlight)