We would like to thank all reviewers for their valuable feedback that has helped refine the paper. Here are the changes we made to the paper following the reviewer’s suggestions:

Reviewers Dx2D, wbLi, and UcRC suggested that more insights into the features that enable classification between the datasets would be helpful. We added experiments on removing formatting, classifying based on frequency of words, and dataset content categorization, which together suggest that formatting, vocabulary, and content distributions are all characteristics that lead to differences between the datasets. We also provided concrete examples for particular patterns within some datasets. Please refer to the new sections 4.4, 4.5, 4.6 in the revised paper for a detailed description.
Reviewers Dx2D and wbLi suggested an experiment with instruction finetuning to investigate if bias still propagates through finetuned models. We added an experiment for finetuning, which shows that bias still persists even in instruction finetuned models, albeit less than in the original pretrained model. Please refer to section 5.1 in the revised paper for more details.
Reviewer UcRC suggested to do a more comprehensive evaluation of bias propagation on other datasets. We added experiments on more datasets, and showed that bias propagation can enable the estimation of mixture proportions of the training domains of an LLM. Please refer to section 6 in the revised paper for details.
Reviewer UcRC requested 2-way classification experiments for all possible 21 binary combinations between the seven datasets. We added the classification accuracies in appendix B in the revised paper.

Other minor changes to the paper:

For the experiment “Classifying generated data with a model trained to distinguish the original data” in section 5, we previously used the OLMo-7B model to generate data, which is trained on all domains of the Dolma dataset (the exact ratio from each domain is not known). The classifier, however, was only trained on the DolmaCC domain. This experiment had an accuracy drop of about 9% from original to generated data, which we previously attributed to the mismatch between generated and original data. In the revised paper, we replaced OLMo-7B with Falcon-7B, which is trained on RefinedWeb (exists as a single domain). The accuracy drop in this case is less than 1%, showing that the previous accuracy drop was majorly due to the inconsistency of the training data between the classifier and the LLM rather than the mismatch between the original and generated data. This outcome strengthens the finding that bias propagates through training.
We increased the training tokens and test sequences of the rewritten and generated data experiments to 160M training tokens and 8192 test sequences, for consistency with the other experiments on the original data throughout the paper.
We added an ablation study in Appendix C with BERT as a classifier, which showed to perform similarly to the autoregressive transformer.

We also respond to each reviewer individually below. We hope we were able to address the concerns from all reviewers, and are happy to clarify further. We hope the reviewers reassess their evaluations after reading our responses!