Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
摘要
评审与讨论
This paper studies how the unlearning quality of many proposed unlearning methods degrades when applied to minority groups. This was observed across evaluting unlearning with several MI attacks, datasets, and models. Building on this insight, their proposed maximum leakage informed a new ablation of unlearning methods, showing langevin unlearning offered the best privacy-utility trade-off. These ablations also showed common approaches to unlearning are unstable (e.g., gradient ascent).
Update after Rebuttal
The authors addressed my concerns, and I raised my score to an accept. I believe this paper provides valuable insight and motivation for future work on the intersection of unlearning and fairness.
给作者的问题
See question about typos in the suggestions section.
论据与证据
The paper makes a (somewhat implicit) claim that they are the first to observe disparate unlearning guarantees/quality amongst datapoints, but this is a known phenomenon in the literature. It was shown in [1] that when using DP-SGD to unlearn (i.e., Langevin dynamics), many datapoints have magnitudes better unlearning guarantees than the worst-case, i.e., there are significant differences between worst-case privacy leakage and average case. Empirically, across a broad range of unlearning methods, [2] provided a metric to identify these harder to unlearn datapoints. [2] is cited in this paper as “concurrent work”, but given the paper is now almost a year old, “concurrent work” may not be an apt characterization of the results of that paper. Also [1] is nearly 2 years old, and Langevin unlearning was the method found to be the best in this paper (and [1] studies it without additional fine-tuning).
This said, I think the paper still has impact with a different claim. In the context of this previous literature, I believe the current paper helps expand the societal impact of these past findings by showing these known disparate unlearning guarantees/quality also correspond to identifiable minority groups. I hope the authors consider reframing the claims in the paper accordingly, and look forward to discussion during the rebuttal on this.
[1] Thudi, Anvith, et al. "Gradients look alike: Sensitivity is often overestimated in {DP-SGD}." 33rd USENIX Security Symposium (USENIX Security 24). 2024.
[2] Zhao, Kairan, et al. "What makes unlearning hard and what to do about it." Advances in Neural Information Processing Systems 37 (2025): 12293-12333.
方法与评估标准
The methods and evaluation made sense, and I would say were quite thorough (many ablations are conducted).
理论论述
No theoretical claims were made.
实验设计与分析
I found no significant issues with the experimental design or analyses from the description in the main body and Appendix section A-C.7.
There is no description of repetitions for the random, canary, and minority evaluations. Given the number of settings tested I do not believe this is an issue, but raise it here in case the authors did have repetitions and I missed this description (or can add suitable repetitions over any randomness in the evaluation pipeline).
补充材料
I read Appendix sections A-C.7.
与现有文献的关系
As mentioned in the claims, past work had observed disparities in the unlearning guarantees over datapoints. This paper adds to this observation by showing identifiable minority groups are disproportionately affected. This finding also has similarities to the disparities observed in the privacy literature [3] for minority group, though I believe those findings are focused on the impact privacy has on performance across groups.
[3] Bagdasaryan, Eugene, Omid Poursaeed, and Vitaly Shmatikov. "Differential privacy has disparate impact on model accuracy." Advances in neural information processing systems 32 (2019).
遗漏的重要参考文献
[1] observed disparate unlearning guarantees for DP-SGD, i.e., Langevin unlearning without the additional fine-tuning which would further minimize this initial divergence. [2] is cited as “concurrent” work, though it is nearly a year old and was accepted to Neurips 2024.
[1] Thudi, Anvith, et al. "Gradients look alike: Sensitivity is often overestimated in {DP-SGD}." 33rd USENIX Security Symposium (USENIX Security 24). 2024.
[2] Zhao, Kairan, et al. "What makes unlearning hard and what to do about it." Advances in Neural Information Processing Systems 37 (2025): 12293-12333.
其他优缺点
Strengths:
- Thorough and insightful empirical evaluation
- The paper is well-written
- Ablations have clear impact on future unlearning methodology
Weaknesses:
- The differences between average and worst-case datapoints for unlearning is already known in the literature (requires further refinement of claims made in the paper)
其他意见或建议
I believe this paper can have significant impact to the field of unlearning, but requires clarifying the results in the context of previous findings. I am happy to raise my score given clarifications to my “major” suggestion. I also have other suggestions, but label these as relatively more minor.
Major:
- I suggest clarifying in the introduction and related work section that previous work had found that there is a gap between the unlearning guarantees for the easiest/average and hardest datapoints (see comments and references in the claims section). Given this, the authors can claim to build on this past insight by showing these harder datapoints coincide with minority groups.
Minor:
- If possible, could the authors compare the scores of the metric proposed in [2] across the average and minority evaluations? This would help connect the findings to that work.
- In EUk and CFk paragraph, the authors wrote “expansive” I think you mean “expensive”.
- In line 364 do you mean “by at least” instead of “for at least”?
[2] Zhao, Kairan, et al. "What makes unlearning hard and what to do about it." Advances in Neural Information Processing Systems 37 (2025): 12293-12333.
We sincerely thank Reviewer jBLe for the insightful suggestions and for recognizing the potential impact of our work on the field of unlearning. Below, we address the reviewer’s questions and suggestions.
Major comments & W1. Clarification in Introduction and Related Work.
We sincerely thank the reviewer for the thoughtful suggestion. We agree that further clarifying our contributions relative to prior work will help readers better situate our findings and we outline below the changes we have made in the Introduction and Related Work sections to address this point.
Revision to the Introduction: We have revised the paragraph beginning with “We identify a critical pitfall...” to more explicitly highlight the dual motivation behind our work. First, prior work [1,2] shows that unlearning difficulty varies significantly across samples in vision tasks. Second, LLM studies reveal highly non-uniform memorization patterns. Building on both insights, we first show that privacy risk in LLM unlearning varies across samples, and further demonstrate that high-risk points often coincide with identifiable minority groups. This connection adds a new dimension of societal relevance to existing technical findings.
Revision to the Related Work: We have updated the final paragraph of the Related Work section to more accurately describe [2] as estimating the unlearning difficulty of individual samples in vision tasks using an entanglement-based metric, and showing how variations in the memorization of image representations impact unlearning performance. Our study investigates how minority subgroups in LLMs fine-tuning systematically suffer from degraded unlearning efficacy, an effect that coincides with harder-to-unlearn examples. As a future direction, it would be interesting to explore the connection between such entanglement-based difficulty measures and our minority-aware findings in the LLM setting.
[1] Thudi et al. Gradients look alike: Sensitivity is often overestimated in DP-SGD.
[2] Zhao et al. What makes unlearning hard and what to do about it.
Missing clarification on evaluation repetitions.
We thank the reviewer for the careful comment. All components involving randomness were run with a fixed random seed (42), as noted in App. B.2 and in line with common practice. Due to the large scale of our experiments, we did not repeat runs with multiple seeds. We will clarify this in App. B.2 and note that using a broader range of seeds could further improve robustness in future work.
Could the authors compare the metric proposed in [2]?
We thank the reviewer for this insightful suggestion. We believe the reviewer is referring to the ToW metric proposed in [2]. While the original ToW metric is designed for vision tasks and based on prediction accuracy, its underlying spirit can be adapted to our LLM setting using privacy-related metrics. Specifically, we construct a ToW metric based on AUC differences between the unlearned model () and a retrained model () across key dataset partitions. Our adapted ToW metric:
where . Higher ToW indicates lower privacy leakage.
We applied this metric to the ECHR dataset with GPT-2 under loss-based attack. As shown in the table, the canary and minority settings consistently yield lower ToW scores than the random setting, indicating greater privacy leakage. These results reinforce our main finding that minority points are harder to unlearn. In addition, we believe it would be interesting future work to further investigate how the idea behind ToW could be leveraged to help identify minority points in the LLM setting.
| Setting | No Unlearn | Gradient Ascent | Random Label | EUk | CFk | NegGrad+ | SCRUB | Langevin Unlearning |
|---|---|---|---|---|---|---|---|---|
| Random | 0.768 | 0.917 | 0.769 | 0.739 | 0.780 | 0.782 | 0.771 | 0.877 |
| Canary | 0.727 | 0.881 | 0.729 | 0.694 | 0.747 | 0.753 | 0.732 | 0.834 |
| Minority | 0.717 | 0.861 | 0.721 | 0.733 | 0.728 | 0.732 | 0.714 | 0.863 |
Typos.
We thank the reviewer for pointing out these typos and have corrected both in the revised version.
Summary: We sincerely thank Reviewer jBLe again for the thoughtful and constructive feedback. We would be happy to clarify any remaining concerns or engage in further discussion. We hope that our responses have addressed the key points raised, and we truly appreciate your openness to reconsidering the score upon clarification.
Thanks for the response!
I believe my concerns have been addressed, and have raised my score accordingly! I believe this is a valuable empirical contribution, and hopefully inspires further (algorithmic or empirical) work into the limitations of current private ML approaches.
Thank you for your valuable feedback and suggestions, which have greatly helped us improve our manuscript. We also truly appreciate your support on this paper.
The paper argues that minority data points are harder to unlearn than the common typical example. To show this, the authors construct canaries by replacing PII in two datasets in the forget set with infrequent PIIs. Then the authors show that common unlearning algorithms struggle to unlearn these data points under MIA evaluation compared to the rest common examples.
给作者的问题
N/a
论据与证据
I find the central claim of minority data points conflated with the idea of approximate unlearning algorithms find it harder to unlearn outlier examples. It is possible that is what the authors want to show but in my opinion, that is a very handwavy argument unless the authors model it rigorously. In that regard, I find the paper flawed as it hard to say what minority is. The experimental results clearly show that common unlearning algorithms do not work uniformly well and the authors have found a canary strategy that succeeds at showing this. But I am not convinced it shows much about "minority" without a proper characterization of what minority is.
The authors claim that Langevin unlearning performs better than other and so as it is the only method that adds noise, noise has a crucial role to play. But, langevin unlearning, unlike SCRUB and Gradient Ascent, comes with theoretical guarantees. That could also be the reason why it holds up. I find the argument that noise addition may have some important role a bit unnecessary given that noise is needed for that definition of unlearning to hold theoretically.
方法与评估标准
The idea of using PII as a marker of minority group affiliation seems odd to me. Why should PII indicate whether a data point belongs to minority or not. Either there should be an argument to motivate why this is a sensible choice or the authors can propose a mathematical model where this is a sensible choice.
理论论述
N/A
实验设计与分析
In addition to my point above, another experiment to do could be to conduct the same experiments on unlearning data points which are classically considered to be minority in datasets that are used for studying fairness. While this takes away the concept of canaries, it can help drive home the point about minorities with a bit less experimental interventions.
补充材料
No
与现有文献的关系
One of the main contributions of the paper is adversarial evaluation of unlearning using canaries. This is now a broadly studied field in Machine unlearning. Corrective machine unlearning https://arxiv.org/abs/2402.14015 and follow-up works (e.g. https://arxiv.org/abs/2406.09173 and https://arxiv.org/abs/2411.13731) also inject special data point (poisons) and try to unlearn them. So, I am not sure the idea of unlearning specifically injected data points is very novel.
遗漏的重要参考文献
See the references above.
其他优缺点
I have mentioned the important points above. From an originality perspective, the central message is interesting and while it may seem obvious it would be good to see a thorough evaluation of the same. The paper attempts that. However, I find the idea of minority, outliers, hard to unlearn etc all conflated together in the absence of a mathematical model.
其他意见或建议
post rebuttal
I thank the authors for the rebuttal. I have read it and I maintain my original rating.
My core concern about conflating the idea of "rare PII" with "minority" is not resolved here. The rebuttal clarifies this by saying "our paper defines minority instances based on the population frequency of their corresponding identifiers " but this is not really explaining the reasoning behind the choice but just stating that the choice was made. I might be inclined to agree that the paper says rare PIIs are harder to unlearn but saying minority groups are harder to unlearn requires an additional step to show why "rare PIIs" constitute a broad class of "minority groups".
There are several ways to do it including referring to the extensive fairness literature, providing a mathematical model (I think its up to the authors to define it and not me providing it), or simply restricting the scope of the work. I think making this broad stroke to say this paper talks about hardness of unlearning minorities is misleading. Finally, also see my note in the original review above on additional experiments on classical fairness datasets.
I am not sure I totally follow what the rebuttal is trying to clarify about langevin unlearning. Just to clarify, any algorithm that provides certifiable unlearning (like lavengin unlearning) and is not exact, necessarily needs some randomness. This is not surprising and any claim regarding deterministic algorithm that claims to do this is not right. My entire point is that it is unnecessary to talk about noise addition as being something special, its just that Langevin unlearning comes with guarantees and thus it is not possible to evade the unlearning guarantee, unlike all other methods like SCRUB etc.
Finally, the rebuttal claims "As noted in these papers, their goals and assumptions differ from those of privacy-oriented unlearning, which is the focus of our work. To acknowledge their relevance, we have cited them in the introductory paragraph of Section 3 (Preliminaries), where we discuss the broader unlearning landscape."
My point was not that these papers be cited in the introduction or even that they do not consider privacy-oriented unlearning. Rather my point was that they do adversarial evaluation of unlearning and thus the strategy of inserting different kinds of special data points (poisons, canaries, Witches brew point) have been considered here and some of their strategy can be used here.
We greatly thank Reviewer Cub4 for reviewing our paper. Below, we address the questions and comments raised by Reviewer Cub4.
The paper fails to provide a rigorous definition of ‘minority’. The definition should use a mathematical model.
We thank the reviewer for this comment. As noted in Footnote 2 and detailed in Sections 4.1 and 4.2, our paper defines minority instances based on the population frequency of their corresponding identifiers (e.g., rare phone area codes or email domains). This definition is precise, grounded in observable statistics, and enables systematic evaluation.
We further validate it through controlled canary experiments, where replacing common identifiers with rare ones while keeping the rest of the input fixed leads to significantly higher privacy leakage. Similar trends are observed in real-world minority samples (“Minority Setting”).
While our definition is frequency-based, we welcome clarification on the type of mathematical model the reviewer envisions. Extending the framework to semantically defined minority groups may benefit from additional modeling, which we consider a promising future direction.
The minority points seem to conflate with outliers or hard-to-unlearn examples.
We thank the reviewer for raising this point and allowing us to clarify a potential misunderstanding. Our experiments demonstrate that the PII-population-defined minority samples consistently exhibit significantly higher privacy leakage across a wide range of unlearning methods. These results indicate that our defined minority points are systematically harder to unlearn in terms of privacy risk, as acknowledged by Reviewer jBLe. To be clear, by harder to unlearn, we refer specifically to their elevated vulnerability under privacy metrics. To prevent potential confusion, we will further refine the language in the paper.
The claim that Langevin unlearning performs better due to noise addition is unconvincing, as its performance instead can be attributed to its theoretical guarantees.
We thank the reviewer for their comment. There is an intrinsic connection between noise injection and the theoretical guarantees of Langevin unlearning. While it is true that Langevin dynamics enjoys privacy guarantees, these guarantees fundamentally rely on the analysis of Langevin Dynamic, how injected Gaussian noise affects the model’s trajectory over adjacent datasets, typically measured via Rényi divergence. In other words, the theoretical robustness of Langevin unlearning is itself a direct consequence of noise injection. We will revise the paper to make this connection more explicit.
Using PII to represent minority data points lacks justification—there’s no clear reason why PII should indicate minority status.
We thank the reviewer for this question. First, PII inherently contains sensitive personal information, making it a central focus in privacy-related research [1,2,3]. Moreover, using PII also enables us to clearly define minority points by leveraging population-level frequency (e.g., rare area codes or email domains), which we believe helps address the reviewer’s concern regarding the clarity of the minority definition. Finally, from an empirical perspective, we show that instances defined as minority under this framework consistently exhibit significantly higher privacy leakage across multiple unlearning methods.
[1] Lukas et al. Analyzing leakage of personally identifiable information in language models.
[2] Kim et al. Propile: Probing privacy leakage in large language models.
[3] Li et al. Llm-pbe: Assessing data privacy in large language models.
Other related references.
We thank the reviewer for sharing the relevant works [4, 5, 6]. These papers focus on corrective unlearning, which aims to address corrupted or harmful training data. As noted in these papers, their goals and assumptions differ from those of privacy-oriented unlearning, which is the focus of our work. To acknowledge their relevance, we have cited them in the introductory paragraph of Section 3 (Preliminaries), where we discuss the broader unlearning landscape.
While canary crafting has been explored in privacy research, it remains overlooked in evaluating unlearning methods for LLMs. Our work fills this gap by demonstrating that selecting high-risk samples, especially minority points, is key to uncovering underestimated privacy risks in current unlearning evaluations.
[4] Goel et al. Corrective machine unlearning.
[5] Li et al. Delta-Influence: Unlearning Poisons via Influence Functions.
[6] Schoepf et al. Potion: Towards poison unlearning.
Summary: We thank Reviewer Cub4 for the thoughtful and detailed feedback. We appreciate the opportunity to respond and are happy to clarify any remaining concerns. If our responses have addressed the points raised, we would be grateful if this could be reflected in the final evaluation.
This paper investigates the underestimated privacy risks faced by minority populations in the context of large language model (LLM) unlearning. The authors argue that current evaluations, which rely on average-case assessments and MIAs, underestimate these risks since minority data is harder to forget. They propose a minority-aware evaluation framework and validate it through controlled experiments with canary injection and real-world datasets (Enron, ECHR). Results show at least 20% higher privacy leakage for minority data across various unlearning methods, MIA variants, datasets, and LLM scales. Among the methods tested, Langevin Unlearning offers the best balance between privacy and utility.
给作者的问题
- Have the authors considered testing larger models (e.g., 13B, 30B) to see if privacy risks scale with model size?
- Could the authors apply their minority-aware evaluation framework to non-PII minority data, such as dialects or underrepresented social groups?
- Do the authors plan to release their code and datasets to enhance reproducibility?
论据与证据
Claims:
- Standard LLM unlearning evaluations underestimate privacy risks for minority groups, as minority data is harder to forget.
- A minority-aware evaluation framework reveals overlooked privacy vulnerabilities.
- Different unlearning methods vary in effectiveness, with Langevin Unlearning achieving the best privacy-utility balance.
Evidence:
- Experiments show that minority data (e.g., rare area codes) is more prone to memorization and experiences higher privacy leakage.
- Canary injection and direct minority subset removal confirm greater privacy risks compared to randomly selected forget sets.
- Benchmarking across multiple datasets (Enron, ECHR), MIAs (lossMIA, zlibMIA, Min-K%), and LLMs (GPT-2, LLaMA-2 7B) consistently supports the claims.
方法与评估标准
Methods:
The study leverages well-established MIAs to assess unlearning efficacy and introduces canary injection to isolate minority data effects. It uses real-world datasets (Enron, ECHR) containing PII to enhance relevance. Unlearning methods are tested across three scenarios: Random (standard forget set selection), Canary (synthetically modified rare identifiers), and Minority (naturally rare data removal).
Evaluation Criteria:
The methodology could be improved by evaluating larger LLMs (e.g., 13B, 30B) to assess whether privacy risks scale with model size. Expanding beyond PII to include underrepresented dialects or cultural biases would further strengthen the study’s applicability.
理论论述
The paper builds on established principles of machine unlearning and privacy auditing without introducing complex theoretical proofs.
实验设计与分析
The paper rigorously compares multiple unlearning methods across datasets and LLM architectures, yielding statistically significant results with consistent trends. However, its focus on PII data limits generalizability to other sensitive information types, and it does not explore how different fine-tuning strategies, such as full fine-tuning versus LoRA-based tuning, impact unlearning effectiveness.
补充材料
Although the appendix includes dataset statistics and hyperparameter details, the authors do not provide code. Reproducibility would be enhanced if the datasets, models, and evaluation scripts were made publicly available.
与现有文献的关系
The paper builds on prior work in privacy auditing, machine unlearning, and membership inference attacks. It addresses a critical gap in the literature by focusing on the privacy risks for minority groups, which are often overlooked in standard unlearning evaluations.
遗漏的重要参考文献
The paper provides a comprehensive and well-researched discussion, citing key works on machine unlearning, privacy risks, and membership inference attacks. It references foundational studies on differential privacy and unlearning (e.g., Guo et al., 2020; Bourtoule et al., 2021), as well as LLM privacy vulnerabilities (e.g., Carlini et al., 2022; Nasr et al., 2023).
其他优缺点
Strengths:
- The paper highlights a critical oversight in LLM unlearning evaluations, emphasizing the unique privacy risks faced by minority groups.
- The paper presents comprehensive experiments with diverse datasets, unlearning methods, and MIA techniques.
- The authors introduces a novel minority-aware evaluation framework, enhancing standard unlearning assessments.
Weaknesses:
- The paper focuses only on PII data, leaving the applicability to other minority attributes uncertain.
- The paper do not explore the impact of different fine-tuning strategies, such as LoRA versus full fine-tuning.
- The authors do not provide code, which limits reproducibility and independent verification.
其他意见或建议
- The authors should consider evaluating larger LLMs (13B, 30B) to assess scaling effects.
- The study should be expanded to include non-PII minority data, such as linguistic or cultural factors.
- The authors should release code and datasets to improve transparency and reproducibility.
We sincerely thank Reviewer WLjM for acknowledging the novelty and comprehensiveness of our minority-aware framework, as well as for supporting the acceptance of our paper. Below, we provide detailed responses to the questions and concerns raised.
Q3 & W3 & C3: Code for our paper.
We appreciate the reviewer’s emphasis on the importance of code availability for both reproducibility and broader dissemination of our work. Yes, in the supplementary material (provided as a ZIP file), we have included all the relevant code and provided a detailed README.md outlining the installation steps. After the paper is made public, we will further release our GitHub repository to ensure wider accessibility and community engagement.
W1 & C2 & Q2 : “The paper focuses only on PII data, leaving the applicability to other minority attributes uncertain.” & Could the authors apply their minority-aware evaluation framework to non-PII minority data, such as dialects or underrepresented social groups?
We thank the reviewer for this insightful and important question. We fully agree that clearly defining what constitutes a “minority” in different contexts is nontrivial, and that generalizing beyond PII is an open and meaningful research direction.
In this work, we chose to focus on PII for two key reasons. First, PII is inherently sensitive and has been a central focus in privacy-related studies [1,2,3], making it a practical and impactful setting for studying unlearning. Second, PII (such as phone number area codes or email domains) offers a well-defined population structure, which allows us to rigorously formulate and operationalize the notion of minority subgroups in a reproducible way.
That said, when non-PII attributes exhibit a clear format and can be associated with measurable population-level frequencies, our framework can be directly generalized to these settings. For attributes that are more semantically defined or lack an explicit population distribution, such as linguistic patterns, we acknowledge that identifying minority status is more challenging. Extending our framework to these settings is a promising direction for future work, and we would be excited to explore this further.
[1] Lukas et al. Analyzing leakage of personally identifiable information in language models. In S&P 2023.
[2] Kim et al. Propile: Probing privacy leakage in large language models. In NeurIPS 2023.
[3] Li et al. Llm-pbe: Assessing data privacy in large language models. In VLDB 2024.
W2: The paper does not explore the impact of different fine-tuning strategies, such as LoRA versus full fine-tuning.
We appreciate the reviewer’s suggestion. Exploring the impact of different fine-tuning strategies, such as LoRA versus full fine-tuning, on unlearning and privacy leakage is indeed an interesting direction. Our current experiments are designed to support our core claim that minority samples are disproportionately vulnerable to privacy leakage across multiple datasets, attack methods, and model scales. Due to computational constraints, especially for large-scale multi-GPU training, we have so far focused on the more resource-efficient LoRA-based fine-tuning strategy. While full fine-tuning on models like LLaMA-7B would be a valuable extension, we believe it is not essential to establish the key findings of our study. We view this as a promising avenue for future exploration and would be happy to investigate it as resources permit.
C1 & Q1: Have the authors considered testing larger models (e.g., 13B, 30B) to see if privacy risks scale with model size?
We thank the reviewer for the valuable question. Investigating how privacy risks scale with model size under our minority-aware framework is indeed an interesting and important direction. Prior work has demonstrated that both verbatim memorization and extractability tend to increase with model scale [4], which suggests that minority-group privacy risks may also become more severe in larger models. While our current experiments include both small (GPT-2) and mid-sized (LLaMa-2 7B) models to support our central claims, which align with model sizes used in current literature [1,5,6], computational constraints have limited our ability to run experiments on larger models such as 13B or 30B. We view this as a promising avenue for future work and plan to explore it as resources become available.
[4] Carlini et al. Quantifying memorization across neural language models. In ICLR 2023.
[5] Maini et al. Tofu: A task of fictitious unlearning for llms. In COLM 2024.
[6] Shi et al. Muse: Machine unlearning six-way evaluation for language models. In ICLR 2025.
Summary: We are grateful to Reviewer WLjM for the supportive and thoughtful review. We would be happy to provide further clarifications or engage in additional discussion. If our responses have helped resolve the raised concerns, we humbly hope this might be taken into account in the final score.
Thank you for the detailed rebuttal and the substantial improvements made to the manuscript. The revised version effectively addresses my concerns, and I have accordingly raised my score.
Thank you very much for your appreciation. We truly value your thoughtful feedback throughout the review process.
This work highlights a critical issue in unlearning that not all samples are of the same difficulty to unlearn. Though this observation is not in it self entirely new, this work presents a careful exploration into this idea and proposes methods to evaluating this. Two reviewers both agreed that this presents a novel and important contribution to the field, and that the evaluation was well carried out.
Despite this, I encourage the authors to consider their usage of minority and how it may conflate with outlier, especially in terms of the empirical setup and takeaways. This was one notable concern left unaddressed after the rebuttal. The proposed changes during the rebuttal will also help address concerns around placement of this work within the literature and it's use of insights from past work.