5.3

/10

Rejected4 位审稿人

最低5最高6标准差0.4

3.8

置信度

正确性2.5

贡献度2.3

表达2.5

ICLR 2025

Towards Understanding the Feasibility of Machine Unlearning

Mahtab Sarvmaili,Hassan Sajjad,Ga Wu

OpenReview PDF

提交: 2024-09-27更新: 2025-02-05

摘要

关键词

Machine UnlearningKernelized Stein Discrepancy

评审与讨论

审稿意见

评分: 6置信度: 42024-11-01

This paper tackles the problem of assessing the difficulty of unlearning individual training samples in machine learning models, a need highlighted by recent privacy regulations. While most existing unlearning methods focus on overall unlearning success rates, this work shifts attention to the unique challenges of unlearning specific samples, considering factors like the underlying model and data characteristics. The authors propose heuristics to predict the success of unlearning operations for individual data points and explore variations in unlearning difficulty across samples, with a ranking mechanism to identify samples that are more resistant to unlearning. A key contribution is the use of Kernelized Stein Discrepancy (KSD) as a model- and data-specific heuristic for gauging unlearning difficulty. The method’s effectiveness is demonstrated on multiple classification tasks, showcasing its applicability across diverse scenarios and highlighting its potential to refine the measurement of unlearning success at a granular level.

优点

This work introduces an original and timely contribution to the field of unlearning by tackling a previously overlooked question: the unlearnability of specific samples.
The use of Kernelized Stein Discrepancy (KSD) in this context is both innovative and technically sound. The KSD-based unlearnability score, which incorporates model and data characteristics, is compelling.

缺点

Regulatory Implications. It is not clear that the developed tools are useful for advancing unlearning techniques to comply with regulations. The authors claim that "With the proposed evaluation metrics, one may reduce unnecessary machine unlearning operations when data points are determined to be infeasible to unlearn.”, but this is not convincing since erasure is mandatory in any case, and it does not seem reasonable to decide to retrain a model from scratch because a heuristic score ranking method indicates that a single sample may be hard to unlearn. More discussion on the regulatory utility or limitations of unlearnability scores would strengthen this point.
Rigorous Unlearning Objective. The unlearning objective presented in Section 2.1 is based on heuristics, such as maximizing the loss on the forget set, which does not guarantee that an adversary or auditor could not detect the presence of the forget data in the unlearned model. A more rigorous definition of unlearning – one that establishes a statistical similarity to retraining from scratch – would better support the authors’ contributions and align their methodology with recent work in statistically grounded unlearning, e.g., see (Guo et al. 2020).
Baseline Comparisons and Additional Techniques. While the inclusion of KSD is interesting, the paper would benefit from a broader comparison with baselines like influence functions (Koh and Liang, 2017), which are efficient and widely applicable to different architectures. Additionally, incorporating more advanced unlearning techniques or defenses against membership inference attacks (Carlini et al. 2022) would strengthen the empirical evaluation, as only three unlearning algorithms are tested here, limiting the generalizability of results.

References

Koh and Liang (ICML 2017). Understanding Black-box Predictions via Influence Functions.

Guo et al. (ICML 2020). Certified data removal from machine learning models.

Carlini et al. (S&P 2022). Membership Inference Attacks From First Principles.

问题

Impact of Unlearnability Scores: Can the authors elaborate on practical applications of unlearnability scores? For example, could these scores help in refining existing unlearning methods to improve handling of difficult samples, or are there contexts in which they could aid in privacy-preserving model design?
Empirical Limitations: What criteria were used to select the three unlearning techniques in the empirical evaluation? Could the authors comment on the generalizability of their methodology to other unlearning frameworks and provide insights on adapting it to handle more complex attack models?

2024-11-23

Thank you very much for your feedback. Below is our response to the concerns and questions you raised.

Regulatory Implications

This paper aims to introduce a new research direction focused on investigating the feasibility of unlearning. to understand the factors that influence the feasibility of unlearning data, specifically in a manner that is agnostic to any particular unlearning algorithm. The main focus is on the evaluating the feasibility of unlearning and understanding the relation through the lens of KSD based scoring, but we didn't invest in designing unlearning algorithm using the KSD scoring even the potential exist. The regulatory implications of "machine unlearning" in relation to the "feasibility of unlearning" are beyond the scope of this research and are left for future exploration. We encourage further research to investigate these aspects in greater detail.

Rigorous Unlearning Objective

To ensure that unlearned data is removed from the model, we explored the Membership Inference Attack efficiency. MIA efficacy is quantified by the ratio of samples predicted as "forgotten samples" (True Negatives TN) to the total number of samples in the forgetting set $|\mathcal{D}_f|$ . The MIA-efficacy reflect the effectiveness of unlearning, where higher MIA-efficiency implies less information about samples unlearned $D_f$ , indicating more successful unlearning outcome. This criteria is reported in Table 2., and full statistical analysis in Table 8.

Baseline Comparisons and Additional Techniques

The primary contribution of this paper lies in exploring the "Feasibility of Machine Unlearning" before diving into investigating the unlearning algorithm. Previous works have rushed into providing the new machine unlearning approaches, without solid understanding about the unlearning feasibility of data. However, this research direction has a great potential to invite future research into the feasibility of unlearning, providing a robust basis for algorithm development.

Although the Influence function is one of the most mathematically solid base feature attribution method, it is significantly expensive and incomparable w.r.t the KSD which has great potential for understanding and exploring the data-model distribution.

Impact of Unlearnability Scores

As discussed later, the primary goal of this study is to understand the factors contributing to the feasibility of data unlearning, with particular emphasis on investigating this feasibility challenge independently of specific unlearning algorithms. The practical application of KSD-based scores for implementing unlearning algorithms lies beyond the scope of this research and is left for future exploration. Instead, the focus is directed towards evaluating the feasibility of unlearning and analyzing its relationship through the perspective of KSD-based scoring. While the potential for designing unlearning algorithms using KSD-based scoring exists, our research does not pursue that direction.

Additionally KSD is measured as the $\mathbb{E}_{x, x' \sim q} [\kappa_p(x, x')]$ any changes to the data samples (unlearning and removing any subset of $\kappa_p$ from the data from) damage the calculated KSD. We employ the KSD as the measurement of model distribution, therefore, we can estimate each of their contribution. Removal of any subset of data will significanly damage the KSD meaning. It is not trivial to apply the KSD for unlearning; hwoever, the insight for the potential approach on employing KSD for unlearning is only using the Scoring Heuristic to select the easy and difficult samples for unlearning.

Empirical evaluation

From the literature[1, 2], we noticed that these two methods are most common practice and considered to be the most effective unlearning algorithms. Rather than tailoring our approach to a specific unlearning algorithm, we aimed to highlight an overlooked challenge in previous studies and establish a new research direction. Still as part of our evaluation, we conducted Membership Inference Attack to investigate whether the unlearned model carries the influence of unlearned sample after the unlearning and how it varies between an easy vs difficult to unlearn sample.

We hope that in the future research, the potential of KSD based scoring metric be employed for the evaluation of privacy and adversarial attacks.

[1] Model Sparsity Can Simplify Machine Unlearning. Jinghan Jia et al. NeurIPS 2023.

[2] Gundavarapu, Saaketh Koundinya, et al. "Machine Unlearning in Large Language Models." arXiv preprint arXiv:2405.15152 (2024).

2024-11-26

I thank the authors for their rebuttal and would like to maintain my original score.

审稿意见

评分: 5置信度: 32024-11-03

The paper introduces a set of KSD-based metrics for quantifying forget difficulty, taking into account the characteristics of the target model and data distribution. It introduces a KSD-based heuristic approach to assess forget difficulty, where KSD is a parameterized kernel function tailored for each model and dataset. These metrics hold significant practical value in supporting decision-making processes.

优点

The KSD-based metrics presented in the paper are particularly intriguing, as they offer valuable insights into the relationship between data and models in the machine unlearning field.
The paper is easy to follow. The authors have effectively communicated their ideas, making complex topics accessible and engaging for the audience.
Understanding which samples are more difficult to unlearn has the potential to aid the development of machine unlearning.

缺点

Table 1 shows many counterintuitive numerical results, such as the basic baseline GradAsct achieving 0% accuracy on the forget set while maintaining 99% accuracy on the test set. Even when the authors' metric indicates that the most difficult-to-unlearn samples to forget in SVNH can also achieve 0% accuracy on the forget set, the accuracy on the test set is mostly around 80%. This result is incredibly hard to believe, especially since the current state-of-the-art GradAsct (enhanced GradAsct baseline: NegGrad+ proposed by [A]) cannot achieve such results.
The authors claim that the metric proposed in the paper does not rely on a specific unlearning algorithm, making it unreasonable to only select the simplest baseline finetune and GradAsct for the experiments. This suggests that the metric may only be effective for finetune and GradAsct. Considering the existence of different methods such as teacher-student methodology [A], weight saliency [B], knowledge distillation [C], Fisher [D], and Newton Update [E], simple finetune and GradAsct cannot adequately represent these methods. As a primary contribution of proposing some metrics, the authors should select a representative method from various heuristic unlearning works to verify that the metric does not depend on any specific unlearning algorithm. Only when the phenomenon observed in the authors' metric consistently exists across these different methods can it be concluded that the metric does not rely on a specific unlearning algorithm.
The author's citations can be quite misleading in several instances. Such as, in lines 98-99, the authors mention: "Gradient Ascent methods (Thudi et al., 2022; Graves et al., 2021.), adjust the model’s weights in the direction of the gradient to increase the model’s error on the data intended for forgetting." However, it's difficult to classify the methods of Thudi et al. (2022) and Graves et al. as ascent, since ascent implies the need to compute the negative gradient, as in [J], rather than merely adjusting the model. In line 112, they state, "Guo et al. (2020) [E] introduced the concept of certified unlearning, grounded in information theory and specifically tailored to the Fisher Information Matrix." However, to my knowledge, Guo et al. (2020) do not mention anything related to the Fisher Information Matrix. If the authors intended to reference the Fisher unlearning method, I suspect they meant to cite [D]. Alternatively, if they intended to reference the use of information theory and the Fisher metric to evaluate unlearning methods, I would guess they meant [I].
The authors lack descriptions of some baseline settings and the choice of evaluation metrics. Please refer to my question for specifics.

[A] Towards Unbounded Machine Unlearning. Meghdad Kurmanji, et al. NeurIPS 2023.

[B] SalUn: Empowering Machine Unlearning via Gradient-Based Weight Saliency in Both Image Classification and Generation. Fan, Chongyu, et al. ICLR, 2024.

[C] Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. Vikram S Chundawat et al. AAAI 2023.

[D] Eternal sunshine of the spotless net: Selective forgetting in deep networks. Golatkar et al. CVPR, 2020.

[E] Certified data removal from machine learning models. Chuan Guo, et al. ICML, 2020.

[H] Model Sparsity Can Simplify Machine Unlearning. Jinghan Jia et al. NeurIPS 2023.

[I] Evaluating Machine Unlearning via Epistemic Uncertainty. Alexander Becker et al. ECML 2021.

[J] Machine Unlearning of Pre-trained Large Language Models. Jin Yao et al. ACL 2024.

问题

The authors are suggested to explain the mentioned numerical results.
Can the proposed metric be applied to [A]-[E]? It can be more convincing if the authors show these in experiments.
What is the expression for 'MIA-efficacy'? It would be best to explain what 'MIA-efficacy' is, either in the main text or in the appendix, rather than just directing the reader to a specific paper, as this is not a common MIA evaluation metric (e.g., AUC, attack success rate).
Which references did the authors use for the evaluation of GradAsc**, **FineTune, and Fisher? To avoid confusion, the authors should clarify in line 112 whether these methods are taken from other papers or are their own designs.
What is the overfit_threshold in line 670? The authors should ideally provide a brief description of these baselines and their settings, either in the main text or in the appendix.
Have the authors tried any NLP-related tasks? I'm particularly curious about the difficulty of forgetting data in NLP compared to CV tasks.

2024-11-23

Reference Clarification

For the experimental evaluation and the unlearning algorithms, we adhered strictly to the design framework recommended by [H]. We wanted to ensure that the validity of our experimental evaluations and guarantees that our KSD-based scoring metrics remain entirely independent of the specific unlearning settings.

Overfitting threshold

The overfitting threshold is a mechanism introduced specifically for GradAscent during the unlearning process. Without this control, GradAscent can lead to a significant increase in the error rate, rendering the results invalid. To address this issue, we define a cap on the model's error for the forget set, referred to as the overfitting threshold. This threshold prevents the unlearning loss from becoming excessively high, which mitigates the risk of distorting the unlearned model. In particular, unlearning a single data point with GradAscent requires careful control to ensure the process is effective. By applying this threshold cap, we aim to prevent excessive error growth while preserving the overall quality of the model.

Feasibility of unlearning for NLP

At this stage of our research, we have completed experimental evaluations on the image classification dataset. As a subsequent step, we aim to extend our investigation to examine the feasibility of unlearning in natural language models and their corresponding datasets.

Citations

We are so thankful for your feedback, we addressed them in the revised version of the paper.