PaperHub
6.8
/10
Rejected4 位审稿人
最低4最高5标准差0.4
4
5
4
4
3.0
置信度
创新性2.5
质量2.3
清晰度3.0
重要性2.0
NeurIPS 2025

Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification

OpenReviewPDF
提交: 2025-05-10更新: 2025-10-29
TL;DR

This paper introduces an input-based unlearning strategy called Forget Vector, which enables efficient data forgetting without modifying model weights, demonstrating superior performance and parameter efficiency across various unlearning tasks.

摘要

关键词
Machine UnlearningForget VectorUniversity Input PerturbationImage Classification

评审与讨论

审稿意见
4
  • Machine Unlearning (MU) aims to remove the influence of specific irrelevant data from a trained model. The proposed method achieves unlearning by adding specific vectors to input images. More specifically, this paper proposes a proactive input-based unlearning strategy called “forget vectors,” which are data perturbations that can be generated independently of the input Additionally, we propose a method that combines multiple class-specific forget vectors using simple operations (e.g., linear combination). Experiments using public datasets discuss the effectiveness of forget vectors.

优缺点分析

Strengths

  • The paper is well structured and easy to understand.
  • Machine learning is an important application of machine learning.

Weaknesses

  • Concerns regarding the scalability of the proposed method

    • The proposed method calculates a forgetting vector for each class. The design of preparing at least one forgetting vector for all classes significantly limits the scalability of the proposed method.In the experiments, the effectiveness of the proposed method has only been evaluated for cases with a small number of classes, such as CIFAR10 and ImageNet-10. It has not been verified whether the proposed method is effective for larger classes. Furthermore, would the proposed method work in fine-grained cases?
  • The connection between Sec. 4 (Generalization of MU to Forget Data Shifts) and other chapters should be made clearer.

    • At the beginning of Section 4, the paper states, “Before designing the forget vector as formulated in (2), we examine the sensitivity of existing models.” However, the discussion in this chapter is limited to a very simple experiment on catastrophic forgetting. In my opinion, this does not constitute an analysis that demonstrates the validity of the proposed method design discussed in Section 5.
    • Additionally, the use of a line graph in Figure 2 is logically inappropriate (there is no continuity between GN, ET, and PGD).
  • Insufficient experiments

    • Detailed experiments and discussions on class combinations
      • Adding a forgetting vector to the input image alone may limit the expressive power. The performance of forgetting may vary significantly with simultaneous forgetting of more classes (or data) in larger datasets.
    • Furthermore, as already mentioned, comprehensive experiments on the effectiveness of the proposed method in fine-grained cases are lacking.
  • Numerical instability

    • As shown in Figure 3(a), UAgap is unstable with respect to the weight magnitude (i.e., the difference between w1 and w2). This suggests numerical instability in the proposed method.
  • More detailed discussion is needed regarding the novelty of the proposed method.

    • The proposed method claims to be inspired by visual prompting methods.However, a method using such a forgetting vector for MU has already been proposed. Therefore, I believe that additional discussion is necessary to claim novelty. (For example, in the following paper, MU is achieved by introducing a forgetting vector (^delta in this paper) for the input image (x in this paper). It is likely that many subsequent studies have proposed similar approaches.)
    • The proposed method claims that its strength lies in not changing model parameters, but in the context of machine learning, is there an application advantage in having information about the forgotten classes or data remaining in the model parameters?
    • T. Shibata et al. “Learning with Selective Forgetting.” IJCAI. Vol. 2. No. 4. 2021. (https://www.ijcai.org/proceedings/2021/0137.pdf)

问题

As described in the weaknesses, more detailed discussion is needed on the following points.

  • (1) Scalability of the proposed method
    • Can the proposed method claim effectiveness even for larger classes? Can the proposed method claim effectiveness even in fine-grained cases?
  • (2) Numerical instability
  • Figure 3(a) suggests numerical instability.A convincing explanation is required for this point.
  • (3) Novelty of the proposed method
    • Please explain the novelty of the forgetting vector. Additionally, please explain the significance of not modifying existing model parameters in the context of machine learning.
  • (4) More comprehensive experiments
    • Detailed experiments and discussions regarding class combinations are insufficient.

局限性

yes

最终评判理由

After reading the reviewers' and authors' comments, I have raised my rating by one notch.

格式问题

N/A

作者回复

We sincerely thank Reviewer pPup for the thoughtful and constructive feedback. Below, we provide detailed responses to each of the key questions raised.

1. Response to the scalability.

(1) Per-class forget vector limitation. We respectfully argue that this design choice does not significantly hinder scalability, for several reasons:

  • Forgetting vectors are small in size (e.g., 0.15M parameters), especially compared to full model weights (e.g., ViT-Base: 85M, ViT-L: 303M). Even when preparing vectors for a large number of classes, the total storage remains negligible.
  • Reusability across settings: Once learned, class-wise forget vectors can be reused across different forgetting requests (e.g., class subsets or random data unlearning).
  • Training Independence & Parallelism: Forgetting vectors for each class can be learned independently and in parallel with intact model, making the process highly scalable in practice.
  • Based on Table A2 in Appendix F, using ImageNet-10 and ViT-Base as an example. While retraining requires 301.98 minutes, our method learns a class-wise forget vector in only 4.80 minutes, highlighting the practical advantage of our approach in terms of computation time.

(2)Scalability to larger datasets. We evaluate the scalability of our method on ImageNet-100, a more diverse and challenging dataset with 100 classes, using the ViT-Base backbone. As shown in Table R11, our method achieves strong unlearning performance (UA, MIA-Efficacy) while preserving utility (RA, TA) and maintaining a low Avg. Gap. These results demonstrate that our approach remains effective in large-class settings, without requiring model retraining—an advantage in scenarios where retraining is costly or infeasible.

Table R11: Performance overview of various MU methods for image classification in class-wise forgetting scenario on ImageNet-100 using ViT-Base.

MU MethodUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
Retrain100.00100.0094.0788.360.00
FT97.1099.2090.5084.881.30
SalUn97.3598.6191.4085.102.49
Ours96.5899.3492.5085.452.14

(3) Fine-grained cases. We thank the reviewer for raising this important point. ImageNet-100 does include several semantically similar or visually confusing classes, such as "Tiger Cat" and "Egyptian Cat". These results in Table R12 highlight the fine-grained generalizability of our forgetting vectors, effectively forgetting one class while retaining accurate recognition of a semantically similar class.

Table R12: Accuracy under forgetting of semantically similar classes ("Tiger Cat" vs. "Egyptian Cat")

Forgot ClassRetain ClassAccuracy on Forget ↓Accuracy on Retain ↑
Tiger CatEgyptian Cat42.4683.30
Egyptian CatTiger Cat37.3884.24

2. Response to the connection between Sec. 4 and other chapters.

(1) Chapter connection. We thank the reviewer for pointing this out. Section 4 is intended as an empirical motivation for the design of Forget Vectors in Section 5. Specifically, we examine whether model-based unlearning persists under input perturbations (e.g., pixel-level noise), and find that while forgetting effects remain, model utility degrades significantly.

This observation motivates our core hypothesis: if data perturbations can induce forgetting, can we learn such perturbations that are minimal yet effective? Section 5 addresses this by introducing input-level forget vectors that suppress forgotten data while preserving performance on retained data. We clarify that Section 4 motivates the formulation of the FV objective, which explicitly balances forgetting and utility. We will revise Section 4's introduction to make this connection clearer.

(2) Line graph in Figure 2. Thank you for your comment. We will replace it with a bar chart and improve the figure presentation in the next revision.

3. Response to insufficient experiments.

(1) Scale with the size of data in larger dataset. Table R13 evaluates different forgetting ratios (10% and 20%) on larger dataset ImageNet 100, demonstrating that compositional unlearning scales effectively with the size of the data.

Table R13: Compositional unlearning on ImageNet100 for random data forgetting with 10% and 20% forgetting ratios.

Forgettinf RatioUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
→ 10%
Retrain6.8592.6994.2589.590.00
FV10.9691.2392.1085.482.96
CU-FV9.1091.0091.3984.802.89
→ 20%
Retrain6.5092.1093.8089.450.00
FV7.1091.4092.3588.151.01
CU-FV7.4590.5092.4588.201.28

(2) Fine-grained cases. Please refer to the Response 1 and Table R12.

4. Response to numerical instability.

We appreciate the reviewer’s careful observation. However,the observed variation in UA gap across different combinations of w1w_1 and w2w_2 in Figure 3a is not a symptom of numerical instability, but rather a natural and expected result of semantic directionality in our forget vector arithmetic.

  • Our method operates in input space, and vector combinations (e.g., between "automobile" and "bird") represent semantic, not numerical, interactions. The effect of adding or subtracting a forget vector is nonlinear and asymmetric, depending on how the perturbation aligns with the model's learned representations. This explains why UA does not change linearly with weight.
  • As shown in Figure 3c, our method identifies a robust operating point (green star) achieving low UA and RA gaps, confirming the method’s controllability and reliability despite nonlinearity.

In short, our approach does not involve numerically sensitive optimization procedures, which enables direction-aware forgetting behavior.

5. Response to more detailed discussion is needed regarding the novelty of the proposed method.

(1) Difference with provided paper. We thank the reviewer for highlighting this work. While Shibata et al. introduce mnemonic codes for class-level forgetting in lifelong learning, our approach differs significantly:

  • Problem formulation: Shibata’s method predefines mnemonic codes for all classes and removes them at test time to forget, requiring prior code assignment. In contrast, our forget vector is learned only for classes to be forgotten and applied directly to inputs ("add-to-forget" vs. "remove-to-forget").
  • Inference-time label dependency: Shibata’s method requires access to class labels at inference to discard the correct code. Our method is label-agnostic—once learned, the forget vector can be applied to any input without needing label information, enabling black-box or privacy-sensitive deployment.
  • Granularity of forgetting: Shibata’s method is designed for class-level forgetting only. The mnemonic codes are tightly coupled with each class. In contrast, our forget vector can be learned for any subset of samples, enabling flexible unlearning, including class-wise and random data forgetting.
  • Post-hoc applicability: Mnemonic codes must be integrated during model training. Our method is post-hoc and model-agnostic, requiring no retraining or architecture changes.
  • Compositional unlearning: Our framework supports compositional forgetting across multiple requests, a capability absent in Shibata et al.

(2) Application advantage. Our method intentionally avoids modifying model parameters, enabling post-hoc, on-demand unlearning in frozen or resource-constrained settings. While residual information in the weights is a valid concern, our approach focuses on behaviorally suppressing the model’s ability to recognize forgotten data through input-level perturbations, rather than physically altering the parameters.

This design offers key practical benefits:

  • Reusability & utility preservation: The model remains unchanged, ensuring consistent performance on future downstream tasks.
  • Dynamic unlearning: Forget requests can be handled on-the-fly without retraining or versioning. Forgetting is reversible—simply remove the applied vector to restore recognition.
  • Lightweight storage: Only compact forget vectors are stored per request, avoiding multiple large model checkpoints. Therefore, we argue that while we do not remove forgotten data from model weights, our model-agnostic, input-level strategy is well-suited for practical deployment scenarios, especially those requiring flexibility, efficiency, and test-time control.

6. Response to the more detailed discussion.

Q1: Scalability of the proposed method.

  • Larger classes: Please refer to above Response 3 and Table R13.
  • Fine-grained cases: Please refer to above Response 1 and Tabel R12.

Q2: Numerical instability. Please refer to above Response 4.

Q3: Novelty of the proposed method and the significance of not modifying existing model parameters. (1) On Novelty. As quoted by Reviewer eoHp, our work explores novel unlearning paradigms and expands the applicability of unlearning. Meanwhile, Reviewer Wzpy explicitly recognize that our proposed methodology is novel. We introduce a new input-based approach to machine unlearning via forget vectors—universal, input-agnostic perturbations that remove the influence of target data without modifying model weights.

Additionally, our work is the first to propose compositional unlearning, allowing flexible, scalable unlearning through arithmetic combinations of class-wise forget vectors. From a practical standpoint, our method is inherently suited to black-box settings, requiring only input-output access, and thus supports unlearning even on proprietary or closed-source models.

(2) On significance of not modifying existing model parameters. Please refer to above Response 5 (2).

Q4: More comprehensive experiments regarding class combinations are insufficient. Please refer to above Response 3 and Table R13.

评论

Thank you very much for your thoughtful comments. I have read all of the comments from the authors and reviewers. Reflecting the thoughtful and clear comments from the authors, my evaluation of this paper has become more positive than before.

评论

Dear Reviewer pPup,

Thank you very much for reviewing our paper and carefully reading our responses and comments from other reviewers. We sincerely appreciate your recognition of our responses as clear and thoughtful, and we are truly grateful for your more positive reassessment of our paper. Your constructive feedback throughout the review process has been instrumental in helping us improve the clarity and overall quality of our work. We will surely improve our manuscript to reflect the responses to your comments. If you have any further questions or suggestions, please feel free to reach out.

Best regards,

Authors

审稿意见
5

This paper introduces a data-driven machine unlearning using a forget vector. Unlike traditional unlearning methods that require model retraining or weight modification, this method utilizes input-agnostic perturbations to achieve unlearning. Furthermore, compositional forget vectors can be generated via linear combinations of class-wise forget vectors without the need for re-optimization.

优缺点分析

Strenth

  • The motivation is clearly presented, and the paper provides detailed descriptions of the task formulation and evaluation methods.
  • The analysis of unlearning in OOD settings is interesting.
  • The method shows competitive performance without model retraining.

Weakness

  • While the paper shows some robustness under certain data shifts, pixel-level perturbations may still be vulnerable to broader transformations. In contrast, feature-level perturbations like visual prompt tuning could offer stronger robustness.
  • While the paper provides some analysis on combining class-wise forget vectors, it does not examine how the method scales when the number of classes or the size of data to be forgotten grows significantly. The potential performance degradation or limitations in compositional unlearning as the forget set expands remain unclear.
  • Input perturbation-based methods may not provide formal deletion guarantees compared to retrain-based methods.
  • Experiments are limited to datasets with relatively small number of classes (e.g., CIFAR-10, ImageNet-10), which raises questions about scalability to larger, more diverse real-world datasets.

问题

  1. How effective is the method in forgetting not just at the level of specific classes or subsets, but truly at the level of individual data points?
  2. How does the approach scale when the number of classes or the amount of data to be forgotten increases significantly? Can a large amount of data still be handled effectively with a fixed-size forget vector?

局限性

The potential vulnerability to white-box adversaries is well acknowledged in the paper, with the need for future work on improving robustness clearly noted.

最终评判理由

The rebuttal addressed most concerns with clear explanations and extended experiments. While large-scale validation would be useful, the method’s practicality and consistent gains justify acceptance.

格式问题

I did not find any significant formatting issues.

作者回复

We greatly appreciate Reviewer BKJF’s thoughtful and constructive feedback. Below, we provide detailed responses to each of the key points raised.

1. Response to broader transformations and contrast to feature-level perturbations.

(1) Vulnerable to broader transformations. While our method indeed operates in the pixel space, we would like to emphasize that our forget vector exhibits substantial transferability to unlearn the unseen forget set, as shown in Table 2 and the accompanying analysis (page 8). In these evaluations, our method significantly outperforms existing baselines in unlearning accuracy across all three test scenarios, demonstrating robustness even when applied to out-of-distribution forget data.

(2) Contrast to feature-level perturbations. We agree that VPT offers a promising direction in parameter-efficient adaptation and may yield enhanced robustness in certain scenarios. However, our approach fundamentally differs in deployment assumptions and design philosophy. VPT operates by injecting learnable prompts into the activation space of the transformer layer, which requires full access to the model at the inference phase to activate or deactivate the prompting effect. In contrast, our forget vector method is designed to operate entirely in the input space, requiring no access to model weights or activations. This enables it to be applied to a given model without any model internal weights/activation intervention.

2. Response to scalability with the number of classes or the size of data on combining class-wise forget vectors.

Thank you for pointing this out. We have conducted additional experiments to examine the scalability of our method with different settings on combining class-wise forget vectors:

(1)Scale with the number of classes overall. We conduct experiments on ImageNet-100 using the ViT-Base backbone, with random forgetting ratios of 10% and 20%. As shown in Table R8, we compare results with individually trained forget vectors (FV) and observe that compositional unlearning via vector arithmetic (CU-FV) remains effective. This demonstrates that our method scales well to larger datasets with more classes.

(2) Scale with the size of data.

Table R8 evaluates different forgetting ratios (10% and 20%), demonstrating that compositional unlearning scales effectively with the size of the data.

Table R8: Compositional unlearning on ImageNet100 for random data forgetting with 10% and 20% forgetting ratios.

Forgettinf RatioUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
→ 10%
Retrain6.8592.6994.2589.590.00
FV10.9691.2392.1085.482.96
CU-FV9.1091.0091.3984.802.89
→ 20%
Retrain6.5092.1093.8089.450.00
FV7.1091.4092.3588.151.01
CU-FV7.4590.5092.4588.201.28

3. Response to the formal deletion guarantees.

We thank the reviewer for raising this important point. It is true that our method does not offer formal deletion guarantees in the same theoretical sense as retraining-based approaches. However, our primary goal is to develop a practical, lightweight, and deployable method that enables unlearning without requiring access to or modification of model weights, which is an increasingly relevant need in real-world scenarios such as on-device learning and frozen-model APIs. From a broader perspective, unlearning techniques can be categorized into three stages:

  • Pre-processing: Manipulate the training data before model training.
  • In-processing: Modify the training process or model weights to remove memorization.
  • Post-processing (ours): Keep the model fixed and perform unlearning at inference time.

In fact, our approach uniquely belongs to the post-processing paradigm of unlearning methods, where the model remains intact, and forgetting is achieved solely through input-space perturbation at test time.

4. Response to scale to larger datasets.

We empirically validated the scalability of our method on ImageNet-100, a more challenging and diverse dataset with 100 classes, using the ViT-Base backbone. We demonstrate the results under the class-wise forgetting setting. As shown in Table R9, our method maintains competitive unlearning effectiveness (UA, MIA-Efficacy) while preserving model utility (RA, TA), and keeps the Avg. Gap low. This indicates that our approach scales well to large-class settings without modifying the model weights, which is particularly appealing when model retraining is impractical.

Table R9: Performance overview of various MU methods for image classification in class-wise forgetting scenario on ImageNet-100 using ViT-Base.

MU MethodUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
Retrain100.00100.0094.0788.360.00
FT97.1099.2090.5084.881.30
Salun97.3598.6191.4085.102.49
Ours96.5899.3492.5085.452.14

5. Response to forgetting at the level of individual data points.

We appreciate the reviewer for raising this subtle but important question. We would like to clarify that forgetting individual datapoints is fundamentally challenging, especially in vision classification tasks. In classification models, a single datapoint is typically embedded within a dense neighborhood of semantically similar examples belonging to the same class. Due to this data redundancy and inter-class generalization, the model can often interpolate from nearby data points to retain its predictive capability, even if one individual image is removed or masked. As highlighted in recent work such as "Challenge Forgets: Can LLMs Forget Specific Training Data?" (Liu et al., 2024), forgetting individual data is often shown to be infeasible or unstable, even in large-scale generative models. In vision tasks, this difficulty is further amplified by the high structural similarity among instances within the same class. For instance, trying to forget a single cat image among hundreds of similar cats rarely affects the model’s recognition capability, due to its semantic generalization ability. Our goal is to remove the model’s capability to recognize a class or subset, not necessarily to ensure perfect deletion of any specific training instance. In line with this, most existing unlearning literature also evaluates at the class- or subset-level, rather than per-instance, due to the practical and theoretical limitations described above.

6. Response to scalability to the number of classes or the amount of data to be forgotten.

(1) Regarding the scalability to the number of classes.

To evaluate how our approach scales with the number of classes in the forget set, we conduct experiments on ImageNet-10 using the ViT-Base backbone. Specifically, we measure performance as the model is tasked to forget 1, 2, and 3 classes simultaneously under the class-wise forgetting setting. The results can be found in Table R10.

Table R10. Performance overview of various MU methods for image classification in class-wise forgetting scenario on ImageNet-10 using ViT-Base with different number of forgetting classes.

MU MethodUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
→ 1 class
Retrain100.00100.0099.9799.850.00
FT42.7940.7899.9699.6129.17
SalUn93.2794.0098.2298.004.08
Ours95.9299.4099.1399.261.53
→ 2 classes
Retrain100.00100.0099.9599.800.00
FT48.6246.4599.9199.4826.32
SalUn92.8093.4097.1598.004.60
Ours95.6797.3598.4598.562.43
→ 3 classes
Retrain100.00100.0099.9099.750.00
FT47.1345.9099.1099.5027.01
SalUn93.5094.2097.1597.504.32
Ours95.9493.2097.2196.334.24

(2) Regarding the scalability of the amount of data to be forgotten.

Please refer to Response 2 and Table R8.

These results indicate that our method remains effective and stable across a range of forget ratios and larger dataset, achieving consistent suppression of forgotten data while maintaining high accuracy on retained data. This demonstrates the scalability and robustness of our approach under varying unlearning demands.

评论

The authors have provided a thoughtful and clear rebuttal. The clarifications and additional analyses sufficiently address the concerns I raised. I believe the rebuttal helped better highlight the strengths of the paper, and incorporating these clarifications into the final version would further improve its clarity and impact. Overall, I am more positively inclined toward the paper.

评论

Dear Reviewer BKJF,

Thank you for your more positive evaluation of our paper. We sincerely appreciate your thoughtful and constructive feedback. We’re very glad to hear that our clarifications and additional analyses helped address your concerns and highlighted the strengths of our work. We will make sure to incorporate the discussed points into the final version to further enhance the clarity and impact of the paper. If you have any further questions or suggestions, please feel free to reach out.

Best regards,

Authors

审稿意见
4

This work proposes machine-unlearning methodology which leaves the trained model unchanged but perturbs the model inputs through suitably selected additive noise. Notably, the noise is applied to all input data, including from the retain and test set. Such a forget vector is computed for each forget class. For random-sample unlearning, a suitable linear combination of the class-specific forget vectors is then used.

优缺点分析

STRENGTHS:

  1. I believe this work is overall well written and well structured.

  2. The proposed methodology is novel to my knowledge

  3. The mathematical derivations seem formally correct.

WEAKNESSES:

  1. I am not convinced that the proposed strategy is suitable for dealing with "right to be forgotten" legislation which the authors cite in the abstract. Of course, most known approximate unlearning techniques are unable to guarantee that the influence of "to-be-forgotten" data is fully erased and so may not be compliant either. But at least they attempt to remove the influence of these data whereas the approach presented here does not even make such an attempt. Worse, even, the forget set is used once more to find the forget vector(s).

  2. As the authors point out perturbing the inputs can lead to larger performance losses on the retain set than alternative methods. So, I'm not sure if the proposed method -- though interesting conceptionally -- is useful in practice.

问题

  1. Can the authors explain the following sentence (from Line 222): "The above indicates that a reduction in prediction performance on the forget set could translate into enhanced unlearning effectiveness on that set."?

  2. How does this approach scale with the number of classes overall and the number of classes in the forget set?

局限性

NA

最终评判理由

The authors response has addressed most of my concerns. I am still not entirely convinced by the practical utility of the proposed methodology but it seem interesting enough. Thus, I am raising my score to 4.

格式问题

  1. The notation "xDx \in \mathcal{D}" (e.g., in Line 169) is confusing since the data sets contain tuples of features and labels.

  2. Typo in order of the subscripts in Line 246.

  3. Fix typos/inconsistencies in the bibliography.

作者回复

We greatly appreciate Reviewer Wzpy’s insightful and constructive comments. Below, we provide detailed responses to address each of the points raised.

1. Response to W1 about "Right to be forgotten" legislation

We thank the reviewer for raising this insightful question. Our primary goal is to enable the removal of data influence as manifested in the model’s classification behavior. While our approach, like most approximate unlearning methods, does not guarantee complete erasure of the original data points, it effectively eliminates their impact on the model’s predictive capability. This is why we cited "right to be forgotten" legislation as a motivation. To illustrate this perspective, consider the following analogy: if one wishes to "forget a person," it may be difficult to erase all associated memories. However, if the person's face is always blurred when they appear (analogous to applying a data perturbation), the perceiver–in this case, the model–can no longer recognize them. From the standpoint of unlearning outcomes, both our data-based approach (using a forget vector) and model-based methods ultimately aim to eliminate the influence of the targeted data, albeit through different mechanisms. In the revision, we will clarify our motivation more precisely and explicitly distinguish our data-based unlearning approach from traditional model-based methods, as well as its relation to the objectives of the "right to be forgotten" legislation.

2. Response to W2 about practical application

While input perturbation may intuitively result in utility degradation on retain set, we carefully designed our method to ensure that such perturbations are minimally invasive to utility, while still effectively disrupting the model’s ability to remember the forget data. In particular, as shown in Equation (4) in the main paper, during the optimization of the forget vector, our loss function jointly considers both forgetting (on the forget set) and retention (on the retain set). This allows us to achieve a practical trade-off between effective forgetting and minimal utility loss. Such a trade-off is further investigated through our experiments (see Table 1 in main paper). The Avg. Gap with Retrain shows that the forget vector improves unlearning effectiveness, ranking among the top two unlearning methods, which is significant compared to the certain loss in utility. As with model-based unlearning approaches, we acknowledge that our method involves a trade-off between unlearning effectiveness and utility preservation. However, part of this trade-off arises from our conservative evaluation protocol. Specifically, the reported testing accuracy (e.g., in Table 1) reflects a worst-case scenario in which the forget vector is mistakenly applied to the retain data. This setup ensures that our results represent a lower bound on utility performance under potential deployment errors. That said, a key practical advantage of the forget vector lies in its input-based nature: since it does not alter the model’s weights, the same model can be used in its original utility mode, without any loss in performance, simply by omitting the forget vector at inference time. In this sense, full utility is preserved unless the forget vector is explicitly applied.

3. Response to Q1 about the explanation of sentence from Line 222.

In the context of classification models, higher prediction accuracy on the forget set implies that the model has retained knowledge about that data. Conversely, when the prediction performance drops significantly on forget set, it indicates that the model may no longer reliably recognize or recall the underlying patterns from forget data, which is the goal of unlearning. Therefore, a drop in classification accuracy on the forget set is used as a proxy signal for unlearning effectiveness: the worse the model performs on the forget set, the more evidence that the model has "forgotten" that data.

4. Response to Q2 about the scalability with the number of classes overall and the number of classes in the forget set

(1)Scale with the number of classes overall. We empirically validated the scalability of our method on ImageNet-100, a more challenging and diverse dataset with 100 classes, using the ViT-Base backbone. We demonstrate the results under the class-wise forgetting setting. As shown in Table R6, our method maintains competitive unlearning effectiveness (UA, MIA-Efficacy) while preserving model utility (RA, TA), and keeps the Avg. Gap low. This indicates that our approach scales well to large-class settings without modifying the model weights, which is particularly appealing when model retraining is impractical.

Table R6: Performance overview of various MU methods for image classification in class-wise forgetting scenario on ImageNet-100 using ViT-Base.

MU MethodUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
Retrain100.00100.0094.0788.360.00
FT97.1099.2090.5084.881.30
Salun97.3598.6191.4085.102.49
Ours96.5899.3492.5085.452.14

(2) Scale with the number of classes in the forget set.

To evaluate how our approach scales with the number of classes in the forget set, we conduct experiments on ImageNet-10 using the ViT-Base backbone. Specifically, we measure performance as the model is tasked to forget 1, 2, and 3 classes simultaneously under the class-wise forgetting setting. The results can be found in Table R7.

Table R7: Performance overview of various MU methods for image classification in class-wise forgetting scenario on ImageNet-10 using ViT-Base with different number of forgetting classes.

MU MethodUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
→ 1 class
Retrain100.00100.0099.9799.850.00
FT42.7940.7899.9699.6129.17
SalUn93.2794.0098.2298.004.08
Ours95.9299.4099.1399.261.53
→ 2 classes
Retrain100.00100.0099.9599.800.00
FT48.6246.4599.9199.4826.32
SalUn92.8093.4097.1598.004.60
Ours95.6797.3598.4598.562.43
→ 3 classes
Retrain100.00100.0099.9099.750.00
FT47.1345.9099.1099.5027.01
SalUn93.5094.2097.1597.504.32
Ours95.9493.2097.2196.334.24

5. Response to paper formatting.

Thank you! We will correct these formatting errors.

评论

Dear Reviewer Wzpy,

Thank you for taking the time to review our paper. We sincerely appreciate your thoughtful and constructive comments. As the discussion deadline approaches, we would like to kindly check whether our response has sufficiently addressed your concerns. If you have any additional feedback or questions, we would be glad to further discuss and provide clarification.

Best regards,

Authors

审稿意见
4

The paper proposes Forget Vectors (FV), a novel input perturbation-based approach to machine unlearning (MU), which aims to remove the influence of specific data points from a pre-trained model. The method learns an input-agnostic perturbation vector by jointly minimizing a margin loss on the forget set, a cross-entropy loss on the retain set, and a regularization term on the perturbation norm. At test time, the learned vector is added to the input image to induce forgetting of the unlearning data. The paper also introduces a new setting called compositional unlearning, in which FVs derived from different classes can be linearly combined with learnable weights to enable randomized data forgetting. Experimental results on CIFAR-10 and ImageNet-10 demonstrate that the proposed method effectively induces unlearning while preserving model performance on the retained data.

优缺点分析

Strengths

  • [S1] Novel unlearning paradigms. The paper explores two underexplored directions in machine unlearning: (1) Data-based unlearning, which performs unlearning without modifying model parameters; and (2) Compositional unlearning, which dynamically adjusts the model's behavior to forget specific data subsets using the optimization results from individual unlearning requests. These are both creative and valuable contributions that expand the applicability of unlearning.
  • [S2] Clear presentation and thoughtful analysis. The paper is well-written and easy to follow. The methodology is explained with good clarity, and the experimental section includes insightful analyses, such as ablations and saliency-based visualizations, that help understand how the proposed method behaves in practice.

Weaknesses

  • [W1] Insufficient motivation for data-based unlearning. While data-based unlearning (e.g., via input perturbation) is a novel direction, its motivation remains unconvincing. Appendix H acknowledges a key vulnerability: these methods may be fragile under white-box adversaries who can simply remove or reverse the input transformation. Given this risk, the paper needs to establish strong, practical advantages that justify pursuing this direction. Currently, the primary implied benefit seems to be lower computational cost which, while important, may not be enough on its own unless clearly quantified and shown to hold across meaningful settings. Empirical demonstrations of additional benefits such as deployment efficiency, ease of integration, or robustness in black-box settings would help significantly.
  • [W2] Missing discussion of computational savings. Despite suggesting that a major advantage of data-based MU lies in reduced compute cost (e.g., Lines 102–103), the paper does not present any actual measurements or comparisons to support this claim. Given the relatively small scale of the image classifier architectures used, these savings might not be apparent. To meaningfully support this argument, the authors should test on larger models where tuning the model weights is expensive, or at least provide rough estimates of compute savings when using FVs.
  • [W3] Weak empirical support for compositional unlearning. While compositional unlearning is conceptually novel, its practical utility is unclear. Results in Table 3 suggest that directly applying the Forget Vector (FV) performs better than linearly combining class-level FVs in most cases, especially in preserving model utility. The paper also does not articulate a clear computational benefit to compositional unlearning (CU-FV) compared to direct FV. Without such an advantage, it is hard to justify using CU-FV at all. As it stands, the compositional setup seems theoretically interesting but practically unnecessary, which weakens its contribution.
  • [W4] Limited experimental scope. The paper's experimental design is somewhat narrow in several ways: (1) It focuses exclusively on image classifiers, while recent interest in machine unlearning is shifting toward generative models, which pose greater challenges and risks. (2) It does not evaluate robustness to counter-unlearning attacks, which have become increasingly relevant [A, B]. (3) It only tests a fixed forget ratio of 10%, leaving open questions about how well the method scales with varying degrees of unlearning. While some of these directions may be non-trivial to implement, exploring at least some of them more thoroughly would strengthen the paper’s significance.

[A] Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning. ICLR 2025.
[B] Catastrophic Failure of LLM Unlearning via Quantization. ICLR 2025.

问题

  • [Q1] Are there any interpretable patterns observed in the trained forget vectors for each unlearned class when visualizing them on the pixel space?
  • Typo in Line 195: should be "tend to" instead of "tent to"
  • Typo in Line 316: did you mean "random data forgetting" instead of "data-wise forgetting"? The following sentences all seem to refer to that setting as the former.

局限性

The authors have adequately addressed the limitations of their work.

最终评判理由

The rebuttal addresses most of my concerns regarding compute cost and experimental scope, leading me to lean more toward acceptance. However, my concern about its vulnerability to white-box adversaries remains (as noted by Reviewer Wzpy, it is unclear whether this approach truly satisfies the goal of unlearning), so I still consider it a borderline accept.

格式问题

N/A

作者回复

We sincerely thank Reviewer eoHp for the thoughtful and constructive feedback. Below, we provide detailed responses to each of the key questions raised.

1. Response to W1 about the insufficient motivation for data-based unlearning

We thank the reviewer for raising this important concern. While we acknowledge the vulnerability of data-based methods under white-box adversaries (Appendix H), such limitations are not unique to our approach. Model-based unlearning methods have also been shown to be ineffective under relearning or adversarial attacks. Unlike model-based unlearning that require modifying model parameters, our data-based method operates purely at the input level. This design offers several practical advantages:

(1) Preserved utility: By not altering model weights, our method maintains the model’s general-purpose utility and downstream performance of the original model, which is valuable in dynamic environments where the model may be reused for tasks unrelated to the forgotten data.

(2) Memory efficiency: Our input-agnostic forget vector avoids updating high-dimensional model weights. This also avoids the need to store multiple model checkpoints, which is often required by model-based methods. Additional details regarding efficiency analysis are provided in our next response.

(3) Compositional unlearning: New forget requests can be fulfilled via simple arithmetic over class-wise forget vectors, reducing the optimization burden to just a few scalar coefficients, enabling plug-and-play unlearning. Such flexibility is not available in model-based approaches.

(4) Black-box model compatibility: Since we only perturb the input, our method naturally supports black-box settings where model parameters are inaccessible.

2. Response to W2 about the missing discussion of computational savings

Thank you for raising this point. We would like to clarify that supporting measurements for the forget vector (FV) are provided in Appendix F (Table A2). Moreover, the computation cost is further reduced via compositional unlearning, where only scalar coefficients need to be optimized. To highlight this, we include additional efficiency analysis for Compositional Unlearning via Class-wise Forget Vectors (CU-FV) in Table R1. Besides, we conducted additional experiments on a larger backbone ViT-Large in the scenario of random data forgetting with a specified forget ratio of 10% on ImageNet10. As shown in the Table R1, CU-FV is up to 19× faster than FV.

Table R1: Performance and efficiency overview of various MU methods under random data forgetting scenario on ImageNet-10 using ViT-Large, across the metrics: UA, RA, TA, MIA-Efficacy, Avg. Gap vs. Retrain, run-time efficiency (RTE) and parameter number (Param.#), where RTE is measured in minutes and M refers to Million.

MU MethodUA↑MIA-Efficacy↑RA↑TA↑Avg.Gap↓RTEParam.#
Retrain0.6993.5399.2799.000.0018.50303.40(M)
FT0.6197.8099.7398.001.449.30303.40(M)
Salun0.9398.8099.0098.041.677.59151.70(M)
FV0.7598.3098.1599.151.515.710.15(M)
CU-FV0.8598.5099.1199.801.500.3010

3. Response to W3 about the empirical support for compositional unlearning

Thank you for your thoughtful feedback. We acknowledge that in some cases, directly trained forget vectors may outperform the compositional unlearning in preserving model utility. This reflects a trade-off between precision and efficiency. However, CU-FV offers unique practical benefits that are not captured by standard FV as follow:

(1) CU-FV significantly reduces computation time by optimizing only a small set of scalar weights (e.g., 10 parameters for random data forgetting on ImageNet10), rather than full input-level perturbations (which scale with image size), as shown in the Table R1 in Response 2.

(2) CU-FV enables efficient editing of unlearning requests in real-time systems, where a single set of class-wise FVs can serve multiple "random data forgetting" requests.

A clear computational benefit to compositional unlearning can be found in the Table R1, where we observe that CU-FV achieves up to 19× faster efficiency compare to FV. Such finding highlights CU-FV’s utility as a lightweight and flexible solution in practical deployment where multiple or large-scale unlearning requests may arise dynamically. In addition, to further demonstrate the lightweight nature and flexibility of compositional unlearning, we also report the RTE and Param.# on ImageNet10 using ViT-Large with different forgetting ratios (10%, 20%, 30%) in Table R2.

Table R2: Comparison of runtime efficiency (RTE) and parameter number (Param.#) between Forget Vector (FV) and Compositional Unlearning (CU-FV) under different forgetting ratios on ImageNet10 using ViT-Large.

MU MethodRTE, 10%Param.# 10%RTE, 20%Param.# 20%RTE, 30%Param.# 30%
FV5.71150,5285.69150,5285.86150,528
CU-FV0.30100.40100.5610

4. Response to W4 about the limited experimental scope

(1) On focusing solely on image classification (not generative models).

We agree that MU for generative models is important and emerging, but it falls outside the scope of our work. Our method is designed for classification tasks, applying input-level perturbations without altering model weights. Extending this to generative models (e.g., diffusion or language models) involves fundamentally different challenges. For instance:

(a) In text-to-image diffusion models, the conditioning is based on text prompt, not images, making our forget vector framework inapplicable.

(b) In large language models, both the input and task (e.g., text generation) differ fundamentally from classification.

Thus, unlearning for generative models is a distinct, complex challenge requiring a different formulation. While important, it is beyond the scope of this work, which focuses on efficient, data-based unlearning for classifiers.

(2) On robustness to counter-unlearning attacks.

(a) Relearning: While recent studies (e.g.,[A]) shows model-based unlearning can be vulnerable to relearning attacks due to residual memory, our input-level unlearning approach fundamentally differs. By applying perturbations directly to inputs, we decouple unlearning from model parameters. We validate this by fine-tuning a ViT-Base model (trained on ImageNet10) using 10% of test data (including forget and retain classes) and re-evaluating the forget vectors. As shown in Table R3, the model maintains its ability to forget the targeted forget data and recognize the retained data, with only minor metric shifts, demonstrating the robustness of our method even under post-unlearning retraining.

Table R3: Performance comparison of the original and relearned ViT-Base models on ImageNet10 to assess the persistence and robustness of the forget vector.

UA ↑MIA-Efficacy ↑RA ↑TA ↑
Original Model + Forget Vector95.9299.4099.1399.26
Relearned Model + Forget Vector96.1599.3099.1799.55

(b) Quantized model. We applied post-training 8-bit quantization to the original model, ensuring classification accuracy on clean inputs remained comparable. To test robustness, we reused the original forget vectors on the quantized ViT-Base model trained on ImageNet10. As shown in Table R4, the quantized model still effectively forgets target classes and preserves recognition of retained data, indicating that our input-level unlearning approach remains robust and effective under quantization.

Table R4: Evaluation of forget vector effectiveness on quantized ViT-Base Models in the class-wise forgetting scenario.

Model VariantUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
Retrain (FP32)100.00100.0099.9799.850.00
Forget Vector on Original Model (FP32)95.9299.4099.1399.261.53
Forget Vector on Quantized Model (INT8)93.6991.7098.7998.674.24

(3) On scalability across different unlearning ratios.

We further evaluated our method under different forget ratios (10%, 20%, 30%) by performing random data forgetting on ImageNet-10 with a ViT-Base backbone. As shown in Table R5, our approach consistently suppresses forgotten data while preserving high accuracy on retained data, demonstrating robustness and scalability across varying unlearning demands.

Table R5: Scalability of various unlearning methods across varying forgetting ratios on ImageNet10 using ViT-Base.

Forgetting RatioUA ↑MIA-Efficacy ↑RA ↑TA ↑Avg. Gap ↓
→ 10%
Retrain1.4193.5799.0799.270.00
FT1.3896.4099.6099.100.89
Salun0.6795.8099.6598.271.14
Ours1.0891.4098.9799.100.69
→ 20%
Retrain1.6596.0398.3598.000.00
FT1.9394.0099.2699.001.05
Salun0.8894.1099.0598.500.98
Ours1.8095.5098.1198.200.28
→ 30%
Retrain1.9596.7199.0499.000.00
FT1.7795.8098.8498.200.52
Salun0.9096.1098.9598.300.61
Ours2.3897.0098.3398.200.56

5. Response to Q1 about interpretable patterns.

Thank you for the question. We did examine the perturbation patterns of the FVs in pixel space for each unlearned class. However, we found that the raw pixel-level perturbations do not exhibit consistent or interpretable visual patterns across different classes. To illustrate the spatial influence patterns induced by the FVs, we thus investigated this from an input saliency perspective (Grad-CAM); see Figure 4 of the main paper and Figures A4 and A5 in Appendix G.

6. Response to typos.

Thank you for pointing out the typos and phrasing inconsistencies, and we will correct them.

评论

Dear Reviewer eoHp,

We sincerely appreciate your valuable time and constructive comments on our submission. As the discussion deadline approaches, we would like to kindly check whether our response has sufficiently addressed your concerns. If you have any further feedback or questions, we would be happy to engage in additional discussion and provide clarification.

Best regards,

Authors

评论

Dear Reviewers,

We sincerely appreciate the time and effort you have devoted to reviewing our submission 13991. During the rebuttal phase, we have carefully addressed the concerns raised in your initial review.

As the rebuttal deadline approaches, we kindly wanted to follow up in case you have any additional comments or questions after reviewing our response. We would be grateful for any further clarification you could provide, as your feedback is extremely valuable to us.

Thank you again for your contribution to the review process.

最终决定

This paper proposes an input-perturbation approach to machine unlearning. Reviewers found the paper easy to follow and clear; the main criticisms were about the depth of the experiments, the applicability to LLMs, and the robustness of the method to white-box attacks. While the authors made some progress towards addressing these issues in their rebuttal, concerns still remain about the scope of the claims and about the practical utility. Overall, I recommend that the authors incorporate their rebuttal experiments into the main text, make the central contributions and claims clearer, and resubmit this work to a future venue.