4.0

/10

Rejected3 位审稿人

最低3最高6标准差1.4

4.3

置信度

正确性2.3

贡献度2.0

表达2.7

ICLR 2025

Choose Your Anchor Wisely: Effective Unlearning Diffusion Models via Concept Reconditioning

Jingyu Zhu,Ruiqi Zhang,Licong Lin,Song Mei

OpenReview PDF

提交: 2024-09-27更新: 2025-02-05

TL;DR

We introduce COncept REconditioning (CORE), a simple yet effective approach for unlearning in diffusion models.

摘要

关键词

Machine UnlearningDiffusion Models.

评审与讨论

审稿意见

评分: 3置信度: 42024-10-18

This paper introduces concept reconditioning (CORE), a simple yet effective approach for unlearning diffusion models. By guiding the noise predictor conditioned on forget concepts towards an anchor generated from alternative concepts, CORE surpasses state-of-the-art methods including its close variants and achieves nearperfect performance, especially when CORE aim to forget multiple concepts. The difference between CORE with other existing approaches is the choice of anchor and retain loss.

优点

This paper produces COncept REconditioning (CORE), a new efficient and effective unlearning method on diffusion models.
Extensive tests on UnlearnCanvas demonstrate that CORE surpasses existing baselines, achieving near-perfect scores and setting new state-of-the-art performance for unlearning diffusion models. CORE also exhibits strong generalization in unlearning styles.
The ablation studies in paper show that the benefits of using a fixed, non-trainable target noise over other unlearning methods.

缺点

The entire paper feels quite redundant. The related work section and Chapter 2 cover the same material. The content after line 294 in Section 3.2 seems to repeat what was mentioned earlier.
The paper mentions various unlearning concepts, such as privacy and explicit content, but in practice, it only focuses on style. The paper claims generalization as one of its contributions, so how is this demonstrated? Or is CORE only applicable to style unlearning?
The paper compares many unlearning methods, but there is only one figure (Figure 2) showing the actual results, and each concept has just one result. The presentation of the outcomes is too sparse. Although the tables show some differences between the models, I still think some of the redundant content could be reduced to include more actual results.
In addition to the fact that the methods for removing concepts mentioned in the paper are not comprehensive, there are also methods described in references [1] and [2]. 【1】.Ni Z, Wei L, Li J, et al. Degeneration-tuning: Using scrambled grid shield unwanted concepts from stable diffusion[C]//Proceedings of the 31st ACM International Conference on Multimedia. 2023: 8900-8909. 【2】.Patrick Schramowski, Manuel Brack, Björn Deiseroth, and Kristian Kersting. 2023. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22522–22531.

问题

1.How does the CORE method perform on other content, such as specific entities or specific concepts? 2.Why can't CORE be directly applied to SD1.5 and instead requires fine-tuning on UnlearnCanvas? From my personal experience, fine-tuning SD1.5 leads to significant changes in its performance, and unlearning on a fine-tuned model makes it relatively easier to control its performance on other non-unlearning concepts. However, this shouldn't reflect the actual scenario.

评论- Rebuttal-1

2024-11-28

Q1: The entire paper feels quite redundant. The related work section and Chapter 2 cover the same material. The content after line 294 in Section 3.2 seems to repeat what was mentioned earlier.

A1: We thank the reviewer for their careful reading. We will modify the presentation in the next version and make it more precise and more clear.

Q2: The paper mentions various unlearning concepts, such as privacy and explicit content, but in practice, it only focuses on style. The paper claims generalization as one of its contributions, so how is this demonstrated? Or is CORE only applicable to style unlearning?

A2: Thank you for your observation. While our experiments focus on unlearning styles—a common benchmark for diffusion models in image generation—we believe our method can be applied to other types of content, such as removing sensitive or private content from images. Extending CORE to these areas is an important future direction, though it is beyond the scope of this submission.

Our claim of strong generalization refers to the model's ability to unlearn styles across unseen objects. To test this, we divided all objects into a training set (used for unlearning) and a test set (used for evaluation). Our results show that CORE effectively unlearns styles even on unseen objects, demonstrating strong generalization capabilities.

Additionally, we applied CORE to the I2P benchmark, which includes sensitive images (e.g., those containing nudity). The results, presented in the appendix, show that CORE can effectively remove sensitive content, indicating its applicability beyond style unlearning.

Q3: The paper compares many unlearning methods, but there is only one figure (Figure 2) showing the actual results, and each concept has just one result. The presentation of the outcomes is too sparse. Although the tables show some differences between the models, I still think some of the redundant content could be reduced to include more actual results.

A3: Thank you for your suggestion. We have included more results in the appendix (see Figures 3 and 4) to demonstrate the effectiveness of our method. While we have shown a subset of the generated images, we believe they are representative of our method's overall performance. The quantitative results in Tables 1–4 are computed over all styles and objects, providing a comprehensive evaluation. We will consider reducing redundant content to include more visual results in the next version.

Q4: In addition to the fact that the methods for removing concepts mentioned in the paper are not comprehensive, there are also methods described in references [1] and [2].

A4: Thank you for highlighting additional related work. Reference [1] introduces the technique of scrambled grids in the training loss, and [2] (Safe Latent Diffusion, SLD) modifies the latent space to improve unlearning performance. We will discuss these methods in more detail in the next version.

Q5: How does the CORE method perform on other content, such as specific entities or specific concepts?

A5: Thank you for your question. While our submission focuses on the UnlearnCanvas benchmark—a comprehensive evaluation framework for unlearning methods—we also conducted experiments on the I2P benchmark, which includes unsafe and sensitive images. The results, presented in the appendix, show that CORE effectively unlearns sensitive and unsafe content, such as images containing nudity. This demonstrates that CORE can be applied to a broader range of tasks beyond style unlearning.

Q6: Why can't CORE be directly applied to SD1.5 and instead requires fine-tuning on UnlearnCanvas? From my personal experience, fine-tuning SD1.5 leads to significant changes in its performance, and unlearning on a fine-tuned model makes it relatively easier to control its performance on other non-unlearning concepts. However, this shouldn't reflect the actual scenario.

A6: Thank you for your question. The fine-tuning and unlearning scheme was implemented by UnlearnCanvas. They fine-tuned the Stable Diffusion v1.5 model on their dataset to enable the model to generate images with specific styles and objects included in the benchmark. Without fine-tuning, the original SD v1.5 model performs poorly in generating images with those styles.

Starting the unlearning process from a fine-tuned model ensures that we evaluate the unlearning methods on a model that has already learned the targeted concepts. We agree that fine-tuning can change a model's performance, and unlearning on a fine-tuned model might make it easier to control performance on other concepts. However, this approach allows for a fair and consistent evaluation across different unlearning methods within the context of the UnlearnCanvas benchmark.

审稿意见

评分: 3置信度: 52024-10-31

This work proposes Concept REconditioning (CORE), a simple yet effective approach for unlearning harmful, sensitive, or copyrighted content from diffusion models. The key contribution lies in the selection of anchor concepts and the retain loss. Extensive experiments demonstrate that CORE surpasses state-of-the-art methods.

优点

The paper writing is well.
Machine unlearning is an interesting topic and studing how to unlearn some concepts in SD model is important.

缺点

The proposed method appears rather trivial. The author simply presents a pairing method of anchor and forget concepts (either from the retain set or other forget concepts) within the unlearning objective of Concept Ablation (CA)[1]. This is highly engineering-focused and lacks adequate innovation. The proposed retaining loss only transitions from predicting Gaussian noise to aligning with the prediction of the pretrained model. Although experimentally proven effective by the author as indicated in Table 3, the author does not discuss this aspect in sufficient depth, and it is regarded as a relatively minor improvement.
There is a deficiency in the comparison with some state-of-the-art methods in the experiments [2, 3, 4].
The experiments lack comparisons with more models. For example, SD v1.4, which is commonly employed by previous methods, and larger models like SD - XL. Additionally, there is a lack of results validating the retaining effect on large-scale datasets, such as COCO - 30K.
The visualization results do not utilize the commonly used prompts adopted by previous works [1][2], making it difficult to demonstrate advantages over previous efforts. Moreover, the retained concepts also exhibit changes in the image content, as seen in Figure 2.

References: [1] Ablating Concepts in Text-to-Image Diffusion Models [2] One-Dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications [3] To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now [4] Unified Concept Editing in Diffusion Models

问题

Please refer to the weakness part.

评论- Rebuttal-1

2024-11-28

Q1: The proposed method appears rather trivial. The author simply presents a pairing method of anchor and forget concepts (either from the retain set or other forget concepts) within the unlearning objective of Concept Ablation (CA)[1]. This is highly engineering-focused and lacks adequate innovation. The proposed retaining loss only transitions from predicting Gaussian noise to aligning with the prediction of the pretrained model. Although experimentally proven effective by the author as indicated in Table 3, the author does not discuss this aspect in sufficient depth, and it is regarded as a relatively minor improvement.

A1: Thank you for your feedback. While our method (CORE) is related to Concept Ablation (CA), it introduces several key innovations that make it fundamentally different:

Fixed Anchor Concept: In our unlearning loss, we fix an anchor concept and compute the error between the unlearned model's output and the fixed pretrained diffusion model's output. In contrast, CA computes the error between the unlearned model and itself. Our approach is based on the intuition that a fixed target provides stability during training.
Retain Loss: We replace the Gaussian random vector with the prediction from the pretrained model. This aligns with statistical intuition that using an estimated parameter can sometimes yield better performance than using a random one [1,2]. [1]. When is the estimated propensity score better? high-dimensional analysis and bias correction. [2]. A puzzling phenomenon in semiparametric estimation problems with infinite-dimensional nuisance parameters
One-to-One Mapping Scheme: We design a one-to-one mapping between forget concepts and retain concepts, which significantly outperforms traditional pairing schemes used in CA and other methods, especially when forgetting multiple concepts (see Table 4).

These design choices not only differentiate CORE from traditional unlearning methods but also contribute to its superior performance, as evidenced by our experimental results.

Q2: There is a deficiency in the comparison with some state-of-the-art methods in the experiments [2, 3, 4].

A2: Thank you for bringing this up. We attempted to include the SPM algorithm from [2] and the UCE algorithm from [4] using the UnlearnCanvas codebase. However:

UCE: The images generated by the unlearned model using UCE were vague and lacked meaningful content, performing worse than reported in the UnlearnCanvas paper. Therefore, we did not include these results in our submission.

SPM: Running SPM required significantly more computational resources—approximately 30 times more than CA or SalUn, as reported in the UnlearnCanvas paper. Due to these constraints, we could not run the full SPM algorithm for unlearning 6 or 25 concepts.

Moreover, the UnlearnCanvas paper indicates that SPM and UCE are outperformed by other baselines like EDiff, CA, SalUn, and ESD, which we have included in our comparisons. We believe our selection of strong baselines provides a fair evaluation, and our proposed algorithm demonstrates superior performance against them.

Q3: The experiments lack comparisons with more models. For example, SD v1.4, which is commonly employed by previous methods, and larger models like SD - XL. Additionally, there is a lack of results validating the retaining effect on large-scale datasets, such as COCO - 30K.

A3: We appreciate your suggestion. However, the UnlearnCanvas codebase supports only SD v1.5. Testing algorithms on the UnlearnCanvas benchmark requires fine-tuning a diffusion model on a dataset comprising 50 styles and 20 objects, with 20 images for each combination—a process demanding substantial computational resources that were beyond our capacity. UnlearnCanvas provides a fine-tuned SD v1.5 model, which we used to apply our method and the baselines.

Q4: The visualization results do not utilize the commonly used prompts adopted by previous works [1][2], making it difficult to demonstrate advantages over previous efforts. Moreover, the retained concepts also exhibit changes in the image content, as seen in Figure 2.

A4: Thank you for this insight. In our original submission, we used the prompts provided by UnlearnCanvas for fair comparison. We have since conducted additional experiments using the general prompts adopted in [1] to compare our method with the baselines. We will include these results in the next version.

Briefly, when unlearning 25 concepts with general prompts, our method achieved a total score of 371.11. The baselines scored as follows: ESD (319.42), EDiff (316.03), CA-model (325.14), CA-noise (319.97), and SalUn (290.14). These results demonstrate that CORE significantly outperforms strong baselines even with more general prompts.

Regarding the retained concepts, our observations indicate that they are well preserved by our method, without significant changes in image content. We will include additional images in the next version to clarify this point.

2024-11-29

Thank you for your detailed response. While you emphasize the innovations of the CORE method, I still have reservations about its actual contributions. In particular, the lack of sufficient experimental support for comparison with advanced methods makes evaluation challenging. As far as I know, both SPM and UCE provide good open-source code, making it theoretically feasible to conduct experiments based on that code. Additionally, regarding the discussion of concept retention, the significant changes evident in Figure 2 lead me to disagree with your assertion of good retention. Given that the author has not adequately addressed my concerns, I will maintain my rating.

审稿意见

评分: 6置信度: 42024-11-03

This paper introduces a novel method, termed CORE, designed for the unlearning of diffusion models by selectively eliminating undesirable knowledge. The proposed approach includes an innovative strategy for anchor selection and a newly formulated retain loss. Experimental results demonstrate the method's superior performance compared to existing techniques.

优点

The experimental setup is well-structured, effectively addressing the majority of my inquiries regarding this method.
The performance outcomes appear to be satisfactory.
The writing is commendable; the method is articulated clearly, and its key distinctions from other approaches are clearly stated.

缺点

The visual results presented are insufficient. I am particularly interested in scenarios where the forget concepts and the retain concepts contain the same object but differ in adjectives. For instance, in Figure 3, "Dadaism Cat" is expected to be forgotten, while "Vibrant Flow Cat" should be retained. Could you provide additional visual results for this kind of situation?
Ablation study. Without the retain loss, how much worse will the model be?
In line 230, the statement "In the unlearning objective, p_a acts as an anchor concept to recondition images from the forget set onto" appears incomplete. It seems that there is a missing component following the word "onto."

问题

The explanation of the key differences from other methods, along with the experimental results, solve most of my questions. I have no further questions aside from those mentioned before.

评论- Rebuttal

2024-11-28

Q1: The visual results presented are insufficient. I am particularly interested in scenarios where the forget concepts and the retain concepts contain the same object but differ in adjectives. For instance, in Figure 3, "Dadaism Cat" is expected to be forgotten, while "Vibrant Flow Cat" should be retained. Could you provide additional visual results for this kind of situation?

A1: Thank you for your observation. We have already included a comparison between "Dadaism Cat" (the concept to forget) and "Vibrant Flow Cat" (the concept to retain) in Figure 3. We will add more visual results of similar scenarios in the next version. Our experiments focus on unlearning specific styles across all objects in the dataset. Therefore, our visual presentations primarily compare different styles to demonstrate that our algorithm can effectively unlearn various objects under the same style.

Q2: Ablation study. Without the retain loss, how much worse will the model be?

A2: Thank you for your question. Without the retain loss, the CORE algorithm's performance decreases significantly. In a test where we aimed to forget six concepts, the model without the retain loss achieved the following scores: UA of 87.5, IRA of 99.5, CRA of 69, and SFID of 83.7, and the total score is 339.2. This is notably lower than the 387.06 score achieved by the original CORE algorithm (as shown in Table 1 of our submission).

Including a retain loss term is standard in machine unlearning methods for both language models and diffusion models [1,2,3,4], as it is essential for good performance. Our algorithm introduces innovations in both the unlearn loss and the retain loss. We demonstrate that our retain loss outperforms those used in prior work (see the last row of Table 3). [1] Ablating Concepts in Text-to-Image Diffusion Models. [2] Selective amnesia: A continual learning approach to forgetting in deep generative models. [3] Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation [4].“Forget-me-not: Learning to forget in text-to-image diffusion models.

Q3: In line 230, the statement "In the unlearning objective, p_a acts as an anchor concept to recondition images from the forget set onto" appears incomplete. It seems that there is a missing component following the word "onto."

A3: We thank the reviewer for their observation. We will modify this sentence in the next version to clarify it.

公开评论- Lack of recent related works

2024-11-22

It seems that several recent highly related works [1,2,3] are ignored.

[1] Separable Multi-Concept Erasure from Diffusion Models

[2] MACE: Mass Concept Erasure in Diffusion Models

[3] Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

AC 元评审

2024-12-11

This paper presents an unlearning method for diffusion models, titled COncept REconditioning (CORE). Reviewer concerns remain unaddressed, particularly regarding the lack of comprehensive experimental comparisons with SOTA methods, ambiguities in the key concept of retention, and an incomplete review of related literature. Moreover, it has been noted that the paper has already been accepted at the NeurIPS 2024 SafeGenAI workshop. In accordance with ICLR’s submission policy, the recommendation is to reject this submission.

审稿人讨论附加意见

The paper has already been accepted by the NeurIPS 2024 SafeGenAI workshop.

最终决定Reject

2025-01-22

Reject