PaperHub
6.3
/10
Poster3 位审稿人
最低5最高8标准差1.2
5
6
8
3.0
置信度
正确性3.0
贡献度3.0
表达3.3
ICLR 2025

Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-28

摘要

关键词
Trustworthy Generative AIDiffusion ModelsMachine UnlearningErasing Concepts

评审与讨论

审稿意见
5

This paper introduces the AGE method, aimed at enhancing the effectiveness of concept erasure within text-to-image diffusion models. AGE dynamically selects an optimal target concept that minimizes interference with unrelated concepts. The method models the concept space as a graph, discovering that erasure effects are localized. This insight leads to AGE's selection of closely related but non-synonymous targets for each undesirable concept, reducing unintended impacts on other model functionalities.

优点

  1. AGE introduces an adaptive erasure approach that refines concept targeting by leveraging graph-based insights about concept space structure.
  2. The method minimizes unintended impacts on unrelated concepts, which addresses a notable limitation in previous fixed-target erasure methods.
  3. The findings are validated across different models, reinforcing AGE’s potential adaptability to various generative tasks and model architectures.

缺点

  1. The optimization procedure for selecting target concepts may be computationally demanding for models with large concept spaces, which could limit AGE’s scalability in practice.
  2. The approach’s effectiveness relies on the accuracy of the concept graph’s structure. Any inaccuracies in capturing semantic relationships may affect erasure outcomes. Is there any discussion regarding this?
  3. Are there any human evaluations of artistic style?
  4. The proposed method doesn't achieve the SOTA, like Table 3. Any detailed discussion about this?

问题

See above

评论

Are there any human evaluations of artistic style?

In Section 5.3 of the paper (the section on erasing artistic styles), we clearly clarified that human evaluation was avoided due to its high cost, time-consuming nature, lack of scalability, and, more importantly, susceptibility to bias. Instead, we utilized the CLIP alignment score and the LPIPS score to evaluate erasure performance, as these metrics have also been employed in previous works [1, 2].

However, to fully address this concern, we have conducted an additional experiment to collect human evaluation results on the artistic style erasure task. The anonymous link to the survey is provided below, and we will include more details in the final version of the paper. However, we must admit that due to the high cost, the survey is limited to 50 images across all five artists per method, which might not be large enough to draw a solid conclusion.

Anonymous link to the survey: https://forms.gle/SyUV3e95d7NWpdW77

[1] Gandikota, Rohit, et al. "Unified concept editing in diffusion models." WACV 2024.

[2] Heng, Alvin, and Harold Soh. "Selective amnesia: A continual learning approach to forgetting in deep generative models." NeurIPS 2023.

The proposed method doesn't achieve the SOTA, like Table 3. Any detailed discussion about this?

It is worth noting that we conducted three erasure tasks in the paper: erasing physical objects, NSFW attributes, and artistic styles. We achieved state-of-the-art (SOTA) performance in two of these tasks (physical objects and NSFW attributes), significantly surpassing the recent SOTA method (MACE, CVPR 2024) in the object erasure task and closely matching the performance of the foundation model.

For the artistic style erasure task, a key challenge lies in the lack of a reliable detector for identifying the presence of artistic styles in generated images. To address this, we utilized the CLIP alignment score and the LPIPS score as evaluation metrics, which have also been employed in prior works. However, these metrics are not perfect and cannot reliably detect artistic styles. Consequently, compared to the other two tasks, the results of the artistic style erasure task should be interpreted as a complementary perspective.

Given that metric limitations in mind, from the experimental results in Table 3, it is evident that no single method clearly outperforms all others. For instance, while the MACE method achieves the best preservation performance, it performs worst in terms of erasure. Conversely, the ESD method secures the second-best erasure performance (based on the CLIP alignment score) but exhibits the poorest preservation performance. Our method strikes a better balance between preservation and erasure performance than all other methods. Specifically, with comparable preservation performance, our method outperforms UCE in erasure. Similarly, with comparable erasure performance, it surpasses ESD and CA in preservation. We will clarify this further in the final version of the paper.

评论

The results of the experiment are shown in the following table. We investigate three different vocabularies for the concept space C\mathcal{C} in the experiment including the ImageNet (AGE-I), Oxford-3K (AGE-O), and the manually crafted vocabulary (AGE-M), where we leverage the knowledge of the to-be-erased concepts to generate the vocabulary, i.e., which words are semantically related to the to-be-erased concepts, like "dog", "car", "instrument", etc. This manually crafted vocabulary is similar to the Oxford-3K but much smaller in size. We compare the performance of AGE with the ESD and UCE methods.

Compared to the baseline methods, it can be seen that UCE totally fails when the number of concepts increases. Our AGE method with Oxford-3K vocabulary (AGE-O) outperforms the ESD method in both erasure and preservation performance. Compared among the three vocabularies, it can be seen that while AGE-O outperforms other vocabularies in erasure performance, it has significant performance drop in preservation performance. The AGE method with manually crafted vocabulary (AGE-M) achieves the best trade-off between erasure and preservation performance, which has a small drop in erasure performance compared to ESD but a much better preservation performance, which is consistent with our analysis in Section 3.2.

ConceptSDESDUCEAGE-IAGE-OAGE-M
Erased Concepts
dog99.411.60.035.811.87.6
truck99.423.40.014.49.86.4
inst.98.610.80.028.812.816.0
build.91.238.80.061.253.074.8
elect.95.023.80.031.611.852.0
Similar Concepts
dog100.094.40.099.892.299.2
truck100.079.40.099.478.696.4
inst.98.833.40.080.434.884.8
build.97.477.00.090.686.695.2
elect.92.021.60.069.635.680.0
General Concepts
mamm.99.898.20.099.898.4100.0
bird99.687.00.298.486.697.2
rept.95.683.20.094.483.889.2
insect78.666.40.075.865.074.4
fish93.873.00.095.864.692.8
veh.99.882.80.098.880.098.0
craft99.264.20.096.670.494.4
furn.96.864.60.483.072.080.4
fruit100.081.60.099.883.899.2
obj.100.074.40.294.876.895.6
Metrics
ESR-124.290.6100.084.791.286.1
ESR-53.378.3100.065.680.268.6
PSR-183.156.30.074.757.173.5
PSR-596.872.10.191.873.991.8

The approach's effectiveness relies on the accuracy of the concept graph's structure. Any inaccuracies in capturing semantic relationships may affect erasure outcomes. Is there any discussion regarding this?

We thank the reviewer for raising this interesting question.

Powerful generative models like Stable Diffusion are trained on massive text-image datasets, such as the LAION-5B dataset. As a result, the number of concepts that these models can generate is extremely large. Consequently, constructing a complete concept graph that connects all these concepts is exponentially complex and practically impossible to achieve accurately.

To address this, our method relies on a concept space C\mathcal{C} that contains only a subset of all possible concepts. This concept space can be thought of as a vocabulary, enabling us to represent a concept as a combination of multiple elements from this vocabulary. By doing so, we can enhance the richness and expressiveness of the concepts while significantly reducing computational costs.

From our perspective, the concept space C\mathcal{C} is the most crucial part of our method. We have provided a thorough analysis of how the concept space is chosen in Appendix D.5.

评论

We thank the reviewer for acknowledging our strengths and providing constructive comments. We would like to address the remaining concerns as follows:

The optimization procedure for selecting target concepts may be computationally demanding for models with large concept spaces, which could limit AGE's scalability in practice.

We would like to address the scalability concern of our method from two aspects: computational complexity analysis and empirical evaluation.

Computational Complexity Analysis

Firstly, we would like to remind that we have already acknowledged this computational challenge in Appendix B in the paper. More specifically, a crucial aspect of our method is the concept space C\mathcal{C}, which is used to search for the optimal target concept. As discussed in Section 4 and further detailed in Appendix B, we use the Gumbel-Softmax trick, which requires feeding the model with the embedding matrix TCT_\mathcal{C} of all concepts in the concept space C\mathcal{C}. However, this requires a large computational cost, especially when the concept space C\mathcal{C} is large. To mitigate the issue, we use a small set of concepts Cce\mathcal{C}_{c_e} which contains the most kk closest concepts to the concept cec_e in the original concept space C\mathcal{C} for each concept, cec_e to reduce the computational cost. We simply choose k=100k=100 for all experiments.

Since we erase multiple concepts simultaneously, each concept cec_e has an associated set of target concepts Cce\mathcal{C}_{c_e} to search for.

We maintain a dictionary to store the weight π\pi of the optimal target concept for each concept cec_e. During each iteration, we first sample a concept cec_e and retrieve the previously stored weight πce\pi_{c_e} from the dictionary. By doing so, we not only reduce the computational cost but also improve the optimization stability.

More specifically, the size of the embedding matrix TCT_\mathcal{C} in each iteration is just B×k×dB \times k \times d, where BB is the batch size, kk is the size of the search space and dd is the dimension of the embedding space. It can be seen that the embedding matrix (as well as the computational cost) does not grow with the size of the erasing set E\mathbf{E} but only depends on the batch size and the size of the search space. Overall, the computational cost of AGE is still acceptable, even for a large erasing set.

Empirical Evaluation on Scalability

We conduct an additional experiment to demonstrate/evaluate the scalability of our proposed method. More specifically, we erase 25 concepts from the NetFive dataset, simultaneously and collect additional 75 other concepts from the ImageNet dataset to form a set of 100 concepts for evaluation. In the preservation set of 75 concepts, we intentionally include 25 concepts that are semantically similar to the 25 concepts being erased and 50 other concepts that are semantically unrelated. We use the ImageNet hierarchy from this website https://observablehq.com/@mbostock/imagenet-hierarchy and Google search to find these visually semantically similar concepts. The code has been uploaded to the anonymous GitHub repository and the experiment details will be provided in the final version. Below, we show the breakdown of the concepts (to-be-erased and to-be-preserved-similar) used in the experiment.

Super-CategoryTo-be-erasedTo-be-preserved
DogEnglish Springer, Clumber Spaniel, English Setter, Blenheim Spaniel, Border CollieChihuahua, Tibetan Mastiff, Red Fox, White Wolf, Hyena
VehicleGarbage Truck, Moving Van, Fire Engine, Ambulance, School BusMoped, Model T, Golf Cart, Tractor, Forklift
Music InstrumentFrench Horn, Bassoon, Trombone, Oboe, SaxophoneOrgan, Grand Piano, Guitar, Drum, Cello
BuildingChurch, Monastery, Bell Cote, Dome, LibraryBoathouse, Greenhouse, Cinema, Bookshop, Restaurant
DeviceCassette Player, Polaroid Camera, Loudspeaker, Typewriter Keyboard, ProjectorCellular Telephone, Laptop, Television, Desktop Computer, iPod
评论

Thanks for the authors' feedback.

评论

Dear Reviewer BfS9,

Thanks for taking time to read through the rebuttal. If you have any further questions or concerns, we would be happy to address them. If our revisions sufficiently address your feedback, we would appreciate your consideration of this in your evaluation.

评论

Dear Reviewer BfS9,

Thank you for taking the time to read through our rebuttal. We would like to remind you that we are available to address any further questions or concerns you may have. If our response has adequately resolved your concerns, we kindly ask you to consider updating your rating.

We sincerely appreciate your constructive feedback and thoughtful review, which undoubtedly helped us improve the quality of our work.

Best regards,

The Authors

审稿意见
6

This paper considers the erasure of undesirable concepts for generative models, namely diffusion models. The proposed approach, AGE, Adaptive Guided Erasure, models the concept space as a graph and locally selects a related target concept to minimize unintended side effects. Furthermore, the approach performs very well empirically. The authors demonstrate its use in several different settings.

优点

Concept erasure for reducing harmful content creation is clearly a highly important and impactful research area.

The proposed approach has several strengths:

  • Clever and intuitive knowledge graph-based approach
  • Clear and well motivated storyline for the proposed objective
  • Wide variety of empirical experiments

缺点

I think that the paper could be improved by the following:

  • The general techniques and ideas appear as not very complex or novel (rather an application of related ideas, e.g. classic graph-based approaches) to new problems. On some level, the depth of empirical analysis makes up for this. However, the reader is left feeling as though there could have been more methodological innovation in the work.
  • The presentation of results could be a bit clearer to show more of where gains come from. For instance, in Table 1, it seems no other method comes close to the proposed one. I would love to better understand why this is?

Minor:

  • typo 116: concept such as “A photo” or “ ”

问题

How does the approach scale with the number of concepts? How should we think how the granularity of concepts impacts the proposed method?

评论

The general techniques and ideas appear as not very complex or novel (rather an application of related ideas, e.g. classic graph-based approaches) to new problems.

We respectfully disagree with this comment. We believe that our paper makes three main contributions that are both innovative and provide new insights into the problem of concept erasure.

The first contribution is a novel empirical evaluation of the structure and geometric properties of the concept space, offering fresh perspectives on concept erasure. This includes key insights such as the localized impact of erasing one concept on another.

The second major contribution is our study of the impact of target concept selection on both erasure effectiveness and the preservation of benign concepts. In this study, we identify two key properties of desirable target concepts. To the best of our knowledge, this work is the first to systematically explore the structure of the concept space and analyze the effect of target concept selection on the erasure task. We believe these observations can provide new insights into the design of future concept erasure methods.

The third contribution is the introduction of the AGE method, which is the first approach capable of dynamically selecting the optimal target concept for each undesirable concept. We view the simplicity and effectiveness of our method as a strength rather than a weakness.

These contributions have been acknowledged by all other reviewers, as well as in your comment recognizing our strengths: "Clever and intuitive knowledge graph-based approach, clear and well-motivated storyline for the proposed objective."

The presentation of results could be a bit clearer to show more of where gains come from. For instance, in Table 1, it seems no other method comes close to the proposed one. I would love to better understand why this is?

We thank the reviewer for acknowledging the superiority of our method. The significant performance improvement stems from our method's ability to dynamically select the optimal target concept for each undesirable concept, whereas other methods must rely on a fixed target concept for all undesirable concepts.

These target concepts are chosen based on the key observation we comprehensively discuss in Section 3 of the paper. Specifically, we observe that the erasure effect is localized. Thus, the target concept should be semantically related to the concept being erased but not semantically similar to other concepts.

With this observation in mind, we designed our method to dynamically select the optimal target concept for each undesirable concept by solving a minimax optimization problem, as detailed in Section 4. The target concepts identified by our method exhibit the desired properties, as discussed in Appendix D.5.

We will clarify this further in the final version of the paper.

评论

Thanks for the authors' feedback.

评论

How should we think how the granularity of concepts impacts the proposed method?

We thank the reviewer for raising this interesting question.

First, we would like to clarify that granularity in this context refers to the level of detail or specificity used to define and categorize concepts. With this in mind, we identify two levels of granularity:

  • Fine-grained concepts are highly specific and detailed, capturing subtle variations and attributes. For example, within the category "dog," fine-grained concepts include specific breeds such as "English Springer Spaniel" or "Clumber Spaniel."

  • Coarse-grained concepts are more general and encompass broader categories. In the same example, "dog" itself would be a coarse-grained concept.

The empirical results from Section 3.2 (Choice of Target Concepts) suggest that an erasure method is more effective when the target concepts are fine-grained (closely related but non-synonymous).

The level of granularity is determined by the choice of concept space C\mathcal{C}, rather than by the specifics of the erasure method. For instance, one can use the ImageNet label set, which contains 1,000 concepts of fine-grained categories, such as specific dog breeds. In contrast, the Oxford-3K dictionary, consisting of 3,000 common English words, is more coarse-grained with categories like "dog" instead of specific breeds. As a result, when erasing a specific concept such as "English Springer Spaniel", using ImageNet concept space is more effective because it includes more finely-grained concepts.

We discussed the impact of the choice of concept space in Appendix D.2. Specifically, we compare four concept spaces with different levels of granularity:

  • ImageNet label set (fined-grain) includes specific concepts such as English Springer Spaniel, Clumber Spaniel.

  • Oxford-3K dictionary (coarse-grain) contains broader categories such as dog, cat, focusing on common English words.

  • CLIP vocabulary (coarse-grain) includes CLIP tokens such as dog, doggie, spaniel. Note that it also includes alphabet characters, subwords, and special tokens, which do not correspond to any meaningful concepts.

  • Generated concept set (fined-grain) This concept set is generated by ChatGPT-3.5 and is made available in the anonymous GitHub repository, folder "concepts". It contains 100 target concepts per concept to be erased, such as American Foxhound, Beagle, providing a high level of granularity. However, many of these concepts are overly similar to the concept being erased.

The results in Table 4 in the paper (we also provide here for reading convenience) show that our method achieves the best preservation performance but the worst erasure performance with the ChatGPT-generated concepts. This aligns with our analysis in Section 3.2, as the target concepts are too semantically similar to the erased concept. On the other hand, the preservation performance is poorer when using overly coarse-grained vocabularies, such as the CLIP and Oxford-3K concept space.

Finally, erasing with the ImageNet concept space achieves the best erasure performance and the second-best preservation performance among all tested concept spaces. This indicates that it provides an effective balance between granularity and semantic similarity.

VocabESR-1↑ESR-5↑PSR-1↑PSR-5↑
SD26.441.0082.4096.20
ESD95.4888.8841.3256.12
UCE100.00100.0021.9638.04
ImageNet97.0893.4880.8494.40
Oxford93.4887.6866.8885.40
CLIP93.4084.9669.9687.56
ChatGPT83.6041.8480.9296.49
评论

The results of the experiment are shown in the following table. We investigate three different vocabularies for the concept space C\mathcal{C} in the experiment including the ImageNet (AGE-I), Oxford-3K (AGE-O), and the manually crafted vocabulary (AGE-M), where we leverage the knowledge of the to-be-erased concepts to generate the vocabulary, i.e., which words are semantically related to the to-be-erased concepts, like "dog", "car", "instrument", etc. This manually crafted vocabulary is similar to the Oxford-3K but much smaller in size. We compare the performance of AGE with the ESD and UCE methods.

Compared to the baseline methods, it can be seen that UCE totally fails when the number of concepts increases. Our AGE method with Oxford-3K vocabulary (AGE-O) outperforms the ESD method in both erasure and preservation performance. Compared among the three vocabularies, it can be seen that while AGE-O outperforms other vocabularies in erasure performance, it has significant performance drop in preservation performance. The AGE method with manually crafted vocabulary (AGE-M) achieves the best trade-off between erasure and preservation performance, which has a small drop in erasure performance compared to ESD but a much better preservation performance, which is consistent with our analysis in Section 3.2.

ConceptSDESDUCEAGE-IAGE-OAGE-M
Erased Concepts
dog99.411.60.035.811.87.6
truck99.423.40.014.49.86.4
inst.98.610.80.028.812.816.0
build.91.238.80.061.253.074.8
elect.95.023.80.031.611.852.0
Similar Concepts
dog100.094.40.099.892.299.2
truck100.079.40.099.478.696.4
inst.98.833.40.080.434.884.8
build.97.477.00.090.686.695.2
elect.92.021.60.069.635.680.0
General Concepts
mamm.99.898.20.099.898.4100.0
bird99.687.00.298.486.697.2
rept.95.683.20.094.483.889.2
insect78.666.40.075.865.074.4
fish93.873.00.095.864.692.8
veh.99.882.80.098.880.098.0
craft99.264.20.096.670.494.4
furn.96.864.60.483.072.080.4
fruit100.081.60.099.883.899.2
obj.100.074.40.294.876.895.6
Metrics
ESR-124.290.6100.084.791.286.1
ESR-53.378.3100.065.680.268.6
PSR-183.156.30.074.757.173.5
PSR-596.872.10.191.873.991.8
评论

We thank the reviewer for acknowledging our strengths and providing constructive comments. We would like to address the remaining concerns as follows:

How does the approach scale with the number of concepts?

We would like to address the scalability concern of our method from two aspects: computational complexity analysis and empirical evaluation.

Computational Complexity Analysis

Firstly, we would like to remind that we have already acknowledged this computational challenge in Appendix B in the paper. More specifically, a crucial aspect of our method is the concept space C\mathcal{C}, which is used to search for the optimal target concept. As discussed in Section 4 and further detailed in Appendix B, we use the Gumbel-Softmax trick, which requires feeding the model with the embedding matrix TCT_\mathcal{C} of all concepts in the concept space C\mathcal{C}. However, this requires a large computational cost, especially when the concept space C\mathcal{C} is large. To mitigate the issue, we use a small set of concepts Cce\mathcal{C}_{c_e} which contains the most kk closest concepts to the concept cec_e in the original concept space C\mathcal{C} for each concept, cec_e to reduce the computational cost. We simply choose k=100k=100 for all experiments.

Since we erase multiple concepts simultaneously, each concept cec_e has an associated set of target concepts Cce\mathcal{C}_{c_e} to search for.

We maintain a dictionary to store the weight π\pi of the optimal target concept for each concept cec_e. During each iteration, we first sample a concept cec_e and retrieve the previously stored weight πce\pi_{c_e} from the dictionary. By doing so, we not only reduce the computational cost but also improve the optimization stability.

More specifically, the size of the embedding matrix TCT_\mathcal{C} in each iteration is just B×k×dB \times k \times d, where BB is the batch size, kk is the size of the search space and dd is the dimension of the embedding space. It can be seen that the embedding matrix (as well as the computational cost) does not grow with the size of the erasing set E\mathbf{E} but only depends on the batch size and the size of the search space. Overall, the computational cost of AGE is still acceptable, even for a large erasing set.

Empirical Evaluation on Scalability

We conduct an additional experiment to demonstrate/evaluate the scalability of our proposed method. More specifically, we erase 25 concepts from the NetFive dataset, simultaneously and collect additional 75 other concepts from the ImageNet dataset to form a set of 100 concepts for evaluation. In the preservation set of 75 concepts, we intentionally include 25 concepts that are semantically similar to the 25 concepts being erased and 50 other concepts that are semantically unrelated. We use the ImageNet hierarchy from this website https://observablehq.com/@mbostock/imagenet-hierarchy and Google search to find these visually semantically similar concepts. The code has been uploaded to the anonymous GitHub repository and the experiment details will be provided in the final version. Below, we show the breakdown of the concepts (to-be-erased and to-be-preserved-similar) used in the experiment.

Super-CategoryTo-be-erasedTo-be-preserved
DogEnglish Springer, Clumber Spaniel, English Setter, Blenheim Spaniel, Border CollieChihuahua, Tibetan Mastiff, Red Fox, White Wolf, Hyena
VehicleGarbage Truck, Moving Van, Fire Engine, Ambulance, School BusMoped, Model T, Golf Cart, Tractor, Forklift
Music InstrumentFrench Horn, Bassoon, Trombone, Oboe, SaxophoneOrgan, Grand Piano, Guitar, Drum, Cello
BuildingChurch, Monastery, Bell Cote, Dome, LibraryBoathouse, Greenhouse, Cinema, Bookshop, Restaurant
DeviceCassette Player, Polaroid Camera, Loudspeaker, Typewriter Keyboard, ProjectorCellular Telephone, Laptop, Television, Desktop Computer, iPod
评论

Dear Reviewer TuUt,

Thank you for taking the time to read through our rebuttal. We greatly appreciate your positive feedback on our work and are pleased that our response has adequately addressed your concerns.

Once again, we sincerely thank you for your time and effort in reviewing our paper.

Best regards, The Authors

评论

Dear Reviewer eC2n,

We would like to remind you that we have provided a detailed rebuttal to your concerns in the previous section, which we are pleased to note has been appreciated by Reviewer TuUt. We are also available to address any further questions or concerns you may have. If our response has adequately resolved your concerns, we kindly ask you to consider updating your rating.

We sincerely appreciate your constructive feedback and thoughtful review, which have undoubtedly helped us improve the quality of our work.

Best regards, The Authors

评论

Apologies for very delayed response. Thank you for your very delayed response. Yes, I have increased my score to 6 following your response.

评论

Dear Reviewer eC2n,

Thank you very much for the kind feedback and appreciation of our work.

Once again, we sincerely thank you for your time and effort in reviewing our paper.

Best regards,

The Authors

审稿意见
8

The paper "Optimal Targets for Concept Erasure in Diffusion Models and Where to Find Them" introduces a novel approach to concept erasure in diffusion models, aimed at mitigating the generation of harmful content by selectively unlearning undesirable concepts. The authors critique the existing fixed-target strategy, which maps undesirable concepts to a generic target, as suboptimal due to its failure to consider the impact on other concepts. Instead, they propose modeling the concept space as a graph to analyze the effects of erasing one concept on others, revealing that the impact is localized. The paper's key contributions include: Empirical Evaluation of Concept Space: The authors present a novel empirical evaluation of the concept space's structure and geometric properties, highlighting the locality of the impact of erasing one concept on another. Analysis of Target Concept Selection: They analyze how the choice of target concepts affects erasure effectiveness and the preservation of benign concepts, identifying that optimal targets should be closely related but not synonyms to the concept being erased. Adaptive Guided Erasure (AGE) Method: Based on their analysis, the authors propose the AGE method, which dynamically selects optimal target concepts for each undesirable concept using a minimax optimization problem. This method models target concepts as a learned mixture of multiple single concepts, allowing for a continuous search space. Experimental Validation: The paper demonstrates the effectiveness of AGE through extensive experiments on various erasure tasks, including object removal, NSFW attribute erasure, and artistic style removal. AGE significantly outperforms state-of-the-art methods in preserving unrelated concepts while effectively erasing undesirable ones. The authors also introduce the NetFive dataset for evaluating erasure methods and provide metrics for assessing generation capability. Their findings suggest that the concept space is sparse and localized, with the impact of erasing a concept being asymmetric and affecting only semantically related concepts. The paper concludes that AGE offers a superior balance between erasing undesirable concepts and preserving benign ones, supported by a comprehensive study of the concept space's structure.

优点

Originality# The paper presents a novel approach to concept erasure in diffusion models by introducing the Adaptive Guided Erasure (AGE) method. This method departs from the traditional fixed-target strategy by dynamically selecting optimal target concepts, which is a significant innovation in the field. The modeling of the concept space as a graph to understand the localized impact of concept erasure is a creative and original contribution. This approach not only addresses the limitations of existing methods but also provides new insights into the geometric properties of the concept space. Quality The quality of the research is high, as evidenced by the thorough empirical analysis and the development of the NetFive dataset for evaluation. The authors provide a comprehensive set of experiments across various tasks, demonstrating the effectiveness of the AGE method. The use of a minimax optimization problem to select target concepts is well-justified and effectively implemented. The paper also includes detailed metrics and comparisons with state-of-the-art methods, which strengthen the validity of the results. Clarity The paper is clearly written and well-structured, making it accessible to readers with a background in machine learning and diffusion models. The authors provide a clear explanation of the problem, the limitations of existing methods, and the rationale behind their proposed approach. The use of figures and tables to illustrate the results and the impact of different target concepts enhances the clarity of the presentation. Additionally, the inclusion of appendices with further details and analyses supports the main text and provides a deeper understanding of the methodology. Significance The significance of the paper lies in its potential to improve the safety and reliability of diffusion models by effectively erasing undesirable concepts while preserving benign ones. The insights into the concept space's structure and the introduction of the AGE method could inspire future research in concept manipulation and erasure. The paper's contributions are relevant to a wide range of applications, including content moderation, bias reduction, and intellectual property protection in generative models. By addressing a critical limitation of existing methods, this work has the potential to significantly impact the development and deployment of safer AI systems. In summary, the paper is a strong contribution to the field, offering original insights and a high-quality, well-executed methodology with significant implications for the future of diffusion models and concept erasure.

缺点

arget Concept Selection: The paper could benefit from a more detailed exploration of the target concept selection process, including specific examples and potential challenges. Scalability: The scalability of the minimax optimization approach for large concept spaces is not fully addressed. Discussing computational complexity and optimization strategies would be helpful. Generalization: The method's applicability to different types of diffusion or generative models is not thoroughly explored. Additional experiments or discussions on this aspect could enhance the paper's impact. Evaluation Metrics: A more comprehensive discussion on the choice and limitations of the evaluation metrics used would strengthen the validation of the method's effectiveness.

问题

How does the method ensure that the erasure of a concept does not inadvertently affect semantically related but benign concepts? Have you tested the AGE method on other types of diffusion models or generative models? If so, what were the results, and if not, what are the anticipated challenges?

评论

Empirical Evaluation on Scalability

We conduct an additional experiment to demonstrate/evaluate the scalability of our proposed method. More specifically, we erase 25 concepts from the NetFive dataset, simultaneously and collect additional 75 other concepts from the ImageNet dataset to form a set of 100 concepts for evaluation. In the preservation set of 75 concepts, we intentionally include 25 concepts that are semantically similar to the 25 concepts being erased and 50 other concepts that are semantically unrelated. We use the ImageNet hierarchy from this website https://observablehq.com/@mbostock/imagenet-hierarchy and Google search to find these visually semantically similar concepts. The code has been uploaded to the anonymous GitHub repository and the experiment details will be provided in the final version. Below, we show the breakdown of the concepts (to-be-erased and to-be-preserved-similar) used in the experiment.

Super-CategoryTo-be-erasedTo-be-preserved
DogEnglish Springer, Clumber Spaniel, English Setter, Blenheim Spaniel, Border CollieChihuahua, Tibetan Mastiff, Red Fox, White Wolf, Hyena
VehicleGarbage Truck, Moving Van, Fire Engine, Ambulance, School BusMoped, Model T, Golf Cart, Tractor, Forklift
Music InstrumentFrench Horn, Bassoon, Trombone, Oboe, SaxophoneOrgan, Grand Piano, Guitar, Drum, Cello
BuildingChurch, Monastery, Bell Cote, Dome, LibraryBoathouse, Greenhouse, Cinema, Bookshop, Restaurant
DeviceCassette Player, Polaroid Camera, Loudspeaker, Typewriter Keyboard, ProjectorCellular Telephone, Laptop, Television, Desktop Computer, iPod

The results of the experiment are shown in the following table. We investigate three different vocabularies for the concept space C\mathcal{C} in the experiment including the ImageNet (AGE-I), Oxford-3K (AGE-O), and the manually crafted vocabulary (AGE-M), where we leverage the knowledge of the to-be-erased concepts to generate the vocabulary, i.e., which words are semantically related to the to-be-erased concepts, like "dog", "car", "instrument", etc. This manually crafted vocabulary is similar to the Oxford-3K but much smaller in size. We compare the performance of AGE with the ESD and UCE methods.

Compared to the baseline methods, it can be seen that UCE totally fails when the number of concepts increases. Our AGE method with Oxford-3K vocabulary (AGE-O) outperforms the ESD method in both erasure and preservation performance. Compared among the three vocabularies, it can be seen that while AGE-O outperforms other vocabularies in erasure performance, it has significant performance drop in preservation performance. The AGE method with manually crafted vocabulary (AGE-M) achieves the best trade-off between erasure and preservation performance, which has a small drop in erasure performance compared to ESD but a much better preservation performance, which is consistent with our analysis in Section 3.2.

ConceptSDESDUCEAGE-IAGE-OAGE-M
Erased Concepts
dog99.411.60.035.811.87.6
truck99.423.40.014.49.86.4
inst.98.610.80.028.812.816.0
build.91.238.80.061.253.074.8
elect.95.023.80.031.611.852.0
Similar Concepts
dog100.094.40.099.892.299.2
truck100.079.40.099.478.696.4
inst.98.833.40.080.434.884.8
build.97.477.00.090.686.695.2
elect.92.021.60.069.635.680.0
General Concepts
mamm.99.898.20.099.898.4100.0
bird99.687.00.298.486.697.2
rept.95.683.20.094.483.889.2
insect78.666.40.075.865.074.4
fish93.873.00.095.864.692.8
veh.99.882.80.098.880.098.0
craft99.264.20.096.670.494.4
furn.96.864.60.483.072.080.4
fruit100.081.60.099.883.899.2
obj.100.074.40.294.876.895.6
Metrics
ESR-124.290.6100.084.791.286.1
ESR-53.378.3100.065.680.268.6
PSR-183.156.30.074.757.173.5
PSR-596.872.10.191.873.991.8
评论

Question: Have you tested the AGE method on other types of diffusion models or generative models? If so, what were the results, and if not, what are the anticipated challenges?

Our empirical investigation on the impact of the concept space, presented in Section 3, is based on the Stable Diffusion v1.4 and v2.1 models, with the full results provided in Appendix D.1. The experiments evaluating the AGE method are conducted on the Stable Diffusion v1.4 model, which is the standard setting in the literature.

To further validate the effectiveness of our method, we are conducting additional experiments on Stable Diffusion v2.1. However, due to time and resource constraints, these experiments have not yet been completed. We will update the results in the final version of the paper.

For generative models other than diffusion models, such as GANs or VAEs, we anticipate that the AGE method can be applied similarly. This is because our approach relies on the output of the generative model, which is consistent across all types of text-to-image generative models as long as they are conditioned on textual input. However, we leave a thorough exploration of these models for future work.

Weakness: Evaluation Metrics: A more comprehensive discussion on the choice and limitations of the evaluation metrics used would strengthen the validation of the method's effectiveness.

To evaluate the performance of a concept erasure method, the primary task is to detect the presence of the concept in the generated image. While this detection task might appear straightforward, the challenge lies in the lack of a universal detector capable of reliably identifying all concepts. As a result, the choice of detector depends on the specific task.

For example, in the object erasure task, we intentionally select concepts from the ImageNet dataset, allowing us to leverage pre-trained classifiers such as ResNet-50. For the NSFW attribute erasure task, we use the NudeNet detector, which is widely adopted in the literature. The most challenging task is artistic style erasure, for which no existing detector is available. In this case, we rely on the CLIP alignment score and the LPIPS score as evaluation metrics, both of which have been used in prior works [1, 2].

We will clarify this further in the final version of the paper.

[1] Gandikota, Rohit, et al. "Unified concept editing in diffusion models." WACV 2024.

[2] Heng, Alvin, and Harold Soh. "Selective amnesia: A continual learning approach to forgetting in deep generative models." NeurIPS 2023.

Weakness: Scalability: The scalability of the minimax optimization approach for large concept spaces is not fully addressed. Discussing computational complexity and optimization strategies would be helpful

We would like to address the scalability concern of our method from two aspects: computational complexity analysis and empirical evaluation.

Computational Complexity Analysis

Firstly, we would like to remind that we have already acknowledged this computational challenge in Appendix B in the paper. More specifically, a crucial aspect of our method is the concept space C\mathcal{C}, which is used to search for the optimal target concept. As discussed in Section 4 and further detailed in Appendix B, we use the Gumbel-Softmax trick, which requires feeding the model with the embedding matrix TCT_\mathcal{C} of all concepts in the concept space C\mathcal{C}. However, this requires a large computational cost, especially when the concept space C\mathcal{C} is large. To mitigate the issue, we use a small set of concepts Cce\mathcal{C}_{c_e} which contains the most kk closest concepts to the concept cec_e in the original concept space C\mathcal{C} for each concept, cec_e to reduce the computational cost. We simply choose k=100k=100 for all experiments.

Since we erase multiple concepts simultaneously, each concept cec_e has an associated set of target concepts Cce\mathcal{C}_{c_e} to search for.

We maintain a dictionary to store the weight π\pi of the optimal target concept for each concept cec_e. During each iteration, we first sample a concept cec_e and retrieve the previously stored weight πce\pi_{c_e} from the dictionary. By doing so, we not only reduce the computational cost but also improve the optimization stability.

More specifically, the size of the embedding matrix TCT_\mathcal{C} in each iteration is just B×k×dB \times k \times d, where BB is the batch size, kk is the size of the search space and dd is the dimension of the embedding space. It can be seen that the embedding matrix (as well as the computational cost) does not grow with the size of the erasing set E\mathbf{E} but only depends on the batch size and the size of the search space. Overall, the computational cost of AGE is still acceptable, even for a large erasing set.

评论

We thank the reviewer for the positive feedback and constructive comments. We would like to address the remaining concerns as follows:

Question: How does the method ensure that the erasure of a concept does not inadvertently affect semantically related but benign concepts?

We thank the reviewer for raising this interesting question.

Intuitively, as observed in Section 3, the concept space can be visualized as a graph where each node represents a concept, and edges between nodes represent the impact between concepts. Mapping one concept (cec_e) to another (ctc_t) by minimizing the erasing loss, as described in Eq. 3 of our paper, can be understood as pulling the node cec_e closer to the node ctc_t on this graph. This action triggers a chain reaction, where the erasure effect spreads out—strongly impacting locally related concepts and weakly affecting those further away or unrelated.

In the naive approach, the target concept ctc_t is a neutral concept semantically distant from the concept to be erased (cec_e). This leads to a stronger chain reaction effect. In contrast, our proposed approach adaptively selects ctc_t to be semantically related to, but not synonymous with, cec_e. This strategy minimizes the chain reaction effect. As discussed in Section 3.2, where we compare different target selection strategies—including those involving semantically related but benign concepts—the results show that the erasure impact is smaller with these in-class target concepts than with the naive strategy. These findings align well with our analysis.

To further evaluate the impact on semantically related but benign concepts, we conducted an additional experiment. Specifically, we erased 25 concepts from the NetFive dataset and selected additional 75 concepts from the ImageNet dataset for the preservation set. Within this preservation set, we intentionally included 25 concepts that are semantically similar to the concepts being erased, along with 50 semantically unrelated concepts.

We use the ImageNet hierarchy from this website https://observablehq.com/@mbostock/imagenet-hierarchy and Google search to find these visually semantically similar concepts. The code has been uploaded to the anonymous GitHub repository and the experiment details will be provided in the final version. Below, we show the breakdown of the concepts (to-be-erased and to-be-preserved-similar) used in the experiment.

Super-CategoryTo-be-erasedTo-be-preserved
DogEnglish Springer, Clumber Spaniel, English Setter, Blenheim Spaniel, Border CollieChihuahua, Tibetan Mastiff, Red Fox, White Wolf, Hyena
VehicleGarbage Truck, Moving Van, Fire Engine, Ambulance, School BusMoped, Model T, Golf Cart, Tractor, Forklift
Music InstrumentFrench Horn, Bassoon, Trombone, Oboe, SaxophoneOrgan, Grand Piano, Guitar, Drum, Cello
BuildingChurch, Monastery, Bell Cote, Dome, LibraryBoathouse, Greenhouse, Cinema, Bookshop, Restaurant
DeviceCassette Player, Polaroid Camera, Loudspeaker, Typewriter Keyboard, ProjectorCellular Telephone, Laptop, Television, Desktop Computer, iPod

The results show that our method, using the ImageNet vocabulary (AGE-I) and the manually crafted vocabulary (AGE-M), achieve the best preservation performance. Both variants exhibit only a small drop in generation capability for semantically related but benign concepts while significantly outperforming the baseline methods.

Among our variants, the worst-performing is AGE-O, which uses the Oxford-3K vocabulary. Nevertheless, AGE-O still outperforms the ESD method in preserving the tested concepts.

ConceptSDESDUCEAGE-IAGE-OAGE-M
Erased Concepts
dog99.411.60.035.811.87.6
truck99.423.40.014.49.86.4
inst.98.610.80.028.812.816.0
build.91.238.80.061.253.074.8
elect.95.023.80.031.611.852.0
Similar Concepts
dog100.094.40.099.892.299.2
truck100.079.40.099.478.696.4
inst.98.833.40.080.434.884.8
build.97.477.00.090.686.695.2
elect.92.021.60.069.635.680.0
评论

Results on Stable Diffusion v2.1

We follow the reviewer's suggestion and evaluate the AGE method on Stable Diffusion v2.1. We follow the same experimental settings as in Section 5.1 of the paper. More specifically, we conduct four different experiments, each involving the simultaneous erasure of five classes from the Imagenette dataset while preserving the remaining five classes, generating 500 images per class. While we can successfully deploy the ESD method on Stable Diffusion v2.1, we are unable to do the same for the UCE method because of a implementation issue. All the code has been uploaded to the anonymous GitHub repository.

It can be seen from table below that our method achieves significant improvements over the ESD method in both erasure and preservation performance, with a gain of 2.5% in ESR-5 and 8% in PSR-5. This indicates the generalizability of our method on different generative models.

We will add the results in the final version of the paper.

VocabESR-1↑ESR-5↑PSR-1↑PSR-5↑
SD18.28 ± 8.971.80 ± 0.6481.72 ± 8.9798.20 ± 0.64
ESD91.99 ± 6.3587.83 ± 7.7953.54 ± 8.2275.45 ± 6.43
AGE92.75 ± 6.9690.27 ± 8.7362.45 ± 13.3083.43 ± 9.71
评论

Dear the reviewers,

We sincerely thank you for your time and effort in reviewing our paper and for providing constructive comments. We appreciate your positive feedback on our work and your acknowledgements on the strengths and contributions made in our paper, which we humbly reiterate below for reference.

Reviewer TuUt: " This method departs from the traditional fixed-target strategy by dynamically selecting optimal target concepts, which is a significant innovation in the field. The modeling of the concept space as a graph to understand the localized impact of concept erasure is a creative and original contribution.... In summary, the paper is a strong contribution to the field, offering original insights and a high-quality, well-executed methodology with significant implications for the future of diffusion models and concept erasure"

Reviewer eC2n: "Clever and intuitive knowledge graph-based approach, Clear and well motivated storyline for the proposed objective, with wide variety of empirical experiments"

We note that the reviewers' raised a shared concern regarding the scalability of our method. We have addressed this in detail in our rebuttal. In summary, we provided a computational complexity analysis to demonstrate that the optimization problem depends on the batch size (i.e., the number of concepts called per iteration) and the size of the subset of concepts C\mathcal{C} rather than the total number of concepts. Furthermore, our implementation employs a dictionary structure to store concept embeddings, enabling efficient updates to target concept embeddings during optimization. To further validate the scalability of our method, we have provided an additional experiment on larger erasure tasks which shows that our method is capable of handling large-scale erasure tasks. All the additional experiments have been uploaded to the anonymous Github repository for reference.

For other concerns raised, we have provided detailed responses in the corresponding sections of the rebuttal. We believe these address all points comprehensively.

If there are any remaining questions or further clarifications needed, we would be delighted to discuss them.

Thank you once again for your constructive feedback and thoughtful review.

The authors

AC 元评审

The paper proposes AGE - a minimax optimization-based dynamical selection technique for mapping undesirable concepts to target concepts while preserving benign (i.e., desirable) concepts. AGE models the concept space as a graph. It shows that: (a) the effect of concept erasure is localized, and (b) at a local neighborhood level synonymous concepts are not optimal as target erasure.

I agree with majority reviewers that AGE is an original and creative approach toward concept erasure. The motivation is clear and can be agreed upon. Comprehensive experiments have been done to establish the robustness of AGE (along with validation across SOTA generative tasks and architectures). The paper has received credit in terms of clarity.

Certain concerns that if addressed can make the paper stronger:

  1. Scalability issues with large concept spaces and computational demands.
  2. Dependence on the accuracy of the concept graph and lack of discussion on potential inaccuracies.
  3. Limited exploration of generalization to other diffusion or generative models.
  4. Insufficient clarity in presenting results and explaining performance gains or shortcomings.
  5. Need for more robust evaluation metrics and human evaluations.

审稿人讨论附加意见

There has not been any discussion although the authors have made commendable effort in clarification. To summarize the reviews, Reviewer TuUt focuses on the broader contributions and originality, Reviewer eC2n is more skeptical about novelty, and Reviewer BfS9 centers on scalability and specific result interpretations.

最终决定

Accept (Poster)