SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
摘要
评审与讨论
This paper presents an efficient concept erasure method that edits model parameters. The proposed method comprises the following three techniques: 1) influence-based prior filtering (IPF), which retains only influential non-target concepts, 2) directed prior augmentation (DPA), which enriches the filtered retain set with variations in a semantically consistent way, and 3) invariant equality constraints (IEC), which preserve key invariants. Experiments on three concept erasure tasks (few-concept, multi-concept, and implicit concept erasure) demonstrate that the proposed method outperforms the previous concept erasure methods, such as UCE or RECE.
优缺点分析
Strengths
- The paper is well written. I could understand their motivation and the core idea of the proposed method.
- The proposed method is based on a deep insight into the behaviors of text-to-image diffusion models.
- The experimental results are good across several erasure tasks.
- Several ablation studies (including the ones in the supplementary material) are conducted, demonstrating that each introduced technique boosts performance.
Weaknesses
- The following things are not well-explained.
- [1-a] Equation 6 and Line 179: The mean value is used as a threshold, but it is not supported by a theoretical explanation. There should be several options as follows. An explanation or an ablation study would be desired.
- The geometric mean value instead of the arithmetic mean value. The geometric mean is more suitable in some cases where we deal with norms.
- The median value instead of the mean value. The median may be robust against outliers
- , where is a scale parameter to be scanned and
- [1-b] Equation 8: A standard normal distribution is used, but its variance can be changed. A reason for the choice or the following ablation study would be desired.
- ,where is a variance parameter to be scanned
- [1-a] Equation 6 and Line 179: The mean value is used as a threshold, but it is not supported by a theoretical explanation. There should be several options as follows. An explanation or an ablation study would be desired.
- Some recent papers (SAFREE, AdaVD, etc.) demonstrate that their method works with generative models other than Stable Diffusion v1.4. Showing that the proposed method works well with other models will make the paper more valuable.
Minor comments
- This paper lacks some citations. The following papers should be introduced and compared (in the supplementary material, if it is not very new). See the "Contemporaneous Work" section of the Call for Papers.
- Training-based methods: MACE is one of the state-of-the-art methods in this category, but the following papers perform well.
- Huang et al., "Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers", https://arxiv.org/abs/2311.17717
- Zhang et al., "Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models", https://arxiv.org/abs/2405.15234
- Wang et al., "ACE: Anti-Editing Concept Erasure in Text-to-Image Models", https://arxiv.org/abs/2501.01633
- Bui et al., "Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them", https://arxiv.org/abs/2501.18950
- Training-free methods: Some methods in this category work better than SLD, although they might not perform better than SPEED. However, there should be pros and cons between training-based methods, editing-based methods, and training-free methods. They should be mentioned.
- Yoon et al., "SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation", https://arxiv.org/abs/2410.12761
- Wang et al., "Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters", https://arxiv.org/abs/2412.06143
- Jain et al., "TraSCE: Trajectory Steering for Concept Erasure", https://arxiv.org/abs/2412.07658
- Lee et al., "Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation", https://arxiv.org/abs/2503.12356
- Training-based methods: MACE is one of the state-of-the-art methods in this category, but the following papers perform well.
问题
The topic is important, and the quantitative and qualitative results are good. However, I have several concerns about the paper, which I provided in "Weaknesses". I would appreciate it if the authors could address them. If their response is convincing, I will raise my rating.
局限性
yes
A Limitations section is included in the supplementary material.
最终评判理由
The authors and reviewers had discussions, and the authors addressed almost all concerns raised by the reviewers, including mine. I pointed out that the paper should include ablation studies on the threshold design, and the authors did them during the rebuttal/discussion period. My concern is resolved. I recommend this paper.
格式问题
none
W1: Equation 6 and Line 179: The mean value is used as a threshold, but it is not supported by a theoretical explanation. There should be several options as follows. An explanation or an ablation study would be desired.
We thank the reviewer for the insightful suggestion. We agree that comparing different strategies for determining in Eq. (6) is important and helps justify our design choice in IPF. Thus we conducted an ablation study comparing four variants: (1) Arithmetic mean (our default), (2) Geometric mean, (3) Median value, and (4) , where .
| Method | Arithmetic Mean (Ours) | Geometric Mean | Median Value | ||||||
|---|---|---|---|---|---|---|---|---|---|
| CS ↓ (Erase) | 26.29 | 26.51 | 26.41 | 26.34 | 25.93 | 25.66 | 26.51 | 26.95 | 27.12 |
| FID ↓ (Reatin) | 29.35 | 31.71 | 32.57 | 31.87 | 35.98 | 35.67 | 30.70 | 42.33 | 47.10 |
First, we observe that the performance among the arithmetic mean, geometric mean, median, and are relatively similar. This is because these strategies produce similar threshold values in practice. To further investigate this, we examine the prior shift distribution across the whole retain set and find that the distribution is relatively uniform without significant outliers. We attribute this to two reasons: (1) The CLIP feature space is relatively dense, and text embeddings rarely produce true outliers. (2) The retain set we used usually includes common, high-frequency concepts whose embeddings are more evenly distributed. Therefore, our choice of the arithmetic mean is motivated by its simplicity, being the most basic form without introducing extra hyperparameters.
Second, we find that the formulation shows significant performance variation when becomes large (e.g., ). This is because large values greatly affect the threshold, which can either overly enlarge or overly shrink the retain set. As further discussed in our responses to reviewers WTi1, 1DqN, and zWMb (shown below), we perform additional ablations on retain set size and found that both excessively small and overly large retain sets hurt performance.
| retain set ratio | 100% | 96% | 77% | 46% (Ours) | 20% | 9% | |
|---|---|---|---|---|---|---|---|
| CS ↓ (Erase) | 27.16 | 27.09 | 26.90 | 26.29 | 25.90 | 25.73 | |
| FID ↓ (Retain) | 48.29 | 45.56 | 44.38 | 29.35 | 34.61 | 37.29 |
In summary, although our current threshold is empirically selected, the extensive evaluations confirm that it consistently achieves a strong trade-off between erasure and preservation quality.
W2: Equation 8: A standard normal distribution is used, but its variance can be changed. A reason for the choice or the following ablation study would be desired.
Thanks for your suggestions. Based on your comment, we conduct an ablation study on the scaling coefficient of the standard deviation in the Gaussian noise used in Eq. (8).
From the results, we observe that changing the noise scale has limited impact on erasure performance (CS remains stable). However, prior preservation (FID) is sensitive to overly large or small values of . Very small noise (e.g., ) lacks sufficient diversity, while large noise (e.g., ) is more likely to introduce noisy and semantically-inconsistent concepts. Our default choice achieves the considerable overall balance. We will include this ablation in the revised paper.
| 0.1 | 0.5 | 1.0 (Ours) | 2.0 | 3.0 | 4.0 | 5.0 | 10.0 | |
|---|---|---|---|---|---|---|---|---|
| CS ↓ (Erase) | 26.38 | 26.38 | 26.29 | 26.41 | 26.50 | 26.51 | 26.45 | 26.49 |
| FID ↓ (Reatin) | 30.84 | 30.08 | 29.35 | 32.30 | 32.32 | 32.99 | 33.22 | 35.67 |
W3: Some recent papers (SAFREE, AdaVD, etc.) demonstrate that their method works with generative models other than Stable Diffusion v1.4. Showing that the proposed method works well with other models will make the paper more valuable.
Thanks for your advice. We agree that demonstrating the effectiveness of our method on other generative models is important for showcasing its generality. In the main paper, we have already provided qualitative results on a range of diffusion models, including DreamShaper, RealisticVision, SDXL, and even SD3 with the different DiT architecture (see Fig. 6). We will include more quantitative and qualitative results in the revised paper.
W4: This paper lacks some citations. The following papers should be introduced and compared (in the supplementary material, if it is not very new).
We thank the reviewer for the helpful suggestion. We agree that a more comprehensive discussion of related work is necessary for better positioning SPEED within the broader concept erasure literature.
Our main paper focuses on methods that directly modify model parameters under an editing-based manner for concept erasure, as this paradigm is more practical in white-box deployments (e.g., Stable Diffusion). Other methods based on external modules (e.g., gating module) or sampling-based optimization (e.g., attention swapping) can be easily bypassed in this white-box setting, which are less focused in our main paper.
In response, we categorize current concept erasure methods into three types: (1) Training-based methods (using gradient descent to optimize a training objective), (2) Editing-based methods (deriving parameter update with a closed-form solution), and (3) Sampling-based methods (intervening in the diffusion sampling process to suppress the generation of target semantics) as follows:
-
Training-based methods: Beyond those mentioned in the main paper, several other approaches adopt different training strategies and objectives to achieve effective concept erasure. For instance, CPE introduces an additional non-linear module acting as a classifier within each cross-attention layer to filter semantics aligned with the target concept. Similarly, Receler adds a plug-in module after each cross-attention layer to erase the target concept, and adopts adversarial training to enhance robustness. However, both methods are fragile in white-box attack scenarios, where attackers have full access to model parameters. Since all parameters remain unchanged, it is easy to bypass the auxiliary modules. In contrast, methods such as RACE and AdvUnlearn fine-tune the parameters directly and incorporate adversarial training to enhance robustness. RACE treats the diffusion model itself as a classifier to generate adversarial prompts, while AdvUnlearn introduces a fast attack generation method to identify such prompts effectively. ACE proposes to inject the erasure guidance into both conditional and the unconditional noise prediction, enabling the model to effectively prevent the creation of erasure concepts during both editing and generation. AGE dynamically selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects.
However, most training-based methods primarily focus on single-concept erasure and involve long training time (e.g., typically over 1,000× slower than our method in erasing a single concept), often requiring hundreds to thousands of iterations per target concept. This significantly limits their scalability and practicality in real-world applications, especially when erasing a large number of concepts within tight time constraints.
-
Editing-based methods mainly involve TIME, UCE, and RECE and have been thoroughly discussed in our main paper. A contemporaneous work published in March 2025, GLoCE, injects a lightweight module into the diffusion model with low-rank matrices and a simple gate, determined only by several generation steps for concepts with closed-form solutions (more detained comparisons of GLoCE can be found in the response to Reviewer jEfu, W3). However, GLoCE requires approximately 11,500 seconds to erase 100 concepts, 2,300× slower than our method. Moreover, GLoCE computes a separate gate module for each target concept, which can be easily bypassed in open-source white-box scenarios, as the core model parameters remain unchanged.
-
Sampling-based methods directly intervene in the diffusion sampling process during image generation to avoid generating target concept semantics. SLD modifies classifier-free guidance to steer generation away from the undesired concept. AdaVD and SAFREE leverage orthogonal projection techniques to eliminate the semantic influence of the target concept during inference. SAFREE performs orthogonal decomposition directly on the text embedding, which may lead to over-erasure and degradation of prior knowledge. Differently, AdaVD applies the decomposition in the value space of each cross-attention layer in the U-Net, and further introduces an adaptive shift mechanism to balance concept erasure and prior preservation. TraSCE guides the diffusion trajectory away from generating harmful content using a specific formulation of negative prompting.
However, although sampling-based methods demonstrate promising performance without parameter modification, they still remain vulnerable to white-box attacks, where full model parameters can be exploited to bypass the erasure mechanism.
We will summarize this discussion in the revised version and provide a more detailed analysis, including full citations and comparisons, in the supplementary material to clarify the strengths and limitations of each paradigm.
Thank you very much for your effort and thorough responses. I can easily imagine that conducting additional experiments was very tough. I really appreciate it.
I have read all the review comments and the authors' responses. I think that the authors have addressed the concerns of the reviewers, including mine, well.
The additional experiments about the retain set size or are comprehensive. My concern about it has been resolved. Regarding W3 (experiments on other generative models), it was careless of me. Figure 6 and its corresponding explanation in the main body describe the applicability of the proposed method clearly.
I am willing to raise my rate. And, let me observe discussions between the authors and the other reviewers. Thank you.
Thanks for your positive feedback! We're pleased that our rebuttal addressed your concerns. We sincerely appreciate your kind words and are glad to hear that the results on the retain set size and were helpful in clarifying our approach.
We also thank you for taking the time to read all the reviewers' comments and recognizing our efforts in addressing them. Your thoughtful evaluation and support are very encouraging to us.
Thank you again for your valuable review.
Dear Reviewer Rrzc,
Thank you again for your thoughtful comments and your willingness to raise the rating during the discussion phase.
We wanted to kindly follow up as the discussion period is approaching its end. We have engaged in discussions with the other reviewers (WTi1, 1DqN, and zWMb), and they have expressed that our rebuttal successfully addressed their concerns and showed support for recommending our submission towards acceptance.
Given your positive comments and acknowledgment that your main concerns have been resolved, we would greatly appreciate it if you could kindly raise your score in the final justification, should you find it appropriate.
Thank you again for your time and contributions to the review process.
Best regards,
Authors of Paper 14342
This paper proposes SPEED, a method for scalable, precise, and efficient concept erasure in T2I diffusion models. By applying null-space constrained model editing, combined with Prior Knowledge Refinement (including Influence-based Prior Filtering, Directed Prior Augmentation, and Invariant Equality Constraints), SPEED effectively removes target concepts while preserving non-target semantics. The method achieves strong results on few-concept, multi-concept (up to 100 concepts), and implicit concept erasure tasks, with significant efficiency gains over prior approaches.
优缺点分析
Strengths
- The authors propose three novel and reasonable strategies, IPF, DPA, and IEC, to construct a more accurate null space, achieving better prior preservation in challenging multi-concept erasure scenarios.
- Comprehensive experiments demonstrate the effectiveness of the proposed method in both few-concept and multi-concept erasure tasks, achieving SOTA erasure performance and enabling the erasure of 100 concepts within 5 seconds.
Weakness
- Please provide an Algorithm that includes the method details.
- In the experiments, how many concepts are preserved when erasing, for instance, 100 concepts simultaneously? Since the number of retained concepts is constrained by the feature dimensionality in Eq. (4), what is the upper bound on the number of concepts that can be preserved?
- In Table 1, the paper reports only CS for target concepts and FID for non-target concepts. Based on my understanding, both metrics can be informative for evaluating both erasure efficacy and prior preservation. Providing a more complete comparison that reports both CS and FID for both target and non-target concepts would offer a more comprehensive view of the erasure performance.
- The authors should include more ablation studies on introduced hyperparameters, such as the threshold choice in IPF and the number of augmentations in DPA.
- While the scalability and efficiency of SPEED are well-demonstrated, some practical aspects (e.g., memory usage when erasing a very large number of concepts) could be discussed more explicitly to guide practitioners.
问题
see the weakness
局限性
Yes
最终评判理由
The authors have addressed most of my concerns. Hence, I will keep my initial rate.
格式问题
N/A
W1: Please provide an Algorithm that includes the method details.
Thanks for the suggestion. We have provided a Markdown version of the algorithm with detailed method steps, and we will include it in the revised version of the paper.
Input: Model parameters , Erasure set , Retain set , Augmentation times
Output: Refined retain set
- Initialize concept embeddings
- Initialize and from
- Compute using Eq. 1
- Filter the original retain set with IPF
- Filter to obtain via Eq. 2
- Further augment and filter with DPA and IPF
- Augment to get using Eq. 3
- Filter to obtain via Eq. 2
- Combine the final set
Return:
W2: In the experiments, how many concepts are preserved when erasing, for instance, 100 concepts simultaneously? Since the number of retained concepts is constrained by the feature dimensionality in Eq. (4), what is the upper bound on the number of concepts that can be preserved?
The dimension of the null space is upper-bounded by the feature dimension of the model as shown in Eq. (4).
Since the feature dimension of the generation model is usually fixed, we investigate extreme cases where the retain set size is greater than . As shown in Fig. 2, we visualize the prior preservation when the retain set includes 20,000 concepts and thus .
Following [A], we include singular vectors w.r.t. non-zero singular values to ensure sufficient degrees of freedom for concept erasure. However, this leads to an approximate null space and induces semantic degradation within the retain set Fig. 2 (row 1). To mitigate this problem, we propose Prior Knowledge Refinement, a structured strategy for refining the retain set to enable accurate null-space construction as shown in Fig. 2 (row 2). This includes Influence-based Prior Filtering (IPF) which focuses preservation efforts on the most vulnerable concepts, Directed Prior Augmentation (DPA) to construct a more flexible null space by augmenting the retain set, and Invariant Equality Constraints (IEC) to protect sampling invariants.
Therefore, our proposed method is specifically designed to preserve more concepts by refining the retain set, it yields significantly better prior preservation compared to baseline methods, even when the theoretical null-space capacity is nearly exhausted.
Moreover, recent diffusion models have significantly increased their feature dimensionality—for instance, from in SD v1.4 to in SD3. This trend inherently improves the potential for null-space-based preservation. As demonstrated qualitatively in Fig. 6, our method shows reliable erasure and preservation performance on SD3, indicating our strong potential for future scalability.
[A] AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR25.
W3: In Table 1, the paper reports only CS for target concepts and FID for non-target concepts. Based on my understanding, both metrics can be informative for evaluating both erasure efficacy and prior preservation. Providing a more complete comparison that reports both CS and FID for both target and non-target concepts would offer a more comprehensive view of the erasure performance.
Thank you for the suggestion. Due to page limits, we reported the core metrics in the main paper. A complete comparison with both CS and FID for target and non-target concepts is provided in Appendix Tables 6 & 7. The results demonstrate that our method consistently achieves superior prior preservation, as indicated by higher CS and lower FID across the majority of non-target concepts.
W4: The authors should include more ablation studies on introduced hyperparameters, such as the threshold choice in IPF and the number of augmentations in DPA.
Thank you for the suggestion. We have ablated the effectiveness of our three proposed modules (IPF, DPA, and IEC) in Table 4. For the finer-grained ablation you mentioned, we conduct the following ablation studies:
-
IPF module: We ablate the different strengths of by multiplying , i.e., , where different corresponds to different ratios of the retain set (e.g., means no the whole retain set).
As shown below, varying impacts the trade-off between erasure (CS) and preservation (FID). A lower includes more weakly affected concepts into the retain set, increasing its rank and overly shrinking the null space. As dicussed in Sec. 4.1 (line 164-169), this leads to worse erasure efficacy (higher CS) and poor preservation (higher FID). Conversely, a higher yields better erasure performance due to fewer retain concepts, but still increases the FID because of non-comprehensive prior coverage. The best balance is observed at moderate thresholds ( in our setup). We will add this clarification to the revised version.
0.0 0.5 0.75 1.0 (Ours) 1.25 1.5 retain set ratio 100% 96% 77% 46% 20% 9% CS ↓ (Erase) 27.16 27.09 26.90 26.29 25.90 25.73 FID ↓ (Retain) 48.29 45.56 44.38 29.35 34.61 37.29 -
DPA module: We have included an ablation study of the DPA module in the Appendix (Fig. 10), analyzing the effects of the number of augmentations and augmentation rank .
W5: While the scalability and efficiency of SPEED are well-demonstrated, some practical aspects (e.g., memory usage when erasing a very large number of concepts) could be discussed more explicitly to guide practitioners.
Thanks for your advice. We report both the GPU memory (GB) and editing time (s) in erasing 1, 10, and 100 concepts, respectively. Since our method is based on a closed-form solution paradigm, it does not require additional image generation or gradient backpropagation, resulting in both memory efficiency and time efficiency. This advantage becomes particularly prominent in multi-concept erasure scenarios, where no significant increase in memory or time consumption is observed.
| Number of Target Concepts | 1 | 10 | 100 |
|---|---|---|---|
| GPU Memory (GB) | 8.75 | 8.96 | 9.13 |
| Editing Time (s) | 3.6 | 3.8 | 5.0 |
Dear Reviewer zWMb:
Thanks for your constructive comments. We would like to follow up to see if our response addresses your concerns or if you have any further questions. Thanks for your attention and best regards.
Faithfully,
Authors of Paper 14342
The authors have addressed most of my concerns. Hence, I will keep my initial rate.
Thanks for your positive feedback. Your comments have been very helpful and inspiring for improving our paper. We will incorporate the results mentioned in our response into the revised version to further enhance our manuscript. Thanks again for your support!
This paper proposes SPEED, a scalable and efficient method for concept erasure in text-to-image diffusion models. SPEED projects parameter updates onto the null space of non-target concepts, enabling effective concept erasure while preserving unrelated content generation. To avoid reducing the degrees of freedom in the null space, the authors introduce Influence-based Prior Filtering (IPF) to make the retain set more compact. They further use Directed Prior Augmentation (DPA) to ensure that the refined retain set covers a broader range of selected non-target concepts. Finally, Invariant Equality Constraints (IEC) are applied to preserve special tokens during erasing.
优缺点分析
Strengths
- The paper is well written and easy to follow the motivation for each component.
- SPEED is both scalable and efficient. It can handle erasing many concepts (up to 100) at once and be much faster than previous methods.
- Although the idea of using the null space comes from NLP task, the authors have improved it and successfully adapted it for concept erasing in diffusion models.
Weaknesses
- The sentence "we use the k-means algorithm [36] to select k centroids to reduce redundancy" in footnote 4 is a bit unclear. I would like to see more details.
- Lack of experiments or ablation studies on the impact of the retain set size (e.g., 10%, 20%, ..., 100%). Since a major motivation of SPEED is related to the size of the retain set, I believe it is important.
- Missing citations of related works (e.g., [A,B,C]). Including more related works would help improve the completeness of literature review and help readers better understand the development of this field.
- The paper only compares SPEED with four baselines. It would be helpful to include additional recent methods (such as ESD [17] and Receler [A]) at least in the main table, especially since their code is available. This would provide a more comprehensive evalution of SPEED's performance relative to existing work.
- The ablation study is only conducted with single-concept erasure. Since some components are designed for multi-concept, it would be helpful to include ablation results on multi-concept settings to better understand the contribution of each component.
- How does SPEED perform against adversarial prompts? Since robustness against attacks is an important issue in concept erasure tasks, it would be valuable to evaluate SPEED using adversarial methods (e.g., [D, E]).
- In experiments, the retain set is always chosen based on the target concept, which limits the flexibility. I just wonder is it possible to use a fixed, general-purpose retain set (e.g., MSCOCO) for any target concept?
[A] Huang, Chi-Pin, et al. "Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024.
[B] Kim, Changhoon, Kyle Min, and Yezhou Yang. "Race: Robust adversarial concept erasure for secure text-to-image diffusion model." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024.
[C] Wang, Yuan, et al. "Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025.
[D] Chin, Zhi-Yi, et al. "Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts." arXiv preprint arXiv:2309.06135 (2023)
[E] Tsai, Yu-Lin, et al. "Ring-a-bell! how reliable are concept removal methods for diffusion models?." arXiv preprint arXiv:2310.10012 (2023).
问题
See above weakness
局限性
yes
最终评判理由
The authors have adequately addressed my concerns regarding missing details (e.g., k-means selection), missing citations (e.g., [A, B, C]), ablation/experiment on retain set size, adversarial robustness, and the use of a general retain set. These clarifications improve the paper's clarity and completeness.
However, the initial version lacked these essential components, and as noted by Reviewer jEfu, the core idea about null space builds on prior NLP work [14] with incremental improvements. For these reasons, I have decided to give a score of 4 (Borderline Accept). The authors should ensure that all clarifications, additional citations, and experimental results are incorporated into the revised version.
格式问题
no
W1: The sentence "we use the k-means algorith to select k centroids to reduce redundancy" in footnote 4 is a bit unclear. I would like to see more details.
Thanks for your feedback. We introduce Invariant Equality Constraints (IEC) module to explicitly protect invariants during image sampling by forcing , where is the stacked invariant embedding matrix. When processing the null-text embedding, the CLIP would encode it into 77 token embeddings: 1 [SOT] and 76 [EOT] tokens.
Although these [EOT] embeddings are semantically similar, they still exhibit numerical differences that unnecessarily increase the rank of , which increases the optimization in obtaining , thereby limiting the degrees of freedom available for null-space optimization. To avoid this rank inflation while preserving the core semantics of the null-text, we apply k-means clustering to the 76 [EOT] embeddings and retain representative centroids (we set ). These centroids are then used to construct in Eq. (11) as part of the IEC constraints.
We will revise the manuscript to make this rationale and implementation choice clearer.
W2: Lack of experiments or ablation studies on the impact of the retain set size (e.g., 10%, 20%, ..., 100%). Since a major motivation of SPEED is related to the size of the retain set, I believe it is important.
We thank the reviewer for highlighting the importance of retain set size in selecting influential priors. In response, we ablate different strengths of by multiplying , i.e., , where different corresponds to different ratios of the retain set (e.g., means the whole retain set).
As shown below, varying impacts the trade-off between erasure (CS) and preservation (FID). A lower includes more weakly affected concepts into the retain set, increasing its rank and overly shrinking the null space. As dicussed in Sec. 4.1 (line 164-169), this leads to worse erasure efficacy (higher CS) and poor preservation (higher FID). Conversely, a higher yields better erasure performance due to fewer retain concepts, but still increases the FID because of non-comprehensive prior coverage. The best balance is observed at moderate thresholds ( in our setup). We will add this clarification to the revised version.
| 0.0 | 0.5 | 0.75 | 1.0 (Ours) | 1.25 | 1.5 | |
|---|---|---|---|---|---|---|
| retain set ratio | 100% | 96% | 77% | 46% | 20% | 9% |
| CS ↓ (Erase) | 27.16 | 27.09 | 26.90 | 26.29 | 25.90 | 25.73 |
| FID ↓ (Retain) | 48.29 | 45.56 | 44.38 | 29.35 | 34.61 | 37.29 |
W3: Missing citations of related works. Including more related works would help improve the completeness of literature review and help readers better understand the development of this field.
We thank the reviewer for pointing out the missing citations. We agree that a more comprehensive discussion of recent related works will help better position SPEED within the broader concept erasure literature. We now incorporate an extended comparison with our work including the following:
- Receler [A] adds plug-in modules after each attention layer to learn a lightweight Eraser for concept erasing and adopts adversarial training to refrain the model from erased target concepts. However, since it only incorporates additional modules without modifying any model parameters. it can be easily bypassed in white-box settings. Instead, our work directly modifies model parameters and thus is more applicable in white-box settings.
- RACE [B] fine-tunes model parameters using adversarial prompts to identify and mitigate adversarial text embeddings, significantly reducing the Attack Success Rate. However, it focuses mainly on single-concept erasure due to its inferior prior preservation (multiple erasing would lead to mode collapse) and involves high training cost (e.g., 800× slower than SPEED).
- AdaVD [C] applies interventions during the sampling process, projecting value features onto orthogonal direction to target embeddings and meanwhile introduces adaptive shifting at each operation. While lightweight, this sampling-based paradigm can also be easily bypassed in white-box settings, as the model parameters are not modified.
We have also consolidated all reviewers’ suggestions on the related work improvement. Please see our full summary in the response to Reviewer Rrzc, W4. We will summarize this discussion in the revised version.
W4: The paper only compares SPEED with four baselines. It would be helpful to include additional recent methods at least in the main table, especially since their code is available.
Thanks for your comment. Since these methods (ESD, RACE, and Receler) do not focus on multi-concept erasure scenarios, we compare their performance in single-concept erasure (i.e., erasing Van Gogh). Our method consistently outperforms these baselines in balancing between erasure and preservation with the lowest FID. Notably, our method can achieve single-concept erasure with only 3.6s, which is 150 to 1500 times faster than these baselines. We will incorporate these results in the revised version.
| Erase | Van Gogh | Picasso | Monet | Paul Gauguin | Caravaggio | COCO | COCO | |
|---|---|---|---|---|---|---|---|---|
| CS ↓ | FID ↓ | FID ↓ | FID ↓ | FID ↓ | CS ↑ | FID ↓ | Time (s) | |
| ESD-x | 27.04 | 111.07 | 90.35 | 106.70 | 107.85 | 26.10 | 33.19 | 530 (×150) |
| ESD-u | 26.24 | 153.10 | 105.78 | 164.83 | 124.41 | 26.35 | 38.08 | 530 (×150) |
| RACE | 23.03 | 127.28 | 94.49 | 106.43 | 114.94 | 25.92 | 41.52 | 2910 (×800) |
| Receler | 23.53 | 134.35 | 143.17 | 194.58 | 133.94 | 25.95 | 37.00 | 5560 (×1500) |
| Ours | 26.29 | 35.86 | 16.85 | 24.94 | 39.75 | 26.55 | 20.36 | 3.6 (×1) |
W5: The ablation study is only conducted with single-concept erasure. Since some components are designed for multi-concept, it would be helpful to include ablation results on multi-concept settings to better understand the contribution of each component.
We thank the reviewer for the valuable suggestion. In response, we conduct an ablation study under the multi-concept erasure setting. The results are consistent with the single-concept ablation results in Table 4, further confirming the individual effectiveness of the proposed components (IEC, DPA, and IPF) in editing-based concept erasure. We will include this additional ablation in the revised paper.
| Config | IEC | DPA | IPF | ↓ | ↑ | ↑ | COCO CS ↑ | COCO FID ↓ |
|---|---|---|---|---|---|---|---|---|
| 1 | ✖ | ✖ | ✖ | 8.62 | 76.82 | 83.47 | 26.18 | 50.82 |
| 2 | ✔ | ✖ | ✖ | 8.42 | 79.62 | 85.18 | 26.21 | 46.26 |
| 3 | ✔ | ✔ | ✖ | 6.86 | 82.92 | 87.73 | 26.22 | 45.04 |
| Ours | ✔ | ✔ | ✔ | 5.87 | 85.54 | 89.63 | 26.22 | 44.97 |
| SD v1.4 | - | - | - | 90.18 | 89.66 | 17.70 | 26.53 | - |
W6: How does SPEED perform against adversarial prompts? Since robustness against attacks is an important issue in concept erasure tasks, it would be valuable to evaluate SPEED using adversarial methods (e.g., [D, E]).
Thanks for your advice. Due to the character limit, please refer to our response to Reviewer WTi1, comment W4.
W7: In experiments, the retain set is always chosen based on the target concept, which limits the flexibility. I just wonder is it possible to use a fixed, general-purpose retain set (e.g., MSCOCO) for any target concept?
We agree with the reviewer that using a fixed, general-purpose retain set (e.g., MSCOCO) would offer greater flexibility. The use of a targeted retain set is by default adopted from prior works (e.g., UCE, RECE), where the retain set is designed to maximize the preservation of semantically related concepts during each erasure task. This paradigm stems from our discussion in Sec. 4.1 where concept erasure inherently exhibits locality. This indicates that when erasing a specific concept (e.g., Van Gogh), some related concepts (e.g., Picasso and Monet) are more influenced than those general concepts (e.g., MSCOCO). To further validate this statement, we compare the erasure and preservation performance with both general and targeted retain sets below.
| Erase | Van Gogh | Picasso | Monet | Paul Gauguin | Caravaggio | COCO | COCO |
|---|---|---|---|---|---|---|---|
| CS ↓ | FID ↓ | FID ↓ | FID ↓ | FID ↓ | CS ↑ | FID ↓ | |
| General Retain Set | 25.93 | 75.27 | 80.64 | 76.94 | 88.48 | 26.51 | 18.54 |
| Targeted Retain Set | 26.29 | 35.86 | 16.85 | 24.94 | 39.75 | 26.55 | 20.36 |
From the table, using a targeted retain set significantly improves the preservation of related concepts (e.g., Picasso, Monet), while achieving comparable performance to the general concepts from MSCOCO. This demonstrates that targeted retain sets are more effective at preserving semantically close concepts, and notably, even without explicitly including general concepts in the retain set, their general prior is remaining largely preserved.
Furthermore, since mainstream concept erasure scenarios typically involve specific domains such as IP instances, artistic styles, celebrity identities, and NSFW content, preparing a dedicated retain set for each category in advance is often sufficient to cover most practical use cases. Nevertheless, we agree that developing a more universal, general-purpose retain set is a promising direction, and we plan to explore this further in future work.
After reading the authors' feedback and the comments from the other reviewers, I would like to thank the authors for their thorough rebuttal.
My concerns have been fully addressed. Therefore, I have raised my score to 4. Please ensure that the clarifications and references are incorporated into the revised manuscript.
We sincerely thank the reviewer for the positive feedback and for raising the score. We truly appreciate your constructive comments throughout the review process, which have greatly helped us improve the clarity and quality of our work.
We confirm that all clarifications and references provided in the rebuttal will be incorporated into the revised manuscript as suggested. Thanks again for your support.
This paper proposes an effective method for erasing multiple concepts while preserving non-targeted concepts. Previous studies faced challenges in maintaining performance on non-target concepts when multiple concepts are erased simultaneously. However, this paper addresses this issue by introducing invariant changes in network parameters inspired by null spaces, which are facilitated by continual learning. The paper empirically validates object and concept erasing scenarios and demonstrates the generation of images for normal text prompts.
优缺点分析
Strengths
- This paper addresses a timely issue when multiple concepts need to be erased. The primary concern is how non-targeted concepts are preserved during this process.
- The proposed method is inspired from techniques that enable continual learning, which helps maintain knowledge acquired from previous stages. This approach aligns with the preservation of information related to non-targeted concepts.
- Once null spaces for the retain set are identified, a three-step procedure is followed to create compact and augmented retain sets. These sets are straightforwardly understood and efficiently determine the information that needs to be preserved.
Weaknesses
- This paper doesn’t explicitly explain the role of directed prior augmentation (DPA) using random generated noises. I understand that low-dimensional structure, as decomposed by SVD, consists of principal and meaningless embeddings. This procedure encourages to offer augmented concepts that align with the original concepts. However, I’m curious about the specific improvements in terms of image quality and concept erasure that DPA derives. In its current form, analyzing ablation studies alone doesn’t provide a clear understanding of DPA’s role. Visualizations would help readers grasp the justification of DPA.
问题
-
In Eq 6, the authors mention that making influential priors depends on the hyperparameter . However, when I read the manuscript, I noticed that the authors did not explicitly state how the strength of affects the performance in multiple concept erasing. I believe that lowering could lead to ambiguity in augmenting the prior information.
-
I’m curious about extreme cases where low-dimensional structures fail to span the original spaces. These situations suggest that either the null spaces of retained sets become almost empty sets or the retained sets consist of multiple terms sharing similar or identical meanings. In such cases, I’d like to observe how the proposed method performs in extreme scenarios.
-
I’m curious about the reason behind Eq. 22. I don’t think Eq. 22 is trivial without a specific condition. While I understand Eq. 33 works, Eq. 22 doesn’t hold unless a specific assumption is made. I believe the authors must specify a specific condition that doesn’t contradict the core hypothesis.
局限性
- In the recent concept erasing, the task of removing nudity has become crucial in preventing Not Sale for Work (NSFW) images. I was curious if this approach could effectively address an issue where adversarial users attempt to generate NSFW images [1,2]. In this paper, the authors addressed this issue by utilizing the I2P dataset. However, adversarial text prompts raise a concern, as recent defense methods still produce harmful content.
Reference
[1] Yang, Yijun, et al. "Mma-diffusion: Multimodal attack on diffusion models." CVPR2024.
[2] Zhang, Yimeng, et al. "To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now." ECCV2024.
最终评判理由
In this rebuttal, the authors addressed my concerns about the sensitivity of hyper-parameters and provided qualitative evidence for Directed Prior Augmentation. They also proposed Prior Knowledge Refinement, which includes Influence-based Prior Filtering, to address an extreme case where low-dimensional structures fail to span the original spaces.
However, their theoretical reasoning seems to selectively choose matrix norms and addresses, which may not fully justify the proposed method.
Considering its experimental improvements and technical contributions, I evaluate this paper as 4.
格式问题
N/A
W1: This paper doesn’t explicitly explain the role of directed prior augmentation (DPA) using random generated noises. I understand that low-dimensional structure, as decomposed by SVD, consists of principal and meaningless embeddings. This procedure encourages to offer augmented concepts that align with the original concepts. However, I’m curious about the specific improvements in terms of image quality and concept erasure that DPA derives. In its current form, analyzing ablation studies alone doesn’t provide a clear understanding of DPA’s role. Visualizations would help readers grasp the justification of DPA.
We thank the reviewer for the insightful comment. Your understanding is correct. Since we cannot include visualizations in the rebuttal, we describe the effect of DPA in details. Overall, DPA mainly improves the consistency of retain concept images before and after erasure. For example, we visualize both the target (snoopy) and retain (mickey) concept when erasing Snoopy w/ and w/o DPA. (1) For retaining image quality (mickey), generations w/ DPA exhibit more consistent visual features with pre-trained model's generations, with better preservation of both subject (e.g., buttons on Mickey's clothes) and background (e.g., flowers and plants in the background). (2) For concept erasure (snoopy), generations w/ DPA demonstrate more successful erasure (e.g., Snoopy's iconic black ears and facial shape are more cleanly removed visually) compared to those w/o DPA. However, the effect on erasure is positive but less visually noticeable, as w/ and w/o DPA both show successful erasure of target semantics.
In summary, the visualization results align with our quantitative ablation in the main paper, DPA ensures that augmented concepts remain valid and consistent, which helps improve both erasure efficacy and prior preservation. We will add visualizations in the revised version.
Q1: In Eq 6, the authors mention that making influential priors depends on the hyperparameter . However, when I read the manuscript, I noticed that the authors did not explicitly state how the strength of affects the performance in multiple concept erasing. I believe that lowering could lead to ambiguity in augmenting the prior information.
We thank the reviewer for highlighting the importance of in selecting influential priors. In response, we ablate different strengths of by multiplying , i.e., , where different corresponds to different ratios of the retain set (e.g., means the whole retain set).
As shown below, varying impacts the trade-off between erasure (CS) and preservation (FID). A lower includes more weakly affected concepts into the retain set, increasing its rank and overly shrinking the null space. As dicussed in Sec. 4.1 (line 164-169), this leads to worse erasure efficacy (higher CS) and poor preservation (higher FID). Conversely, a higher yields better erasure performance due to fewer retain concepts, but still increases the FID because of non-comprehensive prior coverage. The best balance is observed at moderate thresholds ( in our setup). We will add this clarification to the revised version.
| 0.0 | 0.5 | 0.75 | 1.0 (Ours) | 1.25 | 1.5 | |
|---|---|---|---|---|---|---|
| retain set ratio | 100% | 96% | 77% | 46% | 20% | 9% |
| CS ↓ (Erase) | 27.16 | 27.09 | 26.90 | 26.29 | 25.90 | 25.73 |
| FID ↓ (Retain) | 48.29 | 45.56 | 44.38 | 29.35 | 34.61 | 37.29 |
Q2: I’m curious about extreme cases where low-dimensional structures fail to span the original spaces. These situations suggest that either the null spaces of retained sets become almost empty sets or the retained sets consist of multiple terms sharing similar or identical meanings. In such cases, I’d like to observe how the proposed method performs in extreme scenarios.
We appreciate the reviewer’s interest in our performance under extremely low-dimensional conditions. Indeed, the dimension of the null space is upper-bounded by the feature dimension of the model as shown in Eq. (4).
Since the feature dimension of the generation model is usually fixed, we investigate extreme cases where the retain set size is greater than . As shown in Fig. 2, we visualize the prior preservation when the retain set includes 20,000 concepts and thus .
Following [A], we include singular vectors w.r.t. non-zero singular values to ensure sufficient degrees of freedom for concept erasure. However, this leads to an approximate null space and induces semantic degradation within the retain set Fig. 2 (row 1). To mitigate this problem, we propose Prior Knowledge Refinement, a structured strategy for refining the retain set to enable accurate null-space construction as shown in Fig. 2 (row 2). This includes Influence-based Prior Filtering (IPF) which focuses preservation efforts on the most vulnerable concepts, Directed Prior Augmentation (DPA) to construct a more flexible null space by augmenting the retain set, and Invariant Equality Constraints (IEC) to protect sampling invariants.
Therefore, our proposed method is specifically designed to alleviate such extreme cases by refining the retain set, it yields significantly better prior preservation compared to baseline methods even when the theoretical null-space capacity is nearly exhausted.
[A] AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR25.
Q3: I’m curious about the reason behind Eq. 22. I don’t think Eq. 22 is trivial without a specific condition. While I understand Eq. 33 works, Eq. 22 doesn’t hold unless a specific assumption is made. I believe the authors must specify a specific condition that doesn’t contradict the core hypothesis.
We thank the reviewer for the careful reading and for pointing this out. We agree that Eq. 22 is not self-evident and may appear unclear without further justification. We have provided the proof below to clarify its validity. We apologize for the confusion and will include this derivation explicitly in the revised paper to ensure clarity.
Proof.
The singular value decomposition (SVD) of can be written as follows:
where and are orthogonal matrixs. Considering the Frobenius norm is invariant under orthogonal transformations, thus we have:
Let , then:
Recall that is a diagonal matrix, so multiplying it scales each row (corresponding to each singular value):
where is the -th row of . Noting that , we have:
Thus, we obtain:
i.e.,
Q4: In the recent concept erasing, the task of removing nudity has become crucial in preventing Not Sale for Work (NSFW) images. I was curious if this approach could effectively address an issue where adversarial users attempt to generate NSFW images [1,2]. In this paper, the authors addressed this issue by utilizing the I2P dataset. However, adversarial text prompts raise a concern, as recent defense methods still produce harmful content.
Thanks for your question. As discussed with Reviewer jEfu in W4, we compare our method w/ and w/o adversarial training/editing (AT) under both white- (UnlearnDiff) and black-box (MMA and Ring-A-Bell) adversarial attack benchmarks, and report Attack Success Rate (ASR). From the table, our method with adversarial training/editing (Ours w/ AT) exhibits effective performance against both black-box and white-box attacks on par with competitive baselines such as CPE, AdvUnlearn, and Receler, while maintaining efficient runtime (4.5s).
| MMA ↓ | Ring-A-Bell ↓ | UnlearnDiff ↓ | Time (s) ↓ | White-Box Attack | |
|---|---|---|---|---|---|
| CPE | 0.01 | 0.00 | - | 500 (×138) | ✖ |
| AdvUnlearn | 0.00 | 0.00 | 0.21 | 15860 (×4400) | ✔ |
| UCE | 0.38 | 0.39 | 0.80 | 1.2 (×0.33) | ✔ |
| RECE | 0.20 | 0.18 | 0.65 | 1.5 (×0.41) | ✔ |
| RACE | 0.29 | 0.21 | 0.47 | 2910 (×800) | ✔ |
| Receler | 0.07 | 0.01 | - | 5560 (×1500) | ✖ |
| Ours w/o AT | 0.24 | 0.20 | 0.75 | 3.6 (×1) | ✔ |
| Ours w/ AT | 0.01 | 0.00 | 0.45 | 4.5 (×1.25) | ✔ |
Note: ”-” indicates that the method cannot defend white-box attack as it doesn’t modify any model parameters
Current improvements with AT demonstrates the great potential of SPEED in resisting adversarial attacks. Due to the limited time during rebuttal, we could only include preliminary results. We will conduct a further study in the revised paper.
I appreciate the authors’ comprehensive rebuttal and address the raised issues in my initial remarks. The manuscript would significantly benefit from incorporating qualitative results of DPA. For instance, presenting sample outputs immediately demonstrate that the practical contribution of the mechanism would be highly effective. Therefore, I ask to the authors to include in the camera-ready version.
Additionally, my concerns about the additional experiment on adversarial prompts and the ablation study for the filtering parameter have been satisfactorily addressed, and results are clearly explained.
As of Eq. 22, I acknowledge that the linear-algebra argument itself is correct. However, there remains arguable between the low-rank structure of the weight matrices and the updated formular based on frobenius norm. In the initial submission, the matrix norm was treated as either frobenius or spectral. Initially, this seemed acceptable because low-rank reasoning is naturally linked to spectral analysis of weight space, augmentation with low-spectral components, and regularization in the null spaces. However, in this theoretical setting, the frobenius norm measures an overall euclidean distance, thereby de-emphasizing the very low-rank features that motivate the approach. This choice does not align well with the main theorem, which stems from spectral considerations.
For this reason, I encourage the authors to restate and reorganize the relevant sections to ensure readers comprehend precisely why the frobenius norm is preferred and when the proposed method constructs low-rank structures. It would be beneficial to clarify which aspects depend more on spectral analysis. A natural way to achieve this clarity would be to place the discussion of norm selection immediately before the derivation of DPA, thereby maintaining a transparent mathematical flow.
Thank you for your valuable feedback. We are encouraged that our response has adequately addressed the concerns raised in your initial remarks. We will include the additional results and visalizations in the camera-ready version. We further clarify your concerns as follows:
The manuscript would significantly benefit from incorporating qualitative results of DPA. For instance, presenting sample outputs immediately demonstrate that the practical contribution of the mechanism would be highly effective. Therefore, I ask to the authors to include in the camera-ready version.
Thanks for your valuable suggestion. Due to the constraints of the rebuttal, we are unfortunately not able to include figures at this stage. Instead, we have provided a detailed description to convey the role and effectiveness of the DPA module within our method. We will include comprehensive qualitative examples of the DPA module in the camera-ready version.
I encourage the authors to restate and reorganize the relevant sections to ensure readers comprehend precisely why the Frobenius norm is preferred and when the proposed method constructs low-rank structures. It would be beneficial to clarify which aspects depend more on spectral analysis.
Thank you very much for raising this important point and for the careful consideration of our manuscript. We would like to further clarify our norm selections and low-rank structure construction explicitly to avoid any misunderstanding:
-
Proof of the lower bound of for UCE: The reason we chose the Frobenius Norm in Eq. 22 specifically follows directly from the original formulation of UCE (i.e., UCE formulates its objective using Frobenius Norm). The context of Eq. 22 was to derive the lower bound of the term in UCE. In this proof, our primary goal is to demonstrate rigorously that the matrix is not a zero matrix. In this specific theoretical context, whether using the Frobenius Norm ( ) or Spectral Norm ( ), the conditions are mathematically equivalent, as both would yield the same conclusion regarding the non-zero nature of . Hence, employing the Frobenius Norm here does not compromise the validity of our result.
-
Our improvement over UCE: Our null-space-constrained objective introduces a null-space projection matrix , ensuring that parameter updates do not alter the retain concepts. Compared to UCE, our theoretical construction guarantees , achieving strict prior preservation. As a direct extension and refinement of UCE’s original formulation, we intentionally preserve the use of the Frobenius Norm in our optimization objective.
We understand your concern regarding the potential mismatch between our use of the Frobenius Norm and the spectral motivations. However, we would like to clarify explicitly that the choice of Frobenius Norm would not weaken the validity of our theoretical conclusions. This is primarily because our core theoretical objective is to constrain the prior preservation error strictly to zero (i.e., ), a result which remains unaffected by the choice of norm.
-
Low-rank constraints in DPA: From a spectral perspective, DPA explicitly samples along the least-variant directions of (the original model weight) as determined by its singular spectrum, ensuring semantically similar and meaningful augmentation samples. The low-rank structure is applied to augment the representations of the original retain set, rather than introducing norm-based regularization on the final parameter update .
Moreover, your comment about spectral considerations also provides valuable insights for our future explorations, where we have been inspired by existing works about Spectral Norm regularization in many domains, such as maintaining gradient diversity [A], improving generalization [B], and stabilizing training dynamics [C].
In summary, our choice of the Frobenius Norm originates explicitly from the original UCE formulation. Our theoretical analysis to achieve remains rigorous under this norm choice, and the effectiveness of our approach is further supported by our experimental results. Following your recommendation, we will explicitly include a discussion about the selection between the Frobenius and Spectral Norm before the derivation of DPA in the revised paper to enhance the clarity and theoretical coherence.
[A] Learning Continually by Spectral Regularization, ICLR25
[B] Spectral Normalization for Generative Adversarial Networks, ICLR18
[C] Can We Gain More from Orthogonality Regularizations in Training Deep CNNs, NeurIPS18
I appreciate the authors’ clarification of their theoretical contributions regarding regularization and their proposed methods. However, I believe their approaches could be enhanced by integrating diverse perspectives on norms as a single theoretical concept. This approach would simultaneously address the limitations of previous methods and the proposed ones. Specifically, I am concerned that the authors selectively chose updated weight norms and guided readers to the current form of their proposed method. I hope to revise the manuscript and incorporate these suggestions in the final version.
Overall, I appreciate the authors’ efforts to address my concerns and maintain my initial score of 4.
Thanks for your continued support of our manuscript towards acceptance.
We sincerely appreciate your thoughtful suggestion. Our current theoretical framework is complete within the context of the Frobenius norm and demonstrates clear improvements over these baselines. In the final revision, we will further incorporate your suggestions and discuss the possibility of closed-form solutions under diverse norm choices to enrich the theoretical scope of our work.
Thanks again for your valuable feedback and support.
This work proposes a concept erasing method by exploiting the null space and model editing space for direct edits of model parameters. Specifically, the proposed method consists of retaining the most affected non-target concepts with semantically consistent variations while preserving ket invariance. This work demonstrates that it can erase 100 concepts within 5 seconds.
优缺点分析
Strengths:
- The paper is well written and the proposed method is clearly described.
- This work addresses an important problem of concept erasing while preserving the rest.
- The proposed method seems efficient in computation.
Weaknesses:
- There are a number of recent methods (see below) that can erase 100 concepts other than this work and MACE [33]. Moreover, this work only focuses on erasing 100 celebrities, not other concepts such as copyright characters, 100 artistic styles and so on. In order to claim that this proposed method is scalable, then the method should be demonstrated for more diverse cases with diverse erasing concepts as well as diverse remaining concepts. I am very curious about other results such as erasing 100 artistic styles and retaining other celebrities while erasing 50-100 celebrities.
- Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate (ICLR 2025)
- Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation (CVPR 2025)
- Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models (CVPR, 2025)
- The proposed method exploits the null space / editing space via SVD. See the below work that uses a similar method with SVD for handling concept erasing. It seems important to justify the novelty of the proposed method over this work. How was this work using SVD differently and using the null space differently?
- Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation (CVPR 2025) The concept of using low rankness / null space for concept erasing has also been explored for different applications in the following works:
- Linear Adversarial Concept Erasure (ICML 2022)
- LEACE: Perfect linear concept erasure in closed form (NeurIPS 2023)
- Machine Unlearning via Null Space Calibration (IJCAI 2024)
- It seems that the proposed method focuses on remaining concepts while not much focusing on erasing concepts and thus the proposed method does not seem to achieve SOTA performance in erasing. It seems important to demonstrate that this method works well for erasing with diverse benchmarks. For example, this work seems quite inefficient for NudeNet detection on I2P, which is one of the most important applications that this work also claimed. See the above works as well as the below works:
- Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models (NeurIPS 2024)
- R.A.C.E. : Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model (ECCV 2024)
- Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers (ECCV 2024)
- All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models (AAAI 2024)
- Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation (NeurIPS 2024)
- It is important to ensure that the concept erasing method is robust against adversarial attacks. While the proposed method seems to have this capability, it was not fully evaluated for popular adversarial attacks - thus it is unclear.
问题
Please address the concerns above.
局限性
Yes
最终评判理由
The rebuttal has addressed most of the major concerns, so I will be happy to increase my score.
格式问题
N/A
W1: There are a number of recent methods that can erase 100 concepts … I am very curious about other results such as erasing 100 artistic styles and retaining other celebrities while erasing 50-100 celebrities.
We thank the reviewer for pointing out that these recent works (CPE, GLoCE, and Six-CD) can also address multi-concept erasure. Specifically, CPE applies nonlinear residual gates to suppress target concepts; GLoCE introduces low-rank modules with simple gating for concept suppression; Six-CD aggregates multiple strategies for large-scale erasure evaluation.
Compared to these methods, SPEED introduces a null-space constrained method to directly calculate the model parameter update in a closed-form solution. This formulation allows erasing 100 concepts in 5 seconds with two advantages: (1) significantly higher efficiency (e.g., thousands of speedup) derived from the closed-form solution, and (2) applications in white-box scenarios where external modules can be bypassed. Instead, our method directly modifies model parameters. We will include more discussions of these recent methods in the revised paper.
To further support our scalability, we compare 100 artistic styles and 100 celebrities erasure with CPE and GLoCE (we exclude Six-CD as it only releases a benchmark). It can be seen that our method achieves comparable overall multi-concept erasure and preservation performance with these baselines. While our overall performance is slightly lower than CPE, our method offers two key advantages: time efficiency (nearly 10,000× faster) and white-box application (direct parameter editing without reliance on external modules).
| Erase 100 Artistic Styles | CS ↓ | CS ↑ | CS ↑ | COCO CS ↑ | COCO FID ↓ | Time (s) ↓ | White-box Attack |
|---|---|---|---|---|---|---|---|
| SD v1.4 | 26.34 | 25.93 | - | 26.53 | - | - | - |
| CPE | 17.17 | 25.54 | 8.38 | 26.47 | 48.84 | 500 * 100 (×10000) | ✖ |
| GLoCE | 19.30 | 22.56 | 3.26 | 26.01 | 52.42 | 115 * 100 (×2300) | ✖ |
| MACE | 17.67 | 24.67 | 7.00 | 23.44 | 56.36 | 1736 (×350) | ✔ |
| Ours | 17.88 | 25.68 | 7.80 | 26.34 | 44.38 | 5 (× 1) | ✔ |
| Erase 100 Celebrities | ↓ | ↑ | ↑ | COCO CS ↑ | COCO FID ↓ | Time (s) ↓ | White-box Attack |
|---|---|---|---|---|---|---|---|
| SD v1.4 | 90.18 | 89.66 | 17.70 | 26.53 | - | - | - |
| CPE | 0.48 | 85.34 | 91.88 | 26.40 | 48.43 | 500 * 100 (×10000) | ✖ |
| GLoCE | 2.61 | 75.35 | 84.97 | 25.68 | 50.21 | 85 * 100 (×1700) | ✖ |
| MACE | 4.80 | 80.20 | 87.06 | 24.80 | 50.41 | 1736 (×350) | ✔ |
| Ours | 5.87 | 85.54 | 89.63 | 26.22 | 44.97 | 5.0 (× 1) | ✔ |
W2: The proposed method exploits the null space / editing space via SVD … How was this work using SVD differently and using the null space differently?
Thanks for your comment. Our method uses null space with SVD to achieve prior-preserved concept erasure in an editing-based manner. To mitigate the optimization difficulty in null-space constrained editing, our main contribution lies in the proposed Prior Knowledge Refinement strategy with three complementary modules (IPF, DPA, and IEC) to construct an accurate and reliable null space. Our differences with the mentioned methods are discussed respectively as follows:
-
GLoCE is inspired by LEACE, exploring SVD-based subspace to remove concept-specific components from diffusion models. While both GLoCE and our method focus on concept erasure in diffusion models, the motivation and implementation are significantly different:
- Motivation: GLoCE uses SVD to enable efficient erasure, addressing the overhead of full-rank LEACE projections. Specifically, it performs SVD on target and mapping embeddings to extract their principal components. Then, these principal components are used to compute an optimized low-rank LEACE projections for efficient concept erasure. In contrast, our method focuses on prior preservation. We perfor SVD on retain embeddings and extract the null space components to construct a subspace orthogonal to the retain concepts, which ensures that all edits lie in directions orthogonal to the retain concepts
- Implementation: GLoCE performs SVD on the covariance of image embeddings, which requires image sampling and forward pass through the diffusion generator network for each concept. In contrast, our method performs SVD on text embedding tokens with only text inputs. This design avoids additional image sampling and only requires text embeddings for each concept within seconds, making it significantly more efficient than repeatedly forward-passing diffusion generators.
Despite these differences, we think these two perspectives are complementary in practice: GLoCE enhances the concept erasure precision, while our method enables scalable, precise, and efficient editing with prior preservation over the retain set. We believe integrating insights from both directions could inspire future works.
-
RLACE, similar to GLoCE, uses SVD to identify the concept subspace to be erased and removes it via orthogonal projection, while our method uses SVD to define the null space of non-target concepts and constrains updates to lie entirely within it.
-
UNSC leverages null-space projection to map the training gradients onto the null space of the retain subspaces, thereby mitigating the over-unlearning problem in continual learning. In contrast, our method projects the parameter update onto the null space of the remaining concepts in a closed-form solution, without requiring any training. Moreover, we introduce three complementary modules to refine a more accurate and reliable null space, leading to significant improvements in prior preservation.
W3: It seems that the proposed method focuses on remaining concepts while not much focusing on erasing concepts and thus the proposed method does not seem to achieve SOTA performance in erasing.
Thank you for your comment. We would like to clarify that our method is not focused on preserving non-target concepts at the expense of erasure efficacy. Rather, it is designed to jointly optimize both objectives, addressing the well-known trade-off between erasure and preservation, particularly in multi-concept settings. In our work, we address this fundamental joint challenge by introducing a null-space constrained formulation combined with adaptive refinement of the retain set, explicitly emphasizing both erasure and preservation.
This is supported by our experimental results in Table 1 & 2, where we achieve SOTA prior preservation as well as competitive erasure performance. Though our CS of erasing concepts is not the lowest among all methods, it is already sufficiently low to indicate successful erasure. As shown in Fig. 4 & 5, the target concepts are clearly removed. This is further supported by Fig. 7 & 8 in the Appendix, where different visually successful erasures can result in a wide range of CS scores. For example, erasing Snoopy into a generic dog (ours) and meaningless concepts (RECE) yields only a 6-point drop and exceeding 10-point drop in CS, respectively, despite the Snoopy concept both being effectively erased in both cases.
W4: This work seems quite inefficient for NudeNet detection on I2P, which is one of the most important applications that this work also claimed.
We agree that NSFW concepts like nudity are important in concept erasure. Recent works (e.g., CPE, AdvUnlearn, and RACE) have shown strong results on this task, largely due to the use of adversarial training. In contrast, our method mainly focuses on scalability, precision, and efficiency of concept erasure without introducing additional nudity-specific training mechanisms. As shown in Table 8, our method has already achieved better performance against non-adversarial-training methods (e.g., MACE and UCE) for NudeNet detection on I2P.
To make a fair comparison with adversarial training-based methods, we extend SPEED with adversarial training/edting following RECE as discussed in Appendix E and conduct the below comparison:
| I2P Total ↓ | COCO CS ↑ | COCO FID ↓ | Time (s) ↓ | White-Box Attack | |
|---|---|---|---|---|---|
| SD v1.4 | 576 | 26.53 | - | - | |
| CPE | 38 | 26.32 | 48.23 | 2000 (×555) | ✖ |
| AdvUnlearn | 23 | 24.05 | 57.22 | 15860 (×4400) | ✔ |
| RACE | 134 | 25.54 | 42.73 | 2910 (×800) | ✔ |
| Receler | 76 | 25.93 | 40.29 | 5560 (×1500) | ✖ |
| Erasing-Adversarial-Preservation | 65 | 25.32 | 58.28 | 15864 (×4400) | ✔ |
| Ours w/o AT | 113 | 26.29 | 37.82 | 3.6 (×1) | ✔ |
| Ours w/ AT | 55 | 26.03 | 39.51 | 4.5 (×1.25) | ✔ |
With adversarial training/editing, our method exhibits significant improvement in I2P Total from 113 to 55, only inferior to CPE and AdvUnlearn. However, our method can defend white-box attack and achieve high efficiency by taking only 4.5s compared to them. Due to the limited rebuttal period, we provide this as preliminary evidence of our potential in adversarial settings, and we will include a more comprehensive study in the revised version.
W5: It is important to ensure that the concept erasing method is robust against adversarial attacks.
Thanks for your advice. Due to the character limit, please refer to our response to Reviewer WTi1, comment W4.
Dear Reviewer jEfu:
Thanks for your constructive comments. We would like to follow up to see if our response addresses your concerns or if you have any further questions. Thanks for your attention and best regards.
Faithfully,
Authors of Paper 14342
Dear Reviewer jEfu,
Thank you again for your valuable efforts and constructive advice in reviewing our paper. As the discussion period nears its end, we look forward to your feedback on our responses. We have made every effort to address all your concerns and are happy to clarify any points or discuss any remaining questions.
Best regards,
Authors of Paper 14342
Dear Area Chairs and Reviewers,
We are sincerely grateful to all the reviewers for their insightful and constructive feedback on our manuscript. The discussion period has been exceptionally productive, allowing us to clarify critical aspects of our work and further validate its strengths through supplementary experiments. We are pleased that our responses have earned the four out of five reviewers' positive feedback, culminating in a consensus towards acceptance.
I. Acknowledged Methodological Contributions and Strengths
The reviewers consistently highlighted several key strengths and contributions of our SPEED method:
- Methodological Novelty: Our proposed framework was praised for its novelty and technical depth. Reviewers
zWMb,WTi1,1DqN, andRrzccommended our proposed three core modules, Influence-based Prior Filtering (IPF), Directed Prior Augmentation (DPA), and Invariant Equality Constraints (IEC), as "novel and reasonable" (zWMb). This framework was recognized for stemming from a "deep insight" into T2I model behavior (Rrzc) and for successfully "improving and adapting" the null-space concept for diffusion models (1DqN). Collectively, these components construct a "more accurate null space" to effectively address the core challenges of multi-concept erasure. - Exceptional Performance: The reviewers consistently highlighted the exceptional performance of SPEED.
- Efficiency & Scalability: Our method was praised for its "efficient computation" (
jEfu), with the ability to erase 100 concepts in just 5 seconds (zWMb). This represents a speedup of "thousands of times" compared to existing methods and was recognized as making it "much faster" (1DqN), demonstrating its significant potential as a scalable solution for real-world applications. - Erasure & Preservation: Reviewers acknowledged that our method achieves "better prior preservation in challenging multi-concept erasure scenarios" (
zWMb) and delivers "good experimental results" across various erasure tasks (Rrzc).
- Efficiency & Scalability: Our method was praised for its "efficient computation" (
- Empirical Rigor and Clarity: Our work was commended for its "comprehensive experiments" (
zWMb) and "several ablation studies" (Rrzc), which clearly demonstrated that "each introduced technique boosts performance." Furthermore, multiple reviewers described the paper as "well-written" and "easy to follow" (jEfu,1DqN,WTi1), ensuring the accurate communication of our technical contributions.
II. Addressed Concerns and Paper Enhancements
During the discussion, we addressed key questions raised by the reviewers through in-depth responses and additional work, which significantly enhanced the completeness of our paper:
- Comprehensive Comparison with More Methods: As suggested by
jEfu,Rrzc, and1DqN, we incorporated a comprehensive comparison with more recent more methods (e.g., CPE, GLOCE, Receler, RACE, and AdaVD), including both experimental results and a categorized discussion. This clarifies SPEED's unique advantages in erasure performance, efficiency, and white-box applicability. - Validation of Adversarial Robustness: To address the concerns of
jEfu,WTi1, and1DqNregarding robustness, we conducted new experiments on adversarial attacks. The results demonstrate that SPEED, when combined with adversarial training (AT), effectively defends against both black-box and white-box attacks, achieving robustness comparable to SOTA methods while maintaining a significant efficiency advantage. - Methodological Details and Hyperparameter Analysis: We provided detailed ablation studies in response to questions from
WTi1,zWMb,Rrzc, and1DqNon hyperparameters, such as the filtering threshold in IPF and the noise variance in DPA, quantifying their impact on performance. We also added algorithm pseudocode and key equation derivations to strengthen the paper's technical rigor.
III. Outcome of the Discussion
We are delighted that our detailed responses and supplementary work were positively received. Reviewers 1DqN and Rrzc explicitly raised their scores after confirming that their concerns had been fully addressed. Reviewers WTi1 and zWMb expressed satisfaction with our rebuttals and maintained their positive scores, leading to a strong and positive consensus. While we did not receive further feedback from Reviewer jEfu, we were eager to discuss, and we believe our responses have resolved the raised concerns. Overall, the discussion phase has resulted in a strong and positive consensus on our work.
We commit to integrating all promised revisions, new experiments, and analyses into the final version of the paper. We are confident that, thanks to this rigorous peer-review process, the quality and impact of our work have been substantially improved.
Sincerely,
Authors of Paper 14342
(a) Summarize the scientific claims and findings of the paper based on your own reading and characterizations from the reviewers.
The paper proposes SPEED, a method for concept erasure in text-to-image diffusion models, claiming it to be scalable, precise, and efficient. The core idea is to perform model editing within a null space, a parameter space where updates do not affect non-target concepts, thus preserving prior knowledge. To construct an accurate null space, the authors introduce three complementary techniques: Influence-based Prior Filtering (IPF), Directed Prior Augmentation (DPA), and Invariant Equality Constraints (IEC). The primary finding is the method's efficiency, reportedly erasing 100 concepts within 5 seconds, a significant speed-up over existing methods, while maintaining a strong balance between erasure and preservation.
(b) What are the strengths of the paper?
The paper's primary strength lies in its efficiency and scalability. The ability to erase 100 concepts in just 5 seconds is a notable achievement and was consistently praised by reviewers. The proposed framework, combining IPF, DPA, and IEC, is well-motivated and demonstrates strong empirical performance in preserving non-target concepts. The paper is well-written, clearly structured, and easy to follow.
(c) What are the weaknesses of the paper?
The most significant weakness, as pointed out by multiple reviewers, is the limited conceptual novelty of the core approach. The use of null-space projection to preserve knowledge is an established technique in other fields like continual learning and NLP, making the contribution more incremental than foundational. Furthermore, the proposed method is a complex combination of multiple heuristic components. While effective, this complexity may obscure the fundamental principles at play and presents more of an engineering solution rather than a new scientific insight. The initial submission was also missing comparisons to several recent and relevant works and lacked evaluation against adversarial attacks, though these were later addressed in the rebuttal.
(d) Provide the most important reasons for your decision to reject.
The decision to reject this paper is primarily based on concerns about its limited conceptual novelty. While the empirical results are strong and the engineering is solid, the core idea of applying null-space constraints for knowledge preservation is borrowed from other domains. The paper's contribution lies in adapting and refining this existing concept for diffusion models via a complex suite of heuristics (IPF, DPA, IEC). Although this adaptation is non-trivial and effective, it does not represent a fundamental advance in the theory or understanding of concept erasure. Several reviewers, even those leaning towards acceptance, noted that the paper might not meet the high standard for novelty at a top-tier venue. Therefore, despite its efficiency, the work is considered an incremental contribution rather than a groundbreaking one, which is the main reason for rejection.
(e) Summarize the discussion and changes during the rebuttal period.
The rebuttal period was productive. Reviewers raised several key points, including: the lack of comparisons with more recent methods ; the need for adversarial robustness evaluation ; and requests for more detailed hyperparameter ablations. The authors provided a thorough rebuttal, addressing nearly all empirical concerns. They added extensive new experiments comparing their work with SOTA methods, demonstrated robustness by combining SPEED with adversarial training, and provided detailed ablation studies on key hyperparameters. These additions significantly strengthened the paper's empirical validation, and as a result, four out of five reviewers confirmed/raised their positive scores.
However, the rebuttal did not resolve the fundamental concern about the work's conceptual novelty. While the empirical weaknesses were well-addressed, the characterization of the work as an incremental adaptation of an existing idea remains. In the final decision, the thoroughness of the rebuttal was acknowledged, but the core issue of limited novelty was given more weight, as this is a primary criterion for acceptance at this venue. The consensus among reviewers that the paper could be rejected if the bar for novelty is high was a decisive factor.