DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing
We introduce DiffVax, an end-to-end framework for training an "immunizer model" that learns how to generate imperceptible perturbations to immunize target images against diffusion-based editing
摘要
评审与讨论
This paper proposes DiffVax, an optimization-free and scalable defense framework against diffusion-based image editing. Unlike prior methods that require per-image optimization, DiffVax introduces a fast, generalizable perturbation strategy that reduces immunization time. The method prevents editing across images and videos and supports various diffusion-based tools.
优缺点分析
Strengths:
The paper addresses a valuable and timely problem for the community—defending against diffusion-based image editing. The proposed method, DiffVax, represents a significant conceptual improvement by eliminating the need for per-image optimization, resulting in a substantial gain in computational efficiency. The experimental evaluation is thorough, with comparisons across multiple baselines and metrics. Although I am not an expert in image immunization, the paper clearly situates itself within the existing literature and provides detailed comparisons to prior methods, making the contributions easy to follow.
Weaknesses:
The evaluation of editing success is limited to only two methods: inpainting and InstructPix2Pix. However, inpainting is a basic editing technique, and InstructPix2Pix is a training-based method whose editing effectiveness depends heavily on the pretraining dataset. To convincingly claim robustness against image editing, the defense should be tested against a broader range of training-free editing methods, such as [1][2]. Even among training-based methods, InstructPix2Pix is relatively outdated. Many stronger editing approaches have been developed in recent years, such as [3][4][5], and the authors should reconsider the evaluation setup to include more recent and representative editing baselines.
Reference
[1] Hertz, Amir, et al. "Prompt-to-prompt image editing with cross attention control." arXiv preprint arXiv:2208.01626 (2022).
[2] Cao, Mingdeng, et al. "Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing." Proceedings of the IEEE/CVF international conference on computer vision. 2023.
[3] Mokady, Ron, et al. "Null-text inversion for editing real images using guided diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023.
[4] Zhao, Haozhe, et al. "Ultraedit: Instruction-based fine-grained image editing at scale." Advances in Neural Information Processing Systems 37 (2024): 3058-3093.
[5] Wasserman, Navve, et al. "Paint by inpaint: Learning to add image objects by removing them first." arXiv preprint arXiv:2404.18212 (2024).
问题
-
Does the proposed method come with any theoretical analysis or justification?
-
Can the authors evaluate their defense against a broader range of diffusion-based image editing methods?
局限性
Yes
最终评判理由
I think the authors have adequately addressed my concerns during the rebuttal phase. I raise my score and recommend acceptance.
格式问题
No.
We thank the reviewer for their valuable feedback and for recognizing the strengths of our work, including addressing a "valuable and timely problem," representing a "significant conceptual improvement," and providing a "thorough experimental evaluation." We appreciate the opportunity to address the questions and expand on our evaluation.
The evaluation of editing success is limited to only two methods: inpainting and InstructPix2Pix. However, inpainting is a basic editing technique, and InstructPix2Pix is a training-based method whose editing effectiveness depends heavily on the pretraining dataset. To convincingly claim robustness against image editing, the defense should be tested against a broader range of training-free editing methods, such as [1][2]. ... Even among training-based methods, InstructPix2Pix is relatively outdated. Many stronger editing approaches have been developed in recent years, such as [3][4][5], and the authors should reconsider the evaluation setup to include more recent and representative editing baselines. Can the authors evaluate their defense against a broader range of diffusion-based image editing methods?
We agree that the landscape of editing models is vast and rapidly evolving. Our choice of evaluation models was guided by the established benchmarks in the image immunization literature. As noted in our paper, our primary evaluation using inpainting follows the exact setup of prior state-of-the-art methods like PhotoGuard and DiffusionGuard, ensuring a fair and direct comparison. Furthermore, we deliberately included InstructPix2Pix to demonstrate that our framework generalizes beyond this standard inpainting benchmark to a fundamentally different, instruction-based editing paradigm.
To contextualize our evaluation scope, the following table compares the number of/names of editing tools used in prior works:
| Editing Models Used | |
|---|---|
| DiffVax (Ours) | SD Inpainting, IP2P |
| DiffusionGuard (ICLR 2025) | SD Inpainting, IP2P |
| PhotoGuard (ICML 2023) | SD Inpainting, SDEdit |
| SDS (ICLR 2024) | SDEdit, SD Inpainting, Textual inversion |
| Mist (ICML 2023) | Textual inversion, Dreambooth |
| AdvDM (ICML 2023) | Textual inversion, SDEdit |
As the table demonstrates, the scope of our evaluation is aligned with the current best practices in the field. However, we agree with the reviewer that testing against modern, training-free methods would significantly strengthen our claims, and we have conducted an additional experiment comparing our approach to MagicBrush.
Methods like Prompt-to-Prompt [1] and Null-text inversion [3] represent a different editing paradigm. Unlike the inpainting models used in our primary evaluation, these techniques operate on the standard Stable Diffusion model with different formulations and are therefore not directly compatible with the experimental setup used for our method and the baselines. Adapting our framework to protect against these models is a research challenge that we believe is an exciting direction for future work. To directly address the reviewer’s suggestion, we evaluated DiffVax on a recent editing model, MagicBrush [6] . Our preliminary results show that our learned perturbations remain effective at disrupting the edits, demonstrating that the protection generalizes beyond standard inpainting models. We plan to put full evaluations to the final version.
MagicBrush Comparison
| MagicBrush | SSIM | PSNR | FSIM | CLIP-T | SSIM (Noise) |
|---|---|---|---|---|---|
| PhotoGuard | 0.682 | 18.81 | 0.546 | 25.64 | 0.967 |
| DiffVax | 0.635 | 18.41 | 0.529 | 22.18 | 0.965 |
Does the proposed method come with any theoretical analysis or justification?
While the paper is primarily empirical, it provides a strong conceptual justification for its design, rooted in its end-to-end training process. A key property is the framework's ability to backpropagate a loss signal through the entire editing process. The training framework, depicted in Figure 3 of the paper, calculates a final loss function, , after an immunized image has been passed through the Stable Diffusion editing model. By training end-to-end, the immunizer model develops a holistic understanding of the complete editing pipeline. This allows the model to learn a globally informed strategy for disruption, avoiding the local minima that can limit the effectiveness of per-example optimization methods like PhotoGuard.
This global strategy manifests as the model learning to strategically place perturbations in an image's low-frequency components. As detailed in Appendix A.3, this is fundamentally different from the scattered, high-frequency noise introduced by optimization-based methods like PhotoGuard. This low-frequency approach is doubly advantageous: it enhances perceptual quality by creating smoother, more subtle noise, and it provides inherent robustness against counter-attacks like JPEG compression and denoising, which are specifically designed to suppress high-frequency information.
Furthermore, this learned behavior is a direct and tunable consequence of the training scheme. Appendix A.3 shows that the choice of norm in the loss function directly impacts the trade-off between imperceptibility and robustness. While the default norm offers a strong balance, training with or norms produces even less perceptible noise but at the cost of reduced edit resistance. This demonstrates that the model's ability to strategically leverage low-frequency perturbations is a core, justifiable element of the methodology, controlled directly by its training objective.
[6] MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Thank you very much for the detailed rebuttal. I appreciate the authors' efforts in clarifying the evaluation setup and providing additional experiments, especially the inclusion of results on MagicBrush, which help to strengthen the generalization claims.
I also acknowledge the authors' point that many training-free editing methods—such as Prompt-to-Prompt and Null-text inversion—operate under different formulations of diffusion, making direct integration into the current evaluation framework nontrivial. I still encourage the authors to include a broader discussion or additional evaluation of modern editing techniques in the final version of the paper, as this would further contextualize and support the defense claims.
I will take the authors' clarifications and new results into account in my final evaluation. Thank you again for the thoughtful response.
We are grateful to the reviewer for their thoughtful consideration of our rebuttal and their valuable suggestions. We are happy to clarify any remaining points and will be sure to incorporate a more comprehensive discussion of contemporary editing methods in the final manuscript, as recommended.
This paper introduces DiffVax, an optimization-free framework for immunizing images against diffusion-based editing attacks. DiffVax trains an immunizer model to generate imperceptible adversarial noise with a single forward pass. The approach generalizes to previously unseen content and supports both inpainting and instruction-driven editing models. Empirical results demonstrate that DiffVax achieves competitive editing disruption and robustness to counterattacks. The framework also extends to video immunization.
优缺点分析
- This paper is well-written and well-organized. The topic is important and practical.
- DiffVax is evaluated over both image and video scenarios, demonstrating its broader impact.
问题
- DiffVax currently requires retraining for each diffusion editing tool. The immunizer does not transfer directly between different editing models (e.g., between inpainting and instruction-driven tools), which is a significant limitation for general deployment. While this is acknowledged, its practical impact is not deeply explored. In fact, the reason that DiffVax is optimization-free is because the editing tool information has already been trained and stored in the immunizer networks. From this perspective, DiffVax doesn't demonstrate better generalization and applicability than prior optimization-based work.
- The claims about imperceptibility rely largely on SSIM metrics and a short description of the user study. The exact structure and depth of the user study are not detailed. A more common imperceptibility measurement is L-p norm, such as 8/255, 16/255.
- The method is largely an application of adversarial noise training/regulation to diffusion-image editing rather than a fundamentally new adversarial paradigm. While the scalability and specific application to editing via a learned immunization network are advances, the core methodology can be viewed as an adaptation of established adversarial machine learning concepts.
局限性
see above.
格式问题
None
We thank the reviewer for their thoughtful feedback and for recognizing that our paper is "well-written and well-organized" and addresses an "important and practical" topic. Below, we clarify the raised points.
DiffVax currently requires retraining for each diffusion editing tool. The immunizer does not transfer directly between different editing models (e.g., between inpainting and instruction-driven tools), which is a significant limitation for general deployment. While this is acknowledged, its practical impact is not deeply explored. In fact, the reason that DiffVax is optimization-free is because the editing tool information has already been trained and stored in the immunizer networks. From this perspective, DiffVax doesn't demonstrate better generalization and applicability than prior optimization-based work.
We agree that a universal immunizer that works zero-shot across all editing model architectures (e.g., inpainting, InstructPix2Pix) is a challenging and important open problem. Our work takes a critical first step by creating a highly efficient framework for model-specific immunization.
Regarding the statement “DiffVax doesn't demonstrate better generalization and applicability than prior optimization-based work.” we want to clarify that there are three distinct types of generalization: to unseen models, to unseen content, and to unseen masks, and DiffVax is superior to existing optimization-based methods like PhotoGuard in all three, as detailed below.
- Generalization to Unseen Models: This is the type of generalization the reviewer is referring to in pointing out that DiffVax must be trained separately for each model and type of attack. We want to stress that there are no existing immunization methods which achieve this type of generalization, and methods like PhotoGuard also target a single specific model. While generalization to unseen models is not our primary focus, DiffVax demonstrates superior generalization performance in comparison to prior work. As shown in Appendix A.4, Figure 14, DiffVax immunizer trained on Stable Diffusion v1.5 successfully transfers its protective effect to an unseen model, Stable Diffusion v2. In stark contrast, PhotoGuard's perturbations fail completely when transferred to SD v2, rendering the image unprotected. This provides direct evidence that our learned immunization strategy is more robust and generalizable across model versions than the optimization-based approach.
In order to shed more light on this issue we conducted an additional experiment to quantify the improvement in generalization achieved by DiffVax. The table below shows quantitatively that our method achieves superior performance when immunization noise is generated for SD v1.5 and tested on unseen SD v2.
| SD 2.0 | SSIM | PSNR | FSIM | CLIP-T |
|---|---|---|---|---|
| PG-D | 0.566 | 15.17 | 0.417 | 32.00 |
| DiffGuard | 0.609 | 15.26 | 0.454 | 31.73 |
| DiffVax | 0.540 | 14.02 | 0.384 | 27.72 |
-
Generalization to Unseen Content: Optimization-based methods can be said to automatically “generalize” to unseen images, at the expense of a costly per-image optimization process. This paper addresses the question of whether it is possible to learn a feedforward model capable of producing an effective perturbation directly without optimization. This has practical consequences, because a feedforward approach is much more efficient computationally, but it also has scientific consequences, because success implies that the set of perturbations across all possible images has sufficient structure and regularity to be learnable. Our experiments demonstrate that DiffVax is capable of generalizing to unseen images, unseen prompts, and even unseen videos with a single forward pass, thereby establishing for the first time the learnability of the perturbation set.
-
Generalization to Unseen Masks during Test Time: Our work already demonstrates robustness to mismatched masks, a key aspect of real-world applicability. As detailed in our supplementary material (Appendix A.5), we show that DiffVax is uniquely robust to this scenario. The qualitative results in Figure 15 of the appendix demonstrate this clearly.
Therefore, DiffVax demonstrates significantly better generalization to unseen content, unseen models, and superior robustness to unseen editing masks, all while being orders of magnitude more efficient (~70ms vs. >10 minutes per image), making it far more practical for deployment.
The claims about imperceptibility rely largely on SSIM metrics and a short description of the user study. The exact structure and depth of the user study are not detailed. A more common imperceptibility measurement is L-p norm, such as 8/255, 16/255.
We have run experiments comparing DiffVax's average learned perturbation against baselines with fixed 16/255, 32/255, 64/255 budgets, and we performed evaluation based on the mean magnitude (L1) of the immunization noise (perturbation). The results clearly show that DiffVax achieves superior edit disruption with a much smaller mean magnitude (L1) perturbation than baselines given a larger budget, highlighting that its strength lies in the strategic placement of noise, not simply its magnitude.
| SSIM | PSNR | FSIM | CLIP-T | SSIM (Noise) | Mean Magnitude (L1) of Immunization Noise | |
|---|---|---|---|---|---|---|
| PG-D () | 0.492 | 14.13 | 0.355 | 27.85 | 0.947 | 0.007 |
| DiffGuard () | 0.507 | 13.98 | 0.360 | 24.83 | 0.900 | 0.012 |
| PG-D () | 0.502 | 14.23 | 0.360 | 29.18 | 0.950 | 0.006 |
| DiffGuard () | 0.526 | 14.30 | 0.373 | 26.13 | 0.927 | 0.009 |
| PG-D () | 0.528 | 14.60 | 0.387 | 30.27 | 0.978 | 0.003 |
| DiffGuard () | 0.546 | 14.46 | 0.388 | 26.36 | 0.965 | 0.005 |
| DiffVax | 0.496 | 13.85 | 0.352 | 22.96 | 0.989 | 0.001 |
This supports our claim that DiffVax learns a more efficient and targeted noise distribution rather than applying uniform, high-energy noise. Unlike methods that enforce a rigid and uniform L-p budget, DiffVax implicitly learns the perturbation's properties via the trade-off in our loss function, +. This allows the model to strategically allocate its "budget," applying stronger noise only where most effective and least visible.
On the User Study: We provide a comprehensive description in Appendix A.10. The study involved 67 participants recruited via Prolific. Participants were shown 20 different image-prompt sets and asked to rank five edited images (unprotected, plus four immunization methods) from "least aligned to most aligned" with the prompt's intent. A lower rank indicates more successful immunization. Figure 17 in the appendix provides a screenshot of the exact instructions given to participants. The results in Table 7 show DiffVax was ranked best by a significant margin (average rank 1.64), confirming its perceptual effectiveness.
We will incorporate the new tables and more clearly reference the user study appendix in the main paper.
The method is largely an application of adversarial noise training/regulation to diffusion-image editing rather than a fundamentally new adversarial paradigm. While the scalability and specific application to editing via a learned immunization network are advances, the core methodology can be viewed as an adaptation of established adversarial machine learning concepts.
We agree that our work builds upon foundational concepts in adversarial machine learning. Our novelty lies in successfully adapting this paradigm to solve the unique and significant challenges posed by diffusion-based editing, which differ substantially from traditional classification tasks.
The key challenge we address is that diffusion-based editing is not a single feed-forward process but a multi-step, iterative denoising pipeline. A major contribution of our work is designing a framework that can backpropagate a loss signal through this entire black-box editing process. This end-to-end training allows our immunizer model to develop a holistic understanding of the full editing pipeline.
This leads to our second key advance: the model doesn't just add noise, it learns a globally optimal and strategic perturbation strategy. As discussed in Appendix A.3, our method learns to place perturbations in low-frequency image components. This is fundamentally different from the high-frequency, scattered noise generated by optimization methods and makes our defense inherently more robust against counter-attacks like JPEG compression and denoising, which are designed to remove high-frequency information.
Therefore, our contribution is the novel adaptation of these concepts to the diffusion domain, addressing its specific challenges to create the first scalable, fast, and robustly learned immunization network.
Thank you for the detailed response and for providing additional experiments. However, my primary concern remains unaddressed:
-
The crux of my concern is that transferability to unseen editing models is essential for this task, far more so than content or mask transferability. In real-world scenarios, the defender will never know in advance which editing model an attacker may apply. Thus, cross-model robustness is critical for practical deployment. While you emphasize "optimization-free" and repeatedly reference transferability, the presented experiments still focus on intra-family transfer (e.g., SD v1.5 → SD v2) rather than cross-architecture transfer. The current results do not convincingly demonstrate the claimed generalization capability.
-
You note that PhotoGuard does not support cross-model transfer, but it is important to recognize that PhotoGuard was proposed in early 2023 as an early-stage work in this domain. Since then, transferability, especially cross-model adversarial robustness, has become a standard evaluation target in diffusion model adversarial research (see, e.g., [1]). Therefore, simply noting that PhotoGuard lacks this property does not address the expectation that a modern method like DiffVax should aim for and demonstrate it.
-
While I appreciate the content and mask generalization results, they are secondary in importance compared to cross-model transferability. In most practical scenarios, the user is fully aware of their own content but not of the attacker’s editing model. This reinforces why model-level generalization should be the primary focus.
Reference: [1] Chen, Jianqi, et al. "Diffusion models for imperceptible and transferable adversarial attack." IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
We thank the reviewer for their detailed feedback. We agree that universal, cross-model transferability is a critical goal for the field. However, we want to clarify the place of our work in the relevant literature.
1. Context of Prior Work on Transferability:
We agree with the reviewer on the importance of robust cross-architecture transferability. However, we would like to note that this capability is beyond the current state of the art. Our paper's main contribution addresses the efficiency and scalability bottleneck of prior methods, and we have acknowledged the transferability of immunization noise as a limitation for future work in our paper. To our knowledge, no existing immunization method for preventing diffusion-based content editing has yet achieved this level of universal transferability.
While all baselines [PhotoGuard [2] (ICML 2023), DAYN [3] (CVPR 2024 Highlight, no code available), DiffusionGuard [4] (ICLR 2025)] are model-specific, only DiffusionGuard provides a limited transferability discussion through an experiment where a model from SD v1 series is used and transferred to SD v2. Within this established evaluation context, our quantitative results provided in rebuttal show that DiffVax outperforms DiffusionGuard and PhotoGuard, in transferring from SD v1.5 to the unseen SD v2 model.
Moreover, as discussed in DiffusionGuard (Section 4.6 in their paper), since the SD v2 family was trained from scratch and does not build on the SD v1 family, it represents a different model configuration, making this a test of black-box transfer. Therefore, evaluating transfer between them constitutes a black-box transfer experiment, demonstrating a more meaningful form of generalization than simple intra-family transfer.
2. The Referenced Paper [1] Addresses a Different Task:
The referenced work [1] (Chen et al.) addresses adversarial attacks and defenses in the context of image classification—specifically, using diffusion models to generate adversarial examples that fool classifiers. In contrast, our work (DiffVax) proposes an immunizer model specifically targeting diffusion-based image editing models, with the goal of immunizing images against generative content manipulation. These are fundamentally different tasks: the former concerns classification robustness, while the latter concerns the integrity of generative editing pipelines. As such, the evaluation protocols and transferability expectations established in the classification domain do not directly translate to the generative editing setting. Therefore, we believe our work is not directly comparable to Chen et al. due to the fundamental differences in our research objectives.
In summary, while we acknowledge universal transferability as a crucial future direction that is yet solved, our work makes foundational and novel contributions in efficiency, scalability, and robustness to unseen content and masks—all of which are critical for real-world purposes and we are superior to prior methods. We kindly ask the reviewer to reconsider their assessment in the light of our clarifications towards the place of our work in the current state-of-the-art.
[1] Chen, Jianqi, et al. "Diffusion models for imperceptible and transferable adversarial attack." IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
[2] Salman, H., Khaddaj, A., Leclerc, G., Ilyas, A., & Madry, A. (2023, July). Raising the Cost of Malicious AI-Powered Image Editing. In International Conference on Machine Learning (pp. 29894-29918). PMLR.
[3] Lo, L., Yeo, C. Y., Shuai, H. H., & Cheng, W. H. (2024). Distraction is all you need: Memory-efficient image immunization against diffusion-based image editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 24462-24471).
[4] Choi, J. S., Lee, K., Jeong, J., Xie, S., Shin, J., & Lee, K. DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing. In The Thirteenth International Conference on Learning Representations., 2025
This paper proposes DiffVax, a lightweight, and optimization-free image immunization framework designed to protect images and videos from malicious editing based on a diffusion model. Unlike existing methods that require time-consuming per-image optimization, DiffVax trains an immunizer model that can generate imperceptible perturbations in milliseconds with a single forward propagation. These perturbations can effectively disrupt the editing attempts of the diffusion model. Experiments verified that DiffVax outperforms the previous baselines on this task.
优缺点分析
Strengths:
-
The processing time is quite small, as the method only requires a single forward pass to generate the perturbations.
-
It extends protection beyond images to video.
-
Even if the picture is compressed in JPEG or denoised, the protection effect can still be maintained.
Weaknesses:
-
The paper defines the final loss as L=α⋅L{noise}+L{edit}, but does not discuss the value of α used during training. Should add experiments to analyze how different α values affect the trade-off between imperceptibility and protection effectiveness.
-
The method is evaluated against a specific set of diffusion models (SD 1.5 and SDXL) with standard sampling strategies. It would strengthen the work to discuss and test against different samplers, sample steps and more recent inpainting models, such as MagicBrush [1].
-
Some related works are missing [2][3].
-
The dataset is relatively small, with only 875 human images from the CCP.
[1] MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
[2] MACE: Mass Concept Erasure in Diffusion Models
[3] Separable Multi-Concept Erasure from Diffusion Models
问题
Could you please comment on how does this work compare with prior works [2][3]?
局限性
yes
格式问题
N/A
We thank the reviewer for the valuable feedback and for recognizing key strengths of our work, including its fast processing time, video protection extension, and robustness to JPEG compression and denoising. We address the raised questions below.
The paper defines the final loss as , but does not discuss the value of α used during training. Should add experiments to analyze how different α values affect the trade-off between imperceptibility and protection effectiveness.
We agree that this is an important point, and in fact, we have already included a detailed analysis of values in the supplementary material (Section A.9, Loss Weight Selection). Here we provide a brief summary of that analysis:
- Table 6 reports the results for values of 2, 4, and 6, showing the trade-off between edit disruption (measured by SSIM and PSNR) and noise imperceptibility (measured by SSIM (Noise)).
- We concluded that provides the best balance, as it achieves strong edit disruption while the perturbations remain visually imperceptible. The small gain in imperceptibility when increasing to 6 was outweighed by a more pronounced drop in edit resistance.
- The chosen value of is also stated in our implementation details in Section A.1.
The method is evaluated against a specific set of diffusion models (SD 1.5 and SDXL) with standard sampling strategies. It would strengthen the work to discuss and test against different samplers, sample steps and more recent inpainting models, such as MagicBrush [1].
We would like to clarify that the scope of our model evaluation is in line with prior work in this area. Our main experiments use Stable Diffusion v1.5 for inpainting and InstructPix2Pix for instruction-based editing, and we additionally test transferability to Stable Diffusion v2. This level of coverage is comparable to contemporaneous works, which typically focus on two models. Here, we also provide a table summarizing the models tested in our work versus prior methods for clarity:
| Editing Models Used | |
|---|---|
| DiffVax (Ours) | SD Inpainting, IP2P |
| DiffusionGuard (ICLR 2025) | SD Inpainting, IP2P |
| PhotoGuard (ICML 2023) | SD Inpainting, SDEdit |
| SDS (ICLR 2024) | SDEdit, SD Inpainting, Textual inversion |
| Mist (ICML 2023) | Textual inversion, Dreambooth |
| AdvDM (ICML 2023) | Textual inversion, SDEdit |
Nevertheless, we agree that demonstrating robustness is important. To this end, we conducted new experiments to test DiffVax's performance across different samplers and numbers of inference steps, using the same pretrained Stable Diffusion v1.5 model. We put the results with MagicBrush below, and will put the full evaluation to the final version. The results below show that our learned perturbations are not sensitive to these variations.
MagicBrush Comparison
| MagicBrush | SSIM | PSNR | FSIM | CLIP-T | SSIM (Noise) |
|---|---|---|---|---|---|
| PhotoGuard | 0.682 | 18.81 | 0.546 | 25.64 | 0.967 |
| DiffVax | 0.635 | 18.41 | 0.529 | 22.18 | 0.965 |
Sampling Step Comparison (Unseen Set)
| Sampling Step | Model | SSIM | PSNR | FSIM | CLIP-T |
|---|---|---|---|---|---|
| 10 | PG-D | 0.637 | 16.79 | 0.391 | 26.54 |
| DiffGuard | 0.651 | 16.65 | 0.409 | 23.84 | |
| DiffVax | 0.627 | 16.37 | 0.366 | 22.96 | |
| 20 | PG-D | 0.564 | 15.56 | 0.379 | 28.89 |
| DiffGuard | 0.591 | 15.28 | 0.393 | 26.04 | |
| DiffVax | 0.564 | 14.96 | 0.360 | 24.42 | |
| 30 | PG-D | 0.523 | 14.92 | 0.379 | 29.27 |
| DiffGuard | 0.556 | 14.71 | 0.386 | 27.10 | |
| DiffVax | 0.526 | 14.32 | 0.362 | 24.17 | |
| 40 | PG-D | 0.507 | 14.42 | 0.377 | 29.68 |
| DiffGuard | 0.539 | 14.16 | 0.386 | 27.84 | |
| DiffVax | 0.506 | 13.78 | 0.356 | 24.06 |
Sampler Comparison (Unseen Set)
| Sampler | Model | SSIM | PSNR | FSIM | CLIP-T |
|---|---|---|---|---|---|
| PNDMScheduler | PG-D | 0.480 | 14.31 | 0.404 | 26.88 |
| DiffGuard | 0.501 | 14.52 | 0.404 | 26.97 | |
| DiffVax | 0.440 | 13.41 | 0.372 | 21.67 | |
| EulerDiscreteScheduler | PG-D | 0.504 | 14.93 | 0.399 | 28.08 |
| DiffGuard | 0.530 | 14.93 | 0.406 | 27.28 | |
| DiffVax | 0.466 | 13.91 | 0.361 | 22.00 | |
| LMSDiscreteScheduler | PG-D | 0.487 | 14.36 | 0.403 | 27.82 |
| DiffGuard | 0.509 | 14.47 | 0.405 | 27.23 | |
| DiffVax | 0.449 | 13.43 | 0.367 | 21.70 |
Some related works are missing [2][3]. Could you please comment on how does this work compare with prior works [2][3]?
Thank you for bringing these relevant papers to our attention. MACE and Separable Multi-Concept Erasure represent an important but distinct area of research, and we will add them to the related work section. The primary distinction is that these papers focus on machine unlearning to modify the models themselves, whereas our approach is a form of content generation protection that safeguards individual images without altering the models.
Here is a more detailed comparison of the two methodologies:
-
Machine Unlearning (e.g., MACE, Separable Multi-Concept Eraser): This approach focuses on altering a pre-trained diffusion model to remove its ability to generate specific, unwanted concepts.
- Goal: The main objective is to make a model "forget" how to produce harmful, copyrighted, or private content, such as specific artistic styles, explicit material, or celebrity likenesses.
- Methodology: This is a model-level modification. It involves fine-tuning the model's weights to erase the target concepts. For instance, MACE uses a fine-tuning framework to erase up to 100 concepts at once, while the Separable Multi-Concept Eraser (SepME) decouples model weights so that each weight increment corresponds to erasing a single concept.
- Application: This is a remedial action taken by model providers on models that have already learned undesirable concepts from web-scraped data.
-
Content Generation Protection (Our Approach / Image Cloaking): Our work falls into a different category of defense, which the MACE paper refers to as "image cloaking".
- Goal: The objective is to proactively protect individual images from being exploited by AI models for training or unauthorized editing.
- Methodology: This is a content-level protection. It involves adding imperceptible adversarial perturbations to an image before it is shared publicly. This "cloaked" image appears normal to humans but disrupts a diffusion model if it attempts to train on or edit it. The AI models themselves are not modified.
- Application: This defense is used by content creators to safeguard new content that has not yet been posted online, preventing future models from learning to imitate it.
In conclusion, these two lines of research are complementary rather than competing. Machine unlearning (MACE, SepME) addresses how to fix models that have already been trained on unprotected data, while our content protection approach aims to prevent images from being learned or manipulated by models in the first place.
The dataset is relatively small, with only 875 human images from the CCP.
While more data would always be better, our dataset is comparable in size to the current datasets used in related works, and is therefore aligned with the current standard of evidence in the field. To place our dataset size in the context of prior work, the closest research for training a generative adversarial noise generator is the paper "Generative Adversarial Perturbations" [4]. For their experiments on semantic segmentation, they used the focused Cityscapes dataset, which contains 2,975 training and 500 validation images. Given that this foundational work was established on a dataset of a few thousand images from a specific domain (urban scenes), we believe our dataset of 875 human images is in a comparable range for a proof-of-concept study. Nevertheless, we agree with the reviewer that extending our method to larger and more diverse datasets is a crucial next step, and we will highlight this as an important avenue for future work.
[4] Poursaeed, O., Katsman, I., Gao, B., & Belongie, S. (2018). Generative adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4422-4431).
We want to thank the reviewer again for their insightful review. We hope our response addressed your concerns. We are happy to provide any further clarification you may need.
This paper presents a method for “immunizing” images and videos against editing by a given model. Unlike prior methods, which optimize noise added to an image to disrupt editing, their method learns a model to add noise to unseen images. As a result, immunizing a given unseen image is very efficient.
优缺点分析
Strengths:
- This paper studies the important problem of preventing an image from being edited without the consent of its owner.
- The paper presents a conceptually clean approach for efficient editing of unseen examples, substantially improving on other approaches in terms of efficiency.
- The paper’s writing and methodology is generally clear and comprehensive.
Weaknesses:
- The paper would benefit with more clarification: please see the questions below.
问题
I am confused by the performance against baselines. The authors report that their method, DiffVax, outperforms the PhotoGuard baseline. This is confusing to me, because PhotoGuard performs per-example optimization (while DiffVax in some sense seeks to approximate this by learning effective per-example noises from a given dataset). So, it feels like PhotoGuard-type approaches should upper-bound the performance of DiffVax (while being substantially less efficient). Could the authors comment on this?
Also, another useful baseline would be a single universal adversarial perturbation. Could the authors relate the performance of their method to that?
How is the perturbation budget considered during inference? Is the noise predicted by DiffVax just scaled to accommodate a given perturbation budget? What is the budget used for the reported experiments?
局限性
yes
最终评判理由
Thank you to the authors for their response. I am keeping my score.
格式问题
N/A
We thank the reviewer for the constructive feedback and for recognizing the strengths of our work, including its conceptually clean approach, efficiency gains, and clear methodology. We address the raised questions below.
"I am confused by the performance against baselines. The authors report that their method, DiffVax, outperforms the PhotoGuard baseline. This is confusing to me, because PhotoGuard performs per-example optimization (while DiffVax in some sense seeks to approximate this by learning effective per-example noises from a given dataset). So, it feels like PhotoGuard-type approaches should upper-bound the performance of DiffVax (while being substantially less efficient). Could the authors comment on this?"
Although PhotoGuard optimizes perturbations per example, its projected gradient descent-based optimization is prone to local minima, limiting its effectiveness against the full, multi-step diffusion process. DiffVax learns a global strategy by backpropagating through the entire editing pipeline on a dataset, producing perturbations that generalize better. This learned strategy systematically targets low-frequency components (Appendix A.3), which are harder for diffusion models to ignore and more robust to common defenses (e.g., JPEG compression). As a result, DiffVax consistently outperforms PhotoGuard in our benchmarks despite using less total perturbation energy.
"Also, another useful baseline would be a single universal adversarial perturbation. Could the authors relate the performance of their method to that?"
Universal adversarial perturbations (UAPs) have largely been studied in classification tasks and have not been adapted for diffusion-based editing. Unfortunately, it is not feasible to design a single static perturbation that universally protects all images in this setting—the required noise depends heavily on the specific image content to be protected. DiffVax addresses this by acting as a universal immunizer model: rather than outputting a fixed noise pattern, it generates content-aware perturbations tailored to each image in a single forward pass. To our knowledge, there are no existing UAP-style baselines for diffusion editing with which to make a direct comparison, further underscoring the novelty of our contribution.
"How is the perturbation budget considered during inference? Is the noise predicted by DiffVax just scaled to accommodate a given perturbation budget? What is the budget used for the reported experiments?"
We do not explicitly enforce a fixed perturbation budget in DiffVax (e.g., -norm constraint) and instead, we designed DiffVax to jointly learn the optimal perturbation per each pixel directly through the loss function: + , where penalizes perceptibility and enforces protection. The trade-off hyperparameter (ablated in Appendix A.9) allows the model to adaptively allocate noise where it is most effective rather than distributing it uniformly. This flexible approach allows the model to learn to apply denser noise only where it is most effective.
Although we do not enforce a fixed perturbation budget, DiffVax achieves the smallest average perturbation magnitude among all methods. This indicates that our adaptive, learned allocation of noise is not only more effective but also more efficient, achieving superior protection with less total perturbation energy.
To further explore this issue, we conducted an additional experiment, performing a direct comparison among our baselines using fixed perturbation budgets of . The results underscore our method's efficiency:
| SSIM | PSNR | FSIM | CLIP-T | SSIM (Noise) | Mean Magnitude (L1) of Immunization Noise | |
|---|---|---|---|---|---|---|
| PG-D () | 0.492 | 14.13 | 0.355 | 27.85 | 0.947 | 0.007 |
| DiffGuard () | 0.507 | 13.98 | 0.360 | 24.83 | 0.900 | 0.012 |
| PG-D () | 0.502 | 14.23 | 0.360 | 29.18 | 0.950 | 0.006 |
| DiffGuard () | 0.526 | 14.30 | 0.373 | 26.13 | 0.927 | 0.009 |
| PG-D () | 0.528 | 14.60 | 0.387 | 30.27 | 0.978 | 0.003 |
| DiffGuard () | 0.546 | 14.46 | 0.388 | 26.36 | 0.965 | 0.005 |
| DiffVax | 0.496 | 13.85 | 0.352 | 22.96 | 0.989 | 0.001 |
As the table shows, the average perturbation from DiffVax is significantly smaller than that of other methods. This demonstrates that the strength of DiffVax lies in the strategic placement of noise (e.g., in low-frequency components, as discussed in Appendix A.3), not in its raw magnitude. Our method's effectiveness stems from its efficient and intelligent use of a learned, non-uniform perturbation.
We want to thank the reviewer for their insightful review. We hope our response addressed your concerns. We are happy to provide any further clarification you may need.
The paper presents a timely contribution for addressing the major efficiency bottleneck of existing image immunization methods. The shift to a learned, optimization-free framework is a key conceptual advance with immense practical implications, as it makes the defense scalable and viable for real-world applications.
While the authors have highlighted the paper's core contributions (efficiency, scalability, and robustness to unseen content and masks), one reviewer maintained a negative rating due to the lack of cross-model transferability.
Overall, given the competitive acceptance rate at NeurIPS, the paper cannot be accepted in its current form.