PaperHub
5.5
/10
Rejected4 位审稿人
最低5最高6标准差0.5
5
5
6
6
3.0
置信度
正确性2.8
贡献度2.8
表达2.5
ICLR 2025

Towards Black-Box Membership Inference Attack for Diffusion Models

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-05
TL;DR

A Black-Box Membership Inference Attack Algorithm for Diffusion Models

摘要

关键词
diffusion modelmembership inference attack

评审与讨论

审稿意见
5

This article explores the limitations of SOTA methods in addressing the significant problem of identifying whether an artwork has been used to train a diffusion model, i.e., the required access of internal U-nets. To this end, this paper proposes a new membership inference attack method that is based on an image-to-image transformation API without accessing to the model's internal U-net. The experimental results reflect the effectiveness of this method to some extent.

优点

-This article aims to identify whether an artwork was used to train a diffusion model without accessing to the model's internal U-net, which is an Interesting and relevant topic that fits within the scope of the conference.

  • Novelty. The article proposes a novel membership inference attack method based on an image-to-image transformation API.
  • The author(s) performed several experiments, and try to validate the effectiveness of the proposed method.

缺点

  • A certain part of experimental results reveal that the proposed method can only achieve very limited improvements over the existing SOTA methods. For example, as shown in Table 1, the proposed method only improves by about 1-3% on most metrics, and even performs worse than the SOTA method PIAN on the TP metric for CIFAR10. Unfortunately, the authors did not discuss or explain this in the paper. Therefore, the effectiveness of the proposed method is questionable. -The organization of this paper lacks clarity. The author does not clarify why it is difficult to implement MIA without access to the internal U-net, what challenges will be encountered, and how the author effectively addresses these challenges. Without this crucial information, it is difficult to evaluate the significance of the authors’ work.
  • The organization of this paper is poor. In the ablation experiments, the author attempts to test the impact of experimental parameters on the robustness of the algorithm; however, the evaluation metric is one-sided (i.e., only AUC). Hence, the experiment results are therefore difficult to be convincing. While we note that the author seems to provide more contents in the appendix, but these contents exceed the page limits of the paper and should not be considered.
  • The application experiments may have biases, as the author only conducts the evaluations on a single model (DALL-E) and relies on a small dataset (i.e., only 30 famous paintings and 30 generated paintings). Hence, the experiment results are therefore difficult to be convincing. Without comprehensive experiments, the effectiveness of the proposed methods cannot be validly verified.

The paper requires an in-depth editorial review. The authors are recommended to examine structure, argumentation, and language clarity to ensure the paper meets high-quality standards. For example, the equation numbering in the paper is chaotic, and the format of Algorithm 1 lacks indentation

问题

-Could authors provide more detailed discussion or explanation on the experiemnt results precented in Table 1? -Could authors clarify the challenges that will be encountered while implementing MIA without accessing to the internal U-net, as well as how the author effectively addresses these challenges?

评论

Q4: Could authors clarify the challenges that will be encountered while implementing MIA without accessing to the internal U-net, as well as how the author effectively addresses these challenges?

A4: We appreciate the reviewer’s insightful question. The primary challenge in implementing Membership Inference Attacks (MIA) without accessing the internal UNet lies in determining membership based solely on the model’s input and output, without direct access to the model’s parameters. This setting is reasonable because many companies are reluctant to release their code and weights of the model. To address this challenge, we reframe the problem as a series of individual detections, focusing on the model's ability to provide consistent predictions for training set images. Our method leverages the intuition that the model achieves more accurate and consistent outputs for samples it has seen during training. By repeatedly querying the model with perturbed versions of the target image, averaging the results, and comparing them to the original, we can infer membership by evaluating the consistency of the predictions. This enables us to perform effective image-to-image detection without accessing the model's internal parameters.

We thank the reviewer once again for the valuable and helpful suggestions. We will continue to provide clarifications if the reviewer has any further questions.

评论

We sincerely thank the reviewer for the comments and suggestions. Below, we address the primary concern that has been raised.

Q1: A certain part of experimental results reveal that the proposed method can only achieve very limited improvements over the existing SOTA methods.

A1: We thank the reviewer for the comments. In the DDPM experiment setup, previous methods have already achieved high accuracy (AUC > 0.9), so the potential for improvement is inherently limited. However, in the Diffusion Transformer and Stable Diffusion setups, our method achieves a more significant AUC improvement, with an increase of about 10%. We would like to emphasize that our main contribution is removing the dependence on accessing the internal UNet structure, which allows our method to be applicable to a wider range of scenarios compared to previous approaches.

Additionally, we would like to clarify that the AUC metric is more stable than the True Positive Rate (TP) when the False Positive Rate is fixed at 1%. Because the classification threshold is determined by a small number of misclassified samples, leading to greater volatility. When we take three different random seeds for CIFAR-10, the results show significant variation. Below, we present the TPR @1% FPR for the baseline methods and our algorithm for the DDPM model with three random seeds on the CIFAR-10 dataset:

Random Seed123
Loss[1]17.613.515.1
SecMI[2]43.437.842.9
PIA[3]44.440.938.1
PIAN[3]50.344.936.2
ReDiffuse (Ours)40.445.643.6

We also present the AUC for the baseline methods and our algorithm for the DDPM model with three random seeds on the CIFAR-10 dataset:

Random Seed123
Loss[1]0.880.890.89
SecMI[2]0.950.950.96
PIA[3]0.950.960.95
PIAN[3]0.950.950.94
ReDiffuse (Ours)0.960.970.97

As seen, the TP metric exhibits large variance for all methods, whereas the more stable AUC metric clearly demonstrates that our method outperforms the others. We also want to mention that our model consistently outperforms baselines on Diffusion Transformer(Table 2) and Stable Diffusion(Table 3).

Q2: In the ablation experiments, the author attempts to test the impact of experimental parameters on the robustness of the algorithm; however, the evaluation metric is one-sided (i.e., only AUC).

A2: We thank the reviewer for the comments. In the main experimental results presented in Section 5, we report AUC, ASR, and TPR@1%FPR. In the ablation study section, we focus on AUC due to space limitations, but the ASR ablation is included in Appendix B with corresponding explanations in the main text. To provide more comprehensive details, we also present the TPR@1%FPR results for DDIM on CIFAR-10 from the ablation study below:

Average Numbers35101520
TPR@1%FPR29.934.041.246.047.4
Diffusion Steps100150200250300
TPR@1%FPR34.939.841.230.729.1
Sampling Interval10202550100
TPR@1%FPR28.436.844.938.741.2

As observed from the experiments with random seeds in Q1, the TPR@1%FPR values exhibit inherent instability. However, the ablation study demonstrates that our algorithm remains effective, with performance not changing significantly across different hyperparameter choices.

Q3: The application experiments may have biases, as the author only conducts the evaluations on a single model (DALL-E) and relies on a small dataset (i.e., only 30 famous paintings and 30 generated paintings). Hence, the experiment results are therefore difficult to be convincing. Without comprehensive experiments, the effectiveness of the proposed methods cannot be validly verified.

A3: We appreciate the reviewer raising this question. Our main experimental results are provided in Section 5, where we evaluate the performance of our method in a more comprehensive setting. We conduct extensive experiments on DDIM, Diffusion Transformer and Stable Diffusion with many datasets including CIFAR-10/100, ImageNet, Laion5.

The purpose of Section 6 is to demonstrate the practical applicability of our algorithm to commercial models. Specifically, we use DALL-E 2’s variation API to show how our method can detect membership by leveraging the model’s outputs. Since DALL-E 2 does not provide a publicly accessible training set, the experiment in Section 6 primarily serves as demonstrative application scenario rather than an extensive evaluation with large-scale datasets. We agree that further evaluation on more diverse datasets and models would enhance the robustness of the findings, and we plan to include such experiments in future work.

评论

Thanks again for your valuable feedback! Could you please let us know whether your concerns have been addressed? We are happy to make further updates if you have any other questions or suggestions.

审稿意见
5

This paper introduces a new membership inference attack method for diffusion models. Unlike previous approaches that require direct access to the U-Net component within the diffusion model, this method only needs access to the model’s variation API. Extensive experiments demonstrate the effectiveness of the proposed approach.

优点

Strength:

  1. The paper is well written and easy to follow.

  2. The paper considers a broad range of model types, including DDIM, Stable Diffusion, and Diffusion Transformers and the proposed method performs well across different models.

  3. The paper provides theoretical justification for the proposed method.

缺点

  1. Practicality of the Scenario: The paper assumes that the variation API allows users to conduct denoising by querying the model with noisy images and receiving denoised outputs. However, in most real-world APIs for diffusion models (e.g., Stable Diffusion 3 API [1]), users typically only have access to final generated images and cannot query intermediate denoising stages. Therefore, while the proposed attack method does not require model parameter access, it may not be feasible for API-only models. If there are any API-only models with accessible variation APIs, it would be helpful if the authors could provide references to these in the rebuttal.

  2. Lack of a Comparison of Computational Requirements: As mentioned in Weakness 1, although the paper suggests that the method can operate in a black-box setting, its reliance on denoising queries rather than direct image generation could limit its applicability similarly to “white-box” approaches. Therefore, it is reasonable and necessary to have more comparison, particularly of computational resource requirements, because it seems that the proposed method may require more computational resources compared with the baselines.

  3. Clarity of Algorithm Explanation: The intuition behind the proposed algorithm in Section 4.2 is not clear enough. The statement, “If the noise prediction from the neural network exhibited high bias, the network could adjust to fit the bias term, further reducing the training loss,” is ambiguous. What specifically is meant by “bias” here, and why does this lead to the condition θL(θ)=0 \nabla_\theta L(\theta) = 0 for a well-trained model? While the method itself is intuitive, a more thorough explanation in this section would improve clarity.

[1] Stability AI. (2024). Stable Diffusion 3 API. Stability AI. Available from https://stability.ai/news/stable-diffusion-3-api

问题

  1. Can the proposed method further generalize to fine-tuning of diffusion models?

  2. Although the DDIM model is widely used and much more effective than DDPM, can you still provide some results regarding the performance in DDPM?

评论

We thank the reviewer for the comments and constructive suggestions. In the following, we address the main concern raised.

Q1: While the proposed attack method does not require model parameter access, it may not be feasible for API-only models.

A1: We appreciate the reviewer’s feedback. In Section 6 of the paper, we discuss an application scenario where membership inference attacks are performed using the variation API provided by OpenAI DALL-E[1] which is a popular API-only model. On the other hand, we believe that Stable Diffusion has not provided a variation API, likely because it has already released the model code and weights. We consider using only the text-to-image API for model attacks to be a valuable direction for future research. These results demonstrate that our approach can be applied to a real-world diffusion-model API.

Q2: It is reasonable and necessary to have comparison of computational resource requirements.

A2: We appreciate the reviewer’s question. From the perspective of computational complexity, the time required mainly depends on the average number nn. The time for each detection is nn times that of the baseline algorithm. However, we conducted an ablation study in Section 5.5, and the experimental results show that our algorithm achieves an AUC greater than 0.9 even when n=1n=1. Furthermore, with n=10n=10, our algorithm inference 100000 CIFAR-10 images in approximately 2 minutes. We believe this time cost is within the reasonable range, and that the model’s access conditions and accuracy are more important factors. We will add these discussions to the limitation section.

Q3: The intuition behind the proposed algorithm in Section 4.2 is not clear enough. The statement, “If the noise prediction from the neural network exhibited high bias” is ambiguous. What specifically is meant by “bias” here, and why does this lead to the condition θL(θ)=0\nabla_\theta L(\theta) = 0 for a well-trained model?

A3: Thank you for the question. In this context, the "bias" refers to the expectation of the random variable ϵϵθ(αˉtx0+1αˉtϵ,t)\epsilon - \epsilon_\theta \left( \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon, t \right).

The condition θL(θ)=0\nabla_\theta L(\theta) = 0 does not come directly from this statement but is instead a consequence of the assumption that our model is well-trained. Several papers [2][3][4] demonstrate that after sufficient training, neural networks converge to a stationary point where θL(θ)=0\nabla_\theta L(\theta) = 0.

On the other hand, the statement that “If the noise prediction from the neural network exhibited high bias, the network could adjust to fit the bias term, further reducing the training loss,” is an explanation for the equations in line 210.

Q4: Can the proposed method further generalize to fine-tuning of diffusion models?

A4: We greatly appreciate the reviewer for suggesting this new direction. We conduct relevant experiments by fine-tuning a DDIM model pre-trained on the STL-10 dataset. The fine-tuning dataset consists of 1000 randomly sampled images from CIFAR-10 or Tiny-ImageNet, while another 1000 images from the dataset are used as non-members. We finetune the model for 10000 iterations. The experimental results of our algorithm are as follows:

Finetuning DatasetCIFAR-10Tiny-ImageNet
AUC0.910.92
ASR0.850.87

The results show that our algorithm also performs well in the fine-tuning setup, being able to determine whether an image is in the model's fine-tuning training set. We will add these results to the paper.

Q5: Although the DDIM model is widely used and much more effective than DDPM, can you still provide some results regarding the performance in DDPM?

A5: We appreciate the reviewer’s question. We have conducted experiments on the DDPM model across four different datasets with 30 diffusion steps. The results are as follows:

DatasetCIFAR-10CIFAR-100STL-10Tiny-ImageNet
AUC0.870.850.810.89
ASR0.800.780.750.82

The experimental results show that our algorithm is also effective on the DDPM model. We will add these results to the paper.

We thank the reviewer once again for the valuable and helpful suggestions. We would be happy to provide further clarifications if the reviewer has any additional questions.

References

[1] The variation API of DALL-E. https://platform.openai.com/docs/guides/images/variations-dall-e-2-only

[2] Brutzkus A, Globerson A, Malach E, et al. SGD learns over-parameterized networks that provably generalize on linearly separable data[J]. arXiv preprint arXiv:1710.10174, 2017.

[3] Zhang C, Bengio S, Hardt M, et al. Understanding deep learning (still) requires rethinking generalization[J]. Communications of the ACM, 2021, 64(3): 107-115.

[4] Li H, Xu Z, Taylor G, et al. Visualizing the loss landscape of neural nets[J]. Advances in neural information processing systems, 2018, 31.

评论

Thanks again for your valuable feedback! Could you please let us know whether your concerns have been addressed? We are happy to make further updates if you have any other questions or suggestions.

审稿意见
6

This paper introduces a novel membership inference attack method that uses only the image-to-image variation API and operates without access to the underlying model. The experimental results suggest that the model offers a significant boost over prior works.

优点

  • Clear and intuitive method. As someone who does not directly work on membership inference problems, the proposed methodology makes a lot of sense. I am not able to judge if prior works have proposed a similar idea before.
  • The paper provides a good theoretical analysis of the effectiveness of the proposed method.
  • The performance boost over prior works seems to be quite significant.

缺点

  • The paper should better discuss the connections between the proposed idea and prior works. For example, have relevant ideas been proposed in other types of membership inference attack methods?
  • The abstract is too brief. The paper could benefit from elaborating the abstract with more insights of the proposed method and the key experimental results.

问题

  • Since the proposed idea is quite intuitive, have relevant ideas been proposed in other types of membership inference attack methods?
  • Table 4 is vague, how to interpret the L1 and L2 distance as membership inference accuracy?
  • In general, is there a way to provide any confidence to the membership inference results? After all, any mistake could lead to wrong accuse for the diffusion API.

伦理问题详情

The research will lead to claims regarding commercial image generation APIs (such as DALL-E 2 in this paper) has used copyrighted data. Since such membership inference results are uncertain, that may lead to controversies in the general public.

评论

We express our gratitude to the reviewer for the insightful comments. Please find the details below.

Q1: Have relevant ideas been proposed in other types of membership inference attack methods?

A1: We did further survey upon the reviewer's suggestion. We found similar ideas in the context of Machine-Generated text detection, particularly with methods like DetectGPT[1]. DetectGPT uses a perturbation-based approach to determine whether a piece of text is generated by a language model. The core idea is to apply small perturbations to the input text multiple times and observe the model's response to these modifications. If the model’s probability distribution remains stable, it suggests the text is likely generated by the model. Our approach is analogous in that we treat the MIA as a multiple-access model problem. We add random noise to the image and examine whether the model’s denoised output remains stable, using this stability to infer whether the image is in the model’s training set. We will add this reference to the related work section in our paper.

Q2: The abstract is too brief. The paper could benefit from elaborating the abstract with more insights of the proposed method.

A2: We appreciate the reviewer’s suggestion. In response, we have revised the abstract to provide more insights of the proposed method. Please refer to the updated version of the manuscript.

Q3: Table 4 is vague, how to interpret the L1L_1 and L2L_2 distance as membership inference accuracy?

A3: In Algorithm 1, our approach relies on a distance function D. In Table 4, the L1L_1 and L2L_2 distances represent the L1L_1 and L2L_2 metrics in pixel space between two images, serving as a distance function. Similar to previous sections, we classify images with a distance smaller than a certain threshold as members, and those with a larger distance as non-members.

Q4: In general, is there a way to provide any confidence to the membership inference results?

A4: We thank the reviewer for suggesting this interesting direction. Based on the algorithm in this paper, one possible way to estimate confidence is by evaluating the size of the distance. For instance, if we set a threshold τ\tau, images with reconstruction errors close to τ\tau would have lower confidence in their classification, while images further from τ\tau would have higher confidence. We consider this a valuable direction for future work.

Finally, we thank the reviewer once again for the efforts in providing us with valuable and helpful suggestions. We will continue to provide clarifications if the reviewer has any further questions.

Reference

[1] Mitchell, Eric, et al. "Detectgpt: Zero-shot machine-generated text detection using probability curvature." International Conference on Machine Learning. PMLR, 2023.

评论

Thank you for the helpful answers. All my concerns have been stressed. Given my low confidence score, I would like to maintain my current evaluation of score 6.

评论

We thank the reviewer for acknowledging our work!

审稿意见
6

To provide protection for the artworks and detect misuse of data, this paper proposes a black-box membership inference attack for diffusion models used in image generation. It first gives a brief introduction to popular diffusion models. Then it introduces the black-box MIA method based on the variation API. The core idea of the method is based on the hypothesis that images used in the training set typically result in smaller reconstruction errors compared to images not in the training set. Experiments are conducted on several datasets, followed by further ablation studies and an application in a real-world setting.

优点

  1. This paper introduces a novel black-box MIA method that merely requires access to variation API, which makes it more practical and easy to perform.

  2. The core idea of the proposed method is intuitive and effective, as demonstrated through both theoretical and experimental results.

  3. The experiments are comprehensive. The proposed method is applied to three diffusion models and experiments are conducted on multiple datasets. Also, further ablation study for several key hyper-parameters and real application to DALL-E’s API are conducted.

缺点

  1. The written of this paper can be further improved. The motivation discussed in Introduction is not solid enough in my opinion.

  2. Although many experiments are conducted, the analysis to the results is insufficient. For example, in Table 1, the TP value of REDOFFUSE on CIFAR-10 is much lower than others, while no analysis is provided. Also, there is a lack of analysis on why longer diffusion steps, an important factor affecting performance, seriously degrade the results. Potential solutions for this issue should also be discussed.

  3. More cases should be provided to analyze the differences between member and non-member samples. But there is only one case in Sec.6.

问题

Please refer to weaknesses.

评论

We greatly appreciate the reviewer's comments and valuable suggestions. We address the reviewer's questions in more detail as follows:

Q1: The written of this paper can be further improved. The motivation discussed in Introduction is not solid enough in my opinion.

A1: We appreciate the reviewer's feedback on the need for a clearer explanation of the motivation. In response, we have expanded the discussion of the motivation in the introduction. The added content can be found in the highlighted sentences of the revised manuscript.

Q2: The TP value of REDIFFUSE on CIFAR-10 is much lower than others. Also, there is a lack of analysis on why longer diffusion steps, an important factor affecting performance, seriously degrade the results.

A2: We thank the reviewer for the comments.

First, while existing algorithms perform well on simpler tasks such as CIFAR-10, the primary contribution of our method lies in eliminating the need to access the UNet. Hence, we match previous performance while further avoiding the usage of the inner denoise models.

Second, we believe that the AUC is more reliable than the True Positive Rate (TPR) when the False Positive Rate is fixed at 1%. Because the classification threshold is determined by a small number of misclassified samples, leading to greater volatility. When we take three different random seeds for CIFAR-10, the results show significant variation. Below, we present the TPR @1% FPR for the baseline methods and our algorithm for the DDPM model with three random seeds on the CIFAR-10 dataset:

Random Seed123
Loss[1]17.613.515.1
SecMI[2]43.437.842.9
PIA[3]44.440.938.1
PIAN[3]50.344.936.2
ReDiffuse (Ours)40.445.643.6

We also present the AUC for the baseline methods and our algorithm for the DDPM model with three random seeds on the CIFAR-10 dataset:

Random Seed123
Loss[1]0.880.890.89
SecMI[2]0.950.950.96
PIA[3]0.950.960.95
PIAN[3]0.950.950.94
ReDiffuse (Ours)0.960.970.97

As seen, the TP metric exhibits large variance for all methods, whereas the more stable AUC metric clearly demonstrates that our method outperforms the others. We also want to mention that our model consistently outperforms baselines on Diffusion Transformer(Table 2) and Stable Diffusion(Table 3).

Third, regarding longer diffusion steps, we observe that the reconstruction error has higher variance. Although the average for members is an unbiased estimator, the smaller sample size results in lower accuracy. To reduce this variance, we can increase the number of averages, which improves accuracy. The AUC results for CIFAR-10 with diffusion steps=400 under different average numbers are shown in the table below:

Average Numbers102030
ReDiffuse (Ours)69.473.176.8

Q3: More cases should be provided to analyze the differences between member and non-member samples. But there is only one case in Sec.6.

A3: We greatly appreciate the reviewer’s suggestion. In the revised version of our manuscript, we have added additional examples in Appendix D, demonstrating the changes after applying the variation API to both member and non-member inputs. The results show that DALL-E 2’s variation API induces fewer changes to images in the member set compared to the non-member set.

Once again, we sincerely thank the reviewer for the constructive comments, and we are eager to engage in further discussions to clarify any concerns.

References

[1] Matsumoto, Tomoya, Takayuki Miura, and Naoto Yanai. "Membership inference attacks against diffusion models." 2023 IEEE Security and Privacy Workshops (SPW). IEEE, 2023.

[2] Duan, Jinhao, et al. "Are diffusion models vulnerable to membership inference attacks?." International Conference on Machine Learning. PMLR, 2023.

[3] Kong, Fei, et al. "An efficient membership inference attack for the diffusion model by proximal initialization." arXiv preprint arXiv:2305.18355 (2023).

评论

Thanks again for your valuable feedback! Could you please let us know whether your concerns have been addressed? We are happy to make further updates if you have any other questions or suggestions.

评论

We would like to express our sincere gratitude for the reviewer's constructive suggestions and comments. Since the deadline is approaching, we sincerely hope the reviewers can read our response. Please let us know if the reviewers have any comments about our response or any other additional concerns. We are eager to provide any further clarifications and discussions to help the evaluation.

AC 元评审

The paper introduces a membership inference attack method that uses only the image-to-image variation API. In terms of overall score, the reviewers were mildly positive (2x) or mildly negative (2x). In terms of strength, the reviewers highlighted that the method was intuitive and comes with a theoretical justification. In terms of weaknesses, the reviewers had concerns regarding the presentation/motivation, experimental evaluation and gain with respect to the state of the art, and computational complexity. As a consequence, I am unable to recommend acceptance.

审稿人讨论附加意见

The authors put a significant effort in addressing the reviewers' concerns in their rebuttal. However, they did not persuade the reviewers to follow-up or increase their overall recommendation.

最终决定

Reject