/10

Poster4 位审稿人

最低3最高4标准差0.5

ICML 2025

Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models

Xiaoyu Wu,Jiaru Zhang,Steven Wu

OpenReview PDF

提交: 2025-01-22更新: 2025-07-24

TL;DR

Extrapolated guidance from pretrained to fine-tuned DMs enables strong fine-tuning data extraction.

摘要

关键词

Data ExtractionCopyright ProtectionPrivacy and SecurityDiffusion ModelsTrustworthy AI

评审与讨论

审稿意见

评分: 32025-03-11

The paper proposes a model guidance to extract fine-tuning data, that leverages its base pre-trained model as a guidance. The proposed method “model guidance” can sample from the learned distribution of the fine-tuned models via simple guidance techniques. They further propose a new clustering algorithm for sampling within high-probability regions by constructing an image graph. Experiments on various datasets confirm its effectiveness, supported by ablation studies, and real-world applications, such as using checkpoints from the community.

给作者的问题

How does the method perform on recent models, such as FLUX? In the case of DreamBooth, better models tend to show higher generalizability rather than memorization. I would like to see results related to this.

论据与证据

Yes, the paper’s claims are supported by experiments. But some parts are missing; see Weaknesses for details.

方法与评估标准

Yes, the paper’s methods are supported by experiments.

理论论述

Yes, the paper’s equations are reasonable.

实验设计与分析

Yes, the paper’s methods are supported by experiments. But some parts are missing; see Weaknesses for details.

补充材料

Yes, I read all supplementary material.

与现有文献的关系

This work is the first try to leverage a base pre-trained model to extract its fine-tuning dataset. Most existing works did not distinguish this from pre-training.

遗漏的重要参考文献

No, to the best of my knowledge, there are no related works that should be further compared or discussed.

其他优缺点

Sterngth

Model guidance is straightforward and aligns well with the current trend of using the same base model in real-world applications.
The paper is easy to read and follow.
Experiments are provided on various models and concepts, and the experiments using checkpoints from communities like Hugging Face were particularly good.

Weakness

The paper assumes that the training caption is given. Although I found the training caption extraction part in the Appendix, I couldn't find the training data extraction result using the extracted prompt. In my opinion, this part is crucial for highlighting the paper's practical motivation, so the results should be reported.
The model guidance method sounds reasonable, but the clustering approach is not very intuitive and novel. An ablation study on the clustering method is essential, and further explanations, such as qualitative examples, would be helpful.

其他意见或建议

Please refer to Questions or Weaknesses.

作者回复

2025-03-30

Thank you for your insightful feedback!

For Weakness1:

I couldn't find the training data extraction result using the extracted prompt. In my opinion, this part is crucial for highlighting the paper's practical motivation, so the results should be reported.

Our default setting uses captions as accessible, aligning with previous works [1-3]. We observe that captions are accessible in many cases, particularly on real checkpoints on Civitai. This is especially evident for DreamBooth, where several special tokens within the training captions (such as "a sks dog") are always available to ensure the correct application of the models.

Nonetheless, there are scenarios where captions are not accessible, and we admit that data extraction result using the extracted prompt is helpful. Therefore, we update the results using the extracted captions (3 words in length) shown in Tab 5.

	AS	A-ESR( $\tau$ =0.6)
FineXtract (Full Caption)	0.501	0.35
FineXtract (Extracted Caption)	0.314	0.15
FineXtract (Empty Caption)	0.192	0.00
CFG (Full Caption)	0.434	0.23
CFG (Extracted Caption)	0.308	0.08
Direct Text2img (Empty Caption)	0.146	0.00

We observe that though the extraction success rate decreases compared with extraction with full captions, it is significantly stronger than performing extraction without any caption information. It confirms the effectiveness of the proposed caption extraction approach. Additionally, FineXtract achieves notably higher extraction performance under extracted captions compared to the baseline.

[1]Carlini N, Hayes J, Nasr M, et al. Extracting training data from diffusion models[C]//32nd USENIX Security Symposium (USENIX Security 23). 2023: 5253-5270.

[2]Somepalli G, Singla V, Goldblum M, et al. Diffusion art or digital forgery? investigating data replication in diffusion models[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 6048-6058.

[3]Somepalli G, Singla V, Goldblum M, et al. Understanding and mitigating copying in diffusion models[J]. Advances in Neural Information Processing Systems, 2023, 36: 47783-47803.

For Weakness2:

An ablation study on the clustering method is essential, and further explanations, such as qualitative examples, would be helpful.

Without the clustering method, the attacker can only generate a number of data points but cannot determine which ones could lead to a successful extraction. This is reflected in our ablation study in Fig 4(b), where only 1 generated image per training image is used. The result is presented again in the following table to emphasize how the clustering component significantly improves the AS and the A-ESR.

	AS	A-ESR( $\tau$ =0.7)	A-ESR( $\tau$ =0.6)
CFG (without Clustering)	0.280	0.00	0.10
FineXtract (without Clustering)	0.338	0.05	0.13
CFG (with Clustering)	0.434	0.08	0.23
FineXtract (with Clustering)	0.501	0.15	0.35

For Questions:

How does the method perform on recent models, such as FLUX?

We update the results using FLUX.1 [dev]. The DreamBooth fine-tuning on FLUX.1 [dev] requires more than 100GB per GPU, which we are currently unable to run. However, we have conducted experiments in the LoRA scenario using the official scripts provided by Diffusers. The training iterations are fixed at 150 $N_0$ , as suggested by the repository, and other hyperparameters remain the same as those in the repo. We compare our method with the baseline (CFG), and both methods show the best performance when the guidance strength $w’$ is set to 3.0, with the improvement being consistent.

	AS	A-ESR( $\tau$ =0.7)	A-ESR( $\tau$ =0.6)
CFG+Clustering	0.407	0.03	0.20
FineXtract	0.496	0.30	0.43

审稿人评论

2025-04-02

I appreciate the authors for the detailed response and additional experiments. My concerns are mostly addressed, and thus raise the score.

作者评论

2025-04-08

We thank the reviewer for acknowledging our work! We're encouraged that most concerns have been addressed and truly appreciate the time and effort spent on the review.

审稿意见

评分: 32025-03-12

This paper introduces FineXtract, a framework for extracting data used in the fine-tuning of personalized diffusion models. The authors propose a parametric approach to approximate the fine-tuning data distribution by extrapolating the original output distributions of both the pre-trained and fine-tuned models. Subsequently, a clustering algorithm is applied to identify probable fine-tuning images from the generated samples. The proposed method is evaluated across various fine-tuning scenarios, including real-world cases, achieving an Extraction Success Rate of approximately 20%.

给作者的问题

I have two questions regarding the empirical study:

Caption Accessibility Assumption: As the authors also noted, the assumption that training captions are fully accessible seems somewhat strong for practical scenarios. It is commendable that the paper discusses strategies for partially extracting and extending captions when they are unavailable. However, did the authors evaluate FineXtract using these extracted captions? If so, how does the performance compare to using ground-truth captions?
Hyperparameter $\lambda{\prime}$ Selection: It appears that the key hyperparameter $\lambda{\prime}$ is selected via grid search (among 1.0, 2.0, 3.0, 4.0, and 5.0), and its optimal value varies across different fine-tuning settings and data extraction approaches. Could the authors provide further discussion on the underlying principles or insights for identifying $\lambda{\prime}$ or setting the grid search range? For example, I suspect that the optimal $\lambda{\prime}$ depends on the number of fine-tuning iterations. Would it be possible to fix the fine-tuning dataset and the data extraction approach, and systematically analyze how the optimal $\lambda{\prime}$ changes with the number of fine-tuning iterations? Furthermore, when the fine-tuning dataset and iteration count are fixed, does a larger optimal $\lambda{\prime}$ suggest that the model or fine-tuning approach is more prone to overfitting or memorization of the training data?

论据与证据

Yes

方法与评估标准

Yes

理论论述

Nan

实验设计与分析

Yes, I checked section 5 and all sections in the Appendix.

补充材料

Yes, I reviewed all sections in the Appendix.

与现有文献的关系

Previous studies on data extraction have primarily focused on pre-trained diffusion models, while research on extracting data from personalized diffusion models—particularly few-shot fine-tuned models—remains scarce. This work introduces a novel approach by constructing the fine-tuning data distribution through extrapolation between the pre-trained and fine-tuned models. To the best of my knowledge, this is a new perspective that is likely to provide valuable insights to the research community.

遗漏的重要参考文献

Nan

其他优缺点

Strengths: The paper is overall well constructed and presented. The proposed approach is principled. The empirical study is thorough.

Weaknesses: See questions.

其他意见或建议

Nan

作者回复

2025-03-30

Thank you for your insightful feedback!

For Questions1:

It is commendable that the paper discusses strategies for partially extracting and extending captions when they are unavailable. However, did the authors evaluate FineXtract using these extracted captions? If so, how does the performance compare to using ground-truth captions?

As the caption extraction algorithm is not the core contribution of this paper, we focus on the extraction with full captions in our main experiments, consistent with the settings of previous works [1-3].

Nonetheless, we admit that evaluating FineXtract using the extracted captions is highly valuable for fully assessing its effectiveness. Therefore, we update the results using the extracted captions (3 words in length) shown in Tab 5. We observe that, although the extraction success rate decreases compared to using full captions, it remains significantly higher than when no caption information is used. FineXtract, when applied to extracted captions, achieves a notably higher extraction success rate compared to the baseline. This confirms the effectiveness of both the proposed caption extraction approach and FineXtract.

	AS	A-ESR( $\tau$ =0.6)
FineXtract (Full Caption)	0.501	0.35
FineXtract (Extracted Caption)	0.314	0.15
FineXtract (Empty Caption)	0.192	0.00
CFG (Full Caption)	0.434	0.23
CFG (Extracted Caption)	0.308	0.08
Direct Text2img (Empty Caption)	0.146	0.00

[1]Carlini N, Hayes J, Nasr M, et al. Extracting training data from diffusion models[C]//32nd USENIX Security Symposium (USENIX Security 23). 2023: 5253-5270.

[3]Somepalli G, Singla V, Goldblum M, et al. Understanding and mitigating copying in diffusion models[J]. Advances in Neural Information Processing Systems, 2023, 36: 47783-47803.

For Questions2:

Could the authors provide further discussion on the underlying principles or insights for identifying or setting the grid search range?

Furthermore, when the fine-tuning dataset and iteration count are fixed, does a larger optimal suggest that the model or fine-tuning approach is more prone to overfitting or memorization of the training data?

Intuitively, with longer training iterations, the distribution learned by the fine-tuned model should better align with the fine-tuned data distribution. In other words, $p_{\theta'}(x)$ should approach $q(x)$ and become more distant from $p_{\theta'}(x)$ in Eq. (3). Therefore, the optimal $\lambda$ should increase (closer to 1), and the optimal $w = \frac{1}{\lambda}$ should decrease with longer training iterations. Similiary, for conditional diffusion model in Eq. (7), the optimal $\lambda'$ should also increase with the optimal $w' = \frac{1}{\lambda'}$ decreases with longer training iterations.

We update the results using four classes of images from 4 classes of WikiArt under DreamBooth and perform a grid search to find the optimal hyperparameter $w'$ under different training iterations. We set the fixed term $k = 0$ for simplicity, focusing only on $w'$ . The experiment results are available via an anonymous link: Grid search for the best w' for different classes.

Our findings show that, for our method, the optimal $w'$ tends to decrease (the optimal $\lambda'=\frac{1}{w'}$ tends to increase) as training iterations increase in most cases. However, due to the inherent randomness in the clustering process, this trend is not consistent across all scenarios.

We also observe that when the AS is higher, indicating that the model has memorized more, the optimal $w'$ tends to be smaller ( the optimal $\lambda'=\frac{1}{w'}$ tends be larger) . The experiment results are available via an anonymous link: Experiment result for AS at best $w'$ .

In other words, a larger $\lambda'$ likely suggests that the model, or the fine-tuning approach, is more prone to overfitting or memorization of the training data, as noted by the reviewer.

We will include more extensive research on this aspect in the revised paper.

审稿人评论

2025-04-02

I thank the authors for their detailed response. Most of my concerns have been satisfactorily addressed.

The only remaining issue pertains to the performance degradation when captions are extracted, which may limit the applicability of FineXtract in broader real-world scenarios. Nonetheless, I believe the overall contribution of this work outweighs this limitation. Therefore, I will maintain my original rating (weak accept).

作者评论

2025-04-08

Thank you for your positive comments and helpful feedback!

“The only remaining issue pertains to the performance degradation when captions are extracted, which may limit the applicability of FineXtract in broader real-world scenarios.”

We still want to clarify that our main setting assumes captions are directly accessible, which is often the case in practice—particularly for real checkpoints on platforms like Civitai. In DreamBooth, for example, special tokens such as “a sks dog” are typically included to ensure proper model usage. This assumption is also commonly adopted in prior work.

For scenarios where captions are not directly available, we propose an alternative approach to extract partial information. While we do not claim to recover full captions without degradation, to the best of our knowledge, this is the first attempt to explore caption extraction. Our results show that even partial information can increase extraction performance, and we hope this will inspire further research in this direction.

We greatly appreciate your time and efforts spent in reviewing our work!

审稿意见

评分: 42025-03-16

This paper introduces a method to extract training data from personalized diffusion models (DMs). The method approximates the fine-tuned model's distribution as an interpolation between the pretrained model's distribution and the fine-tuning data distribution. By extrapolating the score functions of these models, the generation can be guided toward regions with high probability in the fine-tuning data distribution. Subsequently, clustering is used to identify the most likely matches to the original training data.

update after rebuttal

I am satisfied with the rebuttal, adding experiments with FLUX.1 that show consistent improvements over the baseline method and showing how the extraction rates are correlated with training iterations, which shows the efficacy of the method follows model memorization.

I am convinced to raise my score to Accept.

给作者的问题

Given that the extraction success rate is around 20% in most cases, what factors do you believe limit the extraction of the remaining training data? Would further improvements to the guidance or clustering components potentially increase this rate, or are there fundamental limitations to how much information can be extracted?

论据与证据

The primary claim that fine-tuned diffusion models leak information about their training data is well-supported by the experimental results, and the method successfully extracts training images from various models
The theoretical claim about approximating the fine-tuned model's distribution as an interpolation between the pretrained model and the training data distribution is mathematically consistent and tested with effective practical results.
The performance claims are supported by experiments across different models, fine-tuning methods, and datasets, showing consistent improvements over baselines. Effectiveness is also demonstrated on real-world checkpoints from HuggingFace.

方法与评估标准

The proposed method of model guidance through extrapolation is well-motivated and appropriate for the task, leveraging the mathematical relationship between score matching and diffusion models.
The evaluation metrics are appropriate and refer to previous work in the field.
The choice of datasets for style and object learning is appropriate and common for the use of personalized diffusion models.

理论论述

I have read through Section 4 and the derivations appear sound. However, I haven't checked the correctness of them.

实验设计与分析

The ablation studies on guidance scale and correction term provide valuable insights into the sensitivity of the method, while experiments on real-world checkpoints are particularly valuable.
The defense experiments are also interesting, showing that they can reduce extraction success at the cost of generation quality, highlighting practical tradeoffs.
Experiments on model architectures only account for convolution-based diffusion. Current state-of-the-art generative models, like SD3, Flux or Sana, feature a transformer-based architecture. The paper would benefit from extending the evaluations to these.

补充材料

Yes, I reviewed Appendices B-K, especially for comparisons and visualizations.

与现有文献的关系

This work extends previous research on memorization and data extraction in diffusion models, focusing on personalized models. It also contributes to the area of privacy and copyright in GenAI by providing concrete evidence of data leakage risks.

遗漏的重要参考文献

No missing essential related works to my knowledge.

其他优缺点

Strengths

The formulation of fine-tuned model distribution as an interpolation between the base model and training data distributions provides an elegant theoretical framework that could be applied to other problems beyond data extraction.
The work is original in its application of score matching and guidance to the problem of extracting private training data.

Weaknesses

The extraction success rate is significantly better than the baselines, but still nearly 20% of the total, which seems limited.
The computational requirements of running two models simultaneously are higher than the baseline (more GPU memory), which may limit practical applicability on consumer-grade equipment.

其他意见或建议

Additional visualizations showing the distribution shifts between pretrained and fine-tuned models could help readers better understand the theoretical foundations.

作者回复

2025-03-30

Thank you for your insightful feedback!

About Experiment Design and Analysis:

Experiments on model architectures only account for convolution-based diffusion. Current state-of-the-art generative models, like SD3, Flux or Sana, feature a transformer-based architecture. The paper would benefit from extending the evaluations to these.

We update new results using FLUX.1 [dev]. We experiment on Lora scenario with the official scripts provided by diffusers. The training iterations are fixed to 150 $N_0$ as suggested by the repository and other hyper-parameters keep the same as the one in the repository. We compare our method with baseline (CFG), both methods show best performance when the guidance strength $w’$ is 3.0, where we can see the improvement is consistent. We will add these results in the revised paper.

	AS	A-ESR( $\tau$ =0.7)	A-ESR( $\tau$ =0.6)
CFG+Clustering	0.407	0.03	0.20
FineXtract	0.496	0.30	0.43

About Weakness1 and Question:

The extraction success rate is significantly better than the baselines, but still nearly 20% of the total, which seems limited.

Given that the extraction success rate is around 20% in most cases, what factors do you believe limit the extraction of the remaining training data? Would further improvements to the guidance or clustering components potentially increase this rate, or are there fundamental limitations to how much information can be extracted?

We present updated results demonstrating how the extraction success rate increases with the number of training iterations across four classes of the WikiArt dataset, using DreamBooth as the fine-tuning method on SD v1.4.

Training Iterations	AS	A-ESR( $\tau$ =0.6)
100 $N_0$	0.350	0.03
150 $N_0$	0.360	0.05
200 $N_0$ (commonly used setting)	0.501	0.35
300 $N_0$	0.564	0.58
400 $N_0$	0.594	0.68

These results suggest that the extraction performance is primarily dependent on the extent of the information the model memorizes. Our current result, above 20% at most cases, is based on the default training configuration and the current checkpoint. We believe this limitation is largely due to the model's inherent memorization capacity.

Potential improvements may arise from leveraging additional information within the model. Our current approach relies solely on the output of predicted noise, utilizing the model in an end-to-end manner. Future work could focus on further analysis, such as measuring attention responses to different inputs or optimizing input noise, which might enhance the method.

About Weakness2:

The computational requirements of running two models simultaneously are higher than the baseline (more GPU memory), which may limit practical applicability on consumer-grade equipment.

Even though we need to load two models onto the GPU simultaneously, this overhead is minimized because typically only a specific component is fine-tuned. For instance, in DreamBooth, the UNet is typically fine-tuned, while the text encoder remains unchanged. In LoRA, only the LoRA component is fine-tuned. Therefore, when using our method in such scenarios, the additional GPU memory usage is limited to the fine-tuned module. Moreover, FineXtract is an inference-only method and does not require the memory costs from gradient backpropagation. Therefore, loading two models does not limit practical applicability a lot on consumer-grade equipment in practice.

Other comments or suggestions:

Additional visualizations showing the distribution shifts between pretrained and fine-tuned models could help readers better understand the theoretical foundations.

Thank you for your valuable advice. We will revise the paper accordingly.

审稿人评论

2025-04-08

I thank the authors for their response. Particularly, adding experiments with FLUX.1 that show consistent improvements over the baseline method and showing how the extraction rates are correlated with training iterations, which shows the efficacy of the method follows model memorization.

I am convinced to raise my score to Accept.

作者评论

2025-04-08

We thank the reviewer for acknowledging our work! We're encouraged that most concerns have been addressed and truly appreciate the time and effort spent on the review.

审稿意见

评分: 42025-03-16

This paper introduces a novel technique for extracting fine-tuning data from personalized diffusion models, distinguishing it from prior work on data extraction in standard diffusion models. The additional constraints imposed by the fine-tuning phase have real-world implications, enabling more effective data extraction by leveraging the knowledge that the current model is a fine-tuned version of a base model.

The key insight is that the fine-tuned model's score prediction can be expressed as a weighted combination of the base model's score and the score of the fine-tuning dataset distribution. This formulation allows for an analytical conversion to isolate the score of the fine-tuning distribution, which can then be used for sampling and subsequent clustering to recover training data.

The proposed method is evaluated on a customized generation benchmark, demonstrating improved performance by leveraging this new perspective.

给作者的问题

not applicable.

论据与证据

Yes, the claims are well validated.

方法与评估标准

The overall approach is well-founded. I appreciate the novel problem setup and the insightful method developed around it. The evaluation is robust and effectively supports the proposed technique.

理论论述

Yes, I checked the equations for deriving the score for the distribution of the finetuned dataset

实验设计与分析

Yes.

补充材料

Yes, all.

与现有文献的关系

This paper presents a novel approach to improving training data extraction from pre-trained diffusion models by introducing a tailored guidance term that isolates the influence of fine-tuning. Given the widespread availability of fine-tuned models online, this technique has good practical relevance.

遗漏的重要参考文献

no.

其他优缺点

The overall illustration is clear. And the method makes intuitively sense and is easy to derive and implement.

其他意见或建议

not applicable.

作者回复

2025-03-30

Thank you for your valuable feedback. We greatly appreciate your positive comments!

最终决定Accept (poster)

2025-05-01

This paper introduces a novel method for extracting training data from personalized diffusion models. To achieve this, the authors propose model guidance and clustering techniques.

Strengths

Simple yet effective method
Addresses an important research topic with real-world relevance
Demonstrates results across a variety of models

Weaknesses

Relies on assumptions about training captions

The authors have addressed the reviewers’ concerns in the rebuttal, and I believe this paper makes a meaningful contribution that will be valuable to the community.