/10

Spotlight4 位审稿人

最低4最高4标准差0.0

ICML 2025

PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs

Jianqing Zhang,Yang Liu,JIE FU,Yang Hua,Tianyuan Zou,Jian Cao,Qiang Yang

OpenReview PDF

提交: 2025-01-23更新: 2025-07-24

摘要

关键词

synthetic data generationdifferential privacyevolution algorithm

评审与讨论

审稿意见

评分: 42025-03-11

The paper addresses the problem of generating Differentially Private (DP) synthetic images using APIs, focusing on the setting in which only a small number of private data samples are available.

The authors observe that a popular prior work, Private Evolution, struggles in few-shot private data scenarios due to limitations in its DP-protected similarity voting approach. This is because with few-shot private data, the noise added for DP overwhelms the actual votes, leading to nearly random similarity voting and selection.

To address this, the authors propose Private Contrastive Evolution (PCE). PCE iteratively mines inter-class contrastive relationships in the few-shot private data and integrates them into an adapted Exponential Mechanism (EM) to directly select candidates rather than vote.

PCE includes several key components:

A contrastive filter to improve the class-discriminability of synthetic data.
A similarity calibrator on top of the contrastive filter to maximize the selection probability of the most similar synthetic data.
A score function with sensitivity 1 based on the similarity calibrator / contrastive filter

Then, using the score function above, a synthetic dataset is generated from the exponential mechanism. This process is then repeated in an evolutionary loop.

The authors conduct experiments on four specialized datasets from healthcare and industry domains, demonstrating that PCE outperforms PE and other API-assisted baselines. They also analyze PCE's properties, including synthetic image quality, component effectiveness, and hyperparameter influence. The experimental code is provided in an anonymous repo.

update after rebuttal

I am positive about the paper and the rebuttals have reinforced my stance.

给作者的问题

It seems that it is possible to generate as many images as desired after the final iteration of PCE, simply by using an i2i model repeatedly on the synthetic dataset. Is my understanding correct? If so, does generating more synthetic images improve downstream classification? If this is the case, it could be highlighted as a further benefit of PCE that we can generate large synthetic datasets from only a small number of private data points.
How is the performance of PCE on large image datasets with a small number of images per class (e.g. CIFAR-100)? It seems like the contrastive filter should improve upon prior works (e.g. private evolution), which essentially ignore the inter-class relationship.
Is there a specific reason to consider pure-DP and not approx-DP? The privacy analysis of PCE seems to be a simple composition over iterations/classes, and using strong composition under approx-DP should lead to stronger results.
Which dataset(s) is the pre-trained backbone model trained on? Could there be an overlap with the sensitive datasets?

论据与证据

The claims are supported by empirical evidence.

方法与评估标准

The proposed methods and/or evaluation criteria seem reasonable.

理论论述

I verified the proof of privacy, which is straightforward.

实验设计与分析

I checked the experimental designs/analyses. My only concern is whether the data used to pre-train the backbone model has overlap with the sensitive datasets.

补充材料

I did not review the supplementary material since there are no proofs.

与现有文献的关系

The key contribution is a variation of private evolution for generating private synthetic image data in the few-shot learning case. Previous methods ignored inter-class relationships and just generated data for each class in parallel based on similarity scores between the synthetic data and private data. By applying a contrastive filter before the similarity scores, this work is able to better leverage inter-class information.

遗漏的重要参考文献

N/A

其他优缺点

N/A

其他意见或建议

N/A

作者回复

2025-03-31

We sincerely appreciate your time and effort in reviewing our work! Below, we provide responses to address your concerns and suggestions, with Lines xxx and Section x referring to specific parts of our paper. We hope these clarifications effectively resolve your concerns, and we also thank you for your constructive feedback!

Concern 1: Could the pre-trained dataset of a generative model overlap with sensitive datasets?

We verified that the following pre-trained datasets of our generative models do not overlap with our private few-shot data by cross-referencing unique identifiers and metadata, and using hash comparisons. Specifically:

Stable Diffusion (SD) is pre-trained on the public multi-modal dataset LAION-2B [1].
SD+IPA is pre-trained on the same LAION-2B dataset, supplemented with the public multi-modal dataset COYO-700M.
OpenJourney is pre-trained on LAION-2B and an additional set of over 100K images from Midjourney v4 (a commercial image generative model API).

Reasonably, the images we used to construct the few-shot specialized domains (medical and industrial images) are originally images without corresponding textual data and, therefore, cannot be added to these multi-modal datasets.

[1] Schuhmann, Christoph, et al. "Laion-5b: An open large-scale dataset for training next generation image-text models." NeurIPS 2022.

Concern 2: Is it correct that we can generate as many images as we want after the final iteration of PCE? If so, does generating more synthetic images enhance downstream classification performance?

The answer to the first question is YES. We can generate as many images as needed using PCE.
The answer to the second question is NO. We have demonstrated this in Section CAS w.r.t. N-Shot Synthetic Data and Figure 4, where we show that the utility of the synthetic dataset decreases when generating more than 150 images per class, given noisy data generation APIs, a fixed privacy budget ϵ*, and cost in few-shot private data scenarios. Specifically:
1. Noise: As the number of synthetic images increases, the valuable information plateaus due to the limited few-shot private data, while noise continues to increase. Since current generative models cannot produce content free of noise, finding a balance is essential to optimize the utility of the synthetic dataset.
2. Privacy: PCE's privacy protection relies on differential privacy, constrained by a given privacy budget ϵ*. Each PCE iteration consumes a portion of the privacy budget, so PCE will stop once the total privacy budget exceeds ϵ*, meaning that PCE cannot run indefinitely. Thus, the total number of synthetic images is finite when the number of synthetic images is fixed in each PCE iteration.
3. Cost: Each image generation requires a request to the image generation API service, which incurs a corresponding cost. If cost is not an issue, it is possible to generate as many images as needed in each PCE iteration.

Suggestion 1: Performance on a Private Dataset with More Classes but Fewer Images per Class to Further Show the Advantage of the Contrastive Filter

We thank you for the suggestion to consider a private dataset with more classes to highlight the advantage of our contrastive filter in our PCE.

Private datasets in specialized domains typically have only a few classes, such as positive vs. negative, which is common in fields like medicine. The few-shot issue is particularly prevalent in these specialized domains. After a thorough survey of the literature, we selected four representative specialized datasets used in our paper, each of which has a limited number of classes.
We have included the Cifar100 dataset in R-Table 1 with other settings fixed, where our PCE outperforms PE by 10.46% in accuracy. This improvement is greater than that observed with existing datasets in Table 1, demonstrating the advantage and adaptability of our contrastive filter, even though Cifar100 is not from a specialized domain.

R-Table 1: Top-1 accuracy (%) on Cifar100 (10 private samples per class)

	Accuracy (%)
PE	24.23
Our PCE	34.69

Suggestion 2: Consider Using Approx-DP Instead of Pure-DP for Stronger Composition and Improved Results

We thank you for the suggestion of considering approx-DP to further improve our PCE.

While stronger privacy budget compositions for approx-DP may exist, we chose to use the same composition method as the baseline PE for a fair comparison when calculating the total privacy budget. Using pure-DP in PCE ensures that utility improvements are attributable solely to our methodological innovations rather than differences in privacy accounting. Additionally, pure-DP can provide stronger differential privacy guarantees than approx-DP under the same $\epsilon$ . Improving PCE with approx-DP could be explored in the future.

审稿意见

评分: 42025-03-12

This paper proposes Privacy Contrastive Evolution (PCE), a new algorithm for generating high-quality differentially private (DP) synthetic images from small amounts of private data using a generative API. PCE addresses the limitations of existing Privacy Evolution (PE) algorithms, which struggle with high noise levels in small datasets. PCE introduces a contrastive filter to enhance class discrimination and an adaptive exponential mechanism with a similarity calibrator to improve the quality of selection under DP constraints. Experiments on four professional datasets (e.g., medical and industrial) show that PCE outperforms PE and other baselines, achieving up to 5.44% accuracy improvement on downstream tasks while maintaining strong privacy guarantees.

给作者的问题

Does the method have inherent limitations in natural images or low- $\epsilon$ regimes?

论据与证据

The paper's key claims are generally supported by evidence, but some limitations merit discussion:

Experiments focus on specialized domains (medical/industrial). Generalization to broader domains (e.g., natural images) remains unverified.
Total $\epsilon_* = 8-10$ (Tab. 6) might be not strict enough for strict DP applications.

方法与评估标准

The proposed methods are well-suited to the problem. PCE addresses PE’s limitations in few-shot scenarios through a Contrastive Filter that leverages inter-class relationships and an Adapted EM with a utility function $u = h \circ g$ normalized to sensitivity ( $\Delta_u = 1$ ), reducing noise dominance.

PCE’s effectiveness is validated through performance gain over PE and other baselines (Table 1). Ablation studies (Table 3) confirm that both components ( $g$ and $h$ ) are essential, supporting the method's relevance to the task.

理论论述

Yes, the theoretical claims are technically sound. Class-center aggregation reduces bias from boundary samples, while the similarity calibrator ensures that the utility function $u \in [1]$ for stable EM sampling. Additionally, the sequential composition guarantees total $\epsilon_*$ -DP, confirming the soundness of the approach.

实验设计与分析

The experimental design is generally sound but has areas for improvement.

Generalization could be strengthened by testing on larger-scale or natural image datasets (e.g., ImageNet subsets).
Privacy verification lacks empirical attack results (e.g., membership inference) to validate the theoretical analysis.
Additionally, the computational cost of 20 iterations may be high for some clients, though the API-based design helps offset local compute limitations.

补充材料

Yes. I especially look atthe impact of different encoders (ResNet-18, CLIP) on PE and PCE performance and the impact of varying privacy costs on PE and PCE performance.

与现有文献的关系

This paper builds on [1], whose similarity voting approach failed in few-shot settings due to noise dominance from the Gaussian mechanism’s high sensitivity. By adopting the Exponential Mechanism, this work improves noise efficiency in DP distillation.

[1] Lin, Z., Gopi, S., Kulkarni, J., Nori, H.,and Yekhanin, S. Differentially private synthetic data via foundation model apis1: Images. In International Conferenceon Learning Representations (ICLR), 2024.

遗漏的重要参考文献

As far as I know, no related works that are essential to understanding the (context for) key contributions of the paper, but are not currently cited/discussed in the paper.

其他优缺点

Strengths

First work to address few-shot DP synthesis via APIs.
Enables privacy-preserving medical/industrial AI with SOTA performance.
Releases code and provides full API/dataset specifications and hyperparameter details.

Weaknesses

Tested only on specific datasets, so it’s unclear if the results would apply to more general datasets like ImageNet.
Only evaluated at high privacy levels ( $\epsilon \geq 4$ ), so it's unclear if the method would work well with stronger privacy ( $\epsilon \leq 1$ ).

其他意见或建议

It would be better if authors can demonstrate the effectiveness of the method on natural images (e.g., ImageNet subsets).

作者回复

2025-03-31

We sincerely appreciate your time and effort in reviewing our work! Below, we provide responses to address your concerns, with Lines xxx and Section x referring to specific parts of our paper. We hope these clarifications effectively resolve your concerns.

Concern 1: Generalization to Broader Domains

As mentioned in Section Introduction, we focus on the few-shot issue, which is a critical challenge especially in specialized domains such as medical fields. Therefore, we prioritize these domains to showcase the value of our PCE. Furthermore, publicly available specialized pre-trained large models are scarce, making PCE— which relies solely on general pre-trained large model APIs— even more valuable. Our PCE can be applied to any domain to address the few-shot issue, as the underlying concept of using additional inter-class contrastive information is general and adaptable.
We include a dataset from the natural domain with a larger number of classes while keeping other settings fixed and demonstrate that our PCE can also generalize to the natural domain in R-Table 1.
Large-scale datasets are out of scope, as our paper focuses on few-shot scenarios.

R-Table 1: Top-1 accuracy (%) on Cifar100 (10 private samples per class)

	Accuracy (%)
PE	24.23
Our PCE	34.69

Concern 2: A Larger ϵ* Range Such as ϵ* ≤ 1

We follow PE in using ϵ* near 10 (see Section 5.1.2 in PE's paper) for specialized domains like medicine to ensure a fair comparison.
We have provided results for different ϵ* values ranging from 4 to 20 in Appendix B, showed the robustness of our PCE, and analyze their impact on the privacy-utility trade-off. As shown in Table 6, ϵ*=8/10 emerges as the optimal choice for balancing privacy and utility, where the accuracy decreases greatly when ϵ*<8 but increases slightly when ϵ*>10, since the private information is limited for few-shot scenarios.
We include results with a wider range of ϵ* in R-Table 2, demonstrating that our PCE consistently outperforms PE.

R-Table 2: Top-1 accuracy (%) with more values of ϵ on Camelyon17*

	0.01	0.1	1	100
PE	55.41	55.83	58.65	66.72
Our PCE	60.02	61.78	65.38	72.68

Concern 3: Empirical Attack Results

First of all, Differential Privacy (DP) techniques, such as the Exponential Mechanism (EM), have been widely proven to resist empirical attacks [1]. Since the privacy protection ability of our PCE is supported by DP, implemented via the EM in PCE, our approach is guaranteed to resist empirical attacks. Our primary focus is on addressing the few-shot issue when applying DP techniques.
We incorporate a membership inference attack (MIA) for COVIDx, where the attack model achieves 50.86% success with DP and 65.27% without, showing PCE's protection ability to such attacks. Specifically, we select shadow data from the public covid-chestxray-dataset, which is similar to COVIDx, and train an attack model until convergence using the ResNet-18 [2] architecture with randomly initialized parameters and default settings of SGD.

[1] Nasr, Milad, Reza Shokri, and Amir Houmansadr. "Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning." IEEE symposium on security and privacy (SP), 2019.

[2] He, Kaiming, et al. "Deep residual learning for image recognition." CVPR 2016.

Concern 4: Computational Cost on Clients

Our PCE can be applied to real scenarios across resource-constrained clients (Line 258), as the client-side computational cost is negligible, limited to feature extraction (with a small encoder and few-shot private data) and minimal distance calculations. R-Table 4 further demonstrates this. The API service requires no local computational resources.

R-Table 3: Total time cost (seconds) under 20 iterations.

	COVIDx	Came17	KVASIR-f	MVAD-l
Our PCE	11.17	9.59	18.68	17.82

审稿人评论

2025-04-06

Thank you for the detailed rebuttal. The authors have addressed the key concerns well. Overall, the rebuttal significantly improves the submission. I’m upgrading my score.

作者评论

2025-04-07

Thank you very much for your thoughtful reviews and encouraging words. We're truly grateful that you took the time to read our rebuttal carefully and that our responses addressed your concerns. Your support and feedback mean a lot to us.

审稿意见

评分: 42025-03-13

This paper introduces a new method for generating synthetic images under differential privacy called Private Contrastive Evolution (PCE). This method is designed for the case of few-shot private data, which is prevalent in healthcare, using a generative model behind an API. PCE works by initializing a synthetic image dataset using a text-to-image API from the class labels. This synthetic dataset is iteratively updated for a fixed number of rounds by privately selecting an image from the synthetic dataset for each class that mostly closely resembles the private images and updating the synthetic dataset to more closely resemble the selected images. The authors conduct experiments on various datasets within the problem domain and find that PCE outperforms the existing API-assisted baseline: Private Evolution (PC).

给作者的问题

It would be interesting to know at roughly which dataset size does PE overtake PCE. Or in cases of class imbalanced data e.g. healthy images vs sick images, should the larger class use the Gaussian mechanism applied to a voting histogram and the smaller class use the exponential mechanism as implemented by PCE?

论据与证据

The proposed method PCE outperforms the competitor method PE and several data-independent baseline methods across four image datasets for the few-shot case. At present, these experiments consider performance at a single privacy budget (either $\epsilon = 8$ or $\epsilon = 10$ ). The paper would be improved by considering comparative performance on additional privacy budgets. It is typical consider budgets such that $\epsilon \in (0.1, 10)$ . Since this is in the image domain, it may be worth considering $\epsilon \in (0.01, 100)$ .

方法与评估标准

The proposed methods and datasets are appropriate for the problem.

理论论述

The privacy proof and the definitions appear correct. However, it appears that the method and can be strengthened by applying parallel composition to the exponential mechanism step at each iteration. This is possible because the private images are partitioned by class and only those of a given class are used in the exponential mechanism. In this case, you still need to sequentially compose the iterations of the algorithm but can parallel compose the class iterations. With respect to the privacy analysis, you can achieve the same DP guarantee by setting the $\epsilon = \frac{\epsilon_*}{T}$ rather than $\epsilon = \frac{\epsilon_*}{TC}$ .

实验设计与分析

The experiments appear well-designed but could be improved by considering additional privacy budgets. For the image domain, it may be reasonable to consider budgets in the range of $\epsilon \in (0.01, 100)$ .

补充材料

I reviewed appendices A-E.

与现有文献的关系

The proposed method improves on the Private Evolution method from Lin et al. 2024 for the case of few-shot private synthetic image generation using APIs. This paper does not introduce new methods into the broader literature but successfully applies known building blocks to a specific problem (private synthetic image generation) to improve on existing approaches.

遗漏的重要参考文献

I am not aware of essential references that are not discussed.

其他优缺点

Figure 2 was helpful to understand Alg. 1.

其他意见或建议

Some typos:

Private Evaluation (PE)

Line 86, 2nd column: Should read "Private Evolution"

that near distribution boundary

Line 155. 2nd column

作者回复

2025-03-31

We sincerely appreciate your time and effort in reviewing our work! Below, we provide responses to address your concerns and suggestions for extensions, with Lines xxx and Section x referring to specific parts of our paper. We hope these clarifications effectively resolve your concerns. We also thank you for your creative suggestions for extensions!

Concern 1: Additional Privacy Budgets for ϵ*

We follow PE in using ϵ* near 10 (see Section 5.1.2 in PE's paper) for specialized domains like medicine to ensure a fair comparison.
We have provided results for different ϵ* values ranging from 4 to 20 in Appendix B, showed the robustness of our PCE, and analyze their impact on the privacy-utility trade-off. As shown in Table 6, ϵ*=8/10 emerges as the optimal choice for balancing privacy and utility, where the accuracy decreases greatly when ϵ*<8 but increases slightly when ϵ*>10, since the private information is limited for few-shot scenarios.
We include results with a wider range of ϵ* in R-Table 1, demonstrating that our PCE consistently outperforms PE.

R-Table 1: Top-1 accuracy (%) with more values of ϵ on Camelyon17*

	0.01	0.1	1	100
PE	55.41	55.83	58.65	66.72
Our PCE	60.02	61.78	65.38	72.68

Concern 2: At what dataset size does PE overtake PCE?

We analyzed the impact of the few-shot dataset size ( $K$ ) in Section CAS w.r.t. K-Shot Private Data and Figure 3, finding that PCE maintains superiority over PE when $K \leq 100$ . In R-Table 2, we extend our analysis to a larger $K$ and observe that PE outperforms PCE slightly (0.51%) when $K \ge 1000$ . However, $K \ge 1000$ does not represent a few-shot setting, while PCE is best suited for few-shot scenarios.

R-Table 2: Top-1 accuracy (%) with more private data (large $K$ ) on KVASIR-f

	$K=500$	$K=1000$
PE	61.32	65.53
Our PCE	62.47	65.02

Suggestion for Extension 1: Parallel Composing for DP

We really appreciate your suggestions on strengthen our method with parallel composition.

However, since our PCE addresses the few-shot issue by leveraging inter-class contrastive information through the contrastive filter ( $g$ ), it requires access to private data across all classes when evaluating each synthetic sample, making parallel composition challenging. Therefore, we believe that parallel composition deserves a separate paper in the future.

Suggestion for Extension 2: Combining PE and PCE for Class Imbalance Scenarios

We thank you again for your suggestions on combining PE and PCE for class imbalanced data.

Following your suggestion, for class imbalanced scenarios, we combine PE and PCE, terming it PE+PCE. Specifically, in PE+PCE, PE is applied to the majority private classes, while PCE is used for minority (few-shot) classes. We compare PE, PCE, and PE+PCE in a class imbalanced experiment as follows. Specifically, we design an imbalanced dataset with 1,000 samples from class 0 (normal) and 10 samples from class 1 (breast cancer with tumor tissue) based on the Camelyon17 dataset, while keeping all other settings unchanged (still generating 100 samples per class). The results can be found in R-Table 3.

In this class-imbalanced experiment, both PE and PCE perform better with more private data, but PCE still outperforms PE. Since one class remains few-shot, PE suffers from noise overwhelming the actual votes, leading to nearly random similarity voting and selection for the synthetic data in this minority class. In contrast, PCE alleviates this few-shot issue, providing more informative scores to evaluate the synthetic data. By leveraging PE's advantage in majority private classes and using PCE to address the issues in minority (few-shot) classes, PE+PCE performs the best.

However, simply replacing PCE with PE for majority private classes limits improvement by losing inter-class contrastive information. Future work needs a more creative combination of PE and PCE to fully leverage their advantages.

R-Table 3: Top-1 accuracy (%) on class-imbalanced Camelyon17

	Accuracy (%)
PE	72.36
PCE	74.58
PE+PCE	75.39

Suggestion for Typos

Thank you for pointing out the two typos. We will correct them in the revised version.

审稿意见

评分: 42025-03-25

The authors present an interesting approach to an API-assisted algorithm called Private Contrastive Evolution (PCE) to address the challenge of generating high-quality differentially private (DP) synthetic images from few-shot private data using generative APIs.

The authors introduce a contrastive filter to exploit inter-class contrastive relationships within the few-shot private data, enhancing the class-discriminability of synthetic images.

Another interesting point is that the exponential mechanism was adapted to preserve private inter-class contrastive relationships, addressing the excessive noise issue from the high sensitivity of the Gaussian Mechanism. A similarity calibrator is designed to prioritize high-quality synthetic data that closely resembles private data.

给作者的问题

I have made some comments and suggestions in the previous analysis. If I made some mistake or missed some point, please let me know.

论据与证据

In relation to the claim about the effectiveness of the "few-shot" problem in differentially private synthetic image generation. Although the numerical results show significant improvement over the baseline (PE), I believe the paper does not present sufficiently detailed evidence regarding the isolated impact of the contrastive filter versus the similarity calibrator. I suggest an ablative analysis to determine which component of the method actually solves or mitigates the few-shot problem.

The broad generality and adaptability to different generative APIs are also not clear as the tests performed are limited to three related APIs (Stable Diffusion, SD+IPA, OJ), all based on the same paradigm. I think there are no clear tests with structurally distinct or closed commercial APIs. That will be a very interesting and more comprehensive analysis.

One point I like about robust privacy protection is related to the use of differential privacy. I think the approach is theoretically sound, but the justification for the exact values of the privacy parameters (ϵ*) is superficial. I suggest the authors elaborate a better discussion about the specific choice of these values and their impact on the trade-off between privacy and utility.

方法与评估标准

I found the use of differential privacy mechanisms combined with contrastive (interclass) techniques and adapted exponential mechanisms really interesting, especially considering the context of the problem.

The selected datasets related to COVIDx (medical images), Camelyon17 (tumor tissues), KVASIR-f (endoscopic images), and MVtecAD-l (industrial defects) represent practical scenarios.

I agree that the metric used is well-established in the literature and makes sense. However, I miss other qualitative metrics (e.g., FID - Fréchet Inception Distance), visual analyses, or specific metrics for the visual quality of the images.

理论论述

I was especially interested in theorem 4.1 (Privacy analysis of the PCE method), which states that the PCE algorithm satisfies the differential privacy property with the total parameter ϵ. The presented proof uses the classical definitions of sequential composition and the exponential mechanism (EM), demonstrating that the repeated application of the exponential mechanism in each iteration of the algorithm satisfies the global differential privacy requirement due to the sequential composition of the mechanisms. Seems ok to me, but it could be improved with a more detailed discussion on the practical justification of the values chosen for the parameter ϵ. I am especially concerned about the trade-off between utility and privacy in the real context of the data used.

实验设计与分析

I tried to map some aspects of the experiment as:

selection of experimental datasets. The chosen are recognized benchmarks ensuring comparability.
selection of baselines. The comparisons are extensive and use recent methods such as PE (Lin et al., 2024), as well as variants (DPImg, RF, GCap, etc.).
metrics used for Top-1 accuracy are standard in the literature.
ablative and sensitivity studies help to understand some model parameters.

I miss qualitative and visual assessments, some kind of evaluation of the computational cost for real scenarios, and a detailed analysis of the privacy-utility trade-off as discussed previously.

补充材料

I used the supplementary material to understand some experiments.

与现有文献的关系

I found the main contribution to be directly aligned with the foundations of differential privacy, which introduced the formal concept of DP and privacy assurance mechanisms. The proposed method specifically uses the exponential mechanism and adapts it in an innovative way to the synthetic image generation scenario, which is very interesting.

The use of interclass contrastive relations, which is very important in the proposed method, is related to an established line of literature on contrastive learning and metric learning to allow learning robust representations with little available data.

遗漏的重要参考文献

I am ok with the references.

其他优缺点

The use of only quantitative metrics (Top-1 accuracy) to assess the results is critical as it may mask situations where synthetic images have high quantitative accuracy but low perceptual quality or insufficient diversity.

I miss a detailed justification for the values of differential privacy ϵ to understand the effectiveness of the privacy protection offered.

I liked the idea of using limited computing resources, but I missed the quantitative or qualitative assessment of the actual cost involved in multiple calls to external APIs. The evaluated APIs have the same paradigm. I was thinking about the performance of other approaches.

其他意见或建议

伦理审查问题

作者回复

2025-03-31

Concern 1: An Ablative Analysis

We have only two components: the contrastive filter ( $g$ ) and the similarity calibrator ( $h$ ). In Section Ablation Study (page 8), we have provided an ablative analysis of these two components. We remove each component individually to evaluate their effects. As shown in Table 3, eliminating either $g$ or $h$ results in a significant performance drop. $g$ and $h$ must function together to address the few-shot issue; neither can be used in isolation. While $g$ leverages inter-class contrastive information, it introduces a trade-off by sacrificing similarity measurement, which is then compensated by $h$ .

Concern 2: Other APIs for More Comprehensive Results

The reason for choosing Stable Diffusion, SD+IPA, and OJ is that diffusion models dominate image generative models, and these three are among the most widely used and recognized approaches [1].
We add a structurally distinct Transformer-based API based on FLUX.1-dev, along with the closed commercial API GPT-4o, and show our PCE's generality to these new distinct APIs in R-Table 1 and R-Table 2.

R-Table 1: Top-1 accuracy (%) with FLUX.1 API

	COVIDx	KVASIR-f
PE	52.38	45.67
Our PCE	58.58	55.83

R-Table 2: Top-1 accuracy (%) with GPT-4o API

	COVIDx	KVASIR-f
PE	62.32	56.32
Our PCE	70.41	64.52

[1] Croitoru, Florinel-Alin, et al. "Diffusion models in vision: A survey." IEEE Transactions on Pattern Analysis and Machine Intelligence 45.9 (2023).

Concern 3: Impact of Privacy Parameters ϵ*

We follow PE in using ϵ* near 10 (see Section 5.1.2 in PE's paper) for specialized domains like medicine to ensure a fair comparison.
We have provided results for different ϵ* values ranging from 4 to 20 in Appendix B, showed the robustness of our PCE, and analyze their impact on the privacy-utility trade-off. As shown in Table 6, ϵ*=8/10 emerges as the optimal choice for balancing privacy and utility, where the accuracy decreases greatly when ϵ*<8 but increases slightly when ϵ*>10, since the private information is limited for few-shot scenarios.
We include results with a wider range of ϵ* in R-Table 3, demonstrating that our PCE consistently outperforms PE.

R-Table 3: Top-1 accuracy (%) with more values of ϵ on Camelyon17*

	0.01	0.1	1	100
PE	55.41	55.83	58.65	66.72
Our PCE	60.02	61.78	65.38	72.68

Concern 4: Qualitative Metrics

Using CAS (top-1 accuracy) aligns with our goal. Our objective is to achieve high quantitative accuracy in downstream models, which directly reflects the value of synthetic data for these tasks. As mentioned in our paper (Lines 254-256), the CAS metric (top-1 accuracy) is widely used for assessing the quality of synthetic datasets in downstream tasks. This is also supported by prior influential work [2], which states that "traditional GAN metrics such as Inception Score [3] and FID are neither predictive of CAS nor useful when evaluating non-GAN models."
We have provided visual analyses with high perceptual quality in Section Synthetic Images (page 7). From Figure 6, our PCE can generate visually high-quality images that align with downstream tasks. The FID (↓) value on MVAD-l is 8.47 for PCE but 58.14 for PE.
Metrics for perceptual quality pose risks. While visually high-quality synthetic data may appear realistic, it can exhibit high similarity (low diversity) to private data, reducing its informativeness and utility for downstream tasks.
Our PCE is also effective in diversity metrics.
- Our PCE is designed to enhance diversity. The core of PCE lies in its contrastive filter ( $g$ ), which explicitly improves class discrimination.
- On MVAD-l, PCE achieves 12.51, whereas PE only reaches 82.17 in the Inception Score (↓).

[2] Ravuri, Suman, and Oriol Vinyals. "Classification accuracy score for conditional generative models." NeurIPS (2019).

[3] Chong, Min Jin, and David Forsyth. "Effectively unbiased fid and inception score and where to find them." CVPR 2020.

Concern 5: Computational Cost

R-Table 4: Total time cost (seconds) under 20 iterations.

	COVIDx	Came17	KVASIR-f	MVAD-l
Our PCE	11.17	9.59	18.68	17.82

最终决定Accept (spotlight poster)

2025-05-01

The paper addresses the challenge of generating private synthetic images with limited private data. All reviewers agree that the limitations in prior work such as Private Evolution is substantial in few-shot settings. The paper proposes an approach that uses contrastive signals and a modified Exponential Mechanism to enable more effective candidate selection.

Though there are concerns about the range of parameters evaluated in the experiments, the consensus among reviewers is that the algorithmic approach and the included experiments are sufficient to merit acceptance.