PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
4
4
4
5
3.5
置信度
创新性2.8
质量3.0
清晰度3.0
重要性2.8
NeurIPS 2025

BridgePure: Limited Protection Leakage Can Break Black-Box Data Protection

OpenReviewPDF
提交: 2025-05-10更新: 2025-10-29

摘要

关键词
Data protectionavailability attacksdiffusion bridge modelprotection removal

评审与讨论

审稿意见
4

This paper investigates the problem of protection leakage against black-box data protection methods. Specifically, it proposes BridgePure, which utilizes DDBM as a powerful purification algorithm to purify the perturbations used to protect the images. The key idea is to query the black-box data protection APIs and obtain a set of paired images, which are used to build this purification model. The authors conduct sufficient experiments to demonstrate the effectiveness of the proposed attack.

优缺点分析

Strengths:

  1. The paper is well-written and easy to follow
  2. The authors introduce a new threat model: the protection leakage attack against black-box data protection APIs.
  3. The authors conduct sufficient experiments on both classification (availability attacks) and generation (style mimicry protections) tasks.

Weaknesses:

  1. While the paper introduces a new threat model, the technical novelty remains limited. The core idea of using a purification model closely resembles existing adversarial purification methods. The main difference lies in leveraging prior knowledge of the black-box protection API during training, which is a relatively modest extension.
  2. The reported performance gains, even with 0.5K to 4K queried samples, appear incremental. As shown in Tables 1 and 2, the improvements over baselines are relatively small.
  3. The paper lacks a defense evaluation. The proposed attack is not tested against any existing or adaptive defenses, making it difficult to assess its robustness and practical impact.

问题

  1. How does BridgePure perform against adaptive defenses, particularly in light of the concerns raised in Weakness 3? Evaluating the method under such settings would provide a clearer picture of its robustness and practical utility.
  2. Can you clarify the novelty and significance of BridgePure, especially considering the concerns outlined in Weaknesses 1 and 2?

局限性

Yes

最终评判理由

I will keep my score as positive.

格式问题

Not applicable

作者回复

We thank the reviewer for the overall positive feedback. Here we address your concerns.

[W1&Q2:Technical Novelty]

We believe our work represents the first application of diffusion bridge models to the data protection domain. This adaptation is not straightforward for several reasons:

  • Diffusion bridge models were primarily designed for scenarios with drastic domain shifts, while data protection mechanisms typically introduce subtle, human-imperceptible perturbations rather than obvious domain changes. The effectiveness of bridge models in recovering clean data from such subtle perturbations was not guaranteed or obvious prior to our investigation
  • We introduce specific modifications to the diffusion bridge approach for this application, including a conditioning mechanism that balances protection removal and content preservation.
  • Our comprehensive analysis across different protection methods provides novel insights into the vulnerabilities of various protection schemes that were previously unexplored.
  • The fact that our approach effectively neutralizes these carefully designed protections demonstrates the non-obvious nature of our solution.

[W2&Q2: Performance Improvement]

We believe that BridgePure's performance improvements are not incremental, as achieving complete availability recovery is the adversary's ultimate goal, making the marginal utility of the final percentage points highly significant.

For example, previous methods could only restore CIFAR-100 dataset availability under EM protection from 6.73% to at most 67.76%, which remains far from the unprotected baseline of 74.27%. In contrast, our BridgePure-2K achieves 73.70%, nearly complete recovery. This performance improvement is highly valuable to adversaries but poses a critical threat to data protection systems.

[W3&Q1: Adaptive Defenses]

  • In the Appendix, we conducted experiments demonstrating various aspects of BridgePure's robustness, including different levels of protection leakage (App C.4), mixtures of multiple protection methods (App C.7), transferability across network architectures (App C.8), transferability across protection methods (App C.9), and transferability across data distributions (App C.10).
  • Regarding adaptive defenses, we want to clarify a key difference between data protection and adversarial attacks: in adversarial attacks, the defender receives the perturbed image and can process it further, while in data protection, the attacker receives the perturbed image and can process it further (after the defender can no longer modify it). The defender could potentially account for BridgePure during the initial protection crafting stage, but this would require differentiating through BridgePure—a significant technical challenge that has not been achieved in existing literature and represents an important direction for future work.
评论

We again thank the reviewer for the positive feedback and for reviewing our rebuttal. As the discussion period draws to a close, we would like to confirm that all your concerns have been addressed. If any questions remain, please do not hesitate to reach out - we are happy to provide further clarification!

审稿意见
4

This paper studies attacks against black-box tools that allow data owners to input their dataset and receive a protected version that is “unlearnable” by an ML model. The authors show that, given access to a small set of unprotected in-distribution data and their protected counterparts, an adversary can easily train a diffusion denoising bridge model that learns an inverse mapping, and which generalizes to unseen data from the same distribution. They call this approach BridgePure, and they show that it gives better results than prior work without requiring pretraining or fine-tuning a large model using a lot of data.

优缺点分析

Strengths

  • The authors give extensive experimental results on a range of image datasets for the classification tasks, with comparisons against multiple existing purification-based methods, where they show significant improvements.
  • This work shows a vulnerability in the existing protection APIs, which is important for the API providers to come up with ways to protect against these attacks.

Weaknesses

  • This approach assumes that the attacker can query the same protection API and has access to sufficiently many unprotected data from the same distribution. It is unclear whether these assumptions hold in practice. For instance, leaked images from the same distribution may have some noise or have a slightly different distribution, and it is unclear how these differences may affect the effectiveness of BridgePure.
  • This work only focuses on image datasets, and it is unclear whether BridgePure works with other types of data.

问题

  • Could the work be generalized to non-image datasets?
  • How robust is BridgePure to small variations in the unprotected data from the original distribution?

局限性

yes

最终评判理由

The authors added some clarifications about W1 but did not address W2, hence I will keep my score.

格式问题

None.

作者回复

We thank the reviewer for the overall positive feedback, here we address your concerns.

[W1: Assumption]

(1) Noisy datasets: Real-world datasets such as ImageNet, Pets, Cars, and WebFace contain natural variations including blurring, occlusion, complex backgrounds, various lighting conditions, and even mislabeled categories as noted in [a]. Our experiments on these datasets (Tables 2, 3) demonstrate that BridgePure is robust to the noise inherent in real-world image collections.

(2) Distribution shift: We also address potential concerns about distribution mismatch between BridgePure's training data and the protected target data. In Appendix C.10, we systematically analyze BridgePure's transferability across data distributions for both classification and style mimicry tasks. For classification tasks, BridgePure underperforms on most availability attacks when the training data distribution does not match the target distribution (e.g., between CIFAR-10 and CIFAR-100). However, for style mimicry tasks, BridgePure transfers effectively across different artistic styles (e.g., between @nulevoy and Claude Monet).

[W2: Beyond Image Datasets]

We focus on gradient-based methods, which are most naturally applied to continuous domains like images. We agree that extending data protection to discrete domains such as text, tabular data, or voice is an interesting direction for future work.

[a] Northcutt C G, Athalye A, Mueller J. Pervasive label errors in test sets destabilize machine learning benchmarks, 2021.

评论

Thank you for the clarifications. I will keep my score.

评论

We sincerely thank the reviewer for the prompt response and hope your concerns have been sufficiently addressed. If any questions remain, please do not hesitate to reach out, and we will be happy to explain further.

审稿意见
4

The paper discusses the vulnerability of protection leakage in black-box data protection systems, which arises when an adversary has access to the black-box API, introducing a novel threat model. The authors propose a model, BridgePure, that exploits a limited number of leaked data pairs to reverse data protection mechanisms. BridgePure trains a diffusion-based model to map the protected data back to its original, unprotected form. Experiments show that BridgePure outperforms existing purification methods, even when the number of leaked data pairs is minimal.

优缺点分析

Strengths

  • The paper introduces a novel threat model that includes access to the black-box APIs of protection methods and demonstrates an unignorable privacy breach under such assumption.
  • The paper provides extensive experiments demonstrating that BridgePure consistently outperforms traditional purification methods across multiple datasets.
  • The paper is overall well-presented, making it easy to follow.

Weaknesses

  • The technical contribution is somewhat limited. BridgePure appears to be built on existing work with Diffusion Denoising Bridge Models [1], incorporating a Gaussian noise trick to address overfitting.
  • The paper assumes that the additional dataset is sampled from the same distribution as the attacked dataset, which is a relatively idealized assumption. When this condition does not hold, the performance of BridgePure is not well evaluated. Specifically, there is a lack of comparison with other baselines in such a case, such as baselines that rely on no additional data or large amounts of additional data. The robustness requires further validation.

[1] L. Zhou, A. Lou, S. Khanna, and S. Ermon. “Denoising Diffusion Bridge Models”. In: The Twelfth International Conference on Learning Representations. 2024.

问题

  • In what way is BridgePure technically novel compared to other works, or what specific technical contributions does this paper make? Please provide further clarification.

  • I would be willing to adjust my score if the authors address my concerns.

局限性

Yes.

最终评判理由

The rebuttal primarily clarifies the technical novelty and justifies the evaluation of transferability across distributions, thereby providing a deeper understanding of the significance of this work. Considering these points, I've updated my score to a borderline accept.

格式问题

No major formatting issues.

作者回复

We thank the reviewer for recognizing our contribution. Here we address your comments.

[W1: Technical Novelty]: We believe our work represents the first application of diffusion bridge models to the data protection domain. This adaptation is not straightforward for several reasons:

  • Diffusion bridge models were primarily designed for scenarios with drastic domain shifts, while data protection mechanisms typically introduce subtle, human-imperceptible perturbations rather than obvious domain changes. The effectiveness of bridge models in recovering clean data from such subtle perturbations was not guaranteed or obvious prior to our investigation.
  • We introduce specific modifications to the diffusion bridge approach for this application, including a conditioning mechanism that balances protection removal and content preservation.
  • Our comprehensive analysis across different protection methods provides novel insights into the vulnerabilities of various protection schemes that were previously unexplored.
  • The fact that our approach effectively neutralizes these carefully designed protections demonstrates the non-obvious nature of our solution.

[W2: Assumption and Comparison]:

  • [Baselines without additional data] The comparison in Table 1 includes two baseline methods, i.e., PGD-AT and D-VAE, that rely on no additional data. Note that other baseline methods, including AVATAR, LE-JCDP, and DiffPure, use diffusion models that are pre-trained (and then fine-tuned) with large amounts of in-distribution unprotected data.
  • [Real-world datasets] Real-world datasets, such as ImageNet, Pets, Cars, and WebFace, may contain blurring, occlusion, complex backgrounds, various lighting conditions, and even some mislabeled categories as pointed out in [a]. Our experiments on these datasets (Tables 2, 3) demonstrate that BridgePure is robust to the noise introduced by this level of real-world image collection.
  • [Transferability across distributions] In Appendix C.10, we systematically discuss the transferability of BridgePure across data distributions for both classification and style mimicry tasks. For classification tasks, we found that BridgePure underperforms on most availability attacks when the distribution of additional data does not match that of the data to be purified, e.g., between CIFAR-10 and CIFAR-100. However, for style mimicry tasks, BridgePure performs better when transferring across different artists' painting styles, e.g., between @nulevoy and Claude Monet.

[a] Northcutt C G, Athalye A, Mueller J. Pervasive label errors in test sets destabilize machine learning benchmarks, 2021.

评论

I appreciate the authors' further explanations. They strengthen the case for the work's value. I am willing to improve my score to 4.

评论

We sincerely thank the reviewer for the reply and for recognizing our contribution! We are pleased that we could address your concerns. Please don't hesitate to reach out if you have any additional questions.

评论

Dear reviewer, thank you very much for your engagement, your kind words, and your willingness to increase your score to 4. As far as we can see in the system, your score is still displayed as a 3. If we're not mistaken, it would be appreciated if you can update this in the OpenReview system to help the AC in performing their duties.

审稿意见
5

This paper addresses the problem of the protecting public datasets from being used for training machine learning models through the use of availability attacks. Specifically, they investigate how online black-box models for creating unlearnable examples can be exploited to create a small training set of (clean, protected) data pairs. These training examples can be used to train diffusion model that can recover an original data point from a given unlearnable example. Their empirical results show that their approach train a recovery model using substantially fewer training examples than state-of-the-art methods in the literature.

优缺点分析

Strengths

The paper presents a new threat model to methods that aim to provide unlearnable examples from given clean example. This threat model considers how online services for generating unlearnable examples can be used to construct training data, which in turn can be used to learn a model of how to reverse the protections provided to unlearnable examples.

They present a novel algorithm for training a purification model to attack unlearnable examples, based on the theory of diffusion models.

They demonstrate that their approach can effectively attack methods for creating unlearnable examples by using a much smaller set of training data than existing methods.

Weaknesses

While they provide some discussion of possible defences against such a protection leakage attack, they do not provide any concrete defence strategies that could be followed by users.

问题

Have you tested the sensitivity of the performance of the methods in [9] and [25] to the number of clean training examples that they use?

While there is some discussion in Section C.1 of the computation time for your approach, can you provide any comparison or analysis of time required by the other baseline methods [9] and [25]?

Can you provide more details on possible countermeasures against your protection leakage attack?

局限性

Yes

最终评判理由

My main comment was about possible defences. The authors have addressed this at a high level. Overall, I support this paper's acceptance.

格式问题

None

作者回复

We thank the reviewer for the overall positive feedback! Here we address your concerns:

[W&Q3: Defense Mechanism]:

  • Indeed, in Appendix D.1, we focus on countermeasures that protection service providers can implement to prevent malicious adversaries from training an effective BridgePure model. These providers are obligated to deliver responsible, reliable, and robust data protection in line with their claims. Specifically, we encourage protectors to consider BridgePure threats when crafting images and adopt both system-level and algorithmic approaches.

  • On the user side, the most straightforward defense is to minimize leaking unprotected images and choose trustworthy protection service providers. We also believe the following directions show promise as additional defense strategies:

    (a) Preemptive defenses: Users can develop adversarial preprocessing techniques that make their images inherently resistant to BridgePure attacks. One approach could involve gradient-based methods that require differentiating through the stochastic purification process, presenting an interesting direction for future research.

    (b) Strategic decoys: Users can strategically release crafted decoy images designed to poison the attacker's training data, reducing BridgePure's effectiveness on their actual protected content. Potential approaches include gradient-based perturbation search, visible or invisible watermarking, and inducing significant domain shifts, among others.

    Of course, the effectiveness of these mitigation strategies requires further investigation, and we look forward to investigating further in future work.

We will add the above discussion to our draft.

[Q1&Q2: Comparison with AVATAR[9] and LE-JCDP[25]]

  • We did not retrain the diffusion models from [9] and [25] as both repositories released pre-trained checkpoints rather than detailed training scripts.

  • Regarding sensitivity to the number of training samples, we followed the experimental settings from [9] and [25] using their released models: [9]'s model is trained on 50K unprotected images while [25]'s model is pre-trained and then fine-tuned on 10K unprotected images. We believe our test case represents typical practical settings.

    Notably, our BridgePure-0.5K (using only 500 paired images) outperforms both baselines on 7 of 9 attacks for CIFAR-10 and surpasses [9]/[25] on 8/6 of 9 attacks respectively for CIFAR-100. BridgePure can outperform both methods using at most 4K pairs in the remaining cases.

  • Regarding runtime, neither [9] nor [25] disclosed the computational cost of diffusion model training. Unfortunately, reproducing their training time is not feasible due to computational costs and a lack of implementation details.

评论

Thank you for your clarifications to my questions. As you suggest, it would be good to add the discussion on defences into the paper if it is accepted. I think my score is appropriate given the rebuttal.

评论

We again thank you for your positive feedback and are delighted that we could address your concerns! We will incorporate the additional discussion on defenses into the final draft if accepted.

最终决定

"BridgePure: Limited Protection Leakage Can Break Black-Box Data Protection" describes a vulnerability that arises when users (e.g. artists) are using black-box APIs to attempt to make their data points unlearnable. An adversary is able to query the API with their own unprotected dataset and, with a limited number of queries, is able to train a type of image-to-image diffusion model to compltely undo the protection.

Reviewers unanimously consider this work to be interesting, and the threat to be noteworthy. While the solution to use learned image-to-image models mirrors other diffusion models employed for e.g. adversarial perturbation, or paraphrasing models in language domains, and it was pointed out by reviewers to be not totally unexpected, the submission precisely delimits the threat model, evaluates the approach very clearly and without the need of unnecessary complexity, shows that this attacks works - especially when facilitated through diffusion bridge models, and should be strongly considered in any future work on defenses.

From my side, while the submission discusses related work on purification schemes, I would have liked to see a few more simple baseline comparisons, using e.g. JPEG compression, which can help to better place the achieved purification gains in a context of familiar image operations. Reviewers also point out a few remaining limitations, such as questions regarding defenses against this type of attack, and generalizability to other domains, but the submission is complete enough as it stands, as these limitations may be the focus of future work.