6.3

/10

Rejected4 位审稿人

最低3最高8标准差2.0

3.8

置信度

正确性3.0

贡献度3.0

表达2.0

ICLR 2025

Auditing $f$-Differential Privacy in One Run

Saeed Mahloujifar,Luca Melis,Kamalika Chaudhuri

OpenReview PDF

提交: 2024-09-28更新: 2025-02-05

TL;DR

We use trade-off functions to perform tighter auditing of algorithms designed to satisfy differential privacy in a single run.

摘要

Empirical auditing has emerged as a means of catching some of the flaws in the implementation of privacy-preserving algorithms. Existing auditing mechanisms, however, are either computationally inefficient -- requiring multiple runs of the machine learning algorithms —- or suboptimal in calculating an empirical privacy. In this work, we present a tight and efficient auditing procedure and analysis that can effectively assess the privacy of mechanisms. Our approach is efficient; similar to the recent work of Steinke, Nasr, and Jagielski (2023), our auditing procedure leverages the randomness of examples in the input dataset and requires only a single (training) run of the target mechanism. And it is more accurate; we provide a novel analysis that enables us to achieve tight empirical privacy estimates by using the hypothesized $f$-DP curve of the mechanism, which provides a more accurate measure of privacy than the traditional $\epsilon,\delta$ differential privacy parameters. We use our auditing procure and analysis to obtain empirical privacy, demonstrating that our auditing procedure delivers tighter privacy estimates.

关键词

Differential privacyAuditing privacy

评审与讨论

审稿意见

评分: 8置信度: 32024-10-30

This paper proposes a computationally efficient privacy auditing procedure by leveraging the f-DP curve, and shows that the resulting lower bounds are tighter than those of previous work.

优点

The paper is well-motivated and, for the most part, clearly written. It provides a notable improvement over prior privacy auditing techniques.

缺点

The paper contains some ambiguities and cosmetic errors that should be addressed to improve clarity and overall presentation.

clarify that by "one run" you mean a training run (rather than an inference run)
explicitly state the limitation of Steinke et al. (2023) that you are addressing (in Line 80-82)
change the references to algorithm 3.1 to algorithm 3 (given that that is what the algorithm is called)
remove double mentions of Steinke et al. by just using the reference instead (e.g., in Line 420)
fix the reading flow in Definition 6 (second bullet point is not a full sentence)
correct typos (e.g., Line 194/195, 307, 466, 505/506) and wrong capitalizations in the middle of sentences (e.g., Line 100)

问题

What do you mean by "gubernatorial analysis"? (Line 95)
Do you have an intuition why the bound in Steinke et al. (2023) degrades with higher numbers of canaries while your bounds continue to improve?

评论- Thank you for your constructive feedbacks

2024-11-22

We thank the reviewer for their careful reading of the paper and constructive feedback.

clarify that by "one run" you mean a training run (rather than an inference run)

We clarified in the abstract and also added a footnote to clarify whenever we say one run, we mean one training run. (Line 106) explicitly state the limitation of Steinke et al. (2023) that you are addressing (in Line 80-82). We explicitly discuss the limitation now. Here’s our updated paragraph: “Steinke et al. highlighted a limitation in their approach in auditing specific mechanisms, such as the Gaussian mechanism. They correctly argue that simplifying the mechanism's behavior to just two parameters, $(\epsilon,\delta)$ , results in sub-optimal auditing of specific mechanisms. In other words, the effectiveness of membership inference attacks against the Gaussian mechanism differs significantly from predictions based solely on the $(\epsilon,\delta)$ parameters. To overcome this limitation, we propose auditing the entire privacy curve of a mechanism, rather than focusing solely on $(\epsilon,\delta)$ . “

change the references to algorithm 3.1 to algorithm 3 (given that that is what the algorithm is called)

We fixed this problem. Thanks for your careful reading.

remove double mentions of Steinke et al. by just using the reference instead (e.g., in Line 420)

We corrected our references.

Reading flow and typos:

We enhanced the writing and also reading flow in the paper. We thank the reviewer for their suggestions!

What do you mean by "gubernatorial analysis"? (Line 95)

This was a typo (created by autocorrect!). We meant “combinatorial analysis”. This refers to the part of analysis where we randomly shuffle the order of canaries and use the fact that this ordering and perform a double counting argument.

Do you have an intuition why the bound in Steinke et al. (2023) degrades with higher numbers of canaries while your bounds continue to improve?

We have a discussion about this in the experiment section. See page 10 line 490). The reason the bound of Steinke et al degrades as the number of samples increases beyond a certain point is because of the way their bound depends on the $delta$ term. Specifically, they have an $O(m\delta)$ term in their upper bound that starts to dominate as $m$ increases. For us, this effect does not happen because our analysis does not rely on $delta$ at a single $\epsilon$ to bound the probability of bad events. Our reliance on $f$ -curve enables us to keep improving with more canaries.

2024-11-26

My comments have been adequately addressed and I believe the paper meets the criteria for acceptance.

2024-12-03

Thank you for again your feedback and for taking the time to review the paper. We are glad to hear that your comments have been adequately addressed.

审稿意见

评分: 3置信度: 52024-11-04

The paper presents a novel algorithm for auditing differential privacy (DP) mechanisms in a single run, building upon and extending the work of Steinke et al. (2023). By leveraging the theoretical framework of f-Differential Privacy (f-DP), the authors provide tighter lower bounds on privacy leakage, thereby enhancing the existing toolbox for f-DP analysis. Notably, their auditing algorithm can be applied to various adversaries beyond the standard membership inference, such as reconstruction attacks.

优点

Advancement of f-DP Tools: The paper contributes to the understanding and practical application of f-DP, which could be of independent interest.
Interesting Problem: Auditing DP mechanisms in a one-run scenario is interesting for practical implementations (particularly in the black-box scenario, see the weakness section), and the paper makes significant progress in this area.
Experimental Validation: The experimental results are compelling and demonstrate the effectiveness of the proposed approach.
Versatility in Adversarial Models: Extending the auditing algorithm to handle different adversaries, such as reconstruction attacks.

缺点

The authors investigate exciting problems and provide interesting results. I encourage the authors to continue working on these results, as they are sound and exciting to the DP community. However, I don’t think the work is ready to be published in its current form, as it is somewhat rushed. I sketch my main concerns below.

Writing and Presentation Quality: The manuscript contains several errors and unclear explanations. The authors should revise it before publication, as there are plenty of writing errors and bad citing style.
Unreferenced Figures and Results: Some results, particularly those in Figure 7, need to be adequately referenced or explained within the text, leading to confusion about their significance.
Incomplete Explanation of Gaps: The paper needs to explain the gaps between theoretical and lower bounds. Possible reasons for these gaps should be analysed, such as limitations of the f-DP framework, assumptions made in the analysis, or practical considerations in implementation.
Insufficient Experimental Details: There are no experiments in the black-box setting for which we are compelled to use one-shot auditing. The white-box setting enjoys a tight and efficient auditing algorithm (Nasr et al., 2023), while the black-box algorithms are rather expensive.

问题

Questions:

You claim that your approach achieves tighter results as the number of canaries increases, outperforming the empirical privacy results from Steinke et al. (2023), suggesting that the results can be tight as we increase the number of canaries. Could you elaborate on why your bounds continue to improve with more canaries while the bounds in previous work degrade? What underlying mechanisms in your algorithm contribute to this improvement? Citing the authors: ” Figure 1 demonstrates that our approach outperforms the empirical privacy results from Steinke et al. Interestingly, while the bound in Steinke et al. (2023) degrades as the number of canaries increases, our bounds continue to improve.”
What potential sources contribute to any lack of tightness in your lower bounds? Are there specific aspects of the f-DP framework or your implementation that introduce looseness? How might these be addressed in future work to enhance the tightness of the bounds?
How does your algorithm perform in the black-box setting compared to the white-box setting? Can you provide detailed experimental results illustrating this performance?

评论- Thank you for your constructive feedback.

2024-11-22

We thank the reviewer for their constructive feedbacks. Addressing your comments has improved our paper and we appreciate that.

Could you elaborate on why your bounds continue to improve with more canaries while the bounds in previous work degrade? What underlying mechanisms in your algorithm contribute to this improvement?

Thank you for raising this question. We added a discussion on why this effect happens (See page 10 line 490). The reason the bound of Steinke et al degrades as the number of samples increases beyond a certain point is because of the way their bound depends on the $delta$ term. Specifically, they have an $O(m\delta)$ term in their upper bound that starts to dominate as $m$ increases. For us, this effect does not happen because our analysis does not rely on $delta$ at a single $\epsilon$ to bound the probability of bad events. Our reliance on $f$ -curve enables us to keep improving with more canaries.

What potential sources contribute to any lack of tightness in your lower bounds? Are there specific aspects of the f-DP framework or your implementation that introduce looseness? How might these be addressed in future work to enhance the tightness of the bounds?

This is a great question. We also touched on this in the same paragraph referenced above (Page 10 line 516). We believe our Theorem 9 is tight. However, the way we use theorem 9 to bound the tail is still subject to improvement. There is a subtle point of sub-optimality in our Algorithm 10. In this algorithm, we make use of the fact that the expectation of correct guesses, conditioned on the correct guesses being greater than c divided by the expectation of incorrect guesses conditioned on the same event is greater than c/c′. This step is not tight as we cannot have a mechanism where the adversary makes exactly c correct guesses with probability greater than 0, while making more than c correct guesses with probability exactly 0.

We again emphasize that our Theorem 9 is tight, we only need to find a more optimal way to find the worst case profile that is consistent with 9. Currently Algorithm 10 is the best we have, but that could improve in future work.

How does your algorithm perform in the black-box setting compared to the white-box setting? Can you provide detailed experimental results illustrating this performance?

Thank you for this suggestion. We have performed two sets of experiments in the black-box setting and provided the results in the Experiments section. In the first experiment, we use the same setup to that of Steinke et al with m=n=1000 and perform black-box attacks against models trained on CIFAR10 using DP-SGD. We use the same black-box membership inference attacks as them and obtain empirical privacy. We outperform their bounds in all settings. (See Figure 2)

We also performed an experiment on non-private models trained on CIFRA10. A benefit of empirical privacy is that it can be calculated on models that are not theoretically private but are hypothesized to be private. Hence, we use the state-of-the-art for membership inference attacks on CIFAR10 to obtain numbers on attack success and calculated empirical privacy. In these experiments we also outperform Steinke et al in our empirical privacy measurements. (See Figure 4)

Writing and Presentation Quality:

We enhanced the quality of text and resolved the typos and missing references. We appreciate your feedback.

2024-11-25

I appreciate the effort that went into rewriting the paper. However, I would have appreciated it if the authors had respected the guidelines and appropriately highlighted the changes they made to the manuscript.

I would most likely champion acceptance if the current version was the initial final one. However, I am tempted to keep my reject stance, as the rebuttal is meant to discuss and marginally improve the manuscript, not provide a new one. I feel like the authors have abused the rebuttal phase to heavily rewrite the main body of the paper, which now would require a new review. This defeats the purpose, in my opinion, of rebuttals.

That being said, I find the work valuable, but I think it is only fair to resubmit it directly in this new version.

2024-11-26

We are glad that your evaluation of the revised draft is positive. Regarding the changes in the paper, we had originally provided a list of changes as a general comment to all reviewers. All these changes were made to address reviewers requests. We have also included a supplementary PDF document that highlights the differences between the original and revised versions of the paper. In this document, text highlighted in blue indicates new additions, red shows removed text, and green marks text that has been moved (We recommend viewing this document in a two-page view for the best experience).

We want to emphasize that our intention was not to misuse the rebuttal phase. As demonstrated in the diff file, all changes were made with the sole purpose of addressing the reviewers' concerns to the best of our ability. We found the feedback extremely constructive and have worked diligently to incorporate it into our paper.

We will also appreciate if the reviewer would comment on the changes we highlighted in our rebuttal. Has our rebuttal addressed your concerns and answered your questions about comparison with previous work and also the experiments in the black-box setting? These points were important to clarify and we want to make sure they are sufficiently addressed.

2024-12-03

Dear Reviewer CU1c,

Thank you once again for your hard work in reviewing our paper. We would greatly appreciate any further comments you may have. Additionally, we would be grateful if you could let us know whether your initial comments have been adequately addressed, regardless of whether you decide to modify your score. Our goal is to ensure that all your feedback is incorporated into the next iteration of the paper.

审稿意见

评分: 8置信度: 32024-11-04

The paper presents a novel algorithm designed to audit $f$ -DP guarantees within a single execution of a mechanism. This area of research has become increasingly significant within the privacy community, particularly due to the limitations of existing auditing mechanisms. Existing empirical auditing methods are either computationally expensive (requiring multiple runs of the machine learning algorithm) or fall short in providing a tight empirical privacy guarantee. The need to run the mechanism multiple times has hindered practical applications. Steinke et al. (2023) introduced a pioneering approach that balances the number of runs with the tightness of the audit. This present work enhances this trade-off further by auditing $f$ -DP guarantees, which provide a more precise representation of a mechanism's privacy compared to traditional approximate DP parameters.

优点

Valuable Contribution to Existing Research: There has been extensive work on auditing differential privacy guarantees. This paper distinguishes itself by offering a solution that enhances both computational efficiency and the precision of empirical privacy guarantees. The reliance on multiple runs of the mechanism has been a major obstacle to the widespread application of auditing methods. Their approach, requiring only a single run, makes auditing significantly more practical, especially for complex machine-learning algorithms involving extensive model training.
Using the $f$ -DP framework is a particularly strong aspect of this work. $f$ -DP offers a more general and accurate representation of a mechanism's privacy compared to traditional approximate differential privacy. This choice allows for a more fine-grained and robust analysis of privacy. The authors convincingly demonstrate that auditing $f$ -DP leads to tighter empirical privacy assessments. By performing the analysis in a single training run, the paper achieves a more comprehensive understanding of the privacy implications within a practical computational framework.

缺点

The main weakness of this paper is its presentation. The write-up seems very rushed which at times hinders the flow of the reader. Many references are broken e.g. reference to Algorithm B. Lines 300-312 contain many typos and incomplete sentences. These are issues that can be addressed quickly but in the current state I would argue that the presentations limits the value of this work to the community.
The authors have not provided a code artifact. While the contributions of this work are mostly theoretical, the implementation of the algorithm requires care and it would help reproducibility if a code artifact were supplied.

问题

On the section “Empirical Privacy” line no 307, why do the trade off curves need to be ordered? If you have a set of trade off curves $f_i$ that pass couldn’t you build a new trade off curve $f(x) = \min_i f_i(x)$
In what sense are the empirical results tight in Fig 7 and why is that not also evident in Fig 1?
Can you explain why abstentions are important in this algorithm?

评论- Thank you for your constructive comments.

2024-11-22

We thank the reviewer for their careful read of the paper and constructive comments. They helped us improve the quality of the paper.

The main weakness of this paper is its presentation.

We have significantly enhanced the presentation of the paper. These improvements are both in writing quality and also some new discussions and definitions to enhance the readability. We appreciate your feedback.

The authors have not provided a code artifact. While the contributions of this work are mostly theoretical, the implementation of the algorithm requires care and it would help reproducibility if a code artifact were supplied.

We agree with the reviewer that implementation of the algorithms requires care. We added code snippets for all of the key algorithms for our auditing procedure. These can be found in appendix (pages 23-28).

On the section “Empirical Privacy” line no 307 (line 383 in the revision), why do the trade off curves need to be ordered? If you have a set of trade off curves that pass, couldn't you build a new trade off curve $f(x)=min_i f_i(x)$ .

You are right, they don't have to be ordered. We can find the maximal set of all privacy curves that pass and calculate the empirical privacy based on that. The only problem with your formulation is that we need to take the max, and not the min (this is because smaller $f$ translates to a weaker privacy guarantee). We formulate this in a new definition (see Definition 7 for empirical privacy and other corresponding definitions) and clarify that we don't need to have an ordered set.

In what sense are the empirical results tight in Fig 7 (Fig 11 in the revision) and why is that not also evident in Fig 1?

We rephrased our claim. We now state that our estimation of delta and epsilon captures the true behavior of epsilon and delta whereas for Stineke et al it does not. The reason Fig 7 (Figure 11 in revised draft) looks much tighter than Fig 1 is because we were using a different set of confidence intervals to calculate the bounds in figure 7 (Figure 11 in revised draft). We realized that this particular plot was created with 1-confidence instead of confidence. We regenerated the plot and updated the paper with the new plot (see Figures 10 and 11). The curves still reflect the true behavior of $(epsilon, \delta)$ but with a larger gap from the ground truth.

Thank you for your careful review of the paper! Also note that we have moved this experiment to appendix to open space for other discussions and experiments. We will be happy to bring them back if you find them necessary.

Can you explain why abstentions are important in this algorithm?

Consider a random variable representing the ratio of correct guesses ( $c$ ) to total guesses ( $c'$ ). Note that the core component of our auditing procedure is a way of calculating tail bounds on this random variable. If we reduce the number of guesses the variance of this ratio tends to decrease because the ratio approaches 1 (the adversary can make more correct guesses when we decrease $c'$ ). Conversely, if we increase the number of guesses, the variance can also decrease because having more guesses generally leads to a more stable average, owing to the law of large numbers. This balance makes the number of guesses a crucial factor to optimize for. We have added a discussion about this right after our ablation on the number of guesses. (Note that this experiment and the ablation is now in the appendix, page 22, line 1185.

2024-12-03

Dear Reviewer QcyN,

Thank you once again for your valuable and constructive comments. We would be grateful if you could let us know whether your concerns and questions have been adequately addressed. We found your feedback really valuable and want to make sure to incorporate them in the next iteration of the paper.

2024-12-03

Thank you for the clarifications. While I appreciate the significant improvements that went into the revised version, I also agree with reviewer CU1c that it is not intended to submit an entirely new version in the rebuttal phase. Nevertheless, I believe that the paper is of value to the community and I have therefore increased my score. I would encourage the authors to submit a more final version next time as this will also lead to higher quality reviews.

评论- A summary of changes in our revision

2024-11-22

We thank all reviewers for their constructive feedback. We have updated the paper to address the reviewers concerns and suggestions. To summarize, we have made the following changes:

Added clarification, discussion and insights on why our bounds behave differently from previous work. (Paragraphs starting in lines 84, 186, 216 , 492, 1185)
We ran new experiments in the black-box settings and compared our bounds with previous work. (Figures 2 and 4 in the revised paper.)
We enhanced the presentation of the paper, fixed typos and resolved broken references.
Added a definition for empirical privacy estimation to clarify how we exactly measure empirical privacy. (Definition 7 and revised version of definition 6)
Added code snippet to address the reproducibility concerns raised by reviewers. (Pages 24-27)
Moved the proof and some of the experiments to appendix to open space for the new discussions and experiments.
Added more discussion on previous work on privacy attacks and auditing. (Page 5, lines 255-273)

We hope these changes have addressed the reviewers' concerns and would welcome any additional feedback.

审稿意见

评分: 6置信度: 42024-11-26

This paper proposes an approach for auditing the guarantees of a differentially-private algorithm, which in contrast to other existing auditing schemes, does not require re-training of the model. In addition, the approach provides tighter bounds that the related work by Steinke et al.

优点

-The existing literature on privacy auditing is clearly reviewed as well as the main limitations of existing approaches.

-The paper is well-written and easy to read. The authors have also provide a clear introduction of the notions necessary to the understanding of their work such as the f-DP curve.

-The proposed approach has the benefit of only requiring a single run of the mechanism. The relationship with the method of Steinke at al. is also clearly explained. One of the main novelty of the approach is also to connect the privacy auditing procedure with previous works bounding the accuracy of reconstruction attacks.

-The experiments conducted demonstrate that the approach proposed performed better in terms of the tightness of the bound estimated when the noise added is a a low to high regime.

缺点

-Some notions such as the concept of empirical privacy could have been more formally defined. Other important details are missing such as the relationship between the way canaries are designed and the quality of their possible reconstruction as well as the design of the function f that should be considered for the auditing.

-The number of canaries needed for the experiments is very high and is likely to have a significant impact on the utility of the classifier learnt. While an experiment has been conducted with CIFAR-10 to measure the impact of the introduction of 5000 canaries more experiments should be conducted by varying the number of canaries to observe in a more fine-grained manner the impact of the introduction of canaries.

-The approach has been validated empirically only on one dataset. Additional experiments at least two other datasets should be conducted to validate how well the approach generalizes to other settings.

-There are some minor typos that could be corrected in a future revised version. For instance, "(e.g., [1])." seems to refer to a different bibliography style. Other typos : "an attack algorthm A" should be "an attack algorithm A", "in Essene" should be "in essence" and "augumented multiplicity" should be "augmented multiplicity"

问题

-Can you discuss more how the design of the set of canaries impact the reconstruction games?

-Apart from the classical example of differential privacy, can you provide a few other examples of function f that could be audited using your framework?

-Can the approach be used on classifiers for other types of data such as for example tabular data?

评论- Thank you for your constructive feedback!

2024-12-03

We thank the reviewer for their valuable comments and feedback. We have completed some experiments per your suggestions and are in the process of running more. We plan to include all these in the next iteration of the paper. Please find our responses to individual questions below.

Some notions, such as the concept of empirical privacy, could have been more formally defined.

We have updated our draft with more details on how we define the notion of empirical privacy. Definition 7 in the revised version of the paper precisely defines empirical privacy. To calculate empirical privacy, we find the strongest $f$ -DP curve that will pass our audit and then calculate $\epsilon$ and $\delta$ for that curve.
Additional experiments on at least two other datasets should be conducted.

Thank you for your suggestion. We performed a new experiment on the Purchase dataset. Purchase is a tabular dataset consisting of 600 numerical columns. We used 25,000 canaries and varied the number of guesses. We achieve an empirical $\epsilon \approx 7$ , while Steinke et al.'s approach achieves $\epsilon \approx 4.5$ . We are also in the process of running experiments with the AGNews dataset and will include it in the final version. Note that for all experiments, we expect to perform better than Steinke et al. due to our tighter analysis.
The number of canaries needed for the experiments is very high and is likely to have a significant impact on the utility of the classifier learned.

We agree with the reviewer that the design of canaries is important. However, it is orthogonal to our work. Our auditing procedure can be instantiated with any set of canaries and any attack. Our contribution lies in analyzing empirical privacy better than prior work, for any given instantiation of attack and canary selection. For our black-box experiments on CIFAR-10, for example, we choose samples from the dataset as our canaries. These canaries will not degrade the quality of the model because they are from the same distribution.
Discuss the design of the function $f$ that should be considered for the auditing.

We have added a new discussion on how to choose this function $f$ . In the revised paper, we state:

The family of trade-off functions should be chosen based on the expectations of the true privacy curve. For example, if one expects the privacy curve of a mechanism to be similar to that of a Gaussian mechanism, then they would choose the set of all trade-off functions imposed by a Gaussian mechanism as the family. For example, many believe that in the hidden state model of privacy (Ye & Shokri, 2022), the final model would behave like a Gaussian mechanism with higher noise than what is expected from the accounting in the white-box model (where we assume we release all the intermediate models). Although we may not be able to prove this hypothesis, we can use our framework to calculate the empirical privacy while assuming that the behavior of the final model would be similar to that of a Gaussian mechanism.
Can the approach be used on classifiers for other types of data, such as tabular data?

Yes, as mentioned above, we have new experiments on the Purchase dataset. In these experiments, our set of canaries is selected from the data distribution, and we use state-of-the-art membership inference attacks (https://github.com/privacytrustlab/ml_privacy_meter) to obtain empirical privacy. Our empirical privacy numbers are better than Steinke et al.'s estimates.
Apart from the classical example of differential privacy, can you provide a few other examples of function $f$ that could be audited using your framework?

We can audit any $f$ -DP curve. In this work, we focused on Gaussian and sub-sampled mechanisms. However, we can also consider mechanisms such as Laplace, Exponential, and randomized response mechanisms. As long as we have the $f$ -DP curve for the mechanism, we can plug it into our Algorithm 3 and obtain empirical privacy. We are in the process of running some experiments with the Laplace mechanism to demonstrate this and will include it in the next iteration of the paper.
There are some minor typos.

Thank you for pointing out these typos. We have fixed them all in our revised version.

2024-12-03

Can you discuss more how the design of the set of canaries impacts the reconstruction games?

The best case for reconstruction games is to select canaries in a really distinct way so that the adversary's job in detecting the selected canary becomes easier. However, we emphasize that the choice of canaries should be aligned with the distribution of data to avoid degradation in utility. This is especially important in the one-run setting, where utility is crucial. For instance, in the case of reconstruction game. You can consider choosing a random sample from CIFAR dataset, and then creating 10 version of this sample by adding a small number in the top corner of the image. Then the role of adversary is to identify which image is more likely to be used. We expect this not to have much effect on the utility of the model because the interference from canaries would not change the data distribution. Alternatively, you can use augmentation techniques to augment a sample in 10 different ways and then decide which augmentation was used. The modular nature of our analysis makes it possible to use any attack and canary selection setup and obtain empirical privacy. Although this way of selecting canaries will not be as effective as creating canaries in an adversarial manner, they can still obtain empirical privacy numbers that are close to the theoretical bounds.

2024-12-03

I would appreciate if the authors could answer the questions that were raised in my review.

2024-12-03

We greatly appreciate all your comments and feedback. We hope that our response adequately addresses your questions and concerns. We would be grateful for any additional feedback and suggestions you might have.

Please also accept our apologies for our response to your feedback being posted later than others. Your review came in later, and we were in the process of completing some experiments before posting our response.

AC 元评审

2024-12-15

Summary of Contributions

This paper studies auditing of differential privacy (DP) with one run of the algorithm. Previous work (e.g. (Steinke et al., 2023)) studies the setting where we wish to test whether the algorithm satisfies $(\epsilon, \delta)$ -DP for a single pair of $(\epsilon, \delta)$ . This work extends the test to the setting of $f$ -DP, which--roughly speaking--gives an entire curve for the values of $(\epsilon, \delta)$ . By deriving sharper bounds on the accuracy of the guessing game assuming $f$ -DP (Theorem 9), the authors show that $f$ -DP allows for a better auditing compared to $(\epsilon, \delta)$ -DP; this (partially) answers an open question from (Steinke et al., 2023). The authors also demonstrate this numerically, showing improvements in auditing the Gaussian mechanism and DP-SGD.

Strengths

The paper makes a clear contribution towards privacy auditing in one run, using an additional concept of $f$ -DP. This is novel as previous works have only consider a single $(\epsilon, \delta)$ pair when auditing. Furthermore, $f$ -DP is an important concept that has led to tight numerical privacy accounting in recent years and, thus, this is a well motivated and important setting.
Empirically, the method has shown to significantly improve the empirical privacy from auditing compared to previous work.

Weaknesses

Presentation: Multiple reviewers find the presentation qualities to be inadequate. Some central concepts (e.g. empirical privacy) and details were not defined in the original version of the submission. Discussions on the experiment details and effects of different parameters were also insufficient.
Significant Changes in the revision: Although these have been added in the revision, the changes are quite significant (as can be seen by the diff in the latest supplementary material; perhaps ~30% of the main text). Some of the potential issues--which are only apparent after the change--are listed below. Due to this, it might be best if the paper is re-submitted and fully reviewed again rather than accepted as is.
Possible Flaw in Empirical Privacy: The "empirical privacy" (Definition 7) used in this paper might be flawed and incomparable with previous work e.g. Steinke et al. (2023). To explain this, let us assume that the accuracy is 100% and consider the scenario where the test fails; what Steinke et al. (2023) works say is that the algorithm is not $(\epsilon, \delta)$ -DP. However, what the test in this paper shows is that the algorithm is not $f$ -DP. In this latter scenario, we do not know where the failure comes from; it could be that it fails for $\delta = 10^{-5}$ or at $\delta = 10^{-7}$ etc. As a result, we cannot simply pick a single $\delta$ and report the $\epsilon$ at that $\delta$ , because the violation might not happen there. Due to this, it is unfair to compare to previous work in a manner done in Section 4.1 (and this is not discussed at all in the paper).
Practicality / Limitations: Related to the above, this also highlights the fact that we need to pick the $f$ curve very carefully to get any meaningful result from such a testing at all. This will likely limit the practicality of the algorithm significantly since, for more complicated algorithms other than Gaussian mechanism, the $f$ curve is not well understood and computing them can be numerically challenging. (The choice of $f$ is mentioned briefly before Section 4.1 but the discussion does not dive into such subtlety.)

Recommendation

Although auditing $f$ -DP is an important direction and this paper provides some interesting initial work, more investigation / clarification should be done before the paper is ready for publication. Furthermore, it would be useful for the paper to get another proper full end-to-end review it deserves, given the significant changed during rebuttal. Due to these, we recommend rejection.

审稿人讨论附加意见

As mentioned above, the original version of the submission has many presentation issues, e.g. missing many important concepts / discussions, such as "empirical privacy" which is central to the paper's results. The rebuttal / revision was mostly trying to alleviate this. As stated in the meta-review, this results in a huge amount of change, some of which seems to reveal some additional flaws. Given this, I strongly suggest this paper be rejected so that it can be reviewed more thoroughly.

最终决定Reject

2025-01-22

Reject