Exactly Minimax-Optimal Locally Differentially Private Sampling
摘要
评审与讨论
Consider the following setting: A public domain , a family of distributions over , and a mechanism which give a distribution releases an element , such that is -pure local DP with respect to the input, that is - the output will have similar distribution for any two distribution inputs . Can we expect the output distribution - that is, the distribution of output elements given an input distribution - to be accurate in any way, where accuracy is measured by some -divergence between the distribution of outputs and the input distribution.
At first, this seems like a hopeless task, since the mechanism has to maintain privacy even against distributions that share no support. Surprisingly, as this paper proves, there is some hope and things are not that bleak.
This setting was first considered by [1], who proposed optimal mechanisms for finite and continuous domains, with respect to some given reference distribution, for KL-divergence. This paper removes the dependence on a given reference distribution, and instead provides an intuitive algorithm that is optimal for the finite case and does not depend on the choice of . It extends these results to the continuous case, for a family of distributions that are point-wise bounded from above and below by some reference distribution up to scale. Finally, it provides numerical evaluations for the continuous case, which demonstrate the improved utility.
[1] Hisham Husain, Borja Balle, Zac Cranko, and Richard Nock. Local Differential Privacy for Sampling. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 3404–3413. PMLR, June 2020.
优点
This paper is clearly written. Its contribution lays both in its conceptual framing as well as in its theoretical results.
缺点
Though many of the ideas and proofs in this work are novel, the results can be viewed as a generalization of previously known results as mentioned in the summary.
Though the results achieved in this paper are optimal, they are still quite weak. This is not surprising. As discussed in the summary, the very fact that we can learn anything in this setting is already surprising, given that (in the finite domain setting) the output given two different singleton distributions must be indistinguishable. As a result, the output distribution is heavily skewed towered the uniform distribution.
The main motivation for this setting is somewhat unclear. Since each user has access to a single distribution, we want to maintain privacy w.r.t. the user's distribution, and so we release a single element, the output is useful only under some assumption on the relation between the users' distributions.
问题
See above
局限性
N/A
We thank Reviewer NxAu for thoroughly reviewing our paper and providing valuable comments.
Response to the first weakness on comparison with previous work
We agree that a certain part of our results can be seen as a generalization of the previous work [35], as we mentioned in the "Comparison with Previous Work [35]" paragraph in Section 3.1. However, we would like emphasize that our work has non-trivial contributions beyond [35] in the following aspects:
-
We prove the universal optimality of the mechanism for all -divergences, whereas the previous work [35] only considered the KL divergence. Our proof involves numerous non-trivial steps relying on general properties satisfied for all -divergences, not specific properties of KL divergence.
-
The exact characterization of minimax loss requires a converse proof involving the derivation of the matching lower bound on the loss applicable to all possible mechanisms, not just to the mechanisms similar to [35]. By proving the matching lower bound that holds for all possible mechanisms, we have established the exact minimax optimality. Note that the previous work [35] does not provide explicit guidance on the choice of reference distribution. More specifically, the performance of their mechanism heavily depends on the choice of the reference distribution. Quite contrary, our work implicitly figures out the optimal reference measure with respect to the minimax loss.
Response to the second weakness on weak result
We agree that many existing privacy mechanisms, not limited to sampling scenarios, can be seen as skewing an original distribution towards a specific distribution (such as the uniform distribution). But the challenging part is determining how and to what extent to skew the distribution to achieve the optimal privacy-utility trade-off (PUT). The significance of our research lies in precisely determining the optimal PUT through non-trivial achievability and converse proofs.
Response to the third weakness on main motivation
Our setup has recently gained attention due to its practical applications in generative models. For example, as mentioned in the introduction, [36] uses the private sampling mechanism for fine-tuning large language models.
As mentioned by the reviewer, however, we may consider the scenario where the distributions of the users are correlated. One possible approach is to assume an underlying distribution on distributions from which each user's original distribution is drawn, and then infer such an underlying distribution given samples from the users. Also, we can formulate more general problems such as producing a distribution per user or generating multiple samples per user, which we mentioned in the introduction and the conclusion of the paper, respectively. We believe that the problem we considered is a simple yet fundamental one that serves as a stepping stone towards addressing these more general and practically interesting problems.
I thank the authors for their clarifications and have no further questions.
The paper considers the problem of sampling from a user's distribution under local differential privacy (LDP). Namely, a client has a distribution P in some set of distributions, where the set is known to a data curator. The client wants to perturb P to get Q, such that a sample from Q sent to the data curator satisfies LDP with respect to P (i.e., for any P, P' the probability of sampling a certain outcome differs by at most multiplicatively), but also minimizes a given f-divergence between P and Q. The authors give lower bounds on the f-divergence achievable for a given LDP guarantee in two settings, one where the set of distributions is over a discrete support, and another where the set of distributions is over a continuous support, but is lower and upper bounded by some constants times a base measure. They also give an explicit perturbation in these settings that exactly matches the lower bound, i.e. is an optimal solution to the problem. They also discuss why in the continuous setting the lower/upper bound assumption is necessary to get meaningful bounds (in particular, to avoid two disjoint point distributions being in the support), show how to extend their results to a setting where clients only have samples from their distribution, and perform experiments showing that their perturbation achieves better divergence bounds than the previous work "Local Differential Privacy for Sampling".
优点
- The paper gives matching upper and lower bounds for a very general form (i.e. for arbitrary f-divergence, rather than a specific divergence) of a natural and important problem of generating private samples with LDP
- The assumptions for the settings where the authors derive optimal results are well-justified - in the continuous setting, the authors demonstrate that their assumption (10) is somewhat necessary to derive any meaningful results, and the authors discuss how to extend to a more practical setting of clients having samples rather than distributions
- Numerical results support the theoretical results and make them easier to contextualize, specifically they demonstrate significant improvements in the bounds over the past work
缺点
- While the proofs of Theorems 3.1 and 3.3 in the appendix are far from trivial, given knowledge of the randomized response mechanism, it's not clear to me that the optimal bounds are as surprising as the authors suggest. See Questions for more detail.
- The introduction suggests one of the major improvements of the paper over [35] is avoiding the optimality depending on the choice of reference distribution. However, it seems that once we relax reference distribution to measure as the authors do, in both settings the authors consider, there is a natural choice of reference measure to apply the approach of [35] to. It also seems like the results could be retrieved just shifting the lower and upper bounds on the probabilities that [35] uses. While the authors justify the assumptions in the continuous setting that lead to an easy choice of reference distribution, and a lot of work goes into proving optimality of the authors' suggested mechanisms, it's seems both Theorems 3.1 and 3.3 are achievable via a slight variation of the technique in [35], which suggests limited originality/significance for the algorithmic techniques in the paper.
问题
- In the discrete setting, we could sample from P and then apply a randomized response to the sample. This seems to achieve the same f-divergence bound (7) in the extreme case where P is a point distribution. This mechanism might be worse for non-extreme distributions, but for the minimax f-divergence this might now matter. Is this mechanism also minimax optimal? If so, in the continuous case is the following mechanism minimax optimal? We sample from P, with some probability p keep the sample, otherwise sample from the distribution with pdf proportional to the base measure h. p can be chosen to be the largest value that preserves LDP.
局限性
Yes
Response to the first weakness on alternative mechanism
We thank Reviewer 9St7 for raising an interesting point. We are truly grateful for the thoughtful review of our paper.
Before we discuss about the minimax-optimality of the proposed scheme by the reviewer, we would like to emphasize that even though a simple randomized response-based mechanism provides an alternative achievability result, proving its optimality is a highly non-trivial finding. The exact characterization of the minimax loss requires a converse proof of deriving the matching lower bound on the loss applicable to all possible mechanisms, not just to randomized response or similar mechanisms. We believe that one of the technical novelties of our work lies in this converse proof.
Now let us discuss the alternative mechanism suggested by the reviewer. As mentioned by the reviewer, for the discrete case, we can simply sample an element from the original distribution and apply the -randomized response to that element. This mechanism, which we denote as , can be written as
We can design a similar mechanism for the continuous case as follows:
- First, toss a biased coin with head probability .
- If head, sample from the original distribution .
- Otherwise, sample from the reference distribution whose pdf is .
- The head probability is determined by to satisfy -LDP tightly. It is given as .
Such for the continuous case can be written as
It can be shown that in both setups of Theorems 3.1 and 3.3, the aforementioned mechanism is also minimax-optimal for any -divergences (here, we suppress the subscripts or for notational convenience).
However, as the reviewer mentioned, performs worse for non-extreme distributions. Indeed, we showed that outperforms for any distribution:
- For any -divergence and any , we have .
The above statement is a corollary of a stronger result about , which improves the results of [35] in a non-trivial way:
- For any -divergence, is the projection of onto a specific -close set, in which also lies.
We aim to revise the paper accordingly. In particular, we will mention the alternative mechanism that achieves minimax-optimality and provide comments on it in the final version. Thank you once again for highlighting this interesting point.
Response to the second weakness on comparison with previous work
-
First, as described in the response to the first weakness, the proposed mechanism indeed generalizes the approach of [35]: corresponds the projection of onto an -ball centered at some reference measure for any -divergence. We would like to emphasize that our work has non-trivial contributions beyond [35] in the following aspects:
-
Deriving the projection in a closed form for all -divergences is not immediate from [35] which only consider KL divergence. Although the technicalities cannot be fully detailed in the rebuttal due to space constraints, showing general properties satisfied for all possible , which may be non-differentiable, involves some non-trivial steps compared to [35].
-
Finding a right reference measure is an important step to characterize the exact optimality. Although we can apply the approach of [35] with an arbitrary reference measure, proving the minimax-optimality with respect to a specific reference measure involves highly non-trivial converse techniques. Our work differs from [35] in that we implicitly figured out the optimal reference measure with respect to the minimax loss.
-
-
Second, to the best of our knowledge, the algorithm in [35] to find a KL divergence projection for the continuous case (i.e., MBDE) only guarantees to find an approximate projection, not exact projection, even if a reference measure is given. Therefore, it is unlikely that a variation of MBDE will retrieve the exactly optimal loss for the continuous case in Theorem 3.3, even for the optimal reference measure we identified. In contrast, the proposed mechanism , which has a closed form, achieves the exactly optimal loss both for discrete and continuous cases, thereby eliminating the need for an iterative algorithm for projection as used in [35].
In more detail, MBDE iteratively updates the candidate of based on until convergence, employing a soft decision rule (called a weak learner) to distinguish a sample from and that from . For to converge to the optimal one, the accuracy of the weak learner ( in [35]) must approach 1 as . However, with the aim of making as close to as possible, such accuracy is expected to decrease over . To the best of our understanding of [35], there is no guarantee that this accuracy will converge to 1; thus, we cannot guarantee that MBDE will find the exact projection.
Thanks for your response. The new results comparing your mechanism to randomized response, and the discussion on why working with general f-divergences complicates things compared to [35], are convincing. I have raised my score accordingly.
This paper addresses the problem of sampling under local differential privacy (LDP) requirements, focusing on the privacy-utility trade-off (PUT). It defines the fundamental PUT of private sampling in the minimax sense, using f-divergence as the utility measure. The authors characterize the exact PUT for both finite and continuous data spaces and propose sampling mechanisms that are universally optimal for all f-divergences. The numerical experiments demonstrate the superiority of these mechanisms over existing baselines, showcasing their empirical utilities.
优点
- The paper presents a novel approach to analyzing the privacy-utility trade-off (PUT) in locally differentially private (LDP) sampling, utilizing f-divergence as the utility measure.
- The authors offer optimal mechanisms for both finite and continuous data spaces, given certain assumptions. Figure 2 provides a clear visualization that enhances understanding of the proposed mechanisms.
缺点
The constant does not have a closed form, requiring calculation through approximate methods. This introduces potential approximation errors that could degrade the privacy or utility guarantees. See question 1.
问题
- In practice, would the approximation error of calculating degrade the -LDP guarantee to -LDP with ?
- Is the mechanism uniquely optimal?
- Is there a typo on line 127: there may be no such that (6) does not sum to one?
局限性
NA
Response to the first question on the approximation error
We thank Reviewer KHbV for asking a practically important question. We briefly mentioned the effect of the approximation error in calculating on the privacy budget in Appendix F.2, and we considered this effect in the experiment. Let us analyze the effect of this error in more detail in the following.
Consider the continuous case in Theorem 3.3, and let , as in the RHS of equation (13). Suppose we set error tolerances and , and for each , we find such that . Then, we define the pdf of as . Since , we have , which implies that this approximated mechanism satisfies -LDP. Thus, allowing an approximation error does not require the introduction of -LDP but instead necessitates using a larger privacy budget. Note that to sample from , knowing the precise value of is unnecessary; it suffices to know , utilizing known MCMC methods. In the experiment for the continuous case, we allowed the error tolerances , and for each given , we implemented for , ensuring the actual privacy budget matched the initially given . We note that there is a typo in Appendix F.2 of the manuscript, which we found after submission. In lines 961-962, the sentence
"we modify the value of by "
should be revised as
"we modify the value of by ".
We also checked the code for the experiment and confirmed that the correct formula was used in the code.
Response to the second question on the uniqueness
A related question was raised by Reviewer 9St7. Reviewer 9St7 conjectured that a simple randomized response-based mechanism is also minimax-optimal. In conclusion, we checked that Reviewer 9St7's mechanism is also minimax-optimal, meaning our mechanism is not the unique optimal mechanism.
Specifically, Reviewer 9St7's mechanism works as follows: for the discrete case, an element is sampled from an original distribution and then subjected to -randomized response. For the continuous case, the mechanism operates as follows:
- First, toss a biased coin with head probability that tightly achieves -LDP.
- If head, sample from the original distribution .
- Otherwise, sample from the reference distribution whose pdf is .
We checked that in both setups of Theorems 3.1 and 3.3, this mechanism is also minimax-optimal for any -divergences. However, the mechanism proposed in the paper has certain advantages over the aforementioned mechanism. Specifically, let and denote our proposed mechanism in the paper and Reviewer 9St7's mechanism, respectively. Then, we showed that outperforms in a point-wise sense, i.e., for any -divergence and any , we have . The detailed arguments can be found in the rebuttal to Reviewer 9St7.
Response to the third question on a typo
We acknowledge that the corresponding sentence might cause confusion. To clarify, the corresponding sentence can be revised as follows:
"there may be no such that the RHS of (6) does not sum to one."
Thank you for your rebuttal. I have no additional questions at this time.
This paper considers the issue of protecting probability distributions with local differential privacy when they are sampled. This problem has been recently studied in view of potential applications to generative models. In particular, the paper studies mechanisms that, given a distribution, produce a "sampler", i.e., a sampling mechanism that can be seen as an obfuscated version of the original distribution and which is used to produce samples in replacement of the original distribution. The paper investigates the privacy-utility trade-off of the proposed method, where utility is measured in terms of f-divergence between the original distribution and the sampling distribution.
优点
-
The proposed method is shown to be universally optimal, i.e., the produced sampling mechanism minimizes any f-divergence wrt the original distribution
-
For finite data spaces, the paper presents a closed-form expression for the utility of both the proposed mechanism and the baseline method proposed in [35], allowing for an exact comparison of the utilities.
缺点
- The paper should justify better, in the introduction, why inferring a distribution presents a privacy risk. In general, in the DP literature, we want to protect an individual data point, not the distribution that can be inferred from the data collection.
问题
Can you please explain why knowing a distribution presents a privacy risk? In general, the problem in privacy consists in protecting the individual data point, while allowing statistics to be derived, including the inference of the distribution from which the data points are drawn.
局限性
The limitations are discussed. In particular, the paper points out that there are other notions of utility (distances between distributions) that could be considered.
Response to the question on the motivation of protecting distribution
We thank Reviewer D3Yh for thoroughly reviewing our paper and asking an important question regarding practical motivation of protecting distribution.
Let us first introduce an equivalent definition of LDP that illustrates the operational meaning of LDP. The following equivalent definition of LDP [R1] suggests that an LDP mechanism makes it difficult to infer any sensitive information about a user from , when is provided instead of the user's data .
- Theorem [R1, Thm. 14]: A conditional distribution satisfies -LDP if and only if
. (R1)
As we can see from the above definition, LDP aims to protect any possible sensitive information hidden inside the user data , and hence the variable in (R1) should represent the whole data possessed by the user that is processed to be sent to the server.
As the reviewer mentioned, many privacy studies consider the situation where each of multiple users has a single data point, and in this case, becomes such a single data point. On the other hand, in many practical scenarios, it is more natural to assume that each user possesses multiple data points, e.g., photographs and texts stored in a person's smart phone or multiple medical data hold by a hospital. Accordingly, there have been multiple studies considering the situation where each user has multiple data points. For example, [30] considers a scenario where there are data points per user, each of which is generated in an i.i.d. manner from an underlying distribution. In this case, it is appropriate to set in (R1) to be the whole -tuple of the user's data points. In our setting, the user's data is itself a distribution. Hence, the user distribution is in the place of in (R1). As mentioned in [35], this can be seen as a more general setup that includes situations where each client has multiple data points with variable number of data points, by regarding the multiple data points as the empirical distribution of the multiple data points.
One naive approach for privatizing multiple data points would be to perturb each data point of the user and send all perturbed data to the server, but it will result in significant privacy leakage due to parallel composition. Hence, developing better privacy mechanisms has gained increasing attention recently to protect the whole collection of multiple data points per user, in the context of not only LDP but also DP.
Reference:
- [R1] I. Issa, A. B. Wagner, and S. Kamath, "An Operational Approach to Information Leakage," IEEE Transactions on Information Theory, vol. 66, no. 3, pp. 1625–1657, Mar. 2020, doi: 10.1109/TIT.2019.2962804.
We thank all the reviewers for thoroughly reviewing our paper and providing numerous valuable comments. All suggestions have helped us better present our work and delve deeper into our theoretical results. In particular, some comments were especially helpful in improving the paper, and we believe they deserve mention in this global rebuttal, along with our plans for revising the paper.
The Effect of Approximation Error of on the Privacy Budget
Reviewer KHbV raised the following question:
- In practice, to implement the proposed mechanism, we need to find the constant numerically. Does the approximation error in finding result in a degradation of the privacy guarantee?
We acknowledge the effect of the approximation error on the privacy budget. We briefly explain this effect in Appendix F.2 and account for it in the experiment. However, recognizing its importance during the rebuttal process, we plan to supplement the detailed explanation of this effect in the revision. More details are provided in the rebuttal to Reviewer KHbV.
Alternative Minimax Optimal Mechanism
Two reviewers, KHbV and 9St7, raised a question about whether the proposed mechanism is the unique mechanism achieving minimax optimality. Additionally, Reviewer 9St7 proposed an alternative mechanism, , conjectured to be optimal as well. Our subsequent work during the rebuttal period confirmed that Reviewer 9St7's mechanism is also optimal in the minimax sense. However, we also demonstrated that the proposed mechanism in the paper has an advantage over the reviewer's mechanism in a point-wise sense. This advantage arises from the stronger statement that performs a projection onto a specific -close set for any -divergences. Further details are provided in the rebuttal to Reviewer 9St7.
We believe that these results are worth including in the paper and plan to incorporate them, along with their proof, in the appendix during the revision. In particular, we believe that the property of being a projection for any -divergences is a significant advancement over previous work [35], which only considered KL divergence. Although the proof cannot be fully detailed in the rebuttal due to space constraints, we believe that both the property itself and its proof are highly non-trivial contributions beyond [35]. The basic approach is to verify that satisfies the KKT conditions corresponding to the optimization of minimizing the -divergence. However, the main challenges lie in the following aspects, which distinguish this work from [35]:
- We need to show that satisfies all the KKT conditions corresponding to all possible . This requires relying on general properties satisfied by all possible -divergences (possibly non-differentiable), rather than specific properties of KL divergence.
- We must also address the continuous space, which involves infinite-dimensional optimization, in addition to the discrete space.
Actually, we had already obtained all the results mentioned in the rebuttal to Reviewer 9St7 for the discrete case before submission. However, at the time of submission, we neither acknowledged the mechanism for the continuous case nor proved that the proposed mechanism for the continuous case performs the -divergence projection. Therefore, we did not include these results in the manuscript for consistency. However, thanks to Reviewer 9St7, we were able to dig into the continuous case and prove the results for the continuous case during the rebuttal period. Accordingly, we plan to include these results in the revised paper. In particular, we will add a discussion on the mechanism as well as a proof that corresponds to the -divergence projection also for the continuous case.
This paper makes the following contribution: it considers a generalization of standard single-point LDP to the distribution version. It measures the utility using -divergene, which removes the reference distribution in prior work. One key result is the establishment of the minimax lower and upper bounds. All reviewers find the results novel and important. I also think it has made some nice contributions and would like to accept it.
Some possible minor improvements for the camera-ready version: As mentioned by the reviewer, the authors may need to discuss the random-response-based alternative method, given its close relation to the optimal one. Another one is the discussion on the parameter .