PaperHub
6.3
/10
Poster4 位审稿人
最低6最高7标准差0.4
6
6
6
7
3.5
置信度
正确性3.8
贡献度3.0
表达3.0
NeurIPS 2024

Exactly Minimax-Optimal Locally Differentially Private Sampling

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06

摘要

The sampling problem under local differential privacy has recently been studied with potential applications to generative models, but a fundamental analysis of its privacy-utility trade-off (PUT) remains incomplete. In this work, we define the fundamental PUT of private sampling in the minimax sense, using the $f$-divergence between original and sampling distributions as the utility measure. We characterize the exact PUT for both finite and continuous data spaces under some mild conditions on the data distributions, and propose sampling mechanisms that are universally optimal for all $f$-divergences. Our numerical experiments demonstrate the superiority of our mechanisms over baselines, in terms of theoretical utilities for finite data space and of empirical utilities for continuous data space.
关键词
Local differential privacySamplingPrivacy-utility trade-off

评审与讨论

审稿意见
6

Consider the following setting: A public domain X\mathcal{X}, a family of distributions P\mathcal{P} over X\mathcal{X}, and a mechanism MM which give a distribution PPP \in \mathcal{P} releases an element xXx \in \mathcal{X}, such that MM is ϵ\epsilon-pure local DP with respect to the input, that is - the output will have similar distribution for any two distribution inputs P,PPP, P' \in \mathcal{P}. Can we expect the output distribution - that is, the distribution of output elements xx given an input distribution PP - to be accurate in any way, where accuracy is measured by some ff-divergence between the distribution of outputs and the input distribution.

At first, this seems like a hopeless task, since the mechanism has to maintain privacy even against distributions that share no support. Surprisingly, as this paper proves, there is some hope and things are not that bleak.

This setting was first considered by [1], who proposed optimal mechanisms for finite and continuous domains, with respect to some given reference distribution, for KL-divergence. This paper removes the dependence on a given reference distribution, and instead provides an intuitive algorithm that is optimal for the finite case and does not depend on the choice of ff. It extends these results to the continuous case, for a family of distributions that are point-wise bounded from above and below by some reference distribution up to scale. Finally, it provides numerical evaluations for the continuous case, which demonstrate the improved utility.

[1] Hisham Husain, Borja Balle, Zac Cranko, and Richard Nock. Local Differential Privacy for Sampling. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 3404–3413. PMLR, June 2020.

优点

This paper is clearly written. Its contribution lays both in its conceptual framing as well as in its theoretical results.

缺点

Though many of the ideas and proofs in this work are novel, the results can be viewed as a generalization of previously known results as mentioned in the summary.

Though the results achieved in this paper are optimal, they are still quite weak. This is not surprising. As discussed in the summary, the very fact that we can learn anything in this setting is already surprising, given that (in the finite domain setting) the output given two different singleton distributions must be indistinguishable. As a result, the output distribution is heavily skewed towered the uniform distribution.

The main motivation for this setting is somewhat unclear. Since each user has access to a single distribution, we want to maintain privacy w.r.t. the user's distribution, and so we release a single element, the output is useful only under some assumption on the relation between the users' distributions.

问题

See above

局限性

N/A

作者回复

We thank Reviewer NxAu for thoroughly reviewing our paper and providing valuable comments.

Response to the first weakness on comparison with previous work

We agree that a certain part of our results can be seen as a generalization of the previous work [35], as we mentioned in the "Comparison with Previous Work [35]" paragraph in Section 3.1. However, we would like emphasize that our work has non-trivial contributions beyond [35] in the following aspects:

  • We prove the universal optimality of the mechanism for all ff-divergences, whereas the previous work [35] only considered the KL divergence. Our proof involves numerous non-trivial steps relying on general properties satisfied for all ff-divergences, not specific properties of KL divergence.

  • The exact characterization of minimax loss requires a converse proof involving the derivation of the matching lower bound on the loss applicable to all possible mechanisms, not just to the mechanisms similar to [35]. By proving the matching lower bound that holds for all possible mechanisms, we have established the exact minimax optimality. Note that the previous work [35] does not provide explicit guidance on the choice of reference distribution. More specifically, the performance of their mechanism heavily depends on the choice of the reference distribution. Quite contrary, our work implicitly figures out the optimal reference measure with respect to the minimax loss.

Response to the second weakness on weak result

We agree that many existing privacy mechanisms, not limited to sampling scenarios, can be seen as skewing an original distribution towards a specific distribution (such as the uniform distribution). But the challenging part is determining how and to what extent to skew the distribution to achieve the optimal privacy-utility trade-off (PUT). The significance of our research lies in precisely determining the optimal PUT through non-trivial achievability and converse proofs.

Response to the third weakness on main motivation

Our setup has recently gained attention due to its practical applications in generative models. For example, as mentioned in the introduction, [36] uses the private sampling mechanism for fine-tuning large language models.
As mentioned by the reviewer, however, we may consider the scenario where the distributions of the users are correlated. One possible approach is to assume an underlying distribution on distributions from which each user's original distribution is drawn, and then infer such an underlying distribution given samples from the users. Also, we can formulate more general problems such as producing a distribution per user or generating multiple samples per user, which we mentioned in the introduction and the conclusion of the paper, respectively. We believe that the problem we considered is a simple yet fundamental one that serves as a stepping stone towards addressing these more general and practically interesting problems.

评论

I thank the authors for their clarifications and have no further questions.

审稿意见
6

The paper considers the problem of sampling from a user's distribution under local differential privacy (LDP). Namely, a client has a distribution P in some set of distributions, where the set is known to a data curator. The client wants to perturb P to get Q, such that a sample from Q sent to the data curator satisfies LDP with respect to P (i.e., for any P, P' the probability of sampling a certain outcome differs by at most eϵe^\epsilon multiplicatively), but also minimizes a given f-divergence between P and Q. The authors give lower bounds on the f-divergence achievable for a given LDP guarantee in two settings, one where the set of distributions is over a discrete support, and another where the set of distributions is over a continuous support, but is lower and upper bounded by some constants times a base measure. They also give an explicit perturbation in these settings that exactly matches the lower bound, i.e. is an optimal solution to the problem. They also discuss why in the continuous setting the lower/upper bound assumption is necessary to get meaningful bounds (in particular, to avoid two disjoint point distributions being in the support), show how to extend their results to a setting where clients only have samples from their distribution, and perform experiments showing that their perturbation achieves better divergence bounds than the previous work "Local Differential Privacy for Sampling".

优点

  • The paper gives matching upper and lower bounds for a very general form (i.e. for arbitrary f-divergence, rather than a specific divergence) of a natural and important problem of generating private samples with LDP
  • The assumptions for the settings where the authors derive optimal results are well-justified - in the continuous setting, the authors demonstrate that their assumption (10) is somewhat necessary to derive any meaningful results, and the authors discuss how to extend to a more practical setting of clients having samples rather than distributions
  • Numerical results support the theoretical results and make them easier to contextualize, specifically they demonstrate significant improvements in the bounds over the past work

缺点

  • While the proofs of Theorems 3.1 and 3.3 in the appendix are far from trivial, given knowledge of the randomized response mechanism, it's not clear to me that the optimal bounds are as surprising as the authors suggest. See Questions for more detail.
  • The introduction suggests one of the major improvements of the paper over [35] is avoiding the optimality depending on the choice of reference distribution. However, it seems that once we relax reference distribution to measure as the authors do, in both settings the authors consider, there is a natural choice of reference measure to apply the approach of [35] to. It also seems like the results could be retrieved just shifting the lower and upper bounds on the probabilities that [35] uses. While the authors justify the assumptions in the continuous setting that lead to an easy choice of reference distribution, and a lot of work goes into proving optimality of the authors' suggested mechanisms, it's seems both Theorems 3.1 and 3.3 are achievable via a slight variation of the technique in [35], which suggests limited originality/significance for the algorithmic techniques in the paper.

问题

  • In the discrete setting, we could sample from P and then apply a randomized response to the sample. This seems to achieve the same f-divergence bound (7) in the extreme case where P is a point distribution. This mechanism might be worse for non-extreme distributions, but for the minimax f-divergence this might now matter. Is this mechanism also minimax optimal? If so, in the continuous case is the following mechanism minimax optimal? We sample from P, with some probability p keep the sample, otherwise sample from the distribution with pdf proportional to the base measure h. p can be chosen to be the largest value that preserves LDP.

局限性

Yes

作者回复

Response to the first weakness on alternative mechanism

We thank Reviewer 9St7 for raising an interesting point. We are truly grateful for the thoughtful review of our paper.

Before we discuss about the minimax-optimality of the proposed scheme by the reviewer, we would like to emphasize that even though a simple randomized response-based mechanism provides an alternative achievability result, proving its optimality is a highly non-trivial finding. The exact characterization of the minimax loss requires a converse proof of deriving the matching lower bound on the loss applicable to all possible mechanisms, not just to randomized response or similar mechanisms. We believe that one of the technical novelties of our work lies in this converse proof.

Now let us discuss the alternative mechanism suggested by the reviewer. As mentioned by the reviewer, for the discrete case, we can simply sample an element from the original distribution PP and apply the kk-randomized response to that element. This mechanism, which we denote as Qk,ϵ\mathbf{Q}^{\dagger}_{k,\epsilon}, can be written as

Qk,ϵ(xP)=eϵeϵ+k1P(x)+1eϵ+k1(1P(x)).\mathbf{Q}^{\dagger}_{k,\epsilon}(x|P) = \frac{e^\epsilon}{e^\epsilon + k - 1}P(x) + \frac{1}{e^\epsilon+k-1} (1-P(x)).

We can design a similar mechanism Qc1,c2,h,ϵ(P)\mathbf{Q}^{\dagger}_{c_1,c_2,h,\epsilon}(P) for the continuous case as follows:

  • First, toss a biased coin with head probability γ\gamma.
  • If head, sample from the original distribution PP.
  • Otherwise, sample from the reference distribution μ\mu whose pdf is hh.
  • The head probability γ\gamma is determined by (c1,c2,ϵ)(c_1,c_2,\epsilon) to satisfy ϵ\epsilon-LDP tightly. It is given as γ=eϵ1(eϵ1)(1c1)+c2c1\gamma = \frac{e^\epsilon-1}{(e^\epsilon-1)(1-c_1)+c_2-c_1}.

Such Qc1,c2,h,ϵ(P)\mathbf{Q}^{\dagger}_{c_1,c_2,h,\epsilon}(P) for the continuous case can be written as

Qc1,c2,h,ϵ(P)=γP+(1γ)μ.\mathbf{Q}^{\dagger}_{c_1,c_2,h,\epsilon}(P) = \gamma P + (1-\gamma) \mu.

It can be shown that in both setups of Theorems 3.1 and 3.3, the aforementioned mechanism Q\mathbf{Q}^{\dagger} is also minimax-optimal for any ff-divergences (here, we suppress the subscripts (k,ϵ)(k,\epsilon) or (c1,c2,h,ϵ)(c_1,c_2,h,\epsilon) for notational convenience).

However, as the reviewer mentioned, Q\mathbf{Q}^{\dagger} performs worse for non-extreme distributions. Indeed, we showed that Q\mathbf{Q}^{ * } outperforms Q\mathbf{Q}^{\dagger} for any distribution:

  • For any ff-divergence DfD_f and any PP~P \in \tilde{\mathcal{P}}, we have Df(PQ(P))Df(PQ(P))D_f(P \Vert \mathbf{Q}^{\dagger}(P)) \geq D_f(P \Vert \mathbf{Q}^{ * }(P)).

The above statement is a corollary of a stronger result about Q\mathbf{Q}^{ * }, which improves the results of [35] in a non-trivial way:

  • For any ff-divergence, Q(P)\mathbf{Q}^{ * }(P) is the projection of PP onto a specific ϵ\epsilon-close set, in which Q\mathbf{Q}^{\dagger} also lies.

We aim to revise the paper accordingly. In particular, we will mention the alternative mechanism that achieves minimax-optimality and provide comments on it in the final version. Thank you once again for highlighting this interesting point.

Response to the second weakness on comparison with previous work

  • First, as described in the response to the first weakness, the proposed mechanism Q\mathbf{Q}^{ * } indeed generalizes the approach of [35]: Q(P)\mathbf{Q}^{ * }(P) corresponds the projection of PP onto an ϵ/2\epsilon/2-ball centered at some reference measure for any ff-divergence. We would like to emphasize that our work has non-trivial contributions beyond [35] in the following aspects:

    1. Deriving the projection in a closed form for all ff-divergences is not immediate from [35] which only consider KL divergence. Although the technicalities cannot be fully detailed in the rebuttal due to space constraints, showing general properties satisfied for all possible ff, which may be non-differentiable, involves some non-trivial steps compared to [35].

    2. Finding a right reference measure is an important step to characterize the exact optimality. Although we can apply the approach of [35] with an arbitrary reference measure, proving the minimax-optimality with respect to a specific reference measure involves highly non-trivial converse techniques. Our work differs from [35] in that we implicitly figured out the optimal reference measure with respect to the minimax loss.

  • Second, to the best of our knowledge, the algorithm in [35] to find a KL divergence projection for the continuous case (i.e., MBDE) only guarantees to find an approximate projection, not exact projection, even if a reference measure is given. Therefore, it is unlikely that a variation of MBDE will retrieve the exactly optimal loss for the continuous case in Theorem 3.3, even for the optimal reference measure we identified. In contrast, the proposed mechanism Q\mathbf{Q}^{ * }, which has a closed form, achieves the exactly optimal loss both for discrete and continuous cases, thereby eliminating the need for an iterative algorithm for projection as used in [35].

    In more detail, MBDE iteratively updates the candidate QtQ_t of Q(P)\mathbf{Q}(P) based on Qt1Q_{t-1} until convergence, employing a soft decision rule (called a weak learner) to distinguish a sample from Qt1Q_{t-1} and that from PP. For QtQ_t to converge to the optimal one, the accuracy of the weak learner (γP+γQ2\frac{\gamma_P + \gamma_Q}{2} in [35]) must approach 1 as tt \rightarrow \infty. However, with the aim of making QtQ_t as close to PP as possible, such accuracy is expected to decrease over tt. To the best of our understanding of [35], there is no guarantee that this accuracy will converge to 1; thus, we cannot guarantee that MBDE will find the exact projection.

评论

Thanks for your response. The new results comparing your mechanism to randomized response, and the discussion on why working with general f-divergences complicates things compared to [35], are convincing. I have raised my score accordingly.

审稿意见
6

This paper addresses the problem of sampling under local differential privacy (LDP) requirements, focusing on the privacy-utility trade-off (PUT). It defines the fundamental PUT of private sampling in the minimax sense, using f-divergence as the utility measure. The authors characterize the exact PUT for both finite and continuous data spaces and propose sampling mechanisms that are universally optimal for all f-divergences. The numerical experiments demonstrate the superiority of these mechanisms over existing baselines, showcasing their empirical utilities.

优点

  1. The paper presents a novel approach to analyzing the privacy-utility trade-off (PUT) in locally differentially private (LDP) sampling, utilizing f-divergence as the utility measure.
  2. The authors offer optimal mechanisms for both finite and continuous data spaces, given certain assumptions. Figure 2 provides a clear visualization that enhances understanding of the proposed mechanisms.

缺点

The constant rPr_P does not have a closed form, requiring calculation through approximate methods. This introduces potential approximation errors that could degrade the privacy or utility guarantees. See question 1.

问题

  1. In practice, would the approximation error of calculating rPr_P degrade the ϵ\epsilon-LDP guarantee to (ϵ,δ)(\epsilon,\delta)-LDP with δ>0\delta>0?
  2. Is the mechanism Qk,ϵQ^*_{k,\epsilon} uniquely optimal?
  3. Is there a typo on line 127: there may be no rP>0r_P>0 such that (6) does not sum to one?

局限性

NA

作者回复

Response to the first question on the approximation error

We thank Reviewer KHbV for asking a practically important question. We briefly mentioned the effect of the approximation error in calculating rPr_P on the privacy budget in Appendix F.2, and we considered this effect in the experiment. Let us analyze the effect of this error in more detail in the following.

Consider the continuous case in Theorem 3.3, and let qp,r(x)=clip(1rp(x);bh(x),beϵh(x))q_{p,r}(x) = \mathrm{clip}\left(\frac{1}{r} p(x); bh(x), be^\epsilon h(x)\right), as in the RHS of equation (13). Suppose we set error tolerances 0δ1<10\leq \delta_1 < 1 and δ20\delta_2 \geq 0, and for each PP~P \in \tilde{\mathcal{P}}, we find rPr_P such that qp,rP(x)dx[1δ1,1+δ2]\int q_{p,r_P}(x)dx \in [1-\delta_1,1+\delta_2]. Then, we define the pdf of Q_c1,c2,h,ϵ(P)\mathbf{Q}^{ * }\_{c_1,c_2,h,\epsilon}(P) as qp,rP(x)/qp,rP(x)dxq_{p,r_P}(x) / \int q_{p,r_P}(x)dx. Since qp,rP(x)[b,beϵ]q_{p,r_P}(x) \in [b,be^\epsilon], we have qp,rP(x)/qp,rP(x)dx[b1+δ2,beϵ1δ1]q_{p,r_P}(x) / \int q_{p,r_P}(x)dx \in \left[\frac{b}{1+\delta_2}, \frac{be^\epsilon}{1-\delta_1}\right], which implies that this approximated mechanism satisfies ϵ+log(1+δ21δ1)\epsilon + \log \left(\frac{1+\delta_2}{1-\delta_1}\right)-LDP. Thus, allowing an approximation error does not require the introduction of (ϵ,δ)(\epsilon,\delta)-LDP but instead necessitates using a larger privacy budget. Note that to sample from Q_c1,c2,h,ϵ(P)\mathbf{Q}^{ * }\_{c_1,c_2,h,\epsilon}(P), knowing the precise value of qp,rP(x)dx\int q_{p,r_P}(x)dx is unnecessary; it suffices to know qp,rP(x)q_{p,r_P}(x), utilizing known MCMC methods. In the experiment for the continuous case, we allowed the error tolerances δ1=δ2=105\delta_1=\delta_2=10^{-5}, and for each given ϵ0.1,0.5,1,2,5\epsilon \in \\{0.1,0.5,1,2,5\\}, we implemented Q_c1,c2,h,ϵ\mathbf{Q}^{ * }\_{c_1,c_2,h,\epsilon'} for ϵ=ϵ+log1δ11+δ2\epsilon'=\epsilon + \log \frac{1-\delta_1}{1+\delta_2}, ensuring the actual privacy budget matched the initially given ϵ\epsilon. We note that there is a typo in Appendix F.2 of the manuscript, which we found after submission. In lines 961-962, the sentence

"we modify the value of ϵ\epsilon by ϵ=1δ1+δeϵ\epsilon'=\frac{1-\delta}{1+\delta}e^\epsilon"

should be revised as

"we modify the value of ϵ\epsilon by eϵ=1δ1+δeϵe^{\epsilon'}=\frac{1-\delta}{1+\delta}e^\epsilon".

We also checked the code for the experiment and confirmed that the correct formula was used in the code.

Response to the second question on the uniqueness

A related question was raised by Reviewer 9St7. Reviewer 9St7 conjectured that a simple randomized response-based mechanism is also minimax-optimal. In conclusion, we checked that Reviewer 9St7's mechanism is also minimax-optimal, meaning our mechanism Q_k,ϵ\mathbf{Q}\_{k,\epsilon}^{ * } is not the unique optimal mechanism.

Specifically, Reviewer 9St7's mechanism works as follows: for the discrete case, an element is sampled from an original distribution PP and then subjected to kk-randomized response. For the continuous case, the mechanism operates as follows:

  • First, toss a biased coin with head probability that tightly achieves ϵ\epsilon-LDP.
  • If head, sample from the original distribution PP.
  • Otherwise, sample from the reference distribution μ\mu whose pdf is hh.

We checked that in both setups of Theorems 3.1 and 3.3, this mechanism is also minimax-optimal for any ff-divergences. However, the mechanism proposed in the paper has certain advantages over the aforementioned mechanism. Specifically, let Q\mathbf{Q}^{ * } and Q\mathbf{Q}^{\dagger} denote our proposed mechanism in the paper and Reviewer 9St7's mechanism, respectively. Then, we showed that Q\mathbf{Q}^{ * } outperforms Q\mathbf{Q}^{\dagger} in a point-wise sense, i.e., for any ff-divergence DfD_f and any PP~P \in \tilde{\mathcal{P}}, we have Df(PQ(P))Df(PQ(P))D_f(P \Vert \mathbf{Q}^{\dagger}(P)) \geq D_f(P \Vert \mathbf{Q}^{ * }(P)). The detailed arguments can be found in the rebuttal to Reviewer 9St7.

Response to the third question on a typo

We acknowledge that the corresponding sentence might cause confusion. To clarify, the corresponding sentence can be revised as follows:

"there may be no rP>0r_P > 0 such that the RHS of (6) does not sum to one."

评论

Thank you for your rebuttal. I have no additional questions at this time.

审稿意见
7

This paper considers the issue of protecting probability distributions with local differential privacy when they are sampled. This problem has been recently studied in view of potential applications to generative models. In particular, the paper studies mechanisms that, given a distribution, produce a "sampler", i.e., a sampling mechanism that can be seen as an obfuscated version of the original distribution and which is used to produce samples in replacement of the original distribution. The paper investigates the privacy-utility trade-off of the proposed method, where utility is measured in terms of f-divergence between the original distribution and the sampling distribution.

优点

  • The proposed method is shown to be universally optimal, i.e., the produced sampling mechanism minimizes any f-divergence wrt the original distribution

  • For finite data spaces, the paper presents a closed-form expression for the utility of both the proposed mechanism and the baseline method proposed in [35], allowing for an exact comparison of the utilities.

缺点

  • The paper should justify better, in the introduction, why inferring a distribution presents a privacy risk. In general, in the DP literature, we want to protect an individual data point, not the distribution that can be inferred from the data collection.

问题

Can you please explain why knowing a distribution presents a privacy risk? In general, the problem in privacy consists in protecting the individual data point, while allowing statistics to be derived, including the inference of the distribution from which the data points are drawn.

局限性

The limitations are discussed. In particular, the paper points out that there are other notions of utility (distances between distributions) that could be considered.

作者回复

Response to the question on the motivation of protecting distribution

We thank Reviewer D3Yh for thoroughly reviewing our paper and asking an important question regarding practical motivation of protecting distribution.

Let us first introduce an equivalent definition of LDP that illustrates the operational meaning of LDP. The following equivalent definition of LDP [R1] suggests that an LDP mechanism pYXp_{Y|X} makes it difficult to infer any sensitive information SS about a user from YY, when YY is provided instead of the user's data XX.

  • Theorem [R1, Thm. 14]: A conditional distribution pYXp_{Y|X} satisfies ϵ\epsilon-LDP if and only if

    supPXsupS:SXYlogmaxymaxsPSY(sy)PS(s)ϵ\displaystyle \sup_{P_X} \sup_{S:S-X-Y} \log \frac{\max_{y}\max_{s} P_{S|Y}(s|y)}{P_S(s)} \leq \epsilon. \cdots (R1)

As we can see from the above definition, LDP aims to protect any possible sensitive information hidden inside the user data XX, and hence the variable XX in (R1) should represent the whole data possessed by the user that is processed to be sent to the server.

As the reviewer mentioned, many privacy studies consider the situation where each of multiple users has a single data point, and in this case, XX becomes such a single data point. On the other hand, in many practical scenarios, it is more natural to assume that each user possesses multiple data points, e.g., photographs and texts stored in a person's smart phone or multiple medical data hold by a hospital. Accordingly, there have been multiple studies considering the situation where each user has multiple data points. For example, [30] considers a scenario where there are m1m \geq 1 data points per user, each of which is generated in an i.i.d. manner from an underlying distribution. In this case, it is appropriate to set XX in (R1) to be the whole mm-tuple of the user's data points. In our setting, the user's data is itself a distribution. Hence, the user distribution is in the place of XX in (R1). As mentioned in [35], this can be seen as a more general setup that includes situations where each client has multiple data points with variable number of data points, by regarding the multiple data points as the empirical distribution of the multiple data points.

One naive approach for privatizing multiple data points would be to perturb each data point of the user and send all perturbed data to the server, but it will result in significant privacy leakage due to parallel composition. Hence, developing better privacy mechanisms has gained increasing attention recently to protect the whole collection of multiple data points per user, in the context of not only LDP but also DP.

Reference:

  • [R1] I. Issa, A. B. Wagner, and S. Kamath, "An Operational Approach to Information Leakage," IEEE Transactions on Information Theory, vol. 66, no. 3, pp. 1625–1657, Mar. 2020, doi: 10.1109/TIT.2019.2962804.
作者回复

We thank all the reviewers for thoroughly reviewing our paper and providing numerous valuable comments. All suggestions have helped us better present our work and delve deeper into our theoretical results. In particular, some comments were especially helpful in improving the paper, and we believe they deserve mention in this global rebuttal, along with our plans for revising the paper.

The Effect of Approximation Error of r_Pr\_P on the Privacy Budget

Reviewer KHbV raised the following question:

  • In practice, to implement the proposed mechanism, we need to find the constant r_Pr\_P numerically. Does the approximation error in finding r_Pr\_P result in a degradation of the privacy guarantee?

We acknowledge the effect of the approximation error on the privacy budget. We briefly explain this effect in Appendix F.2 and account for it in the experiment. However, recognizing its importance during the rebuttal process, we plan to supplement the detailed explanation of this effect in the revision. More details are provided in the rebuttal to Reviewer KHbV.

Alternative Minimax Optimal Mechanism

Two reviewers, KHbV and 9St7, raised a question about whether the proposed mechanism is the unique mechanism achieving minimax optimality. Additionally, Reviewer 9St7 proposed an alternative mechanism, mathbfQdagger\\mathbf{Q}^{\\dagger}, conjectured to be optimal as well. Our subsequent work during the rebuttal period confirmed that Reviewer 9St7's mechanism is also optimal in the minimax sense. However, we also demonstrated that the proposed mechanism mathbfQ\\mathbf{Q}^{ * } in the paper has an advantage over the reviewer's mechanism in a point-wise sense. This advantage arises from the stronger statement that mathbfQ\\mathbf{Q}^{ * } performs a projection onto a specific epsilon\\epsilon-close set for any ff-divergences. Further details are provided in the rebuttal to Reviewer 9St7.

We believe that these results are worth including in the paper and plan to incorporate them, along with their proof, in the appendix during the revision. In particular, we believe that the property of mathbfQ\\mathbf{Q}^{ * } being a projection for any ff-divergences is a significant advancement over previous work [35], which only considered KL divergence. Although the proof cannot be fully detailed in the rebuttal due to space constraints, we believe that both the property itself and its proof are highly non-trivial contributions beyond [35]. The basic approach is to verify that mathbfQ(P)\\mathbf{Q}^{ * }(P) satisfies the KKT conditions corresponding to the optimization of minimizing the ff-divergence. However, the main challenges lie in the following aspects, which distinguish this work from [35]:

  • We need to show that mathbfQ(P)\\mathbf{Q}^{ * }(P) satisfies all the KKT conditions corresponding to all possible ff. This requires relying on general properties satisfied by all possible ff-divergences (possibly non-differentiable), rather than specific properties of KL divergence.
  • We must also address the continuous space, which involves infinite-dimensional optimization, in addition to the discrete space.

Actually, we had already obtained all the results mentioned in the rebuttal to Reviewer 9St7 for the discrete case before submission. However, at the time of submission, we neither acknowledged the mechanism mathbfQdagger\\mathbf{Q}^{\\dagger} for the continuous case nor proved that the proposed mechanism mathbfQ\\mathbf{Q}^{ * } for the continuous case performs the ff-divergence projection. Therefore, we did not include these results in the manuscript for consistency. However, thanks to Reviewer 9St7, we were able to dig into the continuous case and prove the results for the continuous case during the rebuttal period. Accordingly, we plan to include these results in the revised paper. In particular, we will add a discussion on the mechanism mathbfQdagger\\mathbf{Q}^{\\dagger} as well as a proof that mathbfQ\\mathbf{Q}^{ * } corresponds to the ff-divergence projection also for the continuous case.

最终决定

This paper makes the following contribution: it considers a generalization of standard single-point LDP to the distribution version. It measures the utility using ff-divergene, which removes the reference distribution in prior work. One key result is the establishment of the minimax lower and upper bounds. All reviewers find the results novel and important. I also think it has made some nice contributions and would like to accept it.

Some possible minor improvements for the camera-ready version: As mentioned by the reviewer, the authors may need to discuss the random-response-based alternative method, given its close relation to the optimal one. Another one is the discussion on the parameter rPr_P.