Lightweight Protocols for Distributed Private Quantile Estimation
By using a private noisy binary search, we achieve quantile estimates under local differential privacy using a number of users that grows with the log of the domain size whereas prior non-adaptive methods use a larger (log-squared) number of users.
摘要
评审与讨论
This paper considers the estimation of quantiles under the LDP framework with bounded integral data. It derives a series of lower bounds under both shuffle-DP and LDP, and proposes an LDP algorithm in an adaptive setting.
给作者的问题
-
For random variables with infinitely many possible values (e.g., Poisson), does the proposed algorithm or framework still apply?
-
Regarding footnote 1, I may be mistaken, but I believe some prior work considers privacy-preserving quantile estimation on a continuous interval (e.g., on ), such as [1] in CDP setting and [2] in the LDP setting. Could the authors clarify how these results relate to their work?
-
The paper specifies a particular range for . For very large privacy budgets (i.e., ), will the results degrade or converge to the non-private results (as seen in [3] and [4])? In other words, do these bounds become consistent with classical results in the absence of privacy constraints?
[1] Lalanne, C., Garivier, A., & Gribonval, R. (2023). Private statistical estimation of many quantiles. In International Conference on Machine Learning (pp. 18399-18418). PMLR.
[2] Liu, Y., Hu, Q., Ding, L., & Kong, L. (2023). Online local differential private quantile inference via self-normalization. In International Conference on Machine Learning (pp. 21698-21714). PMLR.
[3] Chen, L., Keilbar, G., & Wu, W. B. (2023). Recursive quantile estimation: Non-asymptotic confidence bounds. Journal of Machine Learning Research, 24(91), 1-25.
[4] Howard, S. R., & Ramdas, A. (2022). Sequential estimation of quantiles with applications to A/B testing and best-arm identification. Bernoulli, 28(3), 1704-1728.
论据与证据
Yes.
方法与评估标准
Yes.
理论论述
Yes.
实验设计与分析
Yes.
补充材料
Yes.
与现有文献的关系
See the pros below.
遗漏的重要参考文献
No.
其他优缺点
Pros: Quantile estimation under LDP is a significant problem and the non-asymptotic results appear to be interesting.
Cons: The authors assume data are drawn from a bounded integral space (with finitely many values), which seems fairly restrictive and uncommon.
其他意见或建议
See the questions below.
Thanks for your questions and your consideration of our paper
For random variables with infinitely many possible values (e.g., Poisson), does the proposed algorithm or framework still apply?
Without any assumptions on the distribution of the random variable, known lower bounds (as we discuss under related work) show that it is impossible to obtain any meaningful error guarantees, even in the central setting of DP. For concrete distributions (like Poisson with a reasonable bound on its rate, or even continuous distributions), one can usually apply our algorithm combined with bucketing as we discuss in Appendix H. The error guarantees will then depend on how much the CDF can change within a single bucket which in turn depends on the parameter space of the class of distributions. While it is interesting how well classes of infinite or continuous distributions can be discretized, this direction of work is somewhat orthogonal to ours.
Regarding footnote 1, I may be mistaken, but I believe some prior work considers privacy-preserving quantile estimation on a continuous interval (e.g., on ), such as [1] in CDP setting and [2] in the LDP setting. Could the authors clarify how these results relate to their work?
[1] considers continuous distributions with the same error function as ours. However, related to our discussion in Appendix H, their results depend on a parameter that expresses how well the continuous distribution can be discretized into a finite set of intervals on which the CDF does not increase too quickly. [2] extends a line of work (see e.g., Duchi et al. [3]) aiming to minimize the average distance between the estimated median and the data points (and a more generalized form for other quantiles). Lower bounds in [3] show that this error must grow linearly with the range in which the true median lies. In our setting, without further assumptions on the data distribution, this would be linear in . We additionally note that minimizing the distance in the domain between the estimated and true median must also have error , even under central DP. Indeed, if is odd and of the ’s are 0 and the remaining ’s are , then the median is , but changing the data of just a single user can change the median to .
On the other hand, our work aims to minimise the rank error of the quantile and is not subject to these lower bound.
[1] Lalanne, C., Garivier, A., & Gribonval, R. (2023). Private statistical estimation of many quantiles. In International Conference on Machine Learning (pp. 18399-18418). PMLR.
[2] Liu, Y., Hu, Q., Ding, L., & Kong, L. (2023). Online local differential private quantile inference via self-normalization. In International Conference on Machine Learning (pp. 21698-21714). PMLR.
[3] Duchi, J. C., Jordan, M. I., and Wainwright, M. J. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 113(521):182– 201, 2018.
The paper specifies a particular range for . For very large privacy budgets (i.e., ), will the results degrade or converge to the non-private results (as seen in [3] and [4])? In other words, do these bounds become consistent with classical results in the absence of privacy constraints?
Our result states that users suffice to find an -approximate median under LDP with high probability in even for .iThis matches known lower bounds even absent privacy constraints for statistical median estimation (when requiring high probability in ). On the other hand if we just want success probability , absent privacy constraints, users suffice for statistical median estimation. It is an interesting open question to design a protocol with the correct convergence to as , but would likely require quite different techniques than the ones presented in our paper.
In the empirical setting, absent privacy constraints, an algorithm can simply output the median of the data set, so one should expect that as the required number of users would converge to . Again, it is interesting to understand this convergence. In any case, we focus on the common choice of .
I appreciate the authors' response and their efforts to address my concerns. It is indeed different to derive non-asymptotic rank error results, as compared to mean absolute error, which is a common choice in applications (as another reviewer also mentioned). This paper indeed presents many nontrivial results about LDP quantile estimation, except for the finite assumption about , because the entire framework is based on given data sets rather than on potential distributions. Therefore, I will raise the score to 3. However, I still suggest that the authors add more explanation in final version, regarding the proposed setting and error metrics, especially the rationale for choosing this particular quantile error and the additional challenges it poses compared to the traditional setting.
The authors study the problem of estimating quantiles under local differential privacy and under shuffle differential privacy, with applications in distributed and private quantiles estimation. To do so, the paper presents new algorithms. The article presents both upper and lower bounds for the problems at hand. Finally, the article compares experimentally the new methods that it introduces with already existing methods.
update after rebuttal
After the rebuttal, I will maintain my original score.
给作者的问题
- The study focuses on "quantile error," which measures how much probability mass the estimate is off by. However, in some applications, the primary concern is the uncertainty on the actual value (especially when the grid is a discretization of the real line and inherits a distance). Can the results be extended to this setting?
- In Theorem 1.4, the condition n=O(…) suggests that the procedure works as long as there isn’t too much data. This seems counterintuitive, as one might expect more data to make the problem easier. Could you clarify this?
- In the same theorem, why do 𝛼 and 𝜖 appear to interact independently in the bound? Is this a consequence of the specific regime being considered?
- The paper states: “We are typically interested in the high-probability setting, where β=1/poly(B).” Is this because setting β=1/B would be equivalent to making a random guess? If so, adding a sentence to explain this, and clarifying that the polynomial must be greater than B for relevance, could be helpful.
论据与证据
The claims of the article are well-supported by clear proofs and empirical evaluation.
方法与评估标准
The methods and evaluation criteria make sense for the problem at hand.
理论论述
I checked the proofs that are presented in the main body of the article, and I looked at the proof techniques that are used in the appendix. No issue was found, yet I did not check every single detail.
实验设计与分析
I checked the soundness of the experiments that are presented in the main body of the article. I did not however check the correctness of the code that was provided by the authors.
补充材料
I did read some of the proofs that are presented in the supplementary material, and I read the comments from the authors on continuous distributions.
与现有文献的关系
The authors link their work to prior literature and claim (with theoretical evidences) that their new algorithm improves prior state of the art by a polylogarithmic factor in the size of the set on which they work.
遗漏的重要参考文献
NA
其他优缺点
Strengths :
- The article is theoretically sound and overall well written.
- The problem that is tackled is of interest for the community of privacy preserving machine learning.
I overall have a positive opinion of the paper, so the following weaknesses are more in order to improve the paper than they are to point out reasons to reject it.
Weaknesses :
- The structure of the paper makes it slightly difficult to follow, as the technical details of the algorithm are introduced quite late. This forces the reader to jump back and forth to understand how the challenge outlined in the introduction is actually addressed. Adding brief “how-to” explanations earlier could improve readability.
- The definition of an approximate median in Definition 2.2 could be clarified. While it is defined for empirical distributions in the introduction, it is not explicitly extended to general distributions. A quick reminder of this concept in Definition 2.2 would be helpful.
- The article would benefit from a brief conclusion summarizing key insights, highlighting open questions, and outlining possible directions for future work.
其他意见或建议
NA
Thanks for your questions and valuable feedback!
The structure of the paper makes it slightly difficult to follow, as the technical details of the algorithm are introduced quite late. This forces the reader to jump back and forth to understand how the challenge outlined in the introduction is actually addressed. Adding brief “how-to” explanations earlier could improve readability.
We will do our best to add more such explanations. Feel free to let us know if there was any part that was particularly confusing.
The definition of an approximate median in Definition 2.2 could be clarified. While it is defined for empirical distributions in the introduction, it is not explicitly extended to general distributions. A quick reminder of this concept in Definition 2.2 would be helpful.
Thanks for pointing this out. We will clarify in the paper.
The article would benefit from a brief conclusion summarizing key insights, highlighting open questions, and outlining possible directions for future work.
Thanks for the suggestion. We agree, and will include it in the next version of the paper.
The study focuses on "quantile error," which measures how much probability mass the estimate is off by. However, in some applications, the primary concern is the uncertainty on the actual value (especially when the grid is a discretization of the real line and inherits a distance). Can the results be extended to this setting?
Our results cannot be generalized to a domain-based error without making an assumption on the underlying data distribution. Lower bounds from [1] show that any median estimation protocol must have error that grows linearly with B when error is measured with the average distance from the estimate to each data point. Additionally note that minimizing the distance in the domain between the estimated and true median must also have error , even under central DP. Indeed, if is odd and of the ’s are 0 and the remaining ’s are , then the median is , but changing the data of just a single user can change the median to . We expect that similar lower bounds would hold for other data-domain based error functions.
[1] Duchi, J. C., Jordan, M. I., & Wainwright, M. J. (2018). Minimax Optimal Procedures for Locally Private Estimation. Journal of the American Statistical Association, 113(521), 182–201. https://doi.org/10.1080/01621459.2017.1389735
In Theorem 1.4, the condition suggests that the procedure works as long as there isn’t too much data. This seems counterintuitive, as one might expect more data to make the problem easier. Could you clarify this?
Thanks for pointing this out. The statement should be that only users are needed to get an -approximate median. We will update the writing.
In the same theorem, why do and appear to interact independently in the bound? Is this a consequence of the specific regime being considered?
In the following, we ignore logarithmic factors and focus on the polynomial dependence on and . In order to apply privacy amplification by shuffling (Lemma G.5), we have to pick the number of users in a batch . With this many users, shuffle DP ensures that essentially each user in the batch answers threshold queries correctly with probability . With probability of correct answers, it suffices to have users in a batch to estimate the fraction of these users with ’s below a given threshold within . Additionally, there is another reason that the batch has to be this big, namely to ensure that the threshold query of the batch is within of the threshold query of the full data set. Together, this gives sample complexity . The additional logarithmic dependencies come from the fact that these events have to hold with less than constant error probability (since we union bound over steps of the binary search). Finally, we need times as many users, since there are steps to the binary search.
The paper states: “We are typically interested in the high-probability setting, where β=1/poly(B).” Is this because setting β=1/B would be equivalent to making a random guess? If so, adding a sentence to explain this, and clarifying that the polynomial must be greater than B for relevance, could be helpful.
We note that is the failure probability. A random guess would thus give , since the probability of guessing an -approximate median is . Our algorithm guarantees that our median estimate is bad only with very low probability , say .
Thanks to the authors for their response and for integrating the modifications that they mentioned in the paper. I will maintain my score.
This paper studies quantile estimation under local differential privacy. They are interested in the sequentially adaptive local model, where the aggregator queries each user only once, but in rounds where the set of users and the randomizer they are asked to use can depend on information learned in previous rounds.
They first argue via folklore tricks that estimating any quantile can be reduced to estimating the median with twice as many users. Hence, the main task is to estimate the median of either a distribution (where each user samples one example in an i.i.d. fashion from it), or of a dataset from a discrete domain B. They give a sequentially adaptive -DP -round algorithm in the local model that outputs the -approx median whp, as long as the number of users is . This is the main contribution of the paper and follows by using the noisy binary search primitive considered in prior work (Gretta and Price 2024) where you have a number of coins with monotonically increasing success probabilities, a target probability, and can flip coins, with the goal of finding a pair of successive coins containing the target probability. The statistical setting is a straightforward reduction to this problem, but the empirical setting is more challenging since you can't simply sample with replacement from the empirical distribution (since then you might see the same user multiple times- resulting in an algorithm that is not sequentially adaptive). The authors instead use sampling without replacement, along with randomly permuting the users, and argue via martingale-based techniques that for any threshold in , and any , the probability that we draw one of the remaining users that are less than the threshold remains almost unchanged from what it was when . They then argue that the techniques of Gretta and Price for the noisy binary search problem satisfy a robustness property; they work even if we don't flip exactly the same coins, but rather ones with close probabilities. This is sufficient to solve the problem in the empirical setting.
They also show a matching lower bound for sequentially adaptive local model algorithms using Fano's inequality-based techniques of Duchi et al. They also appeal to a result of Edmonds, Nikolov, and Ullman to argue that this gives a separation from the non-adaptive local model, where there is a lower bound of for this problem (for constant ). They also give a result in the shuffle model that achieves better round complexity than the -round local model protocol (with similar total number of users).
给作者的问题
See questions on shuffle model in the 'theoretical claims' section.
论据与证据
The claims seem correct and are largely well explained.
方法与评估标准
The primary results are theoretical and experiments are secondary considerations in the paper - however, they do consider experiments where they compare their method to two other more standard baseline algorithms, and run on datasets of different sizes, drawn from two types of distributions- pareto and uniform. They evaluate the absolute quantile error and success rate (the fraction of times a good quantile is released). They also choose reasonable values in their evaluations. Overall, the experimental setup makes sense.
理论论述
Yes, I checked all the proofs for correctness, and am convinced that the most significant results are correct. The only one I was more unsure of was the shuffle model result, where I didn't fully understand the algorithm that they were using (the proof of Theorem 1.4 is rather vague on this point so fully specifying the algorithm would make it easier to verify).
实验设计与分析
Yes, see methods and evaluation criteria.
补充材料
Yes, I reviewed all the supplementary material (read the theoretical sections in detail, skimmed the experimental section in the appendix)
与现有文献的关系
Prior work has obtained non-adaptive algorithms for quantile estimation in the local model of differential privacy with suboptimal dependence on the domain size; this is the first work to get the correct dependence (leveraging adaptivity). This work fits in the larger literature on the local model of differential privacy (using tools in the model like randomized response and mutual information-based lower bonds), as well as the larger literature on private quantile estimation (extensively explored for the central model of differential privacy as well and a fundamental problem both theoretically and practically).
遗漏的重要参考文献
Not essential, but Cohen et al. (STOC 2023) is the latest paper on quantile estimation/interior point in the central model that should be cited in the related work (where Kaplan et al. and others are mentioned) since it gets asymptotically tight bounds. Also I'd add in a more detailed exposition of work on adaptivity (and separations with non-adaptivity) in the local model (for e.g. Joseph Mao Neel Roth 2019, Joseph Mao Roth 2020, Daniely Feldman 2019, Acharya Canonne Sun Tyagi 22 for e.g.).
其他优缺点
This paper cleverly uses existing tools, and has to jump through a number of technical hoops in order to get them to work for empirical quantile estimation (in the sequentially adaptive local model). Demonstrating the robustness of the Gretta-Price noisy binary search algorithm may be of independent interest.
其他意见或建议
-
instead of adversarial monotonic NBS I'd call the problem robust monotonic NBS (adversarial makes it sound like an adversary can choose coins maliciously).
-
The indices in Lemma 3.1 and 4.1 are overloaded (i and j) which was confusing on first read. Also in Lemma 4.1 the set in the theorem statement should have c_j = 0 not c_i=0.
-
in the proof of Lemma 3.1, the comment on conditioning on pi(1),...pi(i) is confusing because conditioning on X_js does not imply conditioning on pi(1)....pi(i) (but you don't need to condition on that part of the permutation for the argument to go through).
Thanks for your interest in our paper and your valuable feedback!
The only one I was more unsure of was the shuffle model result, where I didn't fully understand the algorithm that they were using (the proof of Theorem 1.4 is rather vague on this point so fully specifying the algorithm would make it easier to verify).
Thanks for pointing this out. We agree that the writing surrounding the proof of theorem 1.4 can be improved. We will update the paper by describing the algorithm in detail and including pseudocode before proving the theorem in the appendix. To clarify, the algorithm randomly partitions the users into batches of size roughly . For a given threshold , our analysis then shows that the shuffled randomized responses to whether the users are above or below suffice to determine the cdf at within additive with sufficiently high probability. Our algorithm thus uses such batches to perform a binary search on . This is similar to the algorithm given by Karp and Kleinberg [1] (section 1.2), but with extra care needed to ensure privacy.
Not essential, but Cohen et al. (STOC 2023) is the latest paper on quantile estimation/interior point in the central model that should be cited in the related work (where Kaplan et al. and others are mentioned) since it gets asymptotically tight bounds. Also I'd add in a more detailed exposition of work on adaptivity (and separations with non-adaptivity) in the local model (for e.g. Joseph Mao Neel Roth 2019, Joseph Mao Roth 2020, Daniely Feldman 2019, Acharya Canonne Sun Tyagi 22 for e.g.).
Thanks for drawing these to our attention. We will include a discussion of these in the paper. In particular, we agree that a more detailed exposition on adaptivity in LDP should be included.
instead of adversarial monotonic NBS I'd call the problem robust monotonic NBS (adversarial makes it sound like an adversary can choose coins maliciously).
We note that the algorithm for NBS works even in the case where an adversary can change coin probabilities maliciously (by at most some amount), hence the name. There is indeed no malicious adversary when we apply the NBS algorithm in our main protocol, but on the other hand, the changing coin probabilities have a complicated distribution. The strong adversarial assumption allows us to handle this, and it is unclear how to analyse our LDP protocol without such an assumption.
The indices in Lemma 3.1 and 4.1 are overloaded (i and j) which was confusing on first read. Also in Lemma 4.1 the set in the theorem statement should have c_j = 0 not c_i=0.
Thanks for making us aware of this, we will fix the typo and streamline the use of free variables in these lemmas.
in the proof of Lemma 3.1, the comment on conditioning on pi(1),...pi(i) is confusing because conditioning on X_js does not imply conditioning on pi(1)....pi(i) (but you don't need to condition on that part of the permutation for the argument to go through).
Thanks for this point. In our first writeup, we defined the martingale as a Doob martingale, wrt the random choices of , but you are right that with the current definition, we should just condition on . We will update the paper.
The paper studies the problem of finding quantiles with constraints of differential privacy. More specifically, it studies shuffle and local differential privacy. The authors proved that the algorithms have utility higher than any known algorithm for the problem and also proved that the local DP algorithm’s bounds are tight and it is impossible to improve it further.
给作者的问题
N/A
论据与证据
The evidence sufficiently supports the claims.
方法与评估标准
Methods and evaluations are appropriate for the problem at hand.
理论论述
The proofs presented in the paper are correct.
实验设计与分析
The experimental design seems valid.
补充材料
I haven't reviewed supplementary material.
与现有文献的关系
The problem of estimating quantiles in differentially private way is well-studied in the literature since it is a common routine for may data analysis algorithms. Local and shuffle differential privacy are also well-studied fields since they allow simplifying privacy guarantees of the systems using DP. However, getting high utility from this algorithms is often hard.
Unfortunately, the LDP algorithms for quantile estimation were not studied enough, so I am really glad that this gap is getting closed.
遗漏的重要参考文献
The paper doesn't lack any specific reference.
其他优缺点
N/A
其他意见或建议
N/A
Thanks for your interest in our paper!
The reviews on this paper are uniformly positive, and all advocate acceptance. The paper addresses an important problem in the context of machine learning, of privately gathering information on the distribution of data, in the form of quantiles. They show improved results for this task over prior work in the distributed/federated setting, under local and shuffle differential privacy. The reviewers appreciated the technical depth and importance of the work. Some minor concerns were addressed clearly during the rebuttal phase, which led to the reviewers increasing their appreciation of the contribution. On this basis, the paper is (strongly) recommended for acceptance in ICML.