Private Lossless Multiple Release
We study how to coordinate private releases for multiple privacy parameter values with no loss in privacy or utility compared to single releases
摘要
评审与讨论
This paper introduces a lossless multiple release mechanism for differential privacy (DP), allowing analysts with different trust levels to receive private data releases with distinct privacy guarantees. Unlike previous methods, this approach ensures that multiple releases do not accumulate unnecessary privacy loss while maintaining the same accuracy as a single release at the most relaxed privacy level. Traditional DP mechanisms operate under a fixed privacy budget, and any additional release of the same dataset typically increases cumulative privacy loss. However, real-world applications often involve multiple access levels—such as different levels of trust, security clearance, or evolving accuracy needs over time. The authors propose a model where an analyst initially receives a high-privacy (low-accuracy) release but can later receive a less private, more accurate release without paying an extra privacy cost beyond that of the latest release.
给作者的问题
see Weaknesses
论据与证据
Yes
方法与评估标准
Yes
理论论述
Yes
实验设计与分析
Yes
补充材料
No
与现有文献的关系
The paper presents a framework based on Gaussian noise mechanisms, enabling multiple DP releases while preserving accuracy and privacy guarantees.
遗漏的重要参考文献
Yes
其他优缺点
Strengths The paper extends the concept of gradual release to a more general multiple release setting, allowing for arbitrary orderings of privacy parameters.
The proposed framework is well-supported by formal theorems.
No additional privacy cost beyond the least private release.
The paper is well-organized, with clear definitions and intuitive explanations.
Supports various data types, including histograms and factorization mechanisms.
Weaknesses The approach is currently applicable only to additive Gaussian noise mechanisms. Requires further exploration for mechanisms like the exponential mechanism.
While the theoretical results are strong, the experimental section is relatively limited.
The computational cost of implementing lossless multiple release is not fully analyzed. A discussion on efficiency, especially for high-dimensional data, would be useful.
The paper could benefit from a comparison with alternative privacy-preserving mechanisms beyond gradual release, such as Rényi DP-based approaches.
Practical deployment may require careful calibration of noise and privacy budgets.
其他意见或建议
see Weaknesses
We thank the reviewer for their reading and comments on our paper. We address some of the perceived weaknesses below:
W1: only applicable to Gaussian additive noise. Theorem 3.5 holds for any additive noise mechanism based on a noise distribution satisfying a convolution preorder (Definition 1.1). Besides the Gaussian distribution, the Laplace, zero-mean Skellam, and Poisson distributions all meet this criterion. In an updated version of the manuscript, we include more details on sampling for general distributions and for the Poisson distribution in particular.
The exponential mechanism cannot in general be expressed as a noise additive mechanism. Whether it can be implemented to satisfy lossless multiple release is an interesting open problem.
W2: computational cost. The computational efficiency of our methods is fairly straightforward, so we did not really discuss it, but we agree with the reviewer that a discussion should be included. Since the additive noise mechanisms we consider add independent noise in each dimension, the space and time complexity of implementing it with lossless multiple release scales linearly with each dimension. We consider one-dimensional noise henceforth. To allow for releases in arbitrary privacy order, we need to store all past releases when making the ’st release. For time complexity, note that any release requires drawing one (possibly conditioned) noise sample. The time complexity of the sampling depends on the particular distribution: In the case of Gaussian noise, it always corresponds to a single sample from a Gaussian.
W3: comparing to mechanisms beyond gradual release, e.g. Renyi-DP. Lossless multiple release (Definition 3.1) does not rely on a fixed privacy notion but rather ensures, information-theoretically, that a set of releases contain no more information than the least private release. In particular, for any fixed , we support -RDP lossless multiple release with respect to .
W4: practical deployment requiring careful calibration. Generally, implementations of differential privacy require carefully choosing privacy parameters. Our contributions in this paper arguably make this easier in some contexts: If multiple releases of a statistic are to be made, we can sometimes (Theorem 3.5) do it dynamically at no cost in the privacy-utility trade-off. Not having to decide on privacy parameters up front saves resources spent on planning.
This paper examines the problem of multiple private data releases with varying privacy parameters, which are assigned sequentially and in arbitrary order. Compared to [LWRR'22], the authors present a simpler analysis that avoids Brownian motion techniques and explicitly provide a sampling method. Their approach extends to additive noise mechanisms whose distributions satisfy a specific convolutional property, termed convolution preorder. As an application, they propose an efficient method for gradually releasing sparse histograms with a runtime independent of the number of dimensions.
给作者的问题
N/A
论据与证据
The theorems appear to be correct to my best knowledge, and proofs are easy to follow.
方法与评估标准
N/A as this is mainly a theory paper.
理论论述
The theorems appear to be correct to my best knowledge, and proofs are easy to follow.
实验设计与分析
N/A as this is mainly a theory paper. The numerical evaluation looks reasonable.
补充材料
I check the omitted proofs in the supplementary material.
与现有文献的关系
See the strength and weakness section below.
遗漏的重要参考文献
The references look proper to me.
其他优缺点
While the paper presents a simple and explicit method for releasing multiple DP levels of a query, its novelty compared to the Brownian mechanism in [LWRR'22] seems rather limited—though it is worth noting that [LWRR'22] studies ex-post DP, a slightly different notion than the one considered here.
For the multiple release problem, the Brownian mechanism provides a natural way to correlate DP noise. While the authors claim to extend [LWRR'22] by allowing arbitrary release orders, this improvement appears to be a straightforward conditioning argument. Given future realizations, the noise distribution follows from a Brownian bridge, making the explicit form and sampling strategy fairly direct. Consequently, the claimed advantages of simpler analysis and explicit sampling seem rather natural, and the postprocessing and factorization theorems also follow in a straightforward manner.
Thus, the main technical contribution of this work lies in its extension to convolutional preorder noise mechanisms and its application to the sparse histogram problem.
其他意见或建议
Post-rebuttal: I agree that a direct and concise analysis has its merits. While I still feel the core idea is somewhat natural given the Brownian mechanism, I recognize its value in various differential privacy applications, as echoed by other reviewers.
We thank the reviewer for their careful reading and comments on our paper. We address the perceived weaknesses below:
W1: Relation between Algorithm 1 and the Brownian mechanism. We agree with the reviewer that Algorithm 1 can be interpreted as the Brownian mechanism combined with Brownian bridge sampling, but we would argue that it is not a trivial translation. Given the focus on ex-post -DP in [WWRR’22], it is not immediate that the Brownian mechanism as stated can be sampled in arbitrary order and that it is lossless.
Second, we believe that there is a value in deriving these results from first principles. Avoiding the language of continuous time processes makes for a less technically demanding exposition, increasing the chance that these results see real-world uptake.
W2: technical contribution. We emphasize that our Theorem 3.5 makes a statement not only for Gaussian noise but (as the reviewer points out) for any additive noise distribution satisfying Definition 1.1. There are many such distributions, including the Laplace, zero-mean Skellam, and Poisson distributions.
In an updated version of the manuscript, we include a version of Algorithm 1 for generic noise distributions. We also give privacy guarantees for the Poisson distribution, and use it as an illustrative example of lossless multiple release for discrete probability distributions (which is what can be implemented on discrete computers).
This paper investigates the private lossless multiple release problem and presents a solution for a broad class of mechanisms (e.g., Gaussian, Laplacian, Poisson) based on additive noise, where the noise distribution satisfies a convolution preorder property. Unlike private gradual release, multiple release does not require an increasing privacy budget. Additionally, this paper explores two applications: one for the factorization mechanism and another for sparse histograms, demonstrating its practicality.
update after rebuttal
I will keep my score unchanged.
给作者的问题
The application to the factorization mechanism assumes that the strategy matrix R remains the same across all releases. However, in practice, different users may require different strategy matrices. For example, in the gradient descent setting, the workload matrix A may vary due to differences in learning rates and momentum weights. How would the proposed method address the multiple release problem in such cases?
论据与证据
Yes.
方法与评估标准
Yes.
理论论述
No. I didn't check the proofs, but the theorems make sense to me.
实验设计与分析
Yes. The experiment for lossless release demonstrates the correctness of the claims.
补充材料
No.
与现有文献的关系
The method can help save private budgets in application that needs multiple releases.
遗漏的重要参考文献
No.
其他优缺点
Strengths: The paper provides a solution for private multiple releases that does not require an increasing order of privacy budget. The theorem generalizing to multiple distributions is well-developed, demonstrating its practicality. Additionally, the approach helps users conserve their privacy budget. The special case of the Gaussian mechanism is particularly easy to understand.
Weaknesses: The solution is limited to independently distributed mechanisms, whereas, in practice, correlated noises are widely used [A]. The paper does not address this limitation. To my knowledge, the composition of correlated Gaussian mechanisms is fundamentally different [B], and lossless multiple release may not be feasible in such cases.
[A] Koloskova, Anastasiia, Ryan McKenna, Zachary Charles, John Rush, and H. Brendan McMahan. "Gradient descent with linearly correlated noise: Theory and applications to differential privacy." Advances in Neural Information Processing Systems 36 (2023): 35761-35773.
[B] Xiao, Yingtai, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. "Answering private linear queries adaptively using the common mechanism." arXiv preprint arXiv:2212.00135 (2022).
其他意见或建议
None.
We thank the reviewer for the feedback and questions. Our technique applies to additive noise mechanisms as well as to invertible post-processing of additive noise mechanisms. In particular, it applies to factorization mechanisms for which the noise added to queries is correlated. This is a rather large class of mechanisms, but the reviewer is correct that it does not include all mechanisms. For example, we do not know how to do lossless private multiple release of iterative methods such as private SGD.
Regarding the question about factorization mechanisms with multiple strategy matrices this is a general question that is not specific to multiple release. The situation is usually modeled by combining all strategy matrices into one. The workload matrix can be changed freely as long as it can be expressed as a matrix product LR, where R is the strategy matrix.
This paper studies the problem of lossless multiple release under differential privacy: we want to release a noisy answers to a query satisfying various levels of privacy such that any subset of the noisy answers contain at most as much information about the statistic as the least private noisy answer. This work builds on results about gradually releasing noisy answers with increasing budget. The authors identify a condition of a noise distribution called convolution preorder that is sufficient for an additive noise mechanism satisfying lossless multiple release. This approach is illustrated through the case of Gaussian noise and applications are given for factorization mechanisms such as the matrix mechanism and sparse histograms.
给作者的问题
One area of improvement is with regard to motivating the lossless multiple release setting. What is a natural setting that would be analogous to the Bell-LaPadula model for privacy? In what setting would you want one analyst to have a less noisy estimate than another?
论据与证据
The claims in this paper are well supported by theoretical results.
方法与评估标准
The proposed methods make sense for the problem.
理论论述
I checked the main text proofs - but not the appendix proofs - and did not spot any errors.
实验设计与分析
The empirical evaluation is reasonable and a good sanity check but is unnecessary.
补充材料
I did not review supplemental material for this paper.
与现有文献的关系
This paper deepens our understanding of additive noise mechanisms used in differential privacy beyond the gradual release setting.
遗漏的重要参考文献
I am not aware of essential references that are not discussed.
其他优缺点
Interesting paper; the organization made the paper clear and easy to follow. Both the sections on Gaussian noise and factorized mechanisms were helpful in looking at specific instances of the general result.
其他意见或建议
In the camera ready version, I don't think the empirical evaluation is necessary. I understand why you added it to the submission though.
We thank the reviewer for the positive review and the question about the lossless multiple release setting.
Realistic privacy settings could be the military setting itself (which Bell-LaPadula was designed for), or other settings where analysts have different trust levels (here the more trusted analysts would get more accurate releases). Another example is if a company wants to release a statistic, say, user statistics: they could make an accurate release for their own data analysts, a less accurate release for external consultants, and include an even less accurate release in a report for shareholders or other external actors. A benefit of this is that lossless multiple release allows for dynamic updates: an external consultant that later gets employed directly by the company could get the more accurate release without constituting a privacy violation or requiring an increased privacy budget to reach the same accuracy.
This paper studies the problem of lossless multiple release under differential privacy: the goal is to release noisy answers to several queries while satisfying various levels of privacy, under the following constraint: any subset of the noisy answers reveals no more information about the statistic than the least private answer in the subset. This work builds on prior results about gradually releasing noisy answers with increasing privacy budgets, significantly extending them and introducing new applications. All the reviewers are supportive and agree that this paper would be a valuable addition to ICML.