PaperHub
7.5
/10
Spotlight4 位审稿人
最低6最高8标准差0.9
8
6
8
8
3.3
置信度
ICLR 2024

Privacy Amplification for Matrix Mechanisms

OpenReviewPDF
提交: 2023-09-23更新: 2024-03-13
TL;DR

We propose an algorithm for computing privacy guarantees of the matrix mechanism with privacy amplification.

摘要

Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning (ML), but, is not readily applicable to the newer state-of-the-art (SOTA) algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC'' (matrix mechanism conditional composition), the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $\epsilon\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem'' has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our algorithm also has practical empirical utility. We show that amplification leads to significant improvement in the privacy/utility trade-offs for DP-FTRL style algorithms for standard benchmark tasks.
关键词
differential privacyprivacy amplificationmatrix mechanism

评审与讨论

审稿意见
8

Given a set of workload queries, the matrix mechanism identifies a set of basis vectors, adds noise to these basis vectors, and then computes DP answers to workload queries via linear combinations of the perturbed basis vectors. This unique algorithm structure poses a challenge to quantifying the privacy amplification by subsampling in matrix mechanism -- each subsampled sensitive data is possibly reused multiple times in computing different basis vectors (in which case the final noise added to each query is correlated). Due to the lack of such privacy amplification results in prior works, the DP-FTRL algorithm (which uses matrix mechanism as a crucial component) in general does not exhibit better performance than the DP-SGD algorithm.

  • In this paper, the authors prove novel bounds for privacy amplification by subsampling for matrix mechanism. The proof decomposes the output distribution into the composition of multiple mixtures of Gaussian mechanisms, where each mixture component is the conditional distribution of the next basis vector given the participation histories of sensitive data and noisy observations in prior rounds.
  • By using this analysis, the authors improve the existing unamplified privacy guarantee for the binary-tree-FTRL algorithm under shuffling by a multiplicative factor of Ω(log(n))\Omega(\sqrt{\log(n)}) where nn is the size of the dataset. This improvement is supported by numerical experiments.
  • Finally, the authors show that this improved privacy analysis enables performance gain for the DP binary-tree-FTRL algorithm under small ε\varepsilon under the CIFAR-10 dataset centralized setup.

优点

  • Novel bounds for privacy amplification by subsampling in matrix mechanisms. The proof involves several interesting new techniques, such as conditional composition and the analysis of the mixture of Gaussians mechanism. These techniques are of independent interest and may shed light on privacy amplification by subsampling/shuffling for other algorithms in the literature.

  • Comparison with related works are discussed thoroughly, making it easy to follow the significance of the results.

缺点

  • The computation cost for the privacy bounds seems high. Namely, the authors need to compute the privacy loss random variable for the product mixture of the Gaussians mechanism, which involves combinatorially many mixture components. Such procedures seem likely to suffer from high computation cost and numerical instability (especially under a small number of rounds or a small ε\varepsilon).

问题

  • Could the authors discuss more about the computation cost and numerical stability of the proved privacy bound? See weakness for more details.

  • Related to the above question, given an arbitrary pair of ε,δ\varepsilon, \delta values, how computationally feasible is it to compute the appropriate noise scale that ensures (ε,δ)(\varepsilon, \delta)-DP of the proposed algorithm?

  • In the proof of Lemma B.4, page 15, the third inequality from top PrxM(D)[xt]PrxM(D)[xt]Pr_{x\sim M(D)}[x\geq t]\leq Pr_{x\sim M(D')}[x\geq t], is there a typo and M(D)M(D') is meant to be M(D)M'(D)? I'd also like more clarifications regarding why this inequality suffices to show the MoG mechanism M\mathcal{M} is dominated by a MoG mechanism M\mathcal{M'}. Specifically, how this inequality suffice to bound the term eεPrxM(D)[xt]-e^{\varepsilon}Pr_{x\sim M'(D')}[x\geq t] in the dominance term Hε(M(D),M(D))H_{\varepsilon}(M'(D), M'(D')).

评论

Thank you for the supportive and thoughtful review. Below we address the weaknesses/questions raised by the reviewer.

“Could the authors discuss more about the computation cost and numerical stability of the proved privacy bound?”

In terms of computation cost, in Appendix F.2, we address the challenge of handling the potentially 2n2^n mixture components mentioned in the reviewer’s weaknesses. Specifically, by rounding the entries of CC to the nearest multiple of some discretization parameter Δ\Delta, we can substantially reduce the number of mixture components at a small cost in accuracy. While we have not proven the asymptotic runtime of MMCC, we believe with the above discretization it should be O(n3+n2log2nf)O(n^3 + n^2 \log^2 n \cdot f), where ff is some function (independent of nn) of the above discretization and approximation parameters used in the PLD accounting. Furthermore, many parts of our implementation can be vectorized or parallelized.

In addition, for some specific classes of matrices, the computation can be dramatically sped up. For example, in Appendix F.4 we discuss how the calculation is much faster for the binary tree mechanism and for banded Toeplitz C.

For numerical stability, in our implementation, the issues which were inherent to our approach (i.e. that wouldn’t appear when doing PLD accounting for DP-SGD) had to do with the fact that the mixtures of Gaussians arising from product distributions may have components with very small probabilities, even when we use the discretization trick in Appendix F.2 to reduce the number of components. In this case, we can always move these small probabilities around and absorb the cost in our delta term. That is, we can take components with at most, say, 102010^{-20} of the probability mass in the mixture of Gaussians, delete these components, and increase the probability mass of a different component to make sure the total probability remains 1. Each time we do this, we pay a cost of 102010^{-20} in the final δ\delta we report, but this is ultimately negligible compared to e.g. the choice of δ=106\delta = 10^{-6} we used throughout the paper.

If there are specific questions the reviewer has about the runtime/implementation that were not discussed here, we would be happy to discuss them in followups.

“Related to the above question, how computationally feasible is it to compute the appropriate noise scale that ensures (ϵ,δ\epsilon, \delta)-DP of the algorithm?”

The best approach we know of is to use binary search to (approximately) invert the function given by MMCC to compute ϵ\epsilon given σ\sigma. So, in the aforementioned settings where MMCC can be run very quickly, one would need to pay a log1/Δ\approx \log 1 / \Delta multiplicative factor to compute σ\sigma up to precision Δ\Delta, but the overall procedure is still quick. For general matrices, a single run of MMCC can take several hours or longer for n2000n \approx 2000, and the log1/Δ\log 1 / \Delta overhead may make this binary search infeasible.

Clarifications on Lemma B.4

Yes, this is a typo! Apologies for any confusion this caused and thanks for pointing this out; we will fix it in the revision.

As for the clarification on why the stated inequality suffices to prove the lemma: To clarify, we believe the reviewer’s question can be rephrased as follows: Since the hockey stick divergence for MM is maxt[Prx M(D)[xt]eϵPrx M(D)[xt]]\max_t [ Pr_{x ~ M(D)} [x \geq t] - e^\epsilon Pr_{x ~ M(D’)}[x \geq t]], why does it suffice to bound Prx M(D)[xt]Pr_{x ~ M(D)} [x \geq t] by Prx M(D)[xt]Pr_{x ~ M’(D)} [x \geq t] and ignore the term depending on DD’? The answer is that both M(D)M(D’) and M(D)M’(D’) are N(0,σ2)N(0, \sigma^2), so the term depending on DD’ is the same for both mechanisms. The proof can definitely be made more clear even after fixing the typos and we will rewrite it accordingly in the revision.

评论

Thanks for the clarification. I appreciate the authors' efforts in making such a seemingly computationally infeasible analysis practical and the frankness in discussing the limitations in detail. I have increased the score accordingly.

评论

Thanks to the reviewer for raising the score! We are happy to hear of the reviewer's appreciation of the discussion.

审稿意见
6

This paper presented an algorithm for calculating tight privacy guarantees of a generic matrix mechanism. The paper opens with a conditional composition theorem which improves the analysis when the noises are correlated. Building on the first theorem, the paper then derives privacy guarantees for the matrix mechanism under uniform sampling. The paper also includes a case study where the algorithm is applied to the binary tree mechanism, backed by experimental evidence.

优点

  1. The theoretical results look novel and sound. The proposed algorithm is also demonstrated effectively in experiments.
  2. The conditional composition theorem could be potentially useful for analyzing general DP mechanisms that involves correlated noise.

缺点

This paper could be more easy to understand if the author could define the matrix mechanism formally in the beginning with some intuitive interpretation of each variable.

问题

None

评论

Thanks to the reviewer for their supportive review.

"This paper could be more easy to understand if the author could define the matrix mechanism formally in the beginning with some intuitive interpretation of each variable.”

Thanks for the suggestion on improving the clarity of the paper. To help the reader understand the matrix mechanism, we will discuss two common instances of the matrix mechanism in Section 1.2, where it is first formally defined. Specifically, we will:

  • Move the discussion in Section 5 about the connection between the binary tree mechanism and the matrix mechanism to Section 1.2.
  • Make explicit the connection between the workload matrix A and optimization algorithms such as DP-SGD/DP-FTRL. In particular, we will say when A is the all-ones lower-triangular matrix, and x is a stream of gradients computed in DP-FTRL then the rows of Ax are the sums of gradients seen so far. Then, the ith iterate in DP-FTRL is the initialization point plus ith row of Ax+Bz, i.e. the initialization plus the noisy sum of the first i gradients seen, and if C = I, we recover independently noising each gradient / DP-SGD.

(This is a partial duplicate of a response to reviewer 9Vpe)

审稿意见
8

This paper proposes an algorithm to improve the privacy analysis for general matrix mechanisms, a major building block behind DP-FTRL. The paper first proposes a “conditional composition theorem”; then the paper characterizes "high-probability PLD" of matrix mechanisms through Mixture of Gaussian mechanisms.

优点

This work is important for the further development of DP-FTRL. The idea of characterizing the worst-case of matrix mechanism through MoG is interesting. The experiment shows improved privacy-utility tradeoff for DP-FTRL.

缺点

I don't see major problem with this paper.

However, I did not check the details of the proof (though the high-level idea in the maintext makes sense to me).

问题

Can the author comment about the novelty of Theorem 3.1? It seems to be a simple extension of Lemma A.1 in [1]?

[1] Cohen, Edith, and Xin Lyu. "The Target-Charging Technique for Privacy Accounting across Interactive Computations." NeurIPS (2023)

评论

Thank you for the supportive review. Below we address the question of the reviewer.

Can the author comment about the novelty of Theorem 3.1? It seems to be a simple extension of Lemma A.1 in [1]?

Some of the ideas in Theorem 3.1 by itself may not have high-degree of novelty; as the reviewer mentions Cohen and Lyu state something similar, and similar ideas appeared in Erlingsson et al. and Balle et al. (which we mention in the paper). There are some subtleties in correctly extending their Lemma A.1 to compositions of mechanisms instead of a single round mechanism, in a way that fits our use case. In our opinion, the main contributions of Section 3 are

  • Explicitly writing down the general form
  • The framework given by combining Theorem 3.1 with Observation 3.2 to analyze correlated noise mechanisms via a series of independent mechanisms, which we demonstrate is versatile in the later parts of the paper.
审稿意见
8

This paper considers the problem of privacy amplification due to subsampling or shuffling for the matrix mechanism, which applies to famous algorithms such as binary-tree based algorithm or DP-FTRL. This problem should be interesting to the DP community.

优点

This paper proposes MMCC, the first algorithm to analyze privacy amplification for general matrix algorithm. By this method, the paper shows an improved privacy amplification result for the binary-tree based algorithm. Furthermore, the conditional composition theorem, as a side product, should have broader utility. Finally, empirical results reveal a performance improvement due to the better accounting for FTRL.

缺点

Generally speaking, I do not have too much to complain about this paper. The paper considers an interesting question with some nice results. It also lists some reasonable future directions which may help further strengthen the paper. Some minor comments:

  1. Although the new privacy amplification applies and improves DP-FTRL, I still believe this paper is more interesting to the DP theory community. Therefore, ICLR might not be the best fit for this paper.
  2. As compared to other FFT-based accounting methods, this algorithm should be much more time-consuming. It remains an interesting question to reduce the running time for some specific use cases.
  3. I would like to see the authors further improve the clarity of the paper. For now, the paper is not quite friendly to the readers who do not have full background on DP-FTRL or DP accounting. One specific suggestion is to explain how DP-FTRL and binary-tree based algorithm fit into the framework defined in section 1.2.

问题

Please refer to the section above.

评论

Thank you to the reviewer for their supportive and thoughtful comments. Here we wish to address some of the weaknesses put forth in the review:

“As compared to other FFT-based accounting methods, this algorithm should be much more time-consuming. It remains an interesting question to reduce the running time for some specific use cases.”

We agree this is an interesting direction, and one we are currently pursuing as followup work. If the reviewer has not done so already, they may be interested to read Appendix F where we discuss some tricks we use to speed up the calculation at a small cost in accuracy in our implementation. In addition, in Appendix F we discuss how for some special classes of matrices such as the binary tree mechanism and banded Toeplitz matrices, the computation can be dramatically sped up.

“I would like to see the authors further improve the clarity of the paper. For now, the paper is not quite friendly to the readers who do not have full background on DP-FTRL or DP accounting. One specific suggestion is to explain how DP-FTRL and binary-tree based algorithm fit into the framework defined in section 1.2.”

Thank you for the suggestion! To improve Section 1.2, we will:

  • Move the discussion in Section 5 about the connection between the binary tree mechanism and the matrix mechanism to Section 1.2.
  • Make explicit the connection between the workload matrix A and optimization algorithms such as DP-SGD/DP-FTRL. In particular, we will say when A is the all-ones lower-triangular matrix, and x is a stream of gradients computed in DP-FTRL then the rows of Ax are the sums of gradients seen so far. Then, the ith iterate in DP-FTRL is the initialization point plus ith row of Ax+Bz, i.e. the initialization plus the noisy sum of the first i gradients seen, and if C = I, we recover independently noising each gradient / DP-SGD.

(This is a partial duplicate of a response to reviewer 5cQp)

评论

Thank you to the reviewers for their careful reading of the paper and for the insightful questions and suggestions which will help us greatly improve the quality and readability of the paper. In addition to the changes we have mentioned in our responses to specific concerns raised by reviewers, we want to make the reviewers aware of a change we have made to the manuscript.

We found a bug in the code used to generate Figure 4, which invalidates the result in that figure (the rest of the figures/results in the paper are unaffected). In the current Figure 4, we had handcrafted a factorization A=BCA=BC to show that “every-round” sampling and MMCC enable better amplification than the banded DP-MF paper’s sampling scheme + amplification analysis due to more randomness in the sampling. However, in the light of the bug, the factorization we handcrafted is suboptimal. But the good news is that the natural binary tree mechanism from Kairouz et al. that we consider in the rest of the paper, enables us to demonstrate the same phenomenon that we initially intended. We have updated Figure 4 and Section 6.3 (and text in the paper referring to this figure/section) appropriately.

We have already uploaded an updated pdf reflecting this change, so the reviewers can make an informed decision. We still plan to make a further revision to incorporate the changes mentioned in our reviewer-specific comments.

AC 元评审

This paper introduces the first algorithm which analyzes privacy amplification for general matrix mechanisms. As a main technical contribution, a conditional composition theorem is derived that enables to derive new bounds for privacy amplification by subsampling with with correlated noise. The paper includes a case study on the binary-tree mechanism with supportive empirical evidence demonstrating performance improvements.

The methods derived in this paper, especially the conditional composition theorem, may be beneficial for analyzing privacy amplification in subsampling/shuffling also for other algorithms. The paper contains a detailed comparison with related works, and a discussion of the limitations of the approach.

Please us the front from the ICLR template in the final version.

为何不给更高分

The paper make a solid theoretical contribution that allows gaining further understanding of matrix mechanisms in DP training and could spur future work. However, in light of certain limitations that might restrict the practical applications for now, I think it's most suitable to feature this paper as a spotlight instead of an oral presentation.

为何不给更低分

The derived technique could have application in other settings with correlated noise/updates.

最终决定

Accept (spotlight)