Distributed Differentially Private Data Analytics via Secure Sketching
We present efficient mechanisms for differentially private data analysis in a distributed setting via cryptographic protocols and linear sketching.
摘要
评审与讨论
In this paper the authors are attempting to find a way to build solutions that have utility as high as in case of central DP, but have as little assumptions as in case of local DP.
The most common example of such an approach is privacy amplification by shuffling. However, the number of mechanisms that could use shuffling is limited. Hence, the paper proposes to use linear transformations instead of shuffling. The paper further suggests some applications of this idea to linear regression and low-rank approximation.
给作者的问题
Your definition of corrupted clients / servers seems unusual to me since it assumes that they are faithfully following the protocol and we just know their result without privacy protections. Is this correct? (If yes, I would suggest calling them spoofed or something similar.)
论据与证据
The evidence sufficiently supports the claims.
方法与评估标准
Methods and evaluations are appropriate for the problem at hand.
理论论述
The proofs presented in the paper are correct.
实验设计与分析
The experimental design seems valid.
补充材料
I haven't reviewed supplementary material.
与现有文献的关系
While this is the first paper discussing use of MPC to replace linear transformations in the central DP model, it is not the first paper (as admitted by the paper itself) that discusses the idea of using MPC to avoid assumptions on central aggregators. The paper proposes alternative algorithms for low-rank approximation and for ridge regression which are better than the one achievable in the local model, but worse than the central model (which is not a surprise).
遗漏的重要参考文献
The paper doesn't lack any specific reference; however, the discussion of shuffle model and other deployments where the central server is replaced by MPC is lacking details.
其他优缺点
The concept of decentralizing the traditional central server model and replacing it with an MPC protocol is a driving force behind the current push to apply DP in real-world scenarios.
Given this trend, the design of algorithms that can be easily translated into straightforward MPC protocols is of paramount importance. Unfortunately, the paper under review tackles problems that lack immediate, real-world applications. This disconnect between the theoretical focus of the paper and practical applicability makes it challenging to argue convincingly for the significance of its results. Especially considering that the theoretical results are not very complex either.
其他意见或建议
N/A
We thank the reviewer for the constructive feedback and questions.
The discussion of shuffle model and other deployments where the central server is replaced by MPC is lacking details.
Regarding missing details on replacing trust in a central server by MPC, we made an extensive comparison in Section 3.3. A complete introduction for MPC is out of scope, but we give some details for our constructions in the appendix. If the reviewer thinks specific points are missing, we would be grateful for pointers.
Your definition of corrupted clients / servers seems unusual to me since it assumes that they are faithfully following the protocol and we just know their result without privacy protections. Is this correct? (If yes, I would suggest calling them spoofed or something similar.)
The privacy guarantees will hold even if a large fraction of corrupted parties deviate from the protocol description. We currently only require the semi-honest assumption for the utility guarantees.
The authors introduce the linear-transformation model (LTM) for distributed differentially private data analytics. The main question that the authors aim to address is "what is the least expressive F need to be securely implemented such that distributed DP utility is comparable to that of the central model". In LTM, the key idea is to leverage efficient secure multiparty computation (MPC) techniques to implement the linear transformation in a distributed setting. Interestingly, the authors have provide a new definition (Definition 3.1) Trusted Computation Model for Differential Privacy when there are colluded clients (with Adversary). The authors also discussed the usage of LTM for tasks such as estimating frequency moment and low-rank approximation. They support their contributions with detailed theoretical utility and privacy guarantees and also an empirical evaluation using real-world datasets (Table 2) demonstrating the error is close to that of the central setting.
给作者的问题
Q1: What happens with lager privacy budget?
Q2: Please consider to add an explicit threat model.
论据与证据
I find most of the claims to be supported.
方法与评估标准
Section 5 evaluate the proposed approach for low rank approximation and the evaluation is convincing for small epsilon values.
理论论述
Didn't check all the proofs in detail, but the result looks correct.
实验设计与分析
I'm wondering what happens when the privacy budget is large (when )?
补充材料
No.
与现有文献的关系
This work has good potential to influence future research in improving the privacy-utility trade-offs of differential privacy using crypto tehcniques.
遗漏的重要参考文献
No.
其他优缺点
The idea is sound and the author provide detailed theoretical analysis for the proposed algorithms.
其他意见或建议
D1. Please move Figure 1 to page 2. D2. Please either user different line style/ticks for all figures. For example, it is super hard to read Figure 2 based only on the color coding.
We thank the reviewer for the constructive feedback and questions.
We will incorporate the useful editorial suggestions about our Figures in the final version.
What happens with lager privacy budget?
We discuss our choice of privacy budget in the experiments in our answer to reviewer sQPT as well. We chose relatively small values, to showcase the separation of different mechanisms in the LTM, which is more prominent when more noise is added. Increasing for our Gaussian noise mechanism in the LTM while fixing , can be interpreted as increasing , so Figure 2 shows that increasing decreases the error significantly in all settings and for all values of .
Please consider to add an explicit threat model.
We will further clarify the threat model in the next revision and make it explicit already at the beginning of Section 3. Specifically, in our privacy analysis, we assume that up to clients and up to servers may collude and share information with each other, while otherwise following the protocol honestly. This type of adversarial behavior is commonly referred to as passive, semi-honest, or honest-but-curious in the cryptography literature.
The paper introduces the Linear Transformation Model (LTM), a new differential privacy (DP) trust model between central DP, where a single party is trusted, and local DP, where no party is trusted. In the LTM, a linear computation can be used to transform a set of private inputs of parties before they are revealed. This is especially suitable to Multiparty Computation (MPC) techniques based on a set of servers where at least one of them does not collude with the adversary, as linear computations can be performed efficiently with such cryptographic tools.
The paper shows the accuracy gain of the LTM with respect to the Local Model of DP. In addition, it provides arguments of its better suitability with respect to the Shuffle Model, which also lies in between central and local DP.
给作者的问题
Please address my concerns with respect to the characterization of the error induced by both in theory and in the empirical evaluation.
论据与证据
- The main claim of the paper is that the LTM provides substantial accuracy advantages with respect to local DP.
I am partially convinced about this claim. It is clear that performing a secure linear transformation can lead to a significant privacy amplification. However, I am not convinced that the utility analysis of Section 4 completely characterizes this gain. The term which impacts the accuracy of the protocol is only analyzed order-wisely, while it should be more concretely quantified.
Therefore I am not sure about the magnitude of the gain.
- As a second claim, the paper argues that the LTM is more convenient than the Shuffle Model in terms of computation and communication cost. I am convinced by the authors in this point.
方法与评估标准
Empirical evaluation criteria seems reasonable.
理论论述
I have checked the proofs of Theorems 3.4 and 3.5, which seem correct.
实验设计与分析
In general the paper seems experimentally sound. However, some parts of it are hard to follow so I am not sure that all the approximation error that impacts the accuracy is included in the comparison (see the previous comments on ).
补充材料
I have reviewed appendices A, B, D and G.
与现有文献的关系
I think the paper is adequately related to broader scientific literature.
The paper positions fairly well with respect to shuffle DP. However, certain statements seem unfair. For example, at the beginning of Section 3 the paper argues that the shuffle model is weaker because it assumes that all parties are honest and do not collude with the adversary. This seems easy to fix in the shuffle model: in the presence of corrupted parties, one can only consider the honest subset of parties to be part of the effective shuffle.
遗漏的重要参考文献
To the best of my knowledge, there are no essential references missed in the paper.
其他优缺点
As said before, the paper is in general hard to follow and theorems 3.4 and 3.5 could be more clearly stated. In mathematical statements, some conditions rely on parameters that are not previously defined. For example, depending on in the randomizer function of Sec 3.2
其他意见或建议
Table 1 ignores the effect of for space. However I think it is important to find a strategy to concisely include this effect.
We thank the reviewer for the constructive feedback and questions.
I am not convinced that the utility analysis of Section 4 completely characterizes this gain. The term which impacts the accuracy of the protocol is only analyzed order-wisely, while it should be more concretely quantified.
Regarding the concrete quantification of , we are unsure if the reviewer means the possible values of that can be chosen and the concrete constants in our analysis. If the question’s focus is on the constants in our theorem statements, for low rank approximation the constant leading is 1. For linear regression, we give the constants resulting from our analysis in Equation 12 in the appendix for linear regression. However, the final constants depend on the analysis of the chosen sketching constructions, which is taken from previous work. The target dimension of sketching constructions of related work from Table 3 have been analyzed up to absolute constants, but the absolute constants themselves are not known. If the focus of the question is rather on the exact value of , we remark that can be chosen to be any number in (0,½), so one can trade off multiplicative and additive error to get a best possible tradeoff.
Experimental Designs Or Analyses: I am not sure that all the approximation error that impacts the accuracy is included in the comparison. (see the previous comments on ).
Regarding the value of in the experiments, it is not possible in general to experimentally determine which part of a cost function comes from a multiplicative error and which comes from an additive error. We can use the worst case bounds as a baseline, but these might not be tightly analyzed. The results presented in the experiments implicitly set the multiplicative error to a small constant and measure the excess risk, which is the closest approximation of a possible additive error.
Table 1 ignores the effect of for space. However I think it is important to find a strategy to concisely include this effect.
The comparisons in Table 1 are with respect to additive error, since there is no multiplicative error in the central model. Multiplicative errors and additive errors cannot be fairly compared to each other, so we simply remark on the existence of our approach’s multiplicative error in the Table caption. These errors are analyzed in the proofs of the theorems in Section 4.
At the beginning of Section 3 the paper argues that the shuffle model is weaker…
We apologize for the confusion caused by the statement in Section 3 — this was not our intention. Our point was not to argue that the shuffle model is inherently weaker, but rather to highlight that the standard formalization of differential privacy in the shuffle model does not, as stated, account for the case of colluding clients. Our motivation for introducing a refined definition was simply to ensure that the privacy guarantee holds even when a bounded number of clients might collude with the adversary, as we believe this is important for the applicability of the definition to real world scenarios. We fully agree with the reviewer that this refinement can naturally be applied to both the shuffle model and the LTM, and our comment should not be interpreted as a point of comparison between the two models. It is solely about the formalization of the security notion itself. As noted in the paper, similar observations have already been made in prior work (e.g., Talwar et al., 2023), fully in line with the reviewer’s comment. We will make sure to revise the text to clearly reflect this intention.
Dear Authors,
Thank you for your thoughtful reply. My concerns about the impact of have been clarified. I still think that the utility is not fully characterized given that the bounds have success probability. However, it seems that the impact of failure is small as the process can be repeated, probably modifying the accuracy negligibly.
I think that completely integrating the effect of this failure would be a valuable addition to the paper. Nevertheless, it is not a strong concern so I will update my score supporting acceptance.
The authors present a new model for differential privacy, the linear transformation model (LTM). This model interpolates between the local model of DP, that does not require a trusted central server, and the central model of DP that does. Some other intermediate models have been proposed, the most well studied being the shuffle model.
The LTM works by having each user perform a local calculation on their data then add noise to it and report it to an intermediate server. This server collects all the perturbed data from the users then applies a linear transformation to it. This transformed data is then processed in order to solve some task (such as low rank approximation). This is analogous to the shuffle model, replacing a shuffle with a linear transformation. The natural choice for a transformation is a linear sketch.
The authors show that this linear sketch can be computed securely with techniques from multi-party communication (MPC). The shuffle model can also be implemented via MPC, but the implementation is less practical because it is more computationally expensive and includes a chain of communication between servers increasing latency.
The authors show that under this new model, several tasks from numerical linear algebra can be solved by algorithms that satisfy LTM-DP while having error that does not scale with the number of clients. The specific tasks in question are Low-Rank approximation and Ridge regression. Experiments demonstrate that the LTM algorithm for ridge regression gives lower error than the local-DP algorithm.
给作者的问题
You mention that the MPC protocols needed for shuffle are not practical while the MPC protocols for the LTM are. Can you explain this quantitatively, how much more communication/computation is needed for MPC shuffle?
What other tasks do you expect to be a good fit for the LTM? Do you expect that the LTM will be a strict upgrade over the shuffle algorithm or will there be tasks less related to numerical linear algebra that will be solvable with less error under the shuffle model.
论据与证据
Most of the claims in this work are theoretical - these are addressed in the theoretical claims section. The claim that LTM algorithms will give lower error than local-DP algorithms is also supported by experiments on real and synthetic data. These experiments corroborate the theory. One claim with fairly weak support is that the MPC procedures needed for the shuffle model are much more expensive than those needed for the LTM. This may be obvious to someone with more MPC background, but it would be good to have this justified more quantitatively either with asymptomatic expressions for the communication/computation needed for these protocols or with experiments.
方法与评估标准
The experimental methods and evaluation criteria are sensible given the theoretical nature of the claims.
理论论述
The main theoretical claims are the privacy and utility results of the ridge regression and low-rank approximation algorithms. The tools used in this analysis are clear (infinitely divisible noise distributions and oblivious sparse norm-approximating projections) and the results are broadly in line with what I would expect given the combinations of these tools. However, I did not have time to carefully review the proofs to check their validity.
实验设计与分析
One modification that would be informative would be a plot of error versus privacy budget for the various methods. The privacy budgets used in the paper are 0.1 and 0.5, which are fairly narrow.
补充材料
I reviewed the experiments in the supplementary and the related works section there. I did not review the proofs in the appendix.
与现有文献的关系
This work is most closely related to other intermediate DP definitions such as the shuffle model and secure aggregation (which is an instance of LTM). As far as I know, the shuffle model has not been implemented much in practice. The LTM seems promising in that it is clearly stronger than the central model while not being as restrictive as the local model and being easier to implement than the shuffle model.
遗漏的重要参考文献
I do not believe that there are essential references not discussed in this paper.
其他优缺点
Strengths
I find the writing of this paper to be very clear and informative. This manuscript does a great job of giving the reader the necessary background about differential privacy models, linear sketching, and multi-party communication
The connection between LTM and linear sketching is very natural and I expect that many tasks from numerical linear algebra will be a good fit for this model
The experiments on real datasets give me confidence that these methods will be useful in practice
Weaknesses
The sketch parameters are underexplored, it seems natural to ask if there is some relationship between the optimal sketch parameters for a particular privacy budget or number of clients
其他意见或建议
No other comments
Thank you for your constructive feedback and thoughts.
One modification that would be informative would be a plot of error versus privacy budget for the various methods. The privacy budgets used in the paper are 0.1 and 0.5, which are fairly narrow
Regarding the choice of privacy budgets in the experiments, we chose to include results for and as those show a clear separation in the error between different mechanisms in the LTM. Note that increasing while keeping fixed can be interpreted as increasing the privacy budget of our Gaussian noise mechanism, so Figure 2 shows that an increase of privacy budget above 0.5 decreases the error drastically in all settings. We will include a plot that explicitly shows the error versus privacy budget in the final version.
The sketch parameters are underexplored, it seems natural to ask if there is some relationship between the optimal sketch parameters for a particular privacy budget or number of clients
With respect to the sketch parameters, we would like to point out that one strength of the LTM is its modularity, in the sense that any sketch can be used when suitable for the data analytics task at hand. We explore different such sketches from the literature and list their parameters in Table 3 in the appendix.
You mention that the MPC protocols needed for shuffle are not practical while the MPC protocols for the LTM are. Can you explain this quantitatively, how much more communication/computation is needed for MPC shuffle?
We acknowledge that we did not sufficiently substantiate the claim that MPC-based shuffles are more expensive than the LTM-based approach for readers who are not familiar with MPC. Due to space constraints, we could not include a full quantitative comparison, but we clarify here the key differences:
- Round Complexity: In MPC-based shuffles, each server must sequentially shuffle and randomize (or re-encrypt) the ciphertexts, leading to a round complexity proportional to the number of servers involved. This sequential dependency significantly increases the protocol’s latency. In contrast, the LTM enables all servers to perform their computations in parallel, resulting in lower round complexity and latency.
- Computational Overhead: In the LTM, servers only need to perform simple linear operations on secret shares, such as matrix-vector multiplications, which are highly efficient. On the other hand, MPC-based shuffles typically rely on mix-nets or similar constructions, where each server performs, for every data point, multiple expensive public-key operations (e.g., modular exponentiations with large moduli and exponents). This introduces an overhead that is orders of magnitude higher compared to simple matrix multiplications.
- Communication Complexity: The total amount of data transmitted is comparable between the two models, but the size of individual messages differs substantially. In the LTM, secret shares can be as small as the plaintext itself; for example, when working with 64-bit types, each share is just 64 bits per server. In contrast, MPC-based shuffles transmit ciphertexts that are significantly larger due to the inherent expansion of public-key encryption. For example, even with elliptic curve ElGamal, each ciphertext would be at least 512 bit. As a result, the total communication volume in MPC shuffles is larger.
Finally, we emphasize that MPC protocols, in general, tend to incur significant communication and computational costs when dealing with non-linear operations (e.g., multiplications on secret-shared data). The LTM model avoids this overhead by relying exclusively on linear operations on secret-shared data, which are much cheaper to compute.
What other tasks do you expect to be a good fit for the LTM? Do you expect that the LTM will be a strict upgrade over the shuffle algorithm or will there be tasks less related to numerical linear algebra that will be solvable with less error under the shuffle model.
Good Fit for the LTM: Currently, we expect many problems beyond numerical linear algebra to be addressable in the LTM. A promising example are certain clustering tasks like k-median clustering, which we are currently working on.
Comparison to Shuffle: We do not believe that the LTM will be an upgrade over the shuffle model, rather it is a complementary model with incomparable strengths and weaknesses. Proving a separation between the two models would be interesting and potentially challenging.
I think that a more detailed comparison would be very helpful for readers without the MPC background, and I hope that future versions of this manuscript find space to include that information in the main text or an appendix.
This paper introduces the Linear Transformation Model (LTM) for distributed differentially private data analytics, as an intermediate model between the central and local models of differential privacy. The core idea is to leverage efficient secure multiparty computation (MPC) techniques to apply a public linear transformation, such as a linear sketch, to the data contributed by different clients. This approach aims to avoid the single point of trust in the central model while mitigating the high noise requirements of the local model. The paper demonstrates the utility of the LTM for tasks like private low-rank approximation and private ridge regression, showing that it can achieve error that does not scale with the number of clients. While the reviewers found the model rather natural, more details were deemed needed for the comparison between LTM and the shuffle model as well as MPC-based alternatives. Also, the writing of the paper can be improved to make it easier to follow (including by adding a more explicit specification of the threat model). Moreover, concerns were raised about the comprehensiveness of the utility analysis (since the privacy budgets considered are quite low). Addressing the above concerns, in particular by adding a more detailed comparison to alternative approaches as well as an explicit threat model, would be preferable before the paper is ready for publication at ICML.