PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors
We show that private vector aggregation can be done with sublinear communication in the two-server setting, with efficient protocols and zero-knowledge proofs.
摘要
评审与讨论
This paper introduces block-sparseness as a principled abstraction for private aggregation, striking a balance between accuracy and efficiency. The authors extend distributed point function (DPF) constructions to support -block-sparse vector secret sharing, achieving significantly reduced communication and computation overhead—transmitting only $kB$ field elements instead of full sparse vectors. They integrate their scheme with sampling-based privacy accounting and demonstrate that it achieves privacy-utility trade-offs comparable to the Gaussian mechanism while drastically reducing communication costs. For example, on 8-million dimensional data, their method compresses communication from 64MB to 1MB with only a modest increase in noise for -DP.
优缺点分析
Strengths:
-
Principled Use of Block-Sparsity: The paper introduces block-sparsity as a natural and effective abstraction for high-dimensional vector aggregation, striking a balance between expressiveness and efficiency. This insight enables meaningful communication and computation reductions without compromising accuracy.
-
Communication Efficiency: By extending DPF constructions to handle block-sparse vectors, the authors reduce communication costs from tens of megabytes (e.g., 64MB) to as low as ~1MB, making the approach highly practical for large-scale systems.
-
Improved Server-Side Efficiency: The protocol minimizes server workload by leveraging cuckoo hashing and probabilistic batch codes, reducing server-side PRG evaluations from to , a significant scalability improvement.
-
Compatible with DP Accounting and Utility Guarantees: The authors show that their block-sparse aggregation integrates well with privacy amplification by subsampling and numerical DP accounting, leading to utility comparable to the Gaussian mechanism, despite the structural constraints.
-
Efficient Zero-Knowledge Proofs: The paper includes a zero-knowledge proof system for validating block-sparse commitments, which does not inflate asymptotic runtime or communication cost, a valuable feature for ensuring integrity in adversarial environments.
-
Deployment-Oriented Design: The use of counter-mode PRG evaluations and linear or near-linear runtime makes the construction well-suited for practical, scalable deployment, especially in two-party and federated learning settings.
Weaknesses:
-
Lack of Some Definitions: Clearly, this paper must have been written by one or more crypto experts. PRG (assuming it stands for Pseudorandom Generator) is referenced a few times without an actual definition. See line 118 where it first appears...
-
Implementation Complexity: The protocol’s reliance on advanced cryptographic components (e.g., DPFs, cuckoo hashing, batch codes, ZKPs) may introduce engineering complexity that could hinder adoption without reference implementations.
问题
-
Can you comment on how sensitive your protocol's performance is to violations of the block-sparsity assumption?
-
Are there heuristics or preprocessing techniques to induce block structure when it is not present?
局限性
N/A
最终评判理由
I think it's a strong submission sitting at the intersection of machine learning and cryptography.
格式问题
N/A
We thank the reviewer for their positive review, and will update the paper to address their valuable feedback. Below we address some of their questions.
Missing definitions: We apologize for the oversight, and will add definitions of any cryptographic terms that we use.
Additional complexity: The reviewer is correct that much like Prio itself, the use of complex cryptographic components will introduce additional engineering complexity. We hope that PREAMBLE will be implemented in the standard libraries that implement various prio protocols, such as libprio and daphne, reducing the overhead. Overall, we believe that PREAMBLE enables larger dimensional mean estimation than was feasible with prior techniques, and the incremental engineering complexity over prio can be justified in scenarios when such aggregates are needed.
Violations of Block-sparsity assumption: In PREAMBLE, block sparsity is not an assumption on the input, but rather ensured by our sub-sampling process. The cryptographic part of the algorithm requires inputs to be block-sparse. However, given an arbitrary input, PREAMBLE first subsamples the right number of blocks to ensure that block-sparsity holds. Indeed this subsampling can be viewed as a pre-processing technique to induce the block-sparse structure.
The authors propose a method for efficient and private aggregation of high-dimensional vectors in the two-server model. They introduce a new cryptographic primitive based on block-sparse distributed point functions to reduce communication and computational overhead. By leveraging block sparsity, random sampling, and privacy amplification by sampling, PREAMBLE achieves strong privacy guarantees with communication costs significantly lower than prior work. The method is particularly relevant in settings like private FL, where clients must send large gradient vectors.
优缺点分析
Strengths
- The paper is well-written and easy to read. Ideas and analysis have been clearly presented.
- The paper addresses a practical bottleneck in private FL, high communication costs for secure aggregation.
- The abstraction of k-block sparse vectors and their integration into DPFs is innovative and leads to notable efficiency gains. The authors have also combined formal privacy guarantees with empirical utility assessments.
Weaknesses
-
While training machine learning models with differential privacy is inherently challenging, the evaluation in this paper is limited to relatively small-scale tasks, which may not fully demonstrate the method’s scalability or practical applicability. The evaluated models are relatively small, which may limit the generalizability of the results to more complex architectures commonly used in practical federated learning scenarios. Does the conclusion change when we have a more complicated task, e.g., a large model on ImagNet level classification?
-
While communication is reduced, the paper may potentially downplay the increased client and server-side computational cost due to PRG evaluations and zero-knowledge proofs. A detailed breakdown of computational costs would be helpful.
-
Though many baselines are mentioned, it lacks a direct comparison with recent alternatives like [CSOK23]. It would be helpful to include more results and discussion.
问题
-
Could you provide insights or preliminary discussions on how PREAMBLE performs with larger models (e.g., ResNet or ViT) and more complex datasets?
-
Could you provide a detailed breakdown of runtime or resource usage for both the client and the servers?
-
Could you provide direct comparison with recent alternatives like [CSOK23]?
局限性
Yes
最终评判理由
The authors' response was satisfactory. I agree with the other reviewers that this is a good paper and will keep my original score of Accept.
格式问题
N/A
We thank the reviewer for their positive evaluation of our work and will address their feedback in the updated version. Below we address their specific questions.
Performance on larger models: We believe that our DP-SGD results demonstrate that the shape of the noise in PREAMBLE does not impact the convergence of DP-SGD, and one can focus on the scale of the noise. In Figure 13 in the appendix, we evaluate gradient estimation error for and per batch, and this should allow us to evaluate the impact on DP-SGD convergence for any model. In particular, DP-SGD in the low privacy regime typically uses per-batch sigma in the range [0.6,2]. If the per-batch privacy sigma is 1.0, and batch size is 10k, Theorem 3.1 implies that sending 100k coordinates per user will ensure an overall sigma of ~1.05. By taking a slightly larger batch of 11k, the effective sigma can be brought back down to 1.0, so that the utility is equivalent to non-compressed DP-SGD. The privacy analysis can then be done with this slightly larger batch and using PREAMBLE. This privacy analysis of the scheme will now need to account for two levels of sampling (sampling a batch of users, and sampling of blocks), and we can use [FS25] to handle this and compute the final (\eps,\delta) parameters. This privacy analysis will incur a small overhead compared to the Moments accountant (and a slightly larger overhead compared to PRV accountants), and for typical parameters, we expect this increase to be small. We will add examples of this overhead for some parameter settings from recent works in the final version of the paper.
We remark that mean estimation is a really wide-spread primitive in private learning and statistics well beyond DP-SGD, with applications ranging from learning histograms over dictionaries both big and small, using such histograms in adaptive algorithms for learning heavy hitters over very large dictionaries, training small and large ML models, running k-means and similar clustering algorithms, running Bayesian inference algorithms, etc. Indeed learning histograms is the most widely-used practical application of Federated learning by far, and Fig. 4(c) captures the benefits of our approach for this application.
Runtime and Resource usage for client and server: The decreased communication in PREAMBLE does come at a computation cost for clients and servers; however, we introduce improvements to existing DPF constructions to keep this computation very modest. The main computational overhead for clients and servers is the evaluation of PRGs from a small seed - we describe the construction at a high level at the start of Section 4 and in more detail in Appendix H. The total computation, dominated by length-doubling PRG evaluations, is done in a tree-like structure, where the client must evaluate k paths in this tree and the server must evaluate the entire tree.
To give some intuition of the runtime of the PRG operations, we quickly benchmarked the PRG instantiated with AES-128 on an Apple M2 laptop with 16GB of memory. Using the command line tool openssl speed, we measured a time of approximately 2ms for server computation and 0.5ms for client computation. To obtain these numbers, we consider a group of size in the final layer and set , , , and .
The asymptotic computation cost is stated in the table below. The improvement in server computation stems from constraining the sparsity structure to have k non-zero blocks instead of non-zero coordinates overall, resulting in calls to a length-doubling PRG and calls to a PRG whose output is the size of the input. This change could also be applied to existing approaches for multi-point DPFs in the other rows of the table, but this is not a focus of those previous works. The asymptotic notation for server computation in the probabilistic batch codes approach hides a multiplicative constant of approximately 3, while our approach requires approximately D PRG evaluations (1.03D in practice), avoiding the constant overhead.
| Server Computation | Client Computation | |
|---|---|---|
| Repeated DPFs [BGI16] | ||
| Probabilistic Batch Codes [BCGI18] | ||
| PREAMBLE |
Comparison with [CSOK23]: [CSOK23] operates in the multi-message shuffle model and thus relies on different trust assumptions than PREAMBLE. Similarly to their work, PREAMBLE exploits the fact that the sparsity pattern of vectors can be hidden. However, our work crucially relies on blocking to reduce the sparsity, and is necessary to make the DPF construction significantly more efficient. Blocking helps significantly with the truncation error, as we show in Section 5. Sparsity constraints also make regular Poisson subsampling, which is used by [CSOK23], less suitable for our application and necessitate the use of sampling schemes that require a more involved privacy analysis.
Thanks for your response. The additional results on computational cost demonstrate the method’s efficiency, and the authors’ justification for not including further experiments is understandable.
This paper studies the problem of private aggregation of high-dimensional vectors. In this problem, there is a data set of high-dimensional vectors distributed among some large number of clients, and there are two servers whose goal is to compute and output a private estimate of their sum. The main goal of this paper is to reduce the communication cost of protocols that achieve this. One approach from prior work is to randomly sparsify and scale the vectors so that in expectation they are preserved, and then to communicate the lightweight sparse vector. The authors state that this approach still leads to a communication cost of , where is the dimensionality of the vectors, is the number of coordinates subsampled during sparsification, and is a security parameter used in secret sharing.
The approach taken by the authors is to use block-sparsity, i.e. instead of randomly picking some coordinates to sample, the vector is divided into blocks of size and some of these blocks are sampled and picked for communication. By doing so they state that they get communication cost , potentially with some other side-costs which can be dealt with separately.
Further, since the final goal is differentially private aggregation, the authors show that one of the advantages of their method is that by using block sparsity and privacy amplification by subsampling, the potentially high sensitivity of the subsampled and scaled vector can be offset leading to a reasonable privacy utility trade-off.
In order to construct their protocol that requires the destination servers to aggregate and privatize the sparsified and encrypted vectors, the authors rely upon zero knowledge proofs to ensure that the clients can prove that their vectors are indeed -block sparse.
Finally, some experiments are conducted showing that they are indeed able to reduce communications costs and truncation error with blocking, and that it works well with privacy by conducting a numerical privacy accounting.
优缺点分析
Strengths:
- The problem being considered is interesting and impactful.
- The high-level approach makes sense and the gains made seem to be significant.
Weaknesses:
- Although the paper is well-written, it is hard for someone not familiar with Prio to easily gauge correctness, or understand all the roadblocks being encountered and how they are resolved.
- The collaborative computation between the servers for the private output is not clear from the pseudocode.
问题
- I understand that this is a hybrid trust model of sorts - the communication between the clients and the two servers is cryptographically secure, and the output of the servers is privacy-preserving in the sense of differential privacy. At the outset you mention how it is only required that one of the two servers is assumed to be honest. If a server is malicious but receives some secret shares of the sparsified vector, what exactly is the privacy promise that is achieved here? Is only the final output private or has the malicious server gained some private information?
- In your Overview section you give an example of a secret share using Distributed Point Functions that seems to use 15MB of communication. Is it easy to see what your approach would give here? It might be good to place this in that section for an easy comparison.
- What exactly are the Zero Knowledge Proofs needed for, is this proof of sparsity necessary for reasons of privacy? Also what is the total round complexity of your protocol?
- Would the gains made from privacy amplification by subsampling also extend to the prior work which uses randomly sparsified vectors, potentially offsetting the high worst case sensitivity?
局限性
yes
最终评判理由
I think the author's comments addressed my questions well and increased my confidence in the paper, which is why I have increased my score from a 4 to a 5.
格式问题
None.
We thank the reviewer for their positive evaluation of our work and will incorporate their valuable feedback in the updated version of the paper. Below we address their specific questions.
Collaborative computation: We acknowledge the complexity of the cryptographic parts of the protocol, and have attempted to give a high-level view of the roadblocks and their resolution in Sec. H in the appendix. We will rework the pseudocode of the collaborative decoding to make it clearer.
Hybrid Trust Model: That is a great question, and we will add more discussion of this. Prio guarantees that as long as one server is honest, the other server (even if they were malicious) learns nothing except for the published aggregate. So from the privacy point of view, we have the strong privacy guarantee: nothing except the differentially-private aggregate can be learnt by any of the parties as long as one of the servers is honest. A malicious server can corrupt the output of course, but cannot impact privacy.
Concrete numbers for the example: That is a great suggestion. We will add some numbers here, and make this example more consistent with the example on lines 78-81. Holding m=kB=50000 constant and choosing , our communication would be ~400KB in this example, if the field size is 64 bits.
The need for Zero knowledge proofs (ZKPs): That is a great question and we will add more discussion of this. The primary reason to have ZKPs here is to make the protocol robust to malicious clients. Absent proofs, a single client can change the whole aggregate arbitrarily. Proofs can ensure that a malicious client can only change the output by a bounded amount. Our protocol has 2 rounds, but as we discuss in section H.2, it can be made single round using Fiat-Shamir (and Fiat-Shamir is provably secure for our protocols in the random oracle model).
Would the gain extend to other uses of randomly-sparsified vectors: Yes and no. Privacy amplification analysis will help in case of random sparsification whenever we can make sure that the adversary cannot learn which coordinates a client contributes to. In single server settings (or in typical MPC-amongst-clients protocols), this is seldom the case, though one can imagine protocols can be designed to ensure this. This need to hide the sparsity pattern is the reason we use DPFs in our work.
This paper studies the private communication of vectors from each client to an aggregating server under differential privacy, while taking communication cost into account. In particular, the authors propose an extension from 1-block sparse vectors to k-block sparse vectors. Each client divides their vector into B blocks, randomly selects k of them, and sends only those to the server. Assuming each block has a bounded ℓ₂ norm, they show that the DP release of the k-block sparse vector can approximate the sum of all client vectors, up to an error term. By appropriately choosing parameters, they demonstrate that the noise from DP-SGD will dominate this approximation error. The authors also provide theoretical privacy guarantees, which depend on a parameter c, although its construction or approximation is not fully understood. They validate their theoretical results experimentally on toy examples and compare DP-SGD with the block-sparse vector approach using numerical privacy accounting.
优缺点分析
Strengths:
- The paper addresses an important and timely problem in differentially private (DP) communication.
- The theoretical results support the proposed approach for DP k-sparse communication. The explanations of the sparsification algorithm, hashing, and decoding mechanisms are clear and well-presented.
Weaknesses:
-
In the numerical comparisons with DP-SGD, the authors mention that both algorithms operate under the same privacy budget (via numerical accounting), but they do not specify the values of ε and δ. Moreover, there is no discussion of how the results are impacted in lower-privacy regimes — the ones typically desired in practice.
-
The privacy guarantee relies on a parameter -c- that can be arbitrarily small. While it's theoretically appealing to have a guarantee without approximation, the practical value of such a guarantee is questionable.
-
While some parts of the paper provide thoughtful discussions of the results, other sections merely restate the findings in a verbal form without offering deeper insights or analysis e.g. after theorem 3.1 when it is explained in what scenario the noise from DP takes over without any explanation of whether such scenario is reasonable other than not.
问题
- How can we approximate the constant c?
- Can you perform an ablation study with varying levels of Gaussian noise added, specifically in the low and medium privacy regimes and if so what would be the utility gap between DP SGD and the proposed work?
局限性
NA
最终评判理由
Given the authors' reposes and their proposed additions to the paper I have updated my score.
格式问题
NA
We thank the reviewer for their positive evaluation and their valuable feedback that we will incorporate in an updated version of the paper. Below we address some of the reviewer’s questions.
Privacy parameters : We missed adding privacy parameters in the main paper, and will bring these forward from the supplementary material. The details for our DP-SGD experiments are presented in the Appendix (Section I.1). In particular, for figure 5, we use the privacy parameters .
Constant : While the constant in Thm 3.2 can be explicitly calculated, the analytic formula is primarily valuable for giving intuition for the asymptotics. It is off by constant factors and hence is not used in practice. In practice (even for simple DP-SGD), one uses numerical accounting methods, and we do the same in our work.
Low-privacy Regime : We believe that our DP-SGD results demonstrate that the shape of the noise in PREAMBLE does not impact the convergence of DP-SGD, and one can focus on the scale of the noise. In Figure 13 in the appendix, we evaluate gradient estimation error for and per batch, and this should allow us to evaluate the impact on DP-SGD convergence in the low privacy regime.
In particular, DP-SGD in the low privacy regime typically uses per-batch sigma in the range [0.6,2]. If the per-batch privacy sigma is 1.0, and batch size is 10k, Theorem 3.1 implies that sending 100k coordinates per user will ensure an overall sigma of about 1.05. By taking a slightly larger batch of 11k, the effective sigma can be brought back down to 1.0, so that the utility is equivalent to non-compressed DP-SGD. The privacy analysis can then be done with this slightly larger batch and using PREAMBLE. This privacy analysis of the scheme will now need to account for two levels of sampling (sampling a batch of users, and sampling of blocks), and we can use [FS25] to handle this and compute the final (\eps,\delta) parameters. This privacy analysis will incur a small overhead compared to the Moments accountant (and a slightly larger overhead compared to PRV accountants), and for typical parameters, we expect this increase to be small. This is the analysis in our current plots, and we will add examples of this overhead for some parameter settings from recent works in the final version of the paper.
We remark that mean estimation is a really wide-spread primitive in private learning and statistics well beyond DP-SGD, with applications ranging from learning histograms over dictionaries both big and small, using such histograms in adaptive algorithms for learning heavy hitters over very large dictionaries, training small and large ML models, running k-means and similar clustering algorithms, running Bayesian inference algorithms, etc. Indeed learning histograms is the most widely-used application of Federated learning by far and Fig. 4(c) captures the benefits of our approach for this application. The parameters of interest here: the batch size, the dimension, and the DP noise scale will vary considerably in these applications, and our approach will provide benefits in some of these cases when dimension is larger than the batch size. Our theorem 3.1 provides some guidance on what compression one can expect for a given setting of parameters. We will add more discussion of this in the updated paper.
This paper studies aggregation of vectors satisfying cryptographic and differential privacy guarantees, introducing a new aggregation method extending 1-block sparse vectors to k-block sparse vectors. All four reviewers appreciated the paper's algorithms, exposition, and improved communication and utility guarantees. A possible point of improvement for the final camera-ready is to better clarify some of the paper's cryptographic details and framing for the largely ML audience at NeurIPS.