8.2

/10

Spotlight4 位审稿人

最低5最高5标准差0.0

3.5

置信度

创新性3.3

质量3.0

清晰度2.5

重要性3.5

NeurIPS 2025

Privacy amplification by random allocation

Moshe Shenfeld,Vitaly Feldman

OpenReview PDF

提交: 2025-05-10更新: 2025-10-29

TL;DR

We consider the privacy guarantees of a new sampling method, which combines aspects of Poisson subsamplig and shuffling, and prove it can be approximated as Poisson subsampling.

摘要

We consider the privacy amplification properties of a sampling scheme in which a user's data is used in $k$ steps chosen randomly and uniformly from a sequence (or set) of $t$ steps. This sampling scheme has been recently applied in the context of differentially private optimization [Chua et al., 2024a, Choquette-Choo et al., 2024] and is also motivated by communication-efficient high-dimensional private aggregation [Asi et al., 2025]. Existing analyses of this scheme either rely on privacy amplification by shuffling which leads to overly conservative bounds or require Monte Carlo simulations that are computationally prohibitive in most practical scenarios. We give the first theoretical guarantees and numerical estimation algorithms for this sampling scheme. In particular, we demonstrate that the privacy guarantees of random $k$-out-of-$t$ allocation can be upper bounded by the privacy guarantees of the well-studied independent (or Poisson) subsampling in which each step uses the user's data with probability $(1+o(1))k/t$. Further, we provide two additional analysis techniques that lead to numerical improvements in several parameter regimes. Altogether, our bounds give efficiently-computable and nearly tight numerical results for random allocation applied to Gaussian noise addition.

关键词

Differential PrivacyDP-SGDsubsamplingshuffling

评审与讨论

审稿意见

评分: 5置信度: 42025-06-22

This paper addresses the theoretical gap between Poisson subsampling--commonly assumed in the analysis of DP-SGD--and the more practical random shuffling approach used in real-world implementations. Unlike Poisson subsampling, which ensures independence across minibatches, shuffling introduces dependencies that complicate theoretical guarantees. While prior work has largely focused on pure differential privacy (i.e., $\delta = 0$ ), this work extends the analysis to approximate differential privacy.

To this end, the authors propose three key contributions:

They show that random allocation achieves nearly the same privacy guarantees as Poisson sampling with rate $k/t$ , up to negligible lower-order terms.
They provide a non-asymptotic privacy bound, showing that the privacy loss of random allocation remains within a constant factor of Poisson sampling across various regimes.
By using RDP, they offer tight, computable bounds that are particularly effective in low-privacy regimes.

The theoretical contributions are supported by numerical evaluations, demonstrating that the privacy bounds under random allocation are very close to those under Poisson sampling. Moreover, the authors argue that random allocation often provides better utility, avoiding the inefficiencies and increased variance associated with Poisson sampling.

This paper offers an important step toward reconciling theoretical privacy analyses with the practical implementation of DP-SGD. It is a strong contribution both in terms of theoretical depth and empirical completeness. That said, the paper would benefit from further discussions on real-world scalability and broader applicability.

优缺点分析

This paper has the following strengths:

The paper is the first to provide a formal privacy analysis for the widely-used random allocation scheme.
Random allocation offers improved utility by reducing variance and making full use of available data, particularly compared to Poisson sampling.
(Extensibility) The results have immediate implications for real-world deployments of DP-SGD and federated learning, helping bridge the gap between theory and implementation.

Although this paper is well-written, the reviewer raised the following weaknesses:

Limited Utility Evaluation: While the paper includes privacy experiments comparing sampling schemes, it lacks empirical results on utility in more complex, realistic settings. The utility claims are mainly based on toy problems such as mean estimation. Demonstrating improvements on neural network training tasks would strengthen the practical impact of the work.

问题

As mentioned in the introduction, random allocation computes exact means (or sums), thus avoiding the extra variance introduced by Poisson sampling. Can the utility-privacy tradeoff observed in Figure 9 be replicated on more realistic datasets or tasks? I suspect the difference may diminish on large-scale datasets.
In Figure 1, the flatness of the RDP curves is attributed to a limited range of $\alpha \in [2, 60]$ . However, based on the alpha-selection method used in DP-SGD, extending this range only incurs minor computational cost (closed-form iterations). Would expanding this range make the curves tighter?
Is it possible to theoretically characterize the crossover point in Figure 9 where the utility advantage shifts from one method to another?
Minor Typos:
- (page 40) In the caption of Fig. 9, there is an unclosed parenthesis. Also, it is hard to recognize the x-label of the figures (sample size).
- (Line 1275) "Since The" --> "Since the"

局限性

yes

最终评判理由

I acknowledge that the authors have adequately addressed my comments and questions. I find this to be an insightful piece of work; however, it is not a groundbreaking one as it does not propose a technically new scheme.

For these reasons, I am keeping my positive score of '5-Acceptance' and raising my confidence score from 3 to 4.

格式问题

No issues

作者回复

2025-07-31

We thank the reviewer for their helpful comments. We would like to clarify that both [1,2] provide an extensive evaluation of the utility of random allocation, [1] also showing that it behaves similarly to shuffling. Our focus is on establishing the formal privacy guarantees for this scheme. All we need for the privacy analysis is the Gaussian noise scale and sampling/number of steps parameters that were used and so we do not need to rerun existing experiments. We will clarify this point in the revision. We added a toy example in Figure 9 and a bit of formal analysis to give the reader some intuition on why, at least in some settings, random allocation has better utility than Poisson sampling.

Replies to the reviewers questions:

Privacy-utility tradeoff in large-scale datasets: In short, the gap indeed decreases with the sample size at the same rate as the underlying sampling noise, which corresponds to an increase in variance of the underlying distribution by a constant factor. At the same time, as the dimension increases, the effect of the privacy noise becomes dominant relative to the additional sampling gap. A real world example of these effects can be found in the experiments of [1], and a formal asymptotic analysis can be found in the answer to question 3.
Computational complexity of RDP: The RDP computation of the random allocation is inherently different from that of the Poisson sampling which is commonly used to analyze DP-SGD. While the computational complexity of moment accounting for the Poisson scheme is $O(1)$ , it grows exponentially in $\alpha$ in the case of random allocation (theorem 4.4), as discussed in Section F.1.
Cross point in Figure 9: Since $\text{Var}(\hat{p}^{\mathcal{A}}) = \frac{p(1-p)}{n} + \frac{t \sigma^{\mathcal{A}}}{n^{2}}$ and $\text{Var}(\hat{p}^{\mathcal{P}}) \approx \frac{p(1-p)}{n} + \frac{p}{n} + \frac{t \sigma^{\mathcal{P}}}{n^{2}}$ , where $\sigma^{\mathcal{A}}$ and $\sigma^{\mathcal{P}}$ are the scales of the noise required to achieve some fixed privacy goal for the random allocation and Poisson schemes respectively, the cross point occurs when $\sigma^{\mathcal{A}} - \sigma^{\mathcal{P}} = \frac{n p}{t}$ . While the gap between the required scales cannot be analytically derived, in can be approximated asymptotically in the high privacy regime, as discussed in Section 4.1.

[1] Chua, Lynn, et al. "Balls-and-Bins Sampling for DP-SGD." International Conference on Artificial Intelligence and Statistics. PMLR, 2025.

[2] Choquette-Choo, Christopher A., et al. "Near-Exact Privacy Amplification for Matrix Mechanisms." The Thirteenth International Conference on Learning Representations.

2025-08-02

Thank you for the authors' response to the comments. My questions are addressed with reasonable responses. I am so sorry that I didn't see the existing experimental results in [1]. The reviewer agrees that the authors do not need to reimplement ML code to produce additional results.

I would like to keep my positive review of this work and increase confidence; however, it seems like I cannot edit the confidence score. If the authors have opinions on this comment, please let me know.

Once again, thank you for your response.

审稿意见

评分: 5置信度: 32025-06-30

The authors investigate a specific sampling scheme for privacy amplification where each user’s data is involved in k randomly chosen steps out of t total steps. They present the first theoretical guarantees and numerical estimation methods for this scheme. They show that privacy amplification under this scheme can be upper-bounded by Poisson subsampling with a sampling rate of approximately k/t. The paper also introduces two alternative analyses for tighter bounds and demonstrates their effectiveness, particularly with Gaussian mechanisms.

优缺点分析

Strengths

The theoretical analysis seems technically correct and sound. In terms of novelty, it is the first comprehensive theoretical analysis of the k-out-of-t random allocation scheme.
The proposed bounds are both efficiently computable and nearly tight, which matches the existing results in the literature.
The results address a key gap between theoretical analyses and practical implementations of differentially private SGD, confirming the benefits of random allocation over Poisoning subsampling.

Weaknesses

More detailed experimental validation and corresponding discussion are encouraged. While the paper provides extensive numerical analysis, it lacks clarity on the specific settings under which the figures were generated. For example, Chua et al. [2024a] evaluated the same sampling scheme in the context of DP-SGD training on real-world datasets. In contrast, this paper does not specify critical experimental parameters or setups. Without this information, it's difficult to assess how well the presented results align with practical privacy analyses.
The paper is felt incomplete in its current form. Notably, it lacks a conclusion section, which is critical for summarizing the key insights, contextualizing the contributions, and highlighting future directions. There is limited discussion of potential limitations or negative results. Section 5 does not provide enough results to reach the conclusions.
The presentation needs to be further improved. There are multiple grammar and formatting issues throughout the manuscript: improper article usage (e.g., "a algorithm" instead of "an algorithm"), awkward phrases (e.g., "that it" instead of "that is"), and inconsistent spacing around citations and symbols. Definitions 2.2–2.6 include extra spaces before citation parentheses, which should be removed. These seemingly minor errors accumulate and reduce the readability of the paper, and should be carefully addressed in revision.

Chua et al. "Balls-and-Bins Sampling for DP-SGD." International Conference on Artificial Intelligence and Statistics,2024.

问题

What are the specific experimental setups used to generate the figures?
How do the bounds scale in practical deep learning settings compared to existing methods like Poisson subsampling?
Please add a conclusion section and elaborate on the takeaways, future work, and potential limitations?

局限性

Yes

最终评判理由

After checking the rebuttal and the other reviews (together with their rebuttal and responses), I am convinced that this paper makes a significant contribution in the privacy study. I raise my score to 5.

格式问题

N/A

作者回复

2025-07-31

We thank the reviewer for the detailed feedback. We intend to polish the presentation before the final version, as it is currently missing some details and includes some typos, and use the additional page to provide a comprehensive discussion.

Replies to the reviewers questions:

Experimental setup: All figures presented in this work except for 9 consider only the privacy of a sequence of noise applications, and as such - do not depend on the actual algorithm. They are purely numerical evaluations of the upper and lower bounds on the privacy parameters of the various sampling schemes, all using a local Gaussian mechanism. Figure 9 presents the privacy-utility tradeoff for a simple toy example detailed in Section F.
Practical deep learning setting: The range of parameters analyzed in the various figures apply to many real world deep learning settings ( $t \in [10^{2}, 10^{6}]$ , $\sigma \in [0.2, 2]$ , and $k \in [1/t,50/t]$ ). The experiments conducted in [1] considered two datasets with two expected batch size each, which corresponds to $t \in [1.5 \cdot 10^{3}, 3.6 \cdot 10^{4}]$ and $\sigma \in [10^{-5}, 0.5]$ , which partially match our numerical analysis. From our quick evaluation our bounds behave similarly in this regime (closely matching the Poisson bounds) and we intend to add a specific analysis for these regimes in the final version.
Discussion: Parts of discussion are currently in the evaluation section. We thank the reviewer for the suggestion to expand those to a separate section which we will do in the final version.

[1] Chua, Lynn, et al. "Balls-and-Bins Sampling for DP-SGD." International Conference on Artificial Intelligence and Statistics. PMLR, 2025.

2025-08-03

Thanks for your reply. Overall, the authors have clarified the concerns raised in the review, and have promised to further improve the presentation in the final version. I agree with the other reviewers that this is a solid contribution.

审稿意见

评分: 5置信度: 32025-07-01

Privacy amplification by subsampling is one of the most central tools for boosting privacy. In DP-SGD, for example, a privacy boost is achieved because only a small batch is used in each step. The question that this paper studies is how to choose the batches, because there are privacy-utility tradeoffs for different sampling procedures. One option is to use Poission subsampling (parametrized by $\lambda$ ), where for each batch of the execution, each data element is chosen to be in the batch w.p. $\lambda$ (independently from the other batches and the other data elements). The advantage of this technique is that it allows good privacy upper bounds due to full independence, but the downside is that it is not very practical, both in terms of implementation constraints and in terms of accuracy. So, in practice, it is common to choose the batches using some type of shuffling, but the problem is that shuffling induces dependencies between the elements and the batches, which makes it harder to analyze, resulting in worse upper bounds compared to the Poisson subsampling.

In this paper, the authors analyze a sampling technique called “random allocation” (parametrized by $k$ ), where each data element appears in exactly $k$ batches (chosen randomly out of the total $t$ batches). This technique induces partial independence (because the locations of the data elements are independent of each other), so it can be seen as a compromise strategy between Poisson sampling and shuffling. From the utility perspective, the authors mention in lines 67-71 the work of Chua et al. [2024a], which shows that in the context of training neural networks via DP-SGD, random allocation with $k=1$ has good utility as shuffling, which is noticeably better than Poisson sampling.

The main result of this work is to prove that, in terms of privacy, random allocation is upper-bounded by Poisson sampling upper bounds (up to low-order terms), so the message of this paper is that this sampling essentially enjoys the utility benefits of shuffling while also enjoying the privacy benefits of Poisson sampling.

Similarly to prior works, this paper analyzes the privacy of random allocation using a dominating pair induced by a randomizer and compares the privacy of random allocation with that of Poission when the mechanism is the simple (non-adaptive) randomizer. They also provide general direct upper bounds (i.e., without using Poisson as a baseline), specific upper bounds for the Gaussian mechanism (used in DP-SGD), and present empirical evaluations which show that their upper bounds are better than the ones of the recent work by [Dong et al., 2025].

优缺点分析

I think that at a high level, the problem that this paper studies is important, and the contribution is significant. The results also seem to be technically challenging, and indeed, from a brief look at the full version in the supplementary material, the proofs are definitely not straightforward.

I think that the main weakness of this paper is its clarity. Because of missing or unclear definitions, I feel that I don’t fully understand all the models, statements, and implications (see my questions below).

Overall, I think that this paper should be above the acceptance bar of NeurIPS, but because I don’t feel I understand large parts of the details due to clarity issues, I gave it a borderline acceptance score.

问题

(1) First, it is not clear to me what exactly “shuffling” is in the context of this paper. The authors first mention it in the context of federated data analysis, where, according to my understanding, given n data elements, we have n batches that are shuffled, and each element appears in exactly one of them. But the context of this paper is not about the federated setting, but about amplifying $t$ adaptive steps of a (centralized) algorithm (e.g., DP-SGD) that gets batches with possibly more than 1 element. So, how are the batches exactly defined in shuffling? Do we shuffle all the data elements and then use the first $t \cdot s$ elements (where $t$ denotes the number of batches, and $s$ denotes the batch size)? Do you assume that $t\cdot s = n$ (where $n$ is the total number of elements) or that $t \cdot s \ll n$ ? Please also explain the Shuffling setting in the context of Figure 1, and why there are unsmooth jumps in the large sigma values.

(2) Another natural amplification technique that we can think of is to sample a random fixed-size subset from the data elements in each step. This technique also has partial independence because the batches are independent of each other, but I haven’t seen any reference for that. What do we know about this technique compared to random allocation?

(3) In lines 201-203, given a mechanism $M$ that expects to get multiple elements as input, you define $P_{t,\lambda}(M)$ (Poisson) and $A_{t,k}(M)$ (Random allocation). In lines 250-252, you say that these definitions naturally extend to the case that $M$ is a randomizer $R$ . But a randomizer doesn’t get multiple elements as input, only a single element or empty set, so I’m not sure I understand what do you mean by $P_{t,\lambda}(R), A_{t,\lambda}(R) \colon \\{\ast,\bot\\}^\ast \rightarrow Y^t$ . I interpret $P_{t,\lambda}(R)$ and $A_{t,\lambda}(R)$ as the mechanisms that get a single element $\ast$ , choose the locations of that element out of the $t$ (using Poisson or random allocation), and in all places where the elements appear you execute $R(\ast)$ and in all places where it doesn’t appear then your execute $R(\bot)$ . But I’m not sure that my interpretation is correct (because you define them as mechanisms that get inputs from $\\{\ast,\bot\\}^*$ ), and this is a very crucial point that has to be clearer because all your theorems use these notations.

Minor comments, questions, and typos:

Figure 1: In the description of the y-axis, what is $\omega$ ? Should it be $\varepsilon$ ?

Lines 170-172: “It was recently shown that ..” Can you elaborate on these works? How do they measure “true privacy”? Do they use (perhaps loose) upper bounds for shuffling, or do they use other (tighter) privacy accounting techniques? I’m trying to understand if the privacy gap from Shuffling to Poisson is just between the known upper bounds (and perhaps just because we don’t have the tools for tighter analysis) or if there is an inherent privacy gap in some cases. (This point is not clear to me).

Random allocation: When you sample $(i_1,\ldots,i_t) \subset [t]$ , is it with replacement or without? (i.e., could $i_1$ and $i_2$ be equal?).

Line 255: “number of allocation” -> allocations.

Line 260: “from the fact random…” -> from the fact that random…

Line 277: $\mathcal{A}(R;\ast) = R^{\otimes t}(\ast)$ .

Theorem 4.1: $\eta$ is undefined for $\gamma = 1$ .

Lines 293-295: Should we think about $\varepsilon’$ as $\gg 1$ ? Also, I don’t understand the $O(\varepsilon^2)$ estimation.

局限性

yes

最终评判理由

My main concerns about this paper were clarity issues in several places. The authors have answered my questions and will address them in the next version, so I'm increasing my score from 4 to 5.

格式问题

No formatting concerns

作者回复

2025-07-31

We thank the reviewer for the comments and questions. We acknowledge that the submitted version of the work is not sufficiently clear in places, partially due to cuts we made to ensure the page limit. We have edited the body of the paper to clarify some of the missing parts and extended the discussion in the appendixes, and will further incorporate the useful comments we have received into the final version.

Replies to the reviewers questions:

Comparison to shuffling: By shuffling scheme we refer to a data sampling method in which the data is first shuffled then used to run an algorithm, for example as is the common practice in SGD. In this model, the number of steps ( $t$ ) is equal to the number of elements ( $n$ ), the elements are randomly permuted prior to the training run, and at each step i the response is produced by a local mechanism receiving data element of (permuted) order i (in addition to previous responses). This model was first proposed in [1] who showed that privacy amplification bounds for this model also imply the privacy amplification for shuffling the responses of identical local randomizers as done in federated learning. We omitted this explanation due to page limitations and intend to include it in the final version.
Phase transition in shuffling: The privacy guarantee of shuffling is indeed not smooth, and has a sharp transition in the regime where the local privacy $\varepsilon_{0}$ is $\approx 1$ , resulting from the fact the total privacy $\varepsilon$ scales with $(1-e^{\varepsilon_{0}})e^{\varepsilon_{0}/2}$ [2]. A similar effect is responsible for the (less) sharp transition of Poisson (and consequently - random allocation) in the $\sigma \in [0.4,0.8]$ regime in the same figure.
Sampling without replacement: Using a fixed subset size per step is commonly referred to as "sampling without replacement", and provides privacy guarantees that closely resemble those of Poisson subsampling [3]. This method can be viewed as the counterpart of random allocation, having the advantage of constant size batches with the disadvantage of varying number of steps in which each user participated. It was briefly mentioned in the introduction, and we intend to elaborate on it in the final version.
Extension of schemes to a randomizer: The reviewer is correct, this was indeed a typo. The extension to randomizer holds for input datasets of size $1$ , that is, $P_{t, \lambda}(M), A_{t,k}(M) : \\{*, \bot\\} \rightarrow Y^{t}$ . This is the only type of input we need to consider, following the reduction stated in Lemma 3.1.

Minor comments:

Y axis in Figure 1: The $\omega$ is simply an $\varepsilon$ printed in a vertical direction, which we realize might be confusing
Lower bounds on shuffle: For example, [4, Figure 1] provided an empirical lower bound on $\varepsilon$ for shuffling which is significantly higher than the upper bound on the one for Poisson in some parameters regime. By definition, such a gap cannot be remedied via a tighter upper bound.
Sampling method in random allocation: The steps are sampled without replacement, so that an element can participate in each step only once.
Line 277: While $\mathcal{A}(R,\bot) = R^{\otimes t}(\bot)$ the same does not hold for $*$ . We are not sure we understood the comment.
Interpretation of $\varepsilon’$ : $\varepsilon'$ should be thought of as smaller than 1 but greater than $\varepsilon$ (e.g. $0.1 \varepsilon$ ). In this setting, the privacy profile for $\varepsilon'$ is negligible and we have $\eta \approx (1+2\varepsilon')/t$ . Since the privacy parameter of the Poisson scheme grows $\approx$ linearly in $\eta$ for fixed $\delta$ , number of steps, and mechanism parameters (e.g. $\sigma$ ), the privacy parameter of the random allocation scheme $\varepsilon_{A} = O(\varepsilon_{P} \cdot \eta t) = O(\varepsilon_{P}(1+\varepsilon') = O(\varepsilon_{P} + \varepsilon_{P}^{2})$ , where $\varepsilon_{P}$ is the privacy of the Poisson scheme with $\eta = 1/t$ .

[1] Erlingsson, Úlfar, et al. "Amplification by shuffling: From local to central differential privacy via anonymity." Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2019.

[2] Feldman, Vitaly, Audra McMillan, and Kunal Talwar. "Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling." 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2022.

[3] Balle, Borja, Gilles Barthe, and Marco Gaboardi. "Privacy amplification by subsampling: Tight analyses via couplings and divergences." Advances in neural information processing systems 31 (2018).

[4] Chua, Lynn, et al. "How Private are DP-SGD Implementations?." International Conference on Machine Learning. PMLR, 2024.

2025-08-05

Thank you for your response. I understand that not everything could be included in the main body due to page limitations. Still, everything should be defined and explained in the supplementary material. Because this wasn't the case with your submission, I found it hard to read. Please address all my comments for the next version.

Regarding the small comment in line 277, I was confused and thought that $\mathcal{A}(R,\ast) = R^{\otimes t}(\ast)$ , and so I didn't understand why it was not written. Ignore this comment.

审稿意见

评分: 5置信度: 42025-07-02

This paper analyzes the "random k-out-of-t allocation" sampling scheme for amplification differentially private guarantees. The authors introduce three novel techniques that provide the first efficiently-computable and nearly-tight theoretical privacy guarantees, showing the scheme's privacy is asymptotically equivalent to the well-studied Poisson subsampling method. Numerical results confirm these new bounds are very close to those for Poisson subsampling, meaning random allocation can be used to improve utility with only a minor loss in privacy, helping to bridge the gap between the theory and practice of algorithms like DP-SGD.

优缺点分析

Strengths:

The paper tackles a crucial issue in DP, the "random allocation" scheme is more practical and utility-friendly than the commonly analyzed Poisson subsampling, but its privacy guarantees were not well understood. This work directly addresses the discrepancy between practical implementations of DP-SGD and their theoretical privacy analyses.
Problem of better utility-privacy tradeoff is very well motivated and of relevance to the broader ML/AI community.
Solid theoretical paper.

Weakness:

While the paper provides some numerical evaluations, more extensive experiments across a wider range of parameters and tasks would strengthen the claims.
Furthermore, while the authors do acknowledge some limitations, a heavier and more detailed discussion of their practical implications would be beneficial. For instance, the authors state that their main asymptotic bound may produce "suboptimal bounds" and that the conversion from RDP is "somewhat lossy". A more thorough exploration of how suboptimal or lossy these can be, perhaps with targeted experiments, would make the work more complete and provide clearer guidance for practitioners.

问题

Can you provide a more concrete threshold or rule of thumb for practitioners to determine when this bound is no longer reliable and they should switch to one of your other analysis techniques, like the Poisson Decomposition or Direct RDP method?
In your analysis of the direct RDP bound, the reduction from a general k > 1 to k = 1 is described as "potentially loose". Could you provide a heavier discussion on this? For example, can you characterize the factors that influence the magnitude of this looseness or how it compares to other sources of imprecision, such as the RDP-to-DP conversion?
Regarding the computational complexity of the direct RDP analysis: what is the practical upper limit on the privacy guarantee that can be reliably calculated with this method?

局限性

N/A

最终评判理由

Overall, I found this paper interesting and worthy of acceptance. I am satisfied with the response provided by the authors and will maintain my positive score.

格式问题

N/A

作者回复

2025-07-31

We thank the reviewer for the detailed feedback and questions. Due to page limitation some of the extended experiments (e.g. Figures 4, 5, and 6) and discussion appear only in the appendix. We intend to use the additional space to clarify the points raised by the reviewer, and extend the discussion in the appendix.

Replies to the reviewers questions:

Threshold between methods: First, we note that practitioners don't need to determine a-priori which method provides the tightest bound, and instead can simply use the best one per configuration (the bold blue line in Figure 1 and the combined line in Figure 2). As a rule of thumb, the recursive bound is tighter than the others when $k/t \ll 1$ and the local privacy parameter $\varepsilon_{0}$ is not too large. We intend to add a detailed discussion of these effects to the final version.
Looseness of conversion from $k>1$ to $k=1$ : Unfortunately, we don’t have a tight measure of these effects, but can provide some approximations. The looseness stems from the reduced entropy resulting from the public splitting of the t rounds to k subsets, and to a lesser extent from the rounding of $t/k$ , which increases as $k/t \rightarrow 1$ . The comparison in Figure 8 to the DCO method which does not use this reduction, seems to indicate the effect is relatively minor, but a formal analysis will require a lower bound for $k>1$ which we currently lack.
Computational complexity: We note that while the computational complexity scales with $\alpha$ this does not directly imply a bound on $\epsilon$ , since the conversion depends on other parameters such as the number of epochs and $\delta$ . On the practical level, increased runtime was experienced around $\alpha=60$ , which typically corresponds to low $\epsilon$ where other bounds usually provide tighter guarantees. A more detailed discussion of this effect can be found in Section F.1.

评论- Thanks for the response

2025-08-02

Thanks for your response to my questions. I am satisfied with the response and maintain my positive score.

最终决定Accept (spotlight)

2025-09-17

The paper looks at a classic problem of sampling in differentially private SGD and proposes a sampling where each data element appears in exactly k batches. This technique induces partial independence, so it can be seen as a compromise strategy between Poisson sampling and the classical shuffling. The authors prove that, in terms of privacy, random allocation is upper-bounded by Poisson sampling upper bounds. So this sampling scheme essentially enjoys the utility benefits of shuffling while also enjoying the privacy benefits of Poisson sampling. Thus, the paper is the first to give comprehensive theoretical analysis of the k-out-of-t random allocation scheme.

All reviewers agree that the studied problem is interesting and important, and all reviewers where impressed by the novelty of the work and the analysis. The overall positive feedback from the reviewers makes this paper a clear accept to NeurIPS. Congratulations! However, the authors ought to pay attention to the comments regarding style and presentation, and make the minor changes accordingly prior to submitting its final version.