PaperHub
5.5
/10
Poster4 位审稿人
最低2最高4标准差0.7
3
3
2
4
ICML 2025

Revisiting Differentially Private Algorithms for Decentralized Online Learning

OpenReviewPDF
提交: 2025-01-23更新: 2025-08-15

摘要

Although the differential privacy (DP) of decentralized online learning has garnered considerable attention recently, existing algorithms are unsatisfactory due to their inability to achieve $(\epsilon, 0)$-DP over all $T$ rounds, recover the optimal regret in the non-private case, and maintain the lightweight computation under complex constraints. To address these issues, we first propose a new decentralized online learning algorithm satisfying $(\epsilon, 0)$-DP over $T$ rounds, and show that it can achieve $\widetilde{O}(n(\rho^{-1/4}+\epsilon^{-1}\rho^{1/4})\sqrt{T})$ and $\widetilde{O}(n (\rho^{-1/2}+\epsilon^{-1}))$ regret bounds for convex and strongly convex functions respectively, where $n$ is the number of local learners and $\rho$ is the spectral gap of the communication matrix. As long as $\epsilon=\Omega(\sqrt{\rho})$, these bounds nearly match existing lower bounds in the non-private case, which implies that $(\epsilon, 0)$-DP of decentralized online learning may be ensured nearly for free. Our key idea is to design a block-decoupled accelerated gossip strategy that can be incorporated with the classical tree-based private aggregation, and also enjoys a faster average consensus among local learners. Furthermore, we develop a projection-free variant of our algorithm to keep the efficiency under complex constraints. As a trade-off, the above regret bounds degrade to $\widetilde{O}(n(T^{3/4}+\epsilon^{-1}T^{1/4}))$ and $\widetilde{O}(n(T^{2/3}+\epsilon^{-1}))$ respectively, which however are even better than the existing private centralized projection-free online algorithm.
关键词
Decentralized Online LearningDifferential Privacy

评审与讨论

审稿意见
3

Two types of algorithms for differentially private online learning (PD-FTGL and PD-OCG) and their theoretical regret bounds are presented. While the conventional D-FTGL (Wan et al., 2024) employs a standard gossip step, the proposed PD-FTGL effectively exploits the accelerated gossip step (Liu & Morse, 2011), which plays a crucial role in achieving a regret bound closer to the lower bound of D-OCO. Additionally, PD-OCG utilizes the CG method (Frank & Wolfe, 1956; Jaggi, 2013) to eliminate the implicit projection operation, making it a more practical algorithm while also successfully improving the regret bound. This paper makes a solid theoretical contribution. However, no experimental or empirical validation is provided, even in the Appendix.

给作者的问题

I have no further questions.

论据与证据

The proposed algorithms (PD-FTGL and PD-OCG) have been theoretically analyzed in terms of regret bounds, and improvements over related works can be confirmed (Sec. 3.2). I believe the theoretical contributions are sufficient.

On the other hand, it is common to experimentally demonstrate how the regret bound improvements manifest empirically through benchmark tests using convex loss functions. However, no such empirical evidence is provided, even in the Appendix.

方法与评估标准

The proposed method is well-suited for DP-OCO, and the theoretical evaluation criteria (regret bound analysis) are appropriate. However, it is generally expected to experimentally demonstrate how the theoretical regret bound improvements manifest empirically through benchmark tests based on convex functions, but this was not included in the paper.

理论论述

The regret bound of PD-FTGL in Algorithm 1 is presented in Theorem 3.6, with its proof provided in Section 4 and Appendices B, C, and D. I focused my review on verifying this, which appears to be correct.

实验设计与分析

No experiments were conducted, including in the Appendix. I believe that demonstrating how the theoretical regret bound improvements manifest empirically through benchmark tests based on convex functions would have made the paper more compelling. This absence has been a factor in my relatively lower score.

补充材料

I specifically reviewed Appendices B, C, and D, as they are related to the proof of the regret bound for PD-FTGL, which is presented in Theorem 3.6 and corresponds to Algorithm 1. I also examined Appendix A, which provides a survey of additional related works.

与现有文献的关系

This study focuses on D-OCO, and a thorough review of related work has been conducted, including Appendix A. The paper further advances the SOTA method, D-FTGL [Wan et al., 2024], demonstrating improved results.

遗漏的重要参考文献

The survey of prior works is highly comprehensive, and the comparison with them regarding regret bounds is sufficiently detailed.

其他优缺点

Nothing in particular.

其他意见或建议

The order of the regret bounds appears frequently throughout the paper. To make the theoretical contributions clearer, how about summarizing the results, including those from related works, in a table form?

作者回复

Many thanks for the constructive reviews!


Q1: No experiments were conducted … demonstrating how the theoretical regret bound improvements manifest empirically ... would have made the paper more compelling. This absence has been a factor in my relatively lower score.

A1: We agree with the significance of empirical evaluations, and have conducted experiments to verify the performance of our algorithms during the rebuttal period. Since the absence of experiments affected your evaluation of our paper, we hope that you could raise the score if the concern has been addressed.

Specifically, we conducted experiments on the problem of decentralized online multiclass classification, where the feasible set consists of all matrices with bounded trace norm, and adopt a complete graph with n=9n=9 nodes as the communication network (see Section 4.1 of Zhang et al. (2017) for more detailed setup about this task). We set the bound of trace norm to 10. The dataset is letter selected from LIBSVM repository [1], which contains 1500015000 examples and 2626 classes. Moreover, the dataset is reused n×10n \times 10 times, and then evenly distributed among the nn local learners, which implies that T=150000T=150000.

We compare our PD-FTGL and PD-OCG against Algorithm 1 in Li et al., (2018), which is called PD-OGD. Note that the original PD-OGD can only achieve (ϵ,0)(\epsilon, 0)-DP for each round, instead of all rounds. To make a fair comparison, we increase the scale of noises added into PD-OGD to achieve (ϵ,0)(\epsilon, 0)-DP over all rounds. Figures 1 and 2 in the link https://anonymous.4open.science/r/ICML2025-8764/Experiments@v2.pdf present our experimental results. Specifically, in Figures 1(a) and 1(b), we compare the average loss of these algorithms under ϵ=10\epsilon=10, and their corresponding runtime, respectively. In Figures 2(a) and 2(b), we respectively report the average loss of our PD-FTGL and PD-OCG under different privacy levels, i.e., ϵ=10,5,\epsilon = 10, 5, and 2.52.5.

From these results, we have the following main observations.

  1. From Figure 1(a), both our PD-FTGL and PD-OCG have much lower average loss than PD-OGD (Li et al., 2018). This is because PD-OGD requires much larger scale of noises to achieve the same privacy level. Moreover, the average loss of our PD-OCG is worse than that of our PD-FTGL, which is consistent with our theoretical results.

  2. From Figure 1(b), the runtime of PD-OGD is much longer than that of our PD-FTGL. This is because PD-OGD performs the projection operation once per round, whereas our PD-FTGL only do this once per block. Moreover, our PD-OCG requires shorter runtime than our PD-FTGL, which verifies the benefit of the projection-free property.

  3. From Figure 2, in terms of the average loss, both our PD-FTGL and PD-OCG perform better as \epsilon\ increases. It is worth noting that the average loss of PD-OCG varies less with different ϵ\epsilon, which implies that the privacy level has a smaller impact on PD-OCG. This is also consistent with the theoretical results, since the privacy-related term in the regret of PD-OCG and PD-FTGL for convex functions is O~(nϵ1T1/4)\tilde{O}(n\epsilon^{-1}T^{1/4}) and O~(nϵ1ρ1/4T)\tilde{O}(n\epsilon^{-1}\rho^{1/4}\sqrt{T}), respectively.

[1] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. In TIST, 2011.


Q2: … how about summarizing the results … in a table form?

A2: Thanks for your helpful suggestion. In the revised version, we will summarize these theoretical results in a table form.

审稿人评论

I have read the rebuttal and appreciate the additional numerical results provided by the authors. The new experimental results help confirm the effectiveness of the proposed methods. However, the scope of the experiments remains limited in terms of comparison methods, datasets, and hyperparameters (e.g., the number of nodes). Thus, I will keep my score.

作者评论

Dear Reviewer J5zj,

Many thanks for your responses. In the last few days, we have tried our best to conduct more experiments, and hope these new results could fully address your concern about experiments.

Specifically, we first verify the performance of our algorithms with different number of nodes, i.e., n=9,16,25n=9,16,25, by still using the letter dataset and the complete graph. Second, we fix the number of nodes to n=9n=9, and further try two additional types of graphs, i.e., the watts-strogatz (abbr. ws) and cycle graph (see Section 4.1 of Zhang et al. (2017) for details about these graphs). Note that among these graphs, the complete graph has the highest ρ\rho and the cycle graph has the lowest ρ\rho. Finally, we introduce a new dataset, i.e., poker, from the LIBSVM repository, which contains 10000001000000 examples and 1010 classes. Except a minor change that this dataset is reused nn times, instead of n×10n\times 10 times, we follow the experimental setup on the letter dataset.

Figures 3, 4, and 5 in the link https://anonymous.4open.science/r/Additional-Experiment-for-April-2025-6CBE/Additional_Experiments.pdf present our new results, and we have the following new observations.

  1. From Figure 3, as the number of nodes increases, the average loss of our algorithms becomes slightly worse, which is consistent with the dependence of our regret bounds on nn.

  2. From Figures 4(a) and 4(b), for the graph with larger ρ\rho, the average loss of PD-FTGL becomes worse when ϵ=10\epsilon=10, but becomes better when ϵ=1000\epsilon=1000. These opposite effects are due to the fact that the O~(n(ρ1/4+ϵ1ρ1/4)T)\widetilde{O}(n(\rho^{-1/4}+\epsilon^{-1}\rho^{1/4})\sqrt{T}) regret bound of PD-FTGL is dominant by O~(nϵ1ρ1/4T)\widetilde{O} (n\epsilon^{-1}\rho^{1/4}\sqrt{T}) and O~(nρ1/4T)\widetilde{O}(n\rho^{-1/4}\sqrt{T}) for ϵ=10\epsilon=10 and ϵ=1000\epsilon=1000, respectively (see Corollary 3.7 for details).

  3. From Figures 4(c) and 4(d), the average loss of PD-OCG is almost unaffected by the topology of graph for both ϵ=10\epsilon=10 and ϵ=1000\epsilon=1000, which is reasonable because the regret bound of PD-OCG does not depend on ρ\rho (see Corollary 3.10 for details).

  4. From Figure 5, our PD-FTGL and PD-OCG outperform PD-OGD (Li et al., 2018) in terms of both effectiveness and efficiency on the new dataset poker. The detailed comparison is similar to our previous discussions about Figure 1 in the link https://anonymous.4open.science/r/ICML2025-8764/Experiments@v2.pdf.

Best,

Authors

审稿意见
3

This paper explores the problem of decentralized online learning in a private setting. Previous works fail to achieve (0,δ)(0, \delta)-DP over TT iterations and do not attain an optimal regret bound comparable to the non-private setting. This paper introduces a blocking update mechanism, an accelerated gossip strategy, and a tree aggregation mechanism, which together enable nearly matching regret bounds in the private setting while ensuring (0,δ)(0, \delta)-DP over TT iterations. Additionally, the paper leverages the classical conditional gradient method to develop a projection-free algorithm, which reduces computational costs but slightly degrades the regret bound.

给作者的问题

  1. Line 247: Why is dˉ\bar{d} mentioned here? give more explanations.
  2. Algorithm 1 (Lines 1011):\mathbf{1 0 - 1 1 ) :}
  • Line 10: The local learner updates its model.
  • Line 11: Does this step indicate that communication with neighbors occurs within the same block update? Could you clarify the update process?

论据与证据

Yes.

方法与评估标准

Yes.

理论论述

I reviewed the main framework of the proof but did not go through the step-by-step details.

实验设计与分析

No experimental results performed.

补充材料

Yes, related work part and the main proof framework.

与现有文献的关系

N/A

遗漏的重要参考文献

There is a lack of discussion on prior work related to DP-FTRL.

其他优缺点

Strengths:

  • The paper is well-written and structured.
  • It presents a novel result in DP decentralized online learning, achieving (0,δ)(0, \delta)-DP over TT iterations with a regret bound close to the non-private case.
  • It addresses practical computational challenges by introducing a projection-free algorithm to reduce computational costs.

Weakness:

  • Privacy analysis clarity: The paper does not clearly explain how the private term is bounded or how sensitivity is calculated.
  • Novelty in decentralized online learning: While I am familiar with differential privacy, I am less familiar with decentralized online learning. Since the proposed approach utilizes existing techniques (blocking update mechanism, accelerated gossip strategy, and tree aggregation mechanism), I am unsure about the novelty of its contributions beyond theoretical analysis.
  • Experimental validation is lacking.

其他意见或建议

See above.

作者回复

Many thanks for the constructive reviews! We hope the reviewer could reevaluate our paper, and are very happy to respond more questions during the reviewer-author discussion period.


Q1: No experimental results ...

A1: Please check our response to Q1 of Reviewer J5zj, which presents experimental results to verify the performance of our algorithms (or check the link https://anonymous.4open.science/r/ICML2025-8764/Experiments@v2.pdf for a quick browse).


Q2: ... discussion on ... DP-FTRL.

A2: First, we want to clarify that the original work of DP-FTRL (Kairouz et al., 2021) and its latest results in a realizable regime (Asi et al., 2023a) have been cited. From lines 146 to 157, we have provided detailed discussions about Kairouz et al. (2021), and will introduce more details about Asi et al. (2023a) in the revised version. Moreover, we are also willing to cite and discuss more works related to DP-FTRL if the reviewer could suggest some concrete references.


Q3: Privacy analysis clarity …

A3: We apologize for this confusion, and will provide more explanations in the revised version. First, the privacy term (we guess what you mean is the noise added at each node in the binary tree) is analyzed and bounded from lines 403 to 412, and then the bound is utilized in Eq. (13), which is critical for establishing our desired regret bounds.

Moreover, we want to clarify that the detailed analysis of privacy guarantees is provided in Appendix D due to the limitation of pages. As an important part of the analysis, the derivation about the sensitivity for each node in the binary tree is carried out from lines 790 to 818, which finally establishes an 1\ell_1-sensitivity of 6n1d(G+αR)6n^{-1}\sqrt{d}(G+\alpha R).


Q4: Novelty in decentralized online learning …

A4: We want to emphasize that there do exist some technical novelties in our algorithm and analysis, instead of simply combining existing techniques.

  1. The closest existing algorithm to ours is the non-private AD-FTGL in Wan et al. (2024b), which has already utilized the blocking update and accelerated gossip. Our algorithm can be viewed as a private extension of AD-FTGL based on the tree-based technique. However, AD-FTGL originally does not satisfy the requirement of applying the tree-based technique due to the block-coupled communication (see Remark below our Algorithm 2 for detailed discussions). To address this issue, our algorithm for the first time adopts block-decoupled communication.

  2. When deriving the privacy guarantees, according to the previous study (Ene et al., 2021), it is natural to bound the sensitivity of the blocking update by the block size. However, this will lead to worse regret bounds under the same privacy level. By contrast, we provide a novel analysis to show that the sensitivity is independent of the block size (see Appendix D), which allows us to recover the non-private regret bounds over a wide range of ϵ\epsilon.


Q5: Line 247: Why is dˉ\bar{\mathbf{d}} mentioned here?

A5: We apologize for this confusion, and will provide more explanations as you suggested. Actually, from Wan et al. (2024b), the role of z_i(t+1)\mathbf{z}\_i(t+1) in Eq. (4) is to approximate τ=1tdˉ(τ)\sum_{\tau=1}^{t}\bar{\mathbf{d}}(\tau), where dˉ(t)=1nin(ft,i(xi(t))αxi(t))\bar{\mathbf{d}}(t)= \frac{1}{n}\sum_i^n (\nabla f_{t,i}(\mathbf{x}_i(t))-\alpha \mathbf{x}_i(t)). However, as discussed below Eq. (4), zi(t+1)\mathbf{z}_i(t+1) cannot be directly rewritten as a partial sum. To address this issue, our key idea is to approximate each dˉ(t)\bar{\mathbf{d}}(t) respectively, and then compute the sum of these local approximations. Moreover, to satisfy the communication protocol, i.e., only one communication per round, we introduce a blocking update mechanism, and thus need to approximate dˉ(z)\bar{\mathbf{d}}(z) introduced in line 247.


Q6: Algorithm 1 (Lines 10-11) …

A6: We apologize for this confusion, and will provide more explanations as you suggested. Specifically, we want to clarify that these nn local learners actually implement lines 6 to 15 in our Algorithm 1 in a synchronously parallel way, though these steps are contained in a for-loop about these learners.

Based on the parallel strategy, in line 10, we update the variable d_i(z)\mathbf{d}\_i(z) for block zz and learner ii, which finally gets d_i(z)=_tT_z(f_t,i(xi(z))αx_i(z))\mathbf{d}\_i(z)= \sum\_{t\in\mathcal{T}\_{z}}(\nabla f\_{t,i}(\mathbf{x}_i(z)) - \alpha \mathbf{x}\_i(z)) at the end of block zz, and will be utilized as the initialization of the accelerated gossip strategy implemented during block z+1z+1 (see line 13 in Algorithm 1).

For this reason, in line 11, we only update dk+1_i(z1)\mathbf{d}^{k+1}\_i(z-1) by Eq. (5), instead of updating dk+1_i(z)\mathbf{d}^{k+1}\_i(z). Moreover, it is worth noting again that the update in line 11 occurs synchronously among all local learners, and the computation of the term j=1nP_ijdk_i(z1)\sum_{j=1}^nP\_{ij}\mathbf{d}^{k}\_i(z-1) in Eq. (5) relies on the local communication between these learners.

审稿人评论

Thank you for the detailed response. I am satisfied with the response, and I have updated my score.

作者评论

Dear Reviewer dtv5,

Many thanks for your kind responses! We will improve our paper according to your constructive comments.

Best,

Authors

审稿意见
2

This paper revists the regret bound for differentially private onilne learning. Based on the block-decoupled accelerated gossip strategy, the authors achieved a refined uppper bound O(nρ1/4/ϵT)O(n\rho^{1/4}/\epsilon\sqrt{T}) under (ϵ,0)(\epsilon,0)-DP. Theoretical results are derived under certain assumptions.

给作者的问题

See the weaknesses part.

论据与证据

Yes.

方法与评估标准

There is no empirical evaluation under benchmark datasets. The proposed methods are evaluated under mathematical proof.

理论论述

I have not had the opportunity to verify the proof carefully, but the results appear intuitively reasonable.

实验设计与分析

No experiments.

补充材料

No.

与现有文献的关系

No.

遗漏的重要参考文献

  1. Regarding the references for MIA, the authors cited He et al., which is not the original literature. So please incoroperate the original one [A]

[A] Membership Inference Attacks against Machine Learning Models. Reza et al., 2016.

其他优缺点

Strengths:

The paper is well-written and easy to follow. The theoretical results presented are intuitive and clearly motivated.

Weakness:

  1. The paper does not provide an empirical comparison with existing online learning algorithms. Given that the theoretical results rely on restrictive assumptions—which are often challenging to verify in practice—an empirical evaluation against existing methods would significantly strengthen the work.

  2. The comparison with existing literature on differentially private online learning is inadequate. The authors briefly state: `` Finally, we notice that many recent efforts have been devoted to improve the regret of private OCO algorithms (Agarwal & Singh, 2017; Agarwal et al., 2023; 2024; Asi et al., 2023a;b; Jain et al., 2023; Hilal et al., 2024), but additional assumptions on functions, decision sets, or theadversary are required."

However, a detailed comparison with these works—particularly the most recent publications from 2023 and 2024—is necessary to clarify the novelty and position the paper within the current research landscape.

3.The motivation and approach of this paper appear primarily derived from Wan et al. (2024). The main contribution seems to extending the non-private results of Wan et al. (2024) to the privacy-preserving regime, which is incremental. If there are additional, more substantial contributions, the authors should clearly specify them.

  1. The authors argue that the derived upper bound matches the lower bound in the non-private regime (ϵ=\epsilon=\infty). However, it is critical for the paper to establish similar matching bounds under differential privacy guarantees. Without matching the lower bound in the privacy-preserving regime, the theoretical results may seem incomplete or insufficiently self-contained.

其他意见或建议

No.

作者回复

Many thanks for the constructive reviews! We hope the reviewer could reevaluate our paper, and are very happy to respond more questions during the reviewer-author discussion period.


Q1: ... the theoretical results rely on restrictive assumptions ... an empirical evaluation against existing methods would significantly strengthen the work.

A1: First, we agree with the significance of empirical evaluations, and have conducted experiments to verify the performance of our algorithms during the rebuttal period. Please check our response to Q1 of Reviewer J5zj, which introduces the detailed results (or check the link https://anonymous.4open.science/r/ICML2025-8764/Experiments@v2.pdf for a quick browse).

Moreover, we want to clarify that our assumptions for theoretical results actually are not restrictive because they have been commonly utilized in previous studies and can be easily satisfied in practice, e.g., our experiments.


Q2: ... comparison with existing literature on differentially private online learning is inadequate ...

A2: We will provide more detailed discussions about those briefly reviewed works. Here, we want to emphasize that the main overlap between our work and previous works on private online learning is the tree-based technique, which actually has been clearly discussed during the introduction about Jain et al. (2012), Smith & Thakurta (2013), Kairouz et al. (2021), and Ene et al. (2021).

Although those briefly reviewed works provide some new results for private online learning, it is unclear whether their results can be further extended into the decentralized setting focused by our paper or not. Therefore, we believe that those works do not affect the novelty and significance of our results.


Q3: ... The main contribution seems to extending the non-private results of Wan et al. (2024) ... which is incremental ...

A3: We agree that our algorithm can be viewed as a private extension of AD-FTGL in Wan et al. (2024b), but want to emphasize that both the algorithm and analysis are highly non-trivial, as explained below.

  1. The key idea of our extension is to combine AD-FTGL with the tree-based technique. However, AD-FTGL originally does not satisfy the requirement of applying the tree-based technique due to the block-coupled communication (see Remark below our Algorithm 2 for detailed discussions). To address this issue, our algorithm for the first time adopts block-decoupled communication.

  2. When deriving the privacy guarantees, according to the previous study (Ene et al., 2021), it is natural to bound the sensitivity of the blocking update by the block size. However, this will lead to worse regret bounds under the same privacy level. By contrast, we provide a novel analysis to show that the sensitivity is independent of the block size (see Appendix D), which allows us to recover the non-private regret bounds over a wide range of ϵ\epsilon.

Moreover, it is also worth noting that our paper is the first to achieve the (ϵ,0)(\epsilon, 0)-differential privacy for decentralized online learning (DOL), and meanwhile improve the regret bound and efficiency (for handling complex constraints) of existing private DOL algorithms.


Q4: … it is critical for the paper to establish similar matching bounds under differential privacy guarantees …

A4: We agree with the significance of matching lower bounds under differential privacy guarantees. However, even for online convex optimization with differential privacy, which has been extensively studied, there still lack such lower bounds. Thus, we would like to leave the problem about lower bounds as a future work, and hope the reviewer could reevaluate our paper based on our significant improvements on the existing private decentralized online learning algorithms.


Q5: Regarding the references for MIA …

A5: Thanks for this suggestion. We will cite the work of Reza et al. (2016) in the revised version.

审稿意见
4

This paper proposes differentially private decentralized online learning algorithms that achieve regret bounds nearly matching those in the non-private setting. The algorithms can be adapted to be projection-free using Frank-Wolfe methods, resulting in a degradation of the regret bounds. However, these bounds still outperform existing private centralized projection-free online algorithms.

给作者的问题

N/A

论据与证据

Theoretical evidence is provided to support the claims of improved regret bounds for the proposed algorithms.

方法与评估标准

Yes, for the following reasons:

  • Theoretical evaluation metrics, like regret bounds, are widely used to assess the theoretical performance of online learning algorithms.

  • The regret bounds were compared to existing ones under a consistent set of standard assumptions, including a bounded constraint set, Lipschitz continuity of the gradient, and (strong) convexity of the objective function.

理论论述

No

实验设计与分析

No

补充材料

No

与现有文献的关系

Theoretical findings advances the development of privacy-preserving decentralized algorithms for online learning problems.

遗漏的重要参考文献

Essential references are included.

其他优缺点

1. Strengths

  • The regret bounds for the proposed algorithms under privacy budget are derived for strongly convex and convex decentralized online learning, based on standard assumptions.
  • These regret bounds outperform existing ones and closely approach the optimal regret bounds in the non-private setting.

  • Additionally, the results for projection algorithms are extended to projection-free variants using Frank-Wolfe algorithms, which yield worse regret bounds.

2. Weaknesses

  • Similar to projection operators on the constraint set KK, Frank-Wolfe methods on KK can be just as challenging to solve as the projection operator on KK. Therefore, it is inaccurate to claim that Algorithm 2 is a projection-free algorithm.

其他意见或建议

N/A

作者回复

Many thanks for the constructive reviews!


Q1: ... Frank-Wolfe methods on KK can be just as challenging to solve as the projection operator ... it is inaccurate to claim that ... is a projection-free algorithm.

A1: There may be some misunderstandings about our projection-free algorithm. We agree that if the number of linear optimization steps is not limited, the invocation of the Frank-Wolfe (FW) method (i.e., CG in our paper) could be as challenging as performing the projection operation. However, in our Algorithm 3, the number of linear optimization steps is limited to LL for each invocation of FW, and FW is only invoked O(T/L)O(T/L) times.

Therefore, the total number of linear optimization steps required by our Algorithm 3 is O(LT/L)=O(T)O(LT/L)=O(T), and it could be more efficient than our Algorithm 1 that requires almost O(T)O(T) projection operations. Moreover, following previous studies, especially a formal definition in Hazan & Minasyan (2020), an algorithm can be said to be efficiently projection-free if the number of linear optimization steps is O(1)O(1) per round on average. Clearly, our Algorithm 3 satisfies this definition, and thus can be claimed as a projection-free algorithm.

最终决定

The paper discusses the setting of DP decentralized online learning. All reviewers agree that the result is new and improves over existing known guarantees. Nevertheless, the reviewers are divided regarding the level of novelty in the paper.

After going over the author's rebuttal and correspondence with reviewers, I think that the authors successfully addressed primary points raised during the review process. I also believe that this paper presents results that are relevant to ICML community, and therefore recommend for (weak) acceptance.