Revisiting differentially private XGBoost: are random decision trees really better than greedy ones?
摘要
评审与讨论
This paper revisits the problem of training decision trees under the constraint of differential privacy. In prior work, it was shown that random boosting was more effective than greedy approaches due to the extra privacy overhead incurred by using private data to make greedy selections. This paper shows that greedy approaches can be competitive with random boosting by making a series of improvements, including better composition using RDP, different utility functions, and Gaussian noise. Specifically, the paper highlights several cases where the greedy method can have an advantage: when features have interactions and when a low number of decision trees is needed. In general, it is shown greedy methods can give the best performance in the low number of tree regimes.
优点
- It is interesting to showcase settings where spending the additional privacy budget on node selection makes sense.
- Shows a clear trade-off between greedy and random approaches and that greedy methods were underestimated previously.
- Generally clear and well-written (with some typos).
- Experiments on many datasets.
缺点
Novelty
The main components are a combination of previously known approaches. Namely:
- 4.1 uses Maddock et al.'s work
- 4.2 uses the score function of Li et al.
- 4.3 uses the same Gaussian mechanism approach as Maddock et al./Nori et al.
- The composition appears to be very similar to Dong et al.'s work (see below for more details)
The experiments show that the combination of these approaches performs well. However, it needs to be made clear which components contributed to the success. An ablation study would be very insightful to show if, for example, it was simply the composition theorem that caused the greatest improvement or some other component.
If each component is not novel, then the study of all the components together is the contribution of this work. An ablation study would more rigorously study the effect of each component and give better insight into why previous work failed.
Composition Theorem
The relationship to prior work by Dong et al. needs to be clarified.
- Nowhere in the Dong et al. paper is zCDP mentioned (after searching the pdf), so the derivation of the CDP bound plotted in the comparison needs to be justified.
- The proof of Theorem 1 follows incredibly closely to the proof of Theorem 5 in Dong et al. Specifically, see proof of Lemma 7.3 for very similar expressions on page 35. Perhaps the authors can clarify this relationship as it seems they compared to Corollary 3.1 of Dong et al. in Figure 1 of the submission and not Theorem 5 of Dong et al. (which seems very similar if not identical to the result in this paper).
Evaluation limitations
-
Besides the suggested ablation study above, another missing component is varying epsilon. As far as I can tell, all evaluations were conducted at epsilon equals 1. I would suspect that with decreasing epsilon, the results would be much different as the greedy selection would tend to random selection, and then this method would once again be strictly outperformed by random boosting. Furthermore, increasing epsilon may increase the advantage of the greedy method. The privacy utility trade-off is an important aspect to study, and only studying a single epsilon gives a very limited view.
-
This work would benefit from a test for statistical significance. Most results had a high standard deviation; thus, deciding when a win is statistically significant is difficult. A hypothesis test or plotting confidence intervals would help to show more clearly when one technique outperforms another. Further, the ranking is somewhat misleading as it hides how close the techniques were to each other.
-
is too large for a delta parameter. It technically allows a trivial mechanism that publishes a single record of the database. The general rule of thumb is that Dwork & Roth. I don't believe changing the delta would affect the overall conclusions of the paper, so I am willing to let it go, but I wanted to bring attention to it.
-
Figure 2 I could not find the number of runs and what the shaded area represents
Typos
This paper has numerous grammatical errors. I give a small subset of examples below:
- Citations should be wrapped in parentheses to distinguish from text
- Missing the word "the" in many cases (e.g. footnote three the sample Hessian)
- Page 1: "is a well tested" -> "are a well tested".
- Page 3: missing a prime. x and x' are the same after adding or removing.
- Page 3: RDP definition "M_1 satisfy" -> "M_1 satisfies".
- Page 4: by choosing fk+1 greedy at (k + 1)-th boosting rounds.
问题
Can the authors clarify the novelty of their work? Most importantly, can they clarify the relationship of the composition theorem with Dong et al.'s work?
Thank you for providing detailed comments to our work. We agree that including ablation study, hypothesis test, and utility analysis with varying epsilon will be beneficial to us.
Weakness: “Nowhere in the Dong et al. paper is zCDP mentioned (after searching the pdf), so the derivation of the CDP bound plotted in the comparison needs to be justified.”
We appreciate the reviewer for bringing up this citation issue. The relationship between bounded range and zCDP is introduced by Cesar and Rogers in their paper (https://arxiv.org/abs/2004.07223, page 12). The paragraph above Section 4 states that an -bounded range implies ⅛- zCDP. This is also well-documented in the blog article on differentialprivacy.org: https://differentialprivacy.org/exponential-mechanism-bounded-range/
We want to point out that in Figure 1, the zCDP curve is constructed based on this result mentioned above, rather than relying on Corollary 3.1 from Dong's paper.
Weakness: “Figure 2 I could not find the number of runs and what the shaded area represents”
There are 15 runs with 5 repetitions across 3 train/test splits (same as experiment on real data), and the shaded region indicates the standard deviations.
Weakness: “1/n is too large for a delta parameter...I don't believe changing the delta would affect the overall conclusions of the paper, so I am willing to let it go, but I wanted to bring attention to it.”
We agree that the choice of delta may not be the most optimal. The rationale behind our choice of 1/n is to maintain consistency with Maddock's paper, which we aim to compare with.
Weakness: Typos
We appreciate your thorough review and have carefully proofread our paper, addressing all the typos we identified.
Question: “Can the authors clarify the novelty of their work?"
Our innovation is providing a new understanding between DP random boosting and DP greedy boosting. In contrast to earlier studies, we find that DP greedy boosting can be comparable with DP random boosting. Additionally, we pinpoint two application scenarios where DP greedy boosting is the more favorable choice. We believe our findings would be beneficial to practitioners and provide new insights into the development of DP boosting tree-based algorithms.
Furthermore, we offer a more precise privacy analysis for the bounded range mechanism using RDP, which may be of independent significance.
Question: Most importantly, can they clarify the relationship of the composition theorem with Dong et al.'s work?”
We prove our Theorem 1 by bounding on the Renyi divergence between two generalized random responses. In contrast, Lemma 7.3 in Dong's paper is derived through bounding the moment generating function of the privacy loss random variable.
Furthermore, as demonstrated through numerical analysis, our RDP based composition analysis provides a more precise approximate DP guarantee compared to both Corollary 3.1 and Theorem 5 from Dong's work. The result is shown in the appendix D.4 of the revised version.
Thanks for the detailed reply. I appreciate the clarification on the composition theorem and relation to related work. I am a little surprised that the z-CDP bound is tighter than theorem 5 of Dong et al. That is not intuitive to me.
"I am a little surprised that the z-CDP bound is tighter than theorem 5 of Dong et al. That is not intuitive to me."
Thanks for bring this issue to our attention. We have identified an error in our implementation of Thm. 5 of Dong et al. and a bug on converting DP to BR. After correcting them, we attach the updated results in Figure 6 of appendix D.4.
The paper evaluates tree ensembles in the setting of differential privacy. Boosting methods with random splits and greedy splits were studied, and the empirical evaluation demonstrated a new perspective on applicability and performance of the most popular boosting methods.
优点
A thorough review of related work was done. Key components of the proposed boosting algorithm were selected from best performing prior art models with additional modifications, and key properties were described with relevant references.
缺点
It was hard to separate novel parts of the work and existing components - for understanding exactly what contributes to the mentioned positive empirical evaluation. Because of this, it was unclear to what extent there is a novel contribution (a significance of modifications mentioned in equation 3 and theorem 1) and what part of the result can be attributed to a permutation of already known techniques.
问题
Is it possible to perform an ablation study (or are there already such results) by starting exactly with prior art (such as the Maddock et. al), and modifying one component at a time to evaluate a contribution of each proposed modification - to be able to emphasize the specific novel parts in the paper? (Tables 3 and 4 in the appendix refer to the components used when comparing to prior art, and there appear to be multiple modifications at once.)
Thank you for taking the time to review our paper and providing valuable comments.
Weakness: It was hard to separate novel parts of the work and existing components - for understanding exactly what... attributed to a permutation of already known techniques.
We want to emphasize that our novelty lies in challenging a longstanding belief that DP random boosting dominates DP greedy boosting. Without requiring substantial modifications to existing DP greedy boosting algorithms, we demonstrate that DP greedy boosting can attain comparable performance with DP random boosting, or even surpass it in certain instances mentioned in Section 7, through a systematic empirical study. This achievement is made possible by mitigating side effects and enhancing DP accounting methods, which can also be easily adopted by other practitioners.
Question: Is it possible to perform an ablation study (or are there already such results) by starting exactly with prior art (such as the Maddock et. al), and modifying ....to be able to emphasize the specific novel parts in the paper?
We appreciate your suggestion on conducting an ablation study comparing our method to Maddock et al., and will include related experiments in a future version. We had constructed a “Needle-in-a-haystack” style examples under which greedy selection is exponentially better than purely random selection and included it in the revision (appendix D.3).
Thank you for answering all my questions and for adding an example (D.3) that illustrates your main point. I am glad to provide an answer to both your comments.
We want to emphasize that our novelty lies in challenging a longstanding belief that DP random boosting dominates DP greedy boosting. Without requiring substantial modifications to existing DP greedy boosting algorithms, we demonstrate that DP greedy boosting can attain comparable performance with DP random boosting, or even surpass it in certain instances mentioned in Section 7, through a systematic empirical study. This achievement is made possible by mitigating side effects and enhancing DP accounting methods, which can also be easily adopted by other practitioners.
I agree that challenging this belief and proving the contrary would provide a significant contribution to the field. My main concern is the validity of the experimental proof, which I will explain below.
When providing an experimental proof, it is difficult to consider all settings, such as
- all external parameters and hyperparameters (one of the cases is the additional suggestion of the Reviewer WvUe about varying epsilon, with which I agree that it needs to be addressed, but there may be more settings that affect the results and conclusions)
- and all kinds of datasets.
So it is desirable to have a way of generalizing from the given experiments (a finite sample) to a broader set of situations (infinitely many variations). One way of generalizing is understanding exactly what change made the positive experiments possible (that it was not a random combination of techniques that worked exactly on the given datasets and settings). An ablation study would point out what changes consistently lead to the improvements and it is expected to make the experimental proof more sound.
We had constructed a “Needle-in-a-haystack” style examples under which greedy selection is exponentially better than purely random selection and included it in the revision (appendix D.3).
I would like to argue that this example is not suitable for comparing greedy and random trees, because it poses a setting where random trees are absolutely certain to fail, no matter if there are any other components involved (such as the DP).
If there is only a small fraction of informative features, and everything else is Gaussian noise (as described in D.3), the random trees will spend most of their power meaninglessly processing that Gaussian noise. Therefore, even if the prior art had been replicated exactly, such as the one from Maddock et. al., when presented with this dataset, the conclusion would have been that Greedy algorithms are better than Random (but this conclusion becomes valid only for this kind of dataset)
I am very motivated to rethink about the validity of your claims, but they need to be proven in a general setting (such as using the exact datasets and starting settings from Maddock et. al. and performing an ablation study), or performing a convincing theoretical study that will make any experiments unnecessary (if such claim can be proven in general setting without conducting experiments).
Additional question
I would like to note that the cited paper "Extremely randomized trees" by Geurts et. al. has the following reasoning in its Section 3.3: there is a tendency in some settings that Random Trees, specifically the Totally Random (TR) variant, may outperform Greedy Trees only when number of trees is large (equivalently, the Greedy outperforms Random when number of trees is small) and shows an example when change of behavior happens at around 40 trees. This is not considering the DP setting. This conclusion is in line with the conclusion in Section 8 of the proposed paper.
This behavior is shown in Figure 5 (top left) in Geurts et. al. paper, where a crossover of the accuracy lines happens at 30-50 trees, it is similar to what happens in Figure 3 (left) of the proposed paper.
Therefore it is additionally necessary to investigate the case of "small number of trees and no differential privacy" to isolate the effect of having small number of trees on the result on given datasets. If the behavior will be similar (Greedy outperforms Random only in case of small number of trees), it will show that it is not the differential privacy setting, but the datasets themselves, are leading to such behavior (as an example, as explained in the previous section of my answer, I expect this to be the case for dataset from the added appendix D.3, but other datasets should be checked as well).
We greatly appreciate your interest in our work and thoughtful comments.
Question: I agree that challenging this belief and proving the contrary would provide a significant contribution...to the improvements and it is expected to make the experimental proof more sound.
To begin with, it's important to note that the paper we have compared with, Maddock et al, is also an experimental study. Therefore, the utilization of experimental evidence is both reasonable and credible.
Although it would be intriguing to explore a theoretical comparison of the performance limits between DP greedy boosting and DP random boosting, to the best of our knowledge, neither method currently has a worst-case bound. However, through the construction of a "Needle-in-Haystack" example and other simulation results (refer to Figure 2), we have already demonstrated that the worst-case performance of DP random decision trees cannot surpass that of DP Greedy boosting trees.
Question: "I would like to argue that this example is not suitable for comparing greedy and random trees...the conclusion would have been that Greedy algorithms are better than Random (but this conclusion becomes valid only for this kind of dataset)"
In the case of DP greedy boosting, there is a requirement to allocate extra privacy budget for the selection process, which does place constraints on the performance of leaf weight and histogram releasing when compared to DP random boosting. Despite this inherent disadvantage, it is noteworthy that DP greedy boosting continues to outperform DP random boosting in this particular example, underscoring the significance of allocating budget to the selection process.
Addtional Question: "I would like to note that the cited paper "Extremely randomized trees" by Geurts et. al. has the following reasoning in its Section 3.3: there is a tendency in some .... This behavior is shown in Figure 5 (top left) in Geurts et. al. paper, where a crossover of the accuracy lines happens at 30-50 trees, it is similar to what happens in Figure 3 (left) of the proposed paper."
We appreciate you highlighting these findings. However, it's important to note that Geurts et al. did not explore gradient boosting decision trees in their study. The results presented in Figure 5, which you referred to, are a comparison between Tree bagging and Extra trees, neither of which are gradient boosting. Therefore, the outcomes from Geurts et al.'s research cannot be regarded as a direct comparison between Random and Greedy boosting, even in non-private settings.
Additional Question: "Therefore it is additionally necessary to investigate the case of "small number of trees and no differential privacy" to isolate the effect of having small number of trees on the result on given datasets... I expect this to be the case for dataset from the added appendix D.3, but other datasets should be checked as well).”
It’s already well known that in the non-private setting, Greedy boosting tree methods, such as XGBoost, can achieve desirable performance with a significantly smaller number of trees, while random boosting trees need much more trees.
The number of trees needed for achieving good performance is also influenced by the inherent structure of dataset. This is precisely why we conducted our comprehensive study, which encompassed the analysis of 18 different datasets.
Finally, our conclusion, "DP greedy boosting needs a smaller number of trees than DP random boosting needs to perform well", is grounded in the experimental results from all datasets considered in our paper, including those examined in Maddock's paper, rather than being solely reliant on the dataset in appendix D.3.
Thank you for a detailed response, I agree with your line of reasoning with the following clarification.
Although it would be intriguing to explore a theoretical comparison of the performance limits between DP greedy boosting and DP random boosting, to the best of our knowledge, neither method currently has a worst-case bound. However, through the construction of a "Needle-in-Haystack" example and other simulation results (refer to Figure 2), we have already demonstrated that the worst-case performance of DP random decision trees cannot surpass that of DP Greedy boosting trees.
I agree with the intention to demonstrate the worst-case bound, such as by providing the "Needle-in-Haystack" example in Appendix D.3, my only concern was about finding out the reasons that can be attributed to this example becoming such a worst-case bound. Specific concern was about whether in this example greedy tree ensembles (in general) are expected to always outperform the random tree ensembles, regardless of the remaining model settings, which may make it difficult to reason based on such an example about performance of the remaining parts of the model related to the DP, which are of the primary interest.
it's important to note that Geurts et al. did not explore gradient boosting decision trees in their study The results presented in Figure 5, which you referred to, are a comparison between Tree bagging and Extra trees, neither of which are gradient boosting. Therefore, the outcomes from Geurts et al.'s research cannot be regarded as a direct comparison between Random and Greedy boosting, even in non-private settings.
Thank you for pointing this out, my intention was to decompose the comparison of models involving Greedy trees with models involving Random trees proposed in this paper into individual components contributing to such comparison (similarly to the previously proposed ablation study in the initial review of this paper).
The example from Geurts et al. was intended as a demonstration of a comparison of an ensemble of greedily constructed trees with an ensemble of totally random ones; even though the ensembling method was different (bagging versus boosting).
This paper explores methods for differentially private decision trees. It introduces a method for greedy decision trees and, contra prior work, argues that greedy methods can be competitive with random methods.
The paper performs experiments on a number of data sets and goes into some detail analyzing in which settings different methods perform better.
优点
Decision trees are a cornerstone of practical data analysis. The best DP versions of these methods are far from settled. I expect there is serious room for improvement and this work helps to fill some gaps. The question of greedy versus random methods is quite important.
The paper bridges theory and practice, aiming to bring strong analytical tools to improve practical algorithms.
I appreciate the level of experimental analysis. Asking and answering questions like "which method is best with few trees?" is valuable to practitioners.
缺点
I feel there are three major issues that leave me with low confidence in the results. I must vote for rejection. I heartily encourage the authors to address these issues and resubmit to a different venue.
First, the algorithm(s), privacy claims, and privacy proofs are not formally stated. This is a cause for serious concern, especially since one of the claimed contributions is the use of better privacy accounting methods. The reader cannot hope to replicate the experiments or verify that the algorithm is private.
Second: one of the central citations is to Maddock et al., who investigate private decision trees in a federated setting, so the data is decentralized. This submission does not appear to discuss this aspect of algorithm design. I am unsure if the proposed algorithm works in a federated setting and, if not, why the work of Maddock et al. is useful as the primary experimental comparison.
Third, one of the ways the paper claims to improve over the work of Grislain and Gonzalvez (Sarus-XGB) and Li et al. (DP-Boost) is through "improved privacy accounting." However, these prior approaches use pure DP, a significantly stronger privacy guarantee than that of the methods described here (which, as I gather from Section 6, is approximate DP). Switching to weaker privacy guarantees enables better composition analysis, but doesn't come for free. Without this discussion, readers may be misled.
问题
Can you give a complete description of the algorithm(s) you propose?
What privacy guarantee(s) do they satisfy?
Do these algorithms work in the federated setting that Maddock et al. consider?
Thank you for taking the time to review our paper and providing thoughtful comments.
Question: "Can you give a complete description of the algorithm(s) you propose?"
We add description of our algorithm in appendix D.5 of the revised version.
Question: "What privacy guarantee(s) do they satisfy? "
Our algorithm satisfies the (, )-approximated DP and also a slightly cleaner zCDP guarantee. The reason why we use approximate DP rather than pure-DP is as follow:
- The gaussian mechanism (used for releasing histogram and leaf weight) doesn’t satisfy pure DP.
- Even if we add Laplace noise instead, then we would get pure-DP, but since we need to leverage consider advanced composition for privacy utility tradeoffs, the privacy guarantee can no longer be pure DP with meaningful parameters.
- Approximate DP can still provide meaning privacy guarantee if delta is choose appropriately (eg. << 1/n) and zCDP avoids us having to choose all together, and is widely accepted.
Weakness: "Third, one of the ways the paper claims to improve over the work of Grislain and Gonzalvez (Sarus-XGB) and Li et al. (DP-Boost) is through "improved privacy accounting." However, these prior approaches use pure DP, a significantly stronger privacy guarantee than that of the methods described here (which, as I gather from Section 6, is approximate DP). Switching to weaker privacy guarantees enables better composition analysis, but doesn't come for free. Without this discussion, readers may be misled."
It is true that the work of Grislain and Gonzalvez (Sarus-XGB) and Li et al. (DP-Boost) use pure-DP, but we are not comparing to the end-to-end privacy bound they get with pure-DP. That would indeed be unfair. We are comparing to them by their zCDP guarantee, which has already improved their privacy-utility tradeoff significantly due to that zCDP composes better than pure-DP.
In general, I think for problems that require a lot of composition of basic mechanisms, relaxation such as zCDP and RDP are almost always preferred than pure-DP for the interest of permitting more utility. Asymptotically, there is a Central Limit Theorem that says that composing k pure-DP mechanisms as k → converges to the privacy profile of a Gaussian mechanism no matter what, so zCDP is also a more accurate quantification of privacy guarantee than the composed pure-DP.
Question: "Do these algorithms work in the federated setting that Maddock et al. consider?"
The answer is yes. In Maddock's paper, the framework assumes that each client possesses only one data item, and their algorithms depend on the aggregated gradient information across all clients, as outlined in Appendix C of Maddock's paper. Thus, most of their results are directly applicable in the central learning setting without changes. We focused on the centralized setting, the reason why we include Maddock et al as a reference and baseline is because it is the best existing method out there that is applicable to the centralized setting.
This paper challenges a wide observation that, under the differential privacy (DP) constraint, random boosting tree outperforms greedy boosting tree. It argues the greedy tree in previous studies was not optimized, and proposes to further optimize it by several simple tweaks including (i) applying Maddock et al. (2022) to split node, (ii) a regularized weighted scoring function, (iii) applying Gaussian mechanism instead of Laplacian to release noised data. In experiments, they show the improved DP boosting greedy tree performs better than random counterpart on simulated data, but performs worse on real world data sets in most cases.
优点
-
The proposed research question is interesting and significant.
-
Overall the paper is well-presented.
缺点
[1] I find the point of this paper quite confusing: it challenges an observation, but its expeirmental results (Fig 3 & Tab 1) largely support the observation -- just like all previous studies.
[2] While the intuitive analysis on why greedy boosting tree may perform worse looks interesting, there is no formal or theorectical justification on it. Theorem 1 is loosely connected to the comparison between greedy tree and random tree.
[3] Technical and theorectical novelties of the paper are limited.
[4] Several mathematical statements are not clear, e.g.,
-- In Defintion 2, please clarify the definition of and neighboring sets.
-- In Lemma 1, please clarify the definition of .
-- In the proof of Lemma 1, please elaborate on how to derive the inequality in (5) using data processing inequality.
问题
From the current presentation, it is not clear whether this paper is the first one to challenge the comparison between DP random boosting tree and DP greedy boosting tree. It is also unclear what is the relation between this study and all previous studies that improve DP greedy boosting tree while using random boosting tree as baseline. Could authors clarify these points?
Thank you for writing the review.
Weakness 1: “I find the point of this paper quite confusing: it challenges an observation, but its expeirmental results (Fig 3 & Tab 1) largely support the observation -- just like all previous studies.”
Actually, figure 3 supports our claim that DP Greedy boosting outperforms DP Random boosting when the number of trees is limited. Furthermore, Table 1 provides evidence that DP Random boosting is only marginally superior to DP Greedy boosting.
Our point is to challenge the belief that DP Random boosting surpasses DP Greedy boosting. Both results you've mentioned support our argument in this regard.
Weakness 2: “While the intuitive analysis on why greedy boosting tree may perform worse looks interesting, there is no formal or theorectical justification on it. Theorem 1 is loosely connected to the comparison between greedy tree and random tree.”
Existing utility guarantees for exponential mechanism (McSherry and Talwar, 2007) apply to each selection step. We agree that an end-to-end utility analysis for both DP boosting algorithms would be interesting — thought to the best of our knowledge, no such guarantees exist for gradient boosting style learners. We had constructed “Needle-in-a-haystack” style examples under which greedy selection is better than purely random selection and included it in the revision (appendix D.3). The example we showed in the simulation illustrated this point as well (Figure 2). The reason why random selection works well in practice might be coincidental, i.e., the datasets the methods are being evaluated on have abundant informative features and large margins, so that even if one keeps generating random splits, it hits the mark often enough for the method to work.
However, we want to point out that Theorem 1 is about RDP accounting of the bounded range mechanism such as the exponential mechanism employed in DP greedy boosting. DP random boosting, on the other hand, relies on random selection and does not have any connection to Theorem 1.
Weankess 3: “Technical and theorectical novelties of the paper are limited.”
The main focus of our paper is not introducing technical or theoretical innovations. Instead, our objective is to challenge a long-standing belief in DP boosting tree research, which posits that DP random boosting surpasses DP greedy boosting. Our findings are meaningful for practitioners and contribute new insights to the design of DP boosting tree algorithms by demonstrating that these two methods are, in fact, comparable in most cases. Moreover, we highlight instances where greedy boosting remains effective, while DP random trees fail.
In addition, we improve the privacy analysis for the composition of bounded range mechanisms through RDP analysis, which is of independent interest to the DP community.
Weankess 4: “-- Several mathematical statements are not clear, e.g., In Defintion 2, please clarify the definition of and neighboring sets….. please elaborate on how to derive the inequality in (5) using data processing inequality.”
The definition of D_a (Renyi Divergence) can be found in definition 2, the definition of the neighboring dataset was provided immediately before definition 2, and the reference of RR was given just above lemma 1. All of them are included in the first submission. We made the reference for RR more visible and added a reference for data processing inequality in the revision.
Question: From the current presentation, it is not clear whether this paper is the first one to challenge the comparison between DP random boosting tree and DP greedy boosting tree.”
Our main contribution lies in the discovery of a novel finding: that DP Greedy boosting, enhanced by modern DP accounting techniques and the elimination of all side effects, can stand on par with DP Random boosting and even outperform it in specific application contexts.
This finding contradicts the conventional notion in previous research, which consistently favored DP Random boosting by a substantial margin over DP Greedy boosting. For further details, please refer to our contributions summary in Section 1 and the literature review in Section 2.
Question: “It is also unclear what is the relation between this study and all previous studies that improve DP greedy boosting tree while using random boosting tree as baseline. Could authors clarify these points?”
Can you please specify the aspect that is causing confusion? We have provided an extensive literature review in both Section 2 and Appendix B.3.