Locally Private and Robust Multi-Armed Bandits
We show an interesting interplay between local differential privacy and Huber contamination in MABs.
摘要
评审与讨论
The study examines the interaction between local differential privacy (LDP) and robustness to Huber corruption and heavy-tailed rewards in multi-armed bandits (MABs). It focuses on two scenarios: LDP-then-Corruption (LTC) and Corruption-then-LDP (CTL). They provide a tight characterization of mean estimation error and minimax regret in both scenarios, supported by simulations. The findings indicate LTC results in worse performance than CTL. Additionally, the study offers the first performance bounds for scenarios with corruption both before and after LDP, and corrects previous errors in regret bounds for locally private, heavy-tailed online MABs
优点
This model considering both local differential privacy (LDP) and robustness to Huber corruption is new and they provide a unified algorithm for online MAB with better minimax upper bound than previous works. They also propose the minimax lower bound for the first time.
缺点
The MAB algorithms still involve the hyper-parameter tuning.
问题
Q1: You mention that the upper bound in Tao's paper is incorrect, can you further point out which step in their proof is wrong?
Q2: What is the novelty of your mean estimation in Algorithm 1?
局限性
For online MAB, the paper just considers the UCB structure, maybe also consider the extension for Thompson Sampling will make the story more complete.
Technical flaw in Tao et al. 2022. Sure, we have provided a detailed discussion on it, which points out the incorrect step in their upper bound proof (see the above general response).
Novelty of Algorithm 1. We remark on the novelty (or insight) of Algorithm 1 from the perspectives of algorithmic design and analysis, respectively.
-
Algorithmic design: We use random response for privacy rather than the Laplace mechanism. This is due to the key observation that under corruption and LDP, the Laplace mechanism no longer gives the optimal rate, highlighting the interesting interplay of corruption and LDP.
-
Analysis: One key novelty in our analysis is that Algorithm 1 can adaptively give optimal rates for all three different scenarios (LTC, CTL, and C-LDP-C) for bounded data.
Hyper-parameter tuning. Yes, our MAB algorithms need to determine the optimistic or pessimistic radius, which requires us to know , , , and some constant . We first note that even in non-private, non-corrupted cases, the radius also requires tuning with respect to in practical implementations. Regarding other parameters (in particular ), we tend to believe that it is necessary to be known in advance to guarantee optimal rates. Finally, to achieve the optimal rate in CTL, one has to carefully choose the right radius, while C-LDP-C (corruption-LDP-corruption) can be the same as LTC.
Other exploration strategies. In this paper, we use UCB as an example. Since the key step in other variants (like Thompson sampling) also relies on a tight mean estimation, we believe one can generalize our results to other exploration strategies.
Thanks for your rebuttal and I will keep my score.
Thanks again for your comments and positive evaluation of our paper!
This paper studies multi-armed bandits where the feedback is a locally differentially private and corrupted version of the true rewards. The main message is that the order in which the rewards are (1) corrupted and (2) made private (i.e., (1) and then (2) or (2) and then (1)) changes the achievable regret rates, which they sharply characterize with new matching lower and upper bounds.
优点
- New and sharp regret upper and lower regret bounds for this locally private and corrupt bandit setting.
- Tight characterization of private and corrupt mean estimation error.
- There is expansive coverage of prior works on DP bandits, which helps place this contribution in context of the literature.
缺点
- I do think the writing is repetitive at times, and the paper could be better served by simplifying exposition and moving focus to key intuitions (for instance, Propositions 1 and 2 seem redundant given Theorem 1). Additionally, there are many settings and problems (LTC, CTL, C-LDP-C), online and offline MABs, so a summary table or glossary could be helpful to get the high level sense of the results.
- It is a bit difficult to get the sense of novelties over prior works (Tao et al., 2022; Wu et al., 2023)). For instance, the local DP model is stronger than central DP, but what is the new technical difficulty handled in this setting? Also, there is not really much discussion of what the mistake of the previous state-of-art Tao et al., 2022 is other than that it contradicts the results of this submission. It would be easier for a quick reader to get a sense of confidence if the exact mistake in analysis could be elaborated. Because of the elaborate setting of this paper (i.e., local privacy, corruption, heavy-tail reward, ordering of corruption vs. privacy) and it seems various slices or weaker versions of this specifically arranged setting have been studied in other works, it's difficult to get a sense if the paper is not just a combinatorial composition of prior approaches for this setting or if some important technical obstacle was truly overcome to obtain the results.
- Although I understand this is mostly a theory paper, I think it would also be good to have some practical discussion of when LTC or CTL might appear in application and what kind of implications the "LTC is harder" message has. Likewise, what are application scenarios for LTC or CTL?
问题
Please see weaknesses above.
局限性
No broader impact concern.
We thank the reviewer for your time and review. Below is our response to the comments.
Presentation. Thanks for the suggestion on the presentation. We will try to incoprate them in the final version, e.g., a summary table.
Practical scenarios of LTC and CTL Yes, we have already briefly discussed them in Appendix C.1.
Important contributions compared to previous works. Our result cannot be obtained by directly following previous works.
-
Even without corruption, the state-of-the-art result is ungrounded. We discuss it further above (see the general response), which highlights the incorrect step in the upper bound proof of Tao et al. 2022. This complements the points we have made from the lower bound perspective.
-
In contrast to Wu et al. 2023's setting of corruption plus central DP, where the standard Laplace mechanism will work, we have shown that the Laplace mechanism will only give a sub-optimal rate in the setting of local DP plus corruption. See Remark 4.
Thank you for your response and clarifying the mistake in the previous work. My concerns are addressed and I raise my score.
This paper considers the heavy-tailed online and offline MAB problem with local differential privacy and Huber corruption, where the CTL, LTC, and C-LDP-C models are considered. By first providing tight high-probability concentration bounds for the mean-estimation problem under the CTL and LTC settings, the authors offer new upper bound results for online and offline MAB. Lower bound results are also established to demonstrate the tightness of the upper bound results.
优点
The presentation of results in this paper is clear, and the comments are detailed. Related works are well discussed. Several novel results are established and well generalize the previous LDP heavy-tailed estimation results, including:
-
New high-probability bounds for LDP heavy-tailed and corrupted mean estimation problems.
-
An interesting separation result between the CTL and LTC settings, with detailed explanations.
-
New results for offline and online MAB.
缺点
Several claims are not so clear to me and may need further explanation and clarification, including:
-
In the regret bound for online MAB, it seems that the upper bound result in Proposition 4 cannot directly imply the result claimed in Theorem 2. I take the CTL result as an example, and the LTC result has the same problem: In Theorem 2, the upper bound reads as . In Proposition 4, there is a factor that balances the trade-off . In particular, when , it seems that Proposition 4 can no longer recover Theorem 2?
-
In the offline MAB result, while authors claimed that the upper bound result almost matches the proposed lower bound, it seems that the dependency on is not optimal when comparing the lower bound term with the upper bound result ?
问题
1.For the questions regarding the claim of the theorems, see Weakness 1 and 2.
2.As discussed in C.3, authors states that the previous SOTA in the heavy-tailed LDP MAB paper [1] has a technical flaw. The authors have demonstrated this claim by showing that the upper bound result in [1] can even break the lower bound result proved in this paper. I am wondering, besides this evidence, can the authors explicitly point out any technical flaw in the proof of the result in [1]?
[1] Youming Tao, Yulian Wu, Peng Zhao, and Di Wang. Optimal rates of (locally) differentially private heavy-tailed multi-armed bandits. In International Conference on Artificial Intelligence and Statistics, pp. 1546–1574. PMLR, 2022.
局限性
N.A.
We thank the reviewer for your time and review. We will provide our response below.
Online MAB. In this paper, as in previous work (e.g., Wu et al 2023), we treat as a constant (i.e., it does not scale with ). Thus, the third term in big-O of Proposition 4 is a lower order term as , which gives us Theorem 2.
Offline MAB. There is a typo in Proposition 5, i.e., the exponent term of is missing. Our proof in Appendix J indeed has the right one, which matches the upper bound in Proposition 6. The updated lower bounds in Proposition 5 are shown below:
(i) LTC:
(ii) CTL:
Technical flaw in Tao et al. 2022. Please see our general response above.
Thank you for your response, I will keep my score positive.
Thanks again for your comments and positive evaluation of our paper!
Technical flaw in Tao et al. 2022
We thank all the reviewers for their time and insightful comments. Since all reviewers would like to see more discussion of Tao et al. 2022's technical flaw from the perspective of upper bound proof, we will provide a general global response below.
The key flaw in their upper bound proof is the incorrect bound on the number of pulls for each sub-optimal arm (in the proof of their Lemma 13).
This step happens on Page 24 of their paper (see "Lastly, for any fixed sub-optimal arm a..."). It should be , rather than . After fixing this, one gets the correct bound on the number of pulls for each sub-optimal arm as (focus on key terms of and only) , rather than the wrong one in their proof, which has (i.e., missing the square).
In fact, translating this using our notation (i.e., ), the correct bound above becomes , which is exactly the first term in Eq. (9) of our paper.
Thus, following the same proof step in our paper (ignore all corruption terms), one can fix their proof and obtain the correct upper bound.
Remark. We remark that in Tao et al. 2022: (i) the authors did not provide detailed proof for the problem-independent (minimax) bound. They only provided the proof for the problem-dependent one (which is also ungrounded, as discussed above). Then, based on this, the authors claimed to obtain the problem-independent one (with standard techniques); (ii) Tao et al. 2022 follow an arm-elimination approach while our analysis is based on UCB. Nevertheless, it is well-known that in both approaches, the key step is to upper bound the number of pulls for each sub-optimal arm.
The paper provides novel characterization for local differential private and robust multi-armed bandits against Huber corruption. It also fixes a technical flaw in previous works. All reviewers agree that the paper is interesting and makes descent contribution to the field.