PaperHub
5.3
/10
Poster4 位审稿人
最低3最高6标准差1.3
6
3
6
6
3.3
置信度
正确性2.5
贡献度2.8
表达2.3
ICLR 2025

Lipschitz Bandits in Optimal Space

OpenReviewPDF
提交: 2024-09-27更新: 2025-02-20

摘要

This paper considers the Lipschitz bandit problem, where the set of arms is continuous and the expected reward is a Lipschitz function over the arm space. This problem has been extensively studied. Prior algorithms need to store the reward information of all visited arms, leading to significant memory consumption. We address this issue by introducing an algorithm named Log-space Lipschitz bandits (Log-Li), which achieves an optimal (up to logarithmic factors) regret of $\widetilde{O}\left(T^{\frac{d_z+1}{d_z+2}}\right)$ while only uses $O\left(\log T\right)$ bits of memory. Additionally, we provide a complexity analysis for this problem, demonstrating that $\Omega\left(\log T\right)$ bits of space are necessary for any algorithm to achieve the optimal regret. We also conduct numerical simulations, and the results show that our new algorithm achieves regret comparable to the state-of-the-art while reducing memory usage by orders of magnitude.
关键词
Space complexityLipschitz bandits

评审与讨论

审稿意见
6

The paper considers the problem of continuous armed bandit for a reward function which is Lipschitz over the action space. This setting is already well studied in the literature, and optimal regret guarantees are already well-known. The novelty of this work stays in proposing al algorithm which is also efficient on the point of view of the space complexity. In fact, the authors show both that their algorithm enjoys order log(T)\log(T) space complexity (in number of bits) and that this is the lower bound for the setting.

优点

The topic of the paper is very interesting. The setting is very interesting, and the space complexity is indeed a limitation of previous approaches for this setting.

The contribution is valuable, and the idea of the paper could bring an impact also outside of the bandit literature.

缺点

Despite I have really appriciated this work, which I think brings a valuable contribution to the bandit community, I am also convinced that this is a little bit immature for a publication at this conference.

I have reached this conlusion after reading many passages of the paper which, in my opinion, need to be deepened. For example,

  1. In Theorem 1, the result seems to hold with a probability of 1T51-T^{-5}. I was rather surprised about this, as most of the other regret bounds in this field fix a given δ>0\delta>0 and show that the regret is bounded by some function of TT multiplied by log(1/δ)\sqrt{\log(1/\delta)}. At first, I tought this was because the authors were using some given technique which goes outside from usual concentration bounds used in the bandit literature. But in fact, looking at the proof of the result, it seems to me that the proof could be adapted to hold with high probability, as usual. Even if the result is correct, putting the statement in this form could affect the possibility of using this result for future work, and create confusion.
  2. The paper assumes that all arms distribution are Gaussian, even if this assumption is never used except at line 263, where in fact only subgaussianity is used. Again, introducing this (very restrictive) assumption without a reason creates unnecessary confusion in anybody needing to use the results of this paper.
  3. Line 277 "By similar argument to Lemma F.1 in Sinclair et al. (2020) and Lemma 1 in Lu et al. (2019), taking a union bound over C ∈ Amh and 1 ≤ h ≤ m ≤ Bstop finishes the proof" is not sufficiently precise to be used in a proof
  4. The proof of the space complexity bound seems to ignore the fact that the algorithm has also (according to line 192) to allocate the subcubes and to remember which of theme have already been visitied. By the way the pseudocode is written this seems to take order 2d2^d bits, and it is not clear if it can be done in less space.

In the end, I am not against the acceptance of this work, but I'm wondering if it is in the interest of the author to publish it like this or if it is better to clarify and improve the mentioned passages.

问题

  1. Indeed, to achieve a regret bound of order TαT^\alpha for any bandit setting, one has to be able to allocate numbers with a precision of at least Tα1T^{\alpha-1}, as a larger misspecification would compromise the regret bound. This already takes O(log(T))O(\log(T)) bits. Does it make the lower bound for the space complexity trivial?
评论

Thank you for communicating these questions to us. We will clarify them in the following and improve our presentation in the camera-ready version.

Response to W1

Thank you for pointing this out. We will adapt the results and proof to hold with high probability in our future versions.

Response to W2

Thank you for pointing this out. We do not require the Gaussian assumption; sub-Gaussian conditions are sufficient to meet our requirements. The confusion was caused by a writing error, which we will correct in future versions.

Response to W3

Thank you for pointing this out. We will add more explanations of this point in future versions to make the proof clearer.

Response to W4

The way we wrote our pseudocode is intended to help readers clearly understand how we partition our set. In practice, our algorithm doesn't need to pre-allocate the subcubes but only needs to traverse them one by one. Since the maximum number of subcubes we will access is T, we only need to maintain the IDs for these T subcubes.

Response to Q1

To achieve the desired regret bound, it is indeed necessary to estimate the expected reward with a specified additive error. However, this requirement does not directly imply a space lower bound. Consider the coin problem: for difference as small as n0.306n^{-0.306}, there are functions that can amplify this value to a constant using only O(loglogn)O(\log \log n) bits [1]. Similarly, in our problem, while precision is important, proving the non-existence of such functions is essential. Our proof take a different aspect by connecting the lower bound with the number of hard instances, which provides a more intuitive understanding and complements our limited-space algorithm.

[1] Braverman M, Garg S, Zamir O. Tight space complexity of the coin problem[C]//2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2022: 1068-1079.

评论

I thank the authors for their rebuttal.

The consideration about the coin problem is interesting, but I did not found in the paper the result mentioned by the authors, so I would like to ask them which is it and how does it apply.

Still, about the lower bound if this paper, I try to clarify my reasoning. Let us say that we have just the easier problem of best arm identification, so that it is sufficient to output xTx_T such that xTx|x_T-x_*| is small.

If we consider the set of reward functions

fa(x)=axx(0,1),f_a(x)=-|a-x|\qquad x\in (0,1),

where the optimal arm of fa()f_a(\cdot) is aa. We can see that to guarantee xTx<Tα1|x_T-x_*|<T^{\alpha-1} in every case (this must hold for α(0,1)\alpha \in (0,1), and, being a one-dimensional Lipschitz class precisely with α=2/3\alpha=2/3 to match the lower bound), we need to distinguish at least T1αT^{1-\alpha} instances. This requires at least log(T1α)=(1α)log(T)\log(T^{1-\alpha})=(1-\alpha)\log(T) bits.

评论

Thank you for your response. The argument regarding the coin problem can be found in section 2.1 of this article, but at the time, I mistakenly thought you meant approximating the reward of keeping an arm.

Based on my understanding, your reasoning for the lower bound aligns with the logic of our lower bound proof. In our proof, we first demonstrate that best arm identification requires at least Ω(logk)\Omega(\log k) space to distinguish all the hard instances, and then, based on this result, we show that achieving optimal regret also requires at least this much space.

In selecting the hard instance, we aimed to provide a tighter bound, resulting in the number of our hard instances being Ω(Tdzdz+2)\Omega\left(T^{\frac{d_z}{d_z+2}}\right). This allowed us to ultimately obtain a lower bound of Ω(logK)\Omega(\log K) instead of Ω((1α)logK)\Omega((1-\alpha)\log K), where the latter would decrease as dzd_z increases.

评论

Thank you, I have no further questions

审稿意见
3

This paper proposed a novel algorithm, “Log-space Lipschitz bandits” (Log-Li), which reduces space complexity to O(logT)O(\log T) from O(T)O(T) while achieving near-optimal regret, for the the Lipschitz bandit problem.

优点

  1. This paper proposed a novel algorithm (Log-Li), which reduces space complexity to O(logT)O(\log T) from O(T)O(T) while achieving near-optimal regret.

  2. It shows that the regret achievable with this minimal memory is near optimal.

  3. The paper includes theoretical proofs for the space and regret bounds of the algorithm, complemented by experimental results demonstrating that Log-Li maintains regret close to other state-of-the-art algorithms but with significantly lower memory usage.

缺点

There are several weaknesses in the current manuscript.

  1. The authors acknowledge that Log-Li’s regret is higher and more variable than the benchmark algorithm A-BLiN (as shown in Fig .3) due to memory limitations, requiring repeated exploration of suboptimal regions. This means that Log-Li cannot guarantee the regret performance if the required memory decreases.

  2. The evaluation is too weak to support the claim made in this paper. It only compares with one benchmark algorithm and the performance of the proposed Log-Li is even worse than that of the benchmark algorithm.

  3. The regret bound is highly dependent on the zooming dimension dzd_z . If d_z is large, the regret bound O(Tdz+1dz+2)O(T^{\frac{d_z + 1}{d_z + 2}}) grows close to linear in T T , suggesting that the Log-Li algorithm may struggle with higher-dimensional spaces. The analysis would benefit from a more in-depth exploration of how the zooming dimension affects both regret and space efficiency in high-dimensional spaces.

  4. The presentation of the current manuscript is extremely poor and unfriendly to readers. Though this reviewer may misunderstand the main technique and theoretical results, the poor presentation is an outstanding issue. From this reviewer's point of view, it is way below the standard.

问题

My detailed questions are as follows:

  1. My first question is about the claimed "optimal regret" in this paper. As indicated in Weakness 3, the regret bound is almost linear with a large zoom dimension dzd_z. If this is the optimal upper bound, does it mean that no algorithm can achieve sublinear regret for Lipschitz bandit in high-dimensional spaces?

  2. What is Doubling Edge-Length Sequence rmr_m? And why do we need to set it as 2m+12^{-m+1}?

  3. Corollary 1 implies that with K10K \geq 10 , the probability of non-optimal cube selection exceeds 112\frac{1}{12} . How would this probability scale as TT grows large? Would the required cube size in terms of KK or arm space expand as TT increases, potentially impacting the algorithm’s ability to achieve optimal regret in very high-dimensional spaces?

  4. In Lemma 2, you define the event EE to bound the difference between the estimated and true expected reward. However, the analysis assumes that each cube in AhmA_h^m has been played sufficiently to approximate rewards accurately. Could you clarify how this holds in scenarios with very large or complex arm spaces where |AhmA_h^m| is substantial? Would the regret increase under limited play-per-cube situations, and if so, how could this impact your regret bound?

  5. The paper states that Log-Li’s arm elimination rule requires revisiting previously eliminated areas. In practical terms, this leads to redundant exploration of suboptimal regions, increasing regret. Is there an analytical way to quantify the regret incurred from this repeated exploration? Could adding memory to store limited past elimination history potentially reduce these redundancies?

评论

Thank you for communicating these questions to us. We will clarify them in the following and improve our presentation in the camera-ready version.

Response to W1 & W2

The core of our paper is theoretical analysis. We first prove that the regret of our algorithm, Log-Li, is asymptotically optimal. Based on this, the primary purpose of our experiments is to verify the correctness of our theoretical conclusions, so we focus more on the trend of regret for our algorithm. From our experimental results, it is evident that the regret of our algorithm is indeed asymptotically optimal, which aligns with our expectations. Moreover, our algorithm only uses a constant number of words.

According to our theoretical results, there might be constant differences compared to other algorithms; however, the differences are not significant. Empirically, since we are only concerned with the trend, we have not fine-tuned the parameters of our algorithm. It is possible that by adjusting the parameters, we could achieve better experimental results, but this is not our main focus.

Response to Q1

The regret lower bound is proved to be R(T)=Ω(Tdz+1dz+2)R(T) = \Omega(T^{\frac{d_z+1}{d_z+2}})[1]. This lower bound indicates that any algorithm for the Lipschitz bandit problem will incur at least this amount of regret. Therefore, according to our theoretical analysis, the regret of Log-Li is already asymptotically optimal.

[1] A. Slivkins, “Contextual bandits with similarity information,” J. Mach. Learn. Res., vol. 15, pp. 2533–2568, Jan. 2014.

Response to Q2

We denote rmr_m as the edge length because it corresponds to the side length of each subcube in our partition. We describe it as doubling since it decays at a rate of 1/2. We chose this particular edge-length sequence to achieve the desired optimal regret rate in our regret analysis. Due to space constraints, our algorithm inevitably re-explores incorrect directions; using this sequence helps us control the additional regret from these erroneous explorations.

Response to Q3

Your first point of view is incorrect and this probability will not scale as T grows large. Our proof for the lower bound of space complexity does not depend on the size of T. As demonstrated in our proof, regardless of the size of T, as long as K>10 and space complexity less than 12logK\frac{1}{2}\log K, our Corollary 1 holds.

As T increases, our cube size decreases and the number of K increases. You can see this specifically in the construction of our hard instance (line 394). This is because as T increases, the allowed error decreases, requiring us to more precisely characterize the landscape of reward and thus increase the space requirement(increase by log(T)\log(T)).

Response to Q4

Your point is incorrect; we do not need each cube to be played sufficiently. In Lemma 2, our estimation range for the true reward is related to the number of times we play each arm, denoted as nhn_h. If the number of plays for a particular cube is small, the estimation range is wider; if the number of plays is large, the range is narrower. Regardless of the number of subcubes, the total number of arm plays is T. Therefore, using the union bound, our Lemma 2 always holds. Consequently, the regret bound we obtain under event E is always valid with high probability.

Response to Q5

In the theoretical analysis section, we carefully analyzed all the components of regret caused by Log-Li in each round (line 338). Under our carefully chosen edge sequence, the extra regret incurred is also within the O(Tdz+1dz+2)O(T^{\frac{d_z+1}{d_z+2}})bound.

Adding memory is definitely helpful because it allows us to retain information about previously explored subcubes, reducing exploration in incorrect directions. However, the main focus of our work is on theoretical analysis, and experimental results are not our primary concern.

审稿意见
6

This paper introduces a new algorithm for the Lipshitz bandit problem which uses only O(logT)O(\log T) bits of memory while achieving an optimal regret of O(Tdz+1dz+2)O(T^{\frac{d_{z} + 1}{d_{z} + 2}}). It also showed that Ω(logT)\Omega(\log T) bits of space are necessary for any algorithm to achieve optimal regret. Finally, it numerically showed that the proposed algorithm is superior to an existing method in terms of regret while reducing memory usage sifnificantly.

优点

The proposed algorithm named Log-Li (Log-space Lipschitz) algorithm achieves an optimal regret rate in the Lipschitz bandit problem while using a memory of O(logT)O(\log T) bits space. This is a significant improvement compared to other existing methods, which have to store information of poly(T)poly (T) arms. They also showed that using O(logT)O(\log T) bits is unavoidable if we want to achieve an optimal regret rate. From above, the Log-Li algorithm is optimal in terms of regret and space, which is the contribution of this paper.

缺点

  • The only algorithm that is compared to the Log-Li algorithm is the A-BLiN algorithm. Even though other existing methods use poly(T) bits of memory space, I believe it is still necessary to empirically compare cumulative regret with them.
  • In line 176, Log-Li algorithm generates (rhrh+1)d(\frac{r_{h}}{r_{h + 1}})^{d} subcubes, which possibly consume large amount of memory.
  • The log-Li algorithm has a larger cumulative regret than an existing method empirically.

[Missing citation]

  • Stefan Magureanu, Richard Combes, and Alexandre Proutiere. Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms. COLT2014.

[Minor comments]

  • The authors should refer to the definition of dzd_{z} in the abstract.
  • Periods are missing, and upper and lower case letters are mixed inappropriately.
    • P9 Figure 2-> "The landscape of μ\mu."
    • P10: Figure 4 "Logarithm of number of words saved by Log-Li and A-BLiN for different μ\mu."
    • Line 178: Space between "for" and "each"
    • Lines 99, 208, 380: "theorem"-> "Theorem "

问题

  • Related to the second weakness above, it seems Log-Li uses a large memory space when we partition the arm set (Line 176). How much memory does the algorithm consume in this procedure? Does it exceed memories than that used to store information of each arm?
  • Isn't the "for loop" in Line 194 computationally heavy? In my understanding, the size of B\mathcal{B} is exponentially large in dd, and therefore hurts the computational complexity of the Algorithm.
评论

We would like to thank the reviewer for communicating these questions to us. We will clarify them in the following and correct the citation and grammar issues in the camera-ready version.

Response to missing citation & minor comments

Thank you for pointing that out. We will further refine and edit the text.

Response to W1 & W3

The primary focus of our paper is on theoretical analysis. We begin by demonstrating that our algorithm, Log-Li, achieves asymptotically optimal regret. Consequently, the main aim of our experiments is to validate the accuracy of our theoretical findings, which is why we concentrate on observing the trend of regret in our algorithm. Our experimental results clearly show that the regret of our algorithm is asymptotically optimal, confirming our expectations. Furthermore, our algorithm operates using only a constant number of words.

While our theoretical results suggest there may be differences when compared to other algorithms, these differences are not large. Since our empirical focus is on the overall trend, we did not fine-tune the parameters of our algorithm. Although parameter adjustments might yield improved experimental outcomes, this is not our primary concern.

Response to Q1

We still present our partitioning in the algorithm this way to make it easier for readers to understand how we perform the partition operation on the set. In the actual algorithm, our algorithm only needs to iterate through the newly partitioned subcubes one by one, without needing to explicitly store information for each of them. Our algorithm only needs to store information about 3 arms, therefore it only requires O(logT)O(\log T) space.

Response to Q2

The "for loop" in Line 194 does not incur significant computational overhead. We explicitly include "for loop" over all subcubes in our algorithm just to help readers understand our partitioning method. Considering what we do at each time step from 1 to T, we simply collect a reward and occasionally update μ~m\tilde{\mu}^{m}. These actions do not result in substantial computational overhead.

评论

Thank you for the reply. I have no further questions.

审稿意见
6

This paper introduces the Log-space Lipschitz bandits (Log-Li) algorithm to address high memory usage in the Lipschitz bandit problem by achieving optimal regret with only O(logT\log T) bits of memory. Next the paper provides a space lower bound for the Lipschitz bandit problem showing that the algorithm is optimal in terms of both regret and space.

优点

  • The paper is well written - provides adequate motivation and highlights the technical challenges involved.
  • To the best of my knowledge, the proofs in the main paper seem fine.
  • The space complexity improvement is significant and makes the solution practical.

缺点

No major weakness.

问题

  • This is regarding the novelty of the analysis. The authors say "Similar to prior research, Log-Li identifies and removes the undesirable region of the arm set while exploring and partitioning the remaining area". Could the authors provide references for these prior research. Further could the authors also provide more details on the novelty of the analysis, how it differs from existing analysis and where, if any have they used modifications of existing results.
评论

Thank you for communicating these questions to us. We will clarify them in the following and improve our presentation in the camera-ready version.

Response to Questions

Since arms belong to a very large set in our problem, it is a classic approach to gradually partition the set to learn the landscape of the reward function, such as in [1,2]. Our algorithm also employs this design, resulting in similarities to past algorithms. In the analysis of previous algorithms, to control the regret caused by exploring in the wrong directions, they managed this by saving all explored arms and promptly eliminating and marking them. This strategy helped limit the number of explorations in unpromising directions. However, this strategy fails in our setting because we can only store information for a constant number of arms and will experience some inevitable repeated exploration in incorrect directions. The novelty of our analysis lies in the control of these extra regrets which is not typically seen in analyses of previous algorithms. By improving our search strategy and making the proper choice of the edge sequence of partitions, we show that even with this additional regret, we can still achieve the desired optimal regret bound.

[1] Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pp. 681–690, 2008.

[2] Yasong Feng, Tianyu Wang, et al. Lipschitz bandits with batched feedback. Advances in Neural Information Processing Systems, 35:19836–19848, 2022.

评论

I thank the authors for anwering my question. I do not have any further questions.

AC 元评审

The paper proposes Log-Li, a novel algorithm that achieves near-optimal regret while significantly reducing space complexity to logarithmic levels. Theoretical analyses validate the space and regret bounds, while experimental results show that Log-Li maintains competitive regret compared to state-of-the-art algorithms, albeit with considerably lower memory usage.

However, there are notable weaknesses. Log-Li’s regret is higher and more variable than the benchmark A-BLiN, especially under memory constraints, and its performance degrades in high-dimensional spaces due to dependency on the zooming dimension. I also agree (with Reviewer pV8P) that the presentation of the current manuscript is quite poor as well as the rebuttal is not very comprehensible/ does not address all the comments from the reviewers, e.g., it is not clear how the results and analysis differ from Kleinberg et al. '08. The experimental evaluation is limited to the A-BLiN benchmark, and that, too, the proposed algorithms' regret performance as well as computational efficacy is in question. Based on these, I recommend the authors to please submit the manuscript to the next suitable venue with all the necessary modifications.

审稿人讨论附加意见

See above.

最终决定

Accept (Poster)