PaperHub
5.6
/10
Poster5 位审稿人
最低4最高7标准差1.2
5
5
4
7
7
3.2
置信度
正确性3.2
贡献度2.8
表达2.4
NeurIPS 2024

Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06
TL;DR

We provide the first minimax regret lower bound for the online submodular maximization with stochastic bandit feedback, and an upperbound matching it in terms of time T and number of arms n.

摘要

We consider maximizing an unknown monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ with cardinality constraint under stochastic bandit feedback. At each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $|S_t| \leq k$ and receives reward $f(S_t) + \eta_t$ where $\eta_t$ is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum $f(S_*)$ with $|S_*| = k$, obtained through robust greedy maximization of $f$. To date, the best regret bound in the literature scales as $k n^{1/3} T^{2/3}$. And by trivially treating every set as a unique arm one deduces that $\sqrt{ {n \choose k} T }$ is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like $\tilde{\Omega}(\min_{L \le k}(L^{1/3}n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$. For a slightly restricted algorithm class, we prove a stronger regret lower bound of $\tilde{\Omega}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$. Moreover, we propose an algorithm Sub-UCB that achieves regret $\tilde{\mathcal{O}}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$ capable of matching the lower bound on regret for the restricted class up to logarithmic factors.
关键词
BanditsSubmodular optimizationMinimax optimal

评审与讨论

审稿意见
5

The authors consider stochastic bandit submodular maximization. For this problem, there are two known regret upper bounds: a T\sqrt{T} regret upper bound with large coefficients obtained by naively considering the problem as the multi-armed bandit problem, and a T2/3T^{2/3} regret upper bound with a relatively small coefficient obtained by the well-known greedy maximization procedure. This paper clarifies that the minimax regret of stochastic bandit submodular maximization can be roughly characterized by the minimum of these two bounds. Additionally, the paper demonstrates that this minimax regret can be achieved by a UCB-based algorithm.

优点

In online submodular maximization with bandit feedback, the focus has often been on achieving a T2/3T^{2/3} regret guarantee with reasonable coefficients, and focusing on the T\sqrt{T} regret upper bound obtained by naively applying MAB is new. To my knowledge, this paper presents the first lower bound for bandit submodular maximization, which is an important contribution to the community. Instead of using the commonly employed approximation regret in bandit submodular maximization, it uses regret defined based on the comparator determined by the greedy algorithm, which is somewhat appealing. Although the paper gives an impression of not being thoroughly proofread (see Weakness), it is overall easy to read.

缺点

One of the major contributions of this paper is the lower bound, but its explanation could be improved. While an intuitive explanation is provided from Lines 168 to 172, it is unclear why regret minimization algorithms (such as ETC and the "standard" MAB algorithm) appear in the discussion of the lower bound.

The authors are expected to provide a more detailed explanation of existing algorithms in Section 3. The current explanation lacks clarity on how f^\hat{f} is computed.

The regret definition involves a comparison with Sk,0S^{k,\mathbf{0}}. Although the authors state that comparing with Sk,0S^{k,\mathbf{0}} may be impossible in a noisy setting, can the authors present a lower bound for this statement?

In Section 1.2, the authors mention the use of the Lovasz extension for online submodular maximization. Isn't this technique used for submodular minimization?

Additionally, the paper overall gives the impression of not being thoroughly proofread. To highlight some minor points: L120-122: [22] in Theorem 4.1, [24] in Theorem 1, .... -> Theorem 4.1 in [22], Theorem 1 in [24], ...., L160: Showed -> showed, L232: abbreviation ETC is not defined, Sec3: There are SiS^{i} and S(i)S^{(i)}, which are confusing, L268: UBC -> UCB.

问题

The reviewer expects the authors to address the questions mentioned above.

局限性

not appicable

作者回复

We are grateful for your review and suggestions.

The mention of two types of algorithms in the lower bound section is intended to provide more intuition for the T2/3T^{2/3} term of the lower bound, which is uncommon in the bandits literature. To avoid any confusion, we will move this intuition to the introduction section.

To answer your question about Sk,0\mathcal{S}^{k, 0}, the lower bound for regret against a noiseless greedy solution is Ω(min(T,(nk)T))\Omega(\min(T, \sqrt{\binom{n}{k} T})), indicating that submodularity does not improve the regret. The intuition behind this is that the gap between the optimal set of cardinality ii and any other set of the same cardinality can be arbitrarily small (e.g., T10T^{-10}) for any i<ki < k and large only for sets of size kk. This means that while the noiseless greedy solution can equal the optimal set of size kk, distinguishing the optimal greedy elements in lower cardinalities with only TT queries is infeasible. Therefore, only querying sets of size kk provides any information about the optimal set. Although this is known in the literature, we will add it to the appendix to give more intuition on why robust greedy solutions are the appropriate benchmark for regret.

In the paragraph starting on line 155, we briefly mention two extensions of discrete submodular functions to the continuous domain and no-regret algorithms for these continuous functions, not as a technique for maximizing discrete submodular bandits. We will expand this paragraph to survey the domain expansion techniques for our problem setting (in which multilinear extension is used for maximization) as well. We also appreciate the proofreading corrections and will incorporate them.

We appreciate the reviewer for highlighting the typographical errors and will simplify the redundant notation in the upper bound section (e.g., only using μ^\hat{\mu} for estimation of the expected reward).

评论

Thank you for the rebuttal to the review. I have reviewed the contents of the rebuttal and I will maintain my current score.

审稿意见
5

The authors investigate lower bounds for regret for combinatorial multi-armed bandit (CMAB) problems under submodular rewards and bandit feedback. For stochastic environments, there has been an open question about the gap between O(T)O(\sqrt{T}) dependence typically seen in MAB problems (and submodular CMAB with extra feedback) and the O(T2/3)O(T^{2/3}) dependence typically achieved using explore-then-commit (ETC) algorithms for bandit feedback in past works. The authors propose a related but different notion of regret then used in previous works and show this regret does indeed have a Ω(T2/3)\Omega(T^{2/3}) dependence. Their formula generalizes the ``small’’ horizon TT regime studied in those past works and identify a minimax bound that interpolates Ω(T2/3)\Omega(T^{2/3}) for small TT and Ω(T)\Omega(\sqrt{T}) for large TT (i.e. essentially large enough that separate super-arms can be treated as arms in a standard MAB algorithm). The authors include experiments on toy functions to illustrate the interpolation.

优点

Major

  • This paper is among the first to tackle lower bounds for combinatorial (cardinality constrained) MAB with monotone submodular rewards and bandit feedback in stochastic environments.
    • To do so the authors argue that a notion of regret based on nesting of near-greedily selected sets, which the authors point out was implicit in the proofs of several prior papers (all of which adapted the same classic greedy approximation algorithm from the offline setting), should be used instead of the 11/e1-1/e-regret that has been used in the past.
    • For this alternative (greedy) regret, the authors prove a Ω(T2/3)\Omega(T^{2/3}) lower bound (for the so called “small TT” regime), strongly suggesting that there is indeed a hardness gap between CMAB problems with bandit feedback and other studied classes (standard MAB, linear CMAB, even submodular CMAB with semi-bandit feedback).
    • The authors also propose an algorithm with matching upper bound regret. The algorithm basically adapts the greedy algorithm (like previous ETC algorithms had) for some of the horizon and then switches to UCB-style over nesting super-sets)

Minor

  • The lower bounds and algorithm proposed is not only for the so called “small” TT regime but for large enough TT that one can sample super-arms as regular arms in a standard MAB algorithm to get T\sqrt{T} dependence. The lower bound and their matching algorithm identify the trade-off (basically to what cardinality ii^* does one follow the greedy before switching to treating all (nesting) cardinality kk superarms as standard MAB arms).
  • The authors include experiments with two classes of toy functions to illustrate the trade-offs between ETC style algorithms (designed in prior works for “small” TT) and UCB style (known to get T\sqrt{T} regret bound for very large TT).

缺点

Major

  • [21] proved Ω(T2/3)\Omega(T^{2/3}) lower bounds for the adversarial setting under a slightly different feedback model (though is relevant for any explore-then-commit strategies) and for a general class of problems (including submodular CMAB as a special case) which is not discussed.

Minor

  • The experiment section does not clearly specify nn or kk or ii^* for the problem instances considered. In the main text, for nn, there is only a mention in line 263 only for the weighted set cover “For n=15n=15, we use…” but that makes it sound like you are using multiple values of nn. Since ii^* is described as a “critical quantity” (line 231), its value should be transparent in the experiments. Only Figure 1’s caption, which is for the submodular cover, clearly mentions nn and kk (without mention of ii^*’s value).
  • I think more discussion on what ``small’’ entails for some potential applications would be valuable. In line 169 the example of T=O(n4)T= O(n^4) is given. In the experiments even for a small nn and kk for toy experiments it is around TT is in the millions to billions that there is a tradeoff, though would such large TT be plausible for problems mentioned in the introduction as motivation? Would the rewards remain i.i.d. for each super-arm over such a large horizon?

Very Minor

  • for the summary 127-133, this is pointing out that for instances of a special sub-class which have a strictly better α=1\alpha=1 worst-case approximation guarantee, the α\alpha-regret of the larger class with provably harder instances behaves weirdly (even giving negative regret values). That is an important observation, but α\alpha-regret bounds are for a class of problems, arising from the worst cases, and there are harder cases for cardinality-constrained submodular optimization than modular functions, where the (11/e)(1-1/e) approximation coefficient comes from.
  • line 80 missing parentheses
  • line 268 “UBC”
  • line 287 ‘were’--> ‘where’
  • In the experiments, 1-Gaussian noise seems strange given the functions are bounded in [0,1].
  • Figure 2 legend use power of 10 notation, or add commas
  • I think the intro may flow better if you open with problems that are submodular and mention they are important, then say people have been trying to solve them but it is unclear the extent that current methods are the best that could be had

问题

Major

  1. The greedy algorithm from Nemhauser and Wolsey is classic and was the most widely adapted for online settings, but there are other approximation algorithms for monotone cardinality-constrained submodular maximization that behave differently in subset selection, such as the threshold greedy https://theory.stanford.edu/~jvondrak/data/submod-fast.pdf. From a quick look it is not clear to me that the sets would satisfy a nesting like that in line 102 (though I didn’t check carefully, maybe it would in a different order than elements get added). So while the RgrR_{gr} lower bound is over any online algorithm, I would wonder if it is possible it might be too strict, in the sense that a different regret notion based on some structure underlying a different algorithm (like a ``RthreshgrR_{thresh-gr}’’ regret) could be defined, still with R11/eRthreshgr R_{1-1/e} \leq R_{thresh-gr} perhaps permitting T\sqrt{T} upper bound even in the regime where TT is small relative to nn choose kk?

  2. How would this work generalize to other problems, even similar but harder problems like monotone submodular maximization with knapsack constraints or intersections of matroid and knapsack constraints, or non-monotone problems? There are known approximation coefficients, so the α\alpha regret could be easily defined accordingly (of course with limitations on interpretability you point out) but would some analog of Lemma 1.1 need to be identified for each of those problem classes and then (3) defined, and in some sense would we need to choose a reference algorithm if there were more than one available that achieved the same approximation guarantee?

Very Minor

  1. Lemma 1.1 what is the definition of Sgrk,ϵS_{gr}^{k,\epsilon}? SgrfS_{gr}^{f} was defined earlier wrt exact value oracle for ff. But Sgrk,ϵS_{gr}^{k,\epsilon} is ambiguous. If in Lemma 1.1 any set SSk,ϵS \in \mathcal{S}^{k,\epsilon} satisfies then I’d suggest the dummy variable SS for clarity.
  2. what is the size of ii for which it is advantageous to be less than kk? if in the “small T” regime, that is interesting.
  3. Theorem 2.1 requires kn/3k\leq \lfloor{n/3}\rfloor -- is there a simple reason for that?
  4. In the experiments, there are no error bars on the figures – was each marker for a single run?

局限性

Yes

作者回复

We appreciate your thorough review and valuable suggestions.

In [21], Theorem 7 provides a lower bound on the convergence rate of bandit Blackwell games. In our setting, as you correctly pointed out, this reduction implies that all explore-then-commit greedy algorithms in adversarial setting have Ω(T2/3)\Omega(T^{2/3}) regret, which is straightforward to show. In our work, we prove a lower bound on stochastic submodular maximization with cardinality constraint for any algorithm, not just explore-then-commit algorithms. We will add this to the related work section for completeness.

The time horizon can be very large for the web-layout example mentioned in the introduction, as the number of users can be much larger than (nk){n \choose k}. This makes the tradeoff relevant in these applications. It is reasonable to assume that reward of the users visiting the webpage are i.i.d.

The value of ii^* in the experiments is equal to the definition in line 219, and its value for each TT is highlighted in Figure 2. We will repeat the definition for clarity.

Regarding your questions:

  1. The fast submodular algorithm in [1] is indeed an ϵ\epsilon-greedy solution, given that the function is bounded by 1, as it is in our problem setting. In the proof of Claim 3.1, they first show that for the added element aa, the inequality fS(a)(1ϵ)fS(x)f_S(a) \ge (1 - \epsilon)f_S(x) holds for any xO\Sx \in O \backslash S, thus fS(a)maxxO\SfS(x)ϵf_S(a) \ge \max_{x \in O \backslash S} f_S(x) - \epsilon. The existence of another approximation for submodular maximization which achieves 1e11 - e^{-1} rate but is looser than the greedy procedure (specifically for modular or low-curvature submodular functions) that could achieve lower regret bounds is an interesting open problem.

  2. To generalize this to other problem domains, an equivalent of Lemma 1.1 is needed to show that a robust greedy solution approximates the optimal solution, similar to Definition 5 in [21]. Generalizing this, especially to the continuous domain DR-submodular maximization, is an interesting future direction.

  3. Sgrk,ϵS^{k, \epsilon}_{gr} represents any set of robust greedy solutions Sk,ϵ\mathcal{S}^{k, \epsilon}, as defined in line 102. Thank you for the suggestion; we will simplify the notation in Lemma 1.1 for clarity.

  4. In the algorithm, a stop level ii less than kk is used when TT is large, allowing the algorithm to explore at least (nki)\binom{n}{k - i} sets of size kik - i, thereby reducing the number of greedy steps needed.

  5. This choice is made for technical reasons in the proof to ensure that the class of alternate functions with different optimal sets and no intersection is sufficiently large (i.e., (nkk)\binom{n - k}{k}).

[1] Ashwinkumar Badanidiyuru and Jan Vondrák, "Fast algorithms for maximizing submodular functions", SODA 2014, https://theory.stanford.edu/~jvondrak/data/submod-fast.pdf

评论

We thank the reviewer for the time and effort spent reviewing our paper. As the discussion phase is about to end, we would like to make sure our responses have sufficiently addressed your concerns. We look forward to your feedback.

评论

Thanks to the authors for their response. I have read the rebuttal and looked over comments/responses to other reviewers. I have decided to keep my score. I do not have further questions.

审稿意见
4

The paper addresses the problem of stochastic submodular bandits. The main contributions are twofold: it provides a new lower bound for the problem and proposes a novel UCB-type algorithm whose regret matches this lower bound.

优点

  • The paper introduces notation clearly and provides a sufficiently thorough comparison with related work.
  • The lower bound proof is new. While it adheres to the standard framework for proving regret lower bounds in stochastic bandits, the challenging aspect lies in constructing the problem instance. The problem instance constructed is original.
  • The proposed algorithm showcases an interesting insight on the hardness of the problem, even though it combines well-known approaches for online submodular maximization.

缺点

  • Sub-UCB is a modified version of the algorithm in Nie et al. (2022). While this algorithm has a lower regret upper bound, the algorithm itself is an incremental change compared to prior work.
  • The lower bound is demonstrated to hold for the minimax expected regret RgrR_{gr}, which differs from the α\alpha-regret commonly used in the literature. While the former is shown to be an upper bound for the latter, it can potentially be much larger. Consequently, a lower bound for RgrR_{gr} does not necessarily imply a corresponding lower bound for RαR_\alpha. Therefore, I believe it is not sound to argue for the minimax optimality of the problem at hand based on this specific change in the regret notion adopted and analyzed.
  • It seems that the lower bound result needs to be conditioned on a harsh prerequisite that T needs to be very large. For example, when k=10k=10, T needs to be at least 101110^11 for the lower bound to hold.

问题

I make the following suggestions:

  • It would be better to formally prove the constructed problem instances (both k=2k=2 and general k cases) are submodular functions (possibly in a lemma). It is not obvious to me for general kk thy those functions are submodular.
  • Also, the minimax optimal version of the UCB or Thompson Sampling can be used in the second phase to fully match the lower bound.

局限性

See Weakness and Questions.

作者回复

Thank you for your review.

We noticed that this review appears to be copied from a previous conference. We would like to highlight that the concerns raised have already been addressed in the current submission. Specifically:

  • In the introduction section (from line 119), we discuss the indirect use of RgrR_{gr} in previous works (i.e., the proofs are upper bounding this measure of regret), and then using Lemma 1.1 to upper bound RαR_{\alpha}. We also argue that α\alpha-regret is zero or negative in many instances, and therefore a loose measure of regret in our setting.

  • The detailed proof of submodularity of the instance is provided in Lemma A.1 (with a proof sketch for k=2k=2 on line 181).

  • We discuss the main novelty of our upper bound, which includes adding an optimal stopping cardinality to the greedy procedure depending on the time horizon.

We would appreciate receiving your feedback on the current version of the paper.

评论

There are two major contributions of this work, and here I address my concerns for each of them.

  1. A regret lower bound is derived for submodular maximization with bandit feedback. If I understand correctly, the regret lower bound applies specifically on RgrR_{gr}, which differs from the commonly referenced RαR_\alpha in existing literature. While it is acceptable to use RgrR_{gr} for regret upper bound, as an upper bound on RgrR_{gr} suggests an upper bound on RαR_\alpha, the converse does not hold true for lower bounds. The modified definition of regret, RgrR_{gr}, appears to be more aligned with algorithms of the Greedy+ETC style (thus not surprisingly leading to T2/3T^{2/3}), suggesting it may be a specialized metric rather than a universally applicable one. Consequently, it may be premature to assert the minimax optimality of the problem based solely on this particular shifted concept of regret.

  2. An algorithm with regret upper bound matching the established lower bound is proposed. It's a mixture of the ETC and UCB strategies, which explains the presence of both T1/2T^{1/2} and T2/3T^{2/3} terms in the regret formula. To me, the alignment of the regret with the lower bound is expected given how it was formulated. Moreover, the selection process for the optimal stopping level ll remains unclear. Even it can be calculated efficiently, as indicated in Figure 2, an optimal stopping level does not indicate the best regret. Thus, how ll can be selected to get the best regret is questionable.

I apologize I did not notice the new Lemma A.1 provided. However, the paper's contribution may not sufficiently meet the rigorous standards of novelty and impact that NeurIPS is known for. As such, I will maintain my current scoring, reflecting the above considerations.

评论

We appreciate your concerns; however, we've already addressed them in the current version of the paper. More specifically:

  1. In lines 119-133, we provide a detailed discussion on why RgrR_{gr} can be a more appropriate measure of regret for our problem setting. Firstly, previous works are providing upper bounds for RgrR_{gr}, albeit indirectly. Furthermore, the offline greedy procedure provides the optimal approximation rate (which is close to 11-approximation for low-curvature submodular functions), so it is reasonable to directly measure the regret against it.

  2. In lines 275-277, we explain that the stop level ll is chosen as a minimizer of worst-case regret in Theorem 3.1, making the algorithm (nearly) minimax optimal. Consequently, it might not be the exact optimal stop level, as the gaps between the expected rewards of different sets of the same cardinality can be large, causing the greedy procedure to halt faster.

It appears that these sections and the changes we have made were not taken into account in the recent review. We kindly ask the reviewer to base their evaluation on the current version of the paper. We would greatly appreciate any feedback on the updated manuscript.

评论

I have carefully considered the current version of your work and I have fully noticed the parts you pointed to. However, I don't feel your answers fully addressed my questions and concerns.

审稿意见
7

This paper studies the stochastic submodular bandit problem. In this problem, the decision-maker needs to select a subset with size at most k from a known ground set, and then the decision-maker gains a stochastic reward associated with the subset. The expectation of the reward is a submodular function. This problem is a natural extension of combinatorial linear bandit which relaxed the linearity assumption of the reward function.

This paper mainly focuses on the lower bound of the submodular bandit problem, which is a recognized open question. Different from other literature of submodular bandit, this paper defines a new notion of regret which is called robust greedy regret. Roughly speaking, the new regret compares the algorithm gained reward with the reward gained by the output subset of an offline “approximated” greedy algorithm. The authors prove a lower bound of their defined notion of regret. This lower bound changes from kn1/3T2/3kn^{1/3} T^{2/3} to nkT1/2n^k T^{1/2} as the size of T, and this results implies that T2/3T^{2/3}-type regret is inevitable when T is relatively small(polynomial to k and n). This paper also proposed an algorithm that nearly match the lower bound. All their results are in the sense of the new defined “robust greedy regret” rather than commonly used γ-regret.

优点

The major contribution of this paper is the lower bound. Even though the first algorithm for (adversarial) submodular bandit has been proposed for 17 years, we do not have any regret lower bound now. Given this fact, I think a lower bound result on this problem should be extensively encouraged.

Back to the previous algorithms for submodular bandit (both stochastic setting and adversarial setting), all these algorithms only achieve O(T2/3)O(T^{2/3}) regret. The main reason is that these algorithms must clearly distinguish whether the current round is an exploration round or an exploitation round. The technique in this paper somewhat explains the intuition behind that. That is, the small size subset will bring more information but cost large regret, and the full-rank subset incurs low regret but bring less information. This naturally divide the subset with different size to exploration action and exploitation action. In this sense, I think this paper do give some intuition on the hardness of submodular bandit and this difficulty also suggests that the submodular bandit is completely different in nature from the linear bandit.

Overall, this paper provides me important knowledge and I believe it will also bring new knowledge to the people who are working on submodular bandit.

缺点

The main weakness of this paper is that the regret lower bound is in terms of the new defined notion of regret—“robust greedy regret”. However, almost all papers about submodular bandit or more generally combinatorial bandit are considering the notion of a scaled “γ\gamma-regret”. As the authors state, robust greedy regret is always larger than gamma regret, so a lower bound of robust greedy regret does not imply anything about the lower bound of γ\gamma-regret. Also, the major difficulty of proving the lower bound of γ\gamma-regret is “the immediate regret can be negative”, which does not exist in traditional 1-regret. However, by defining the notion of robust greedy regret, the authors can avoid this difficulty. The robust greedy regret lower bound is actually a traditional 1-regret lower bound in the hardcase authors have constructed. So I think this result is still quite far away from figuring out the hardness of γ\gamma-regret.

As the author states, γ\gamma-regret may not be a good metric of submodular bandit problem or other combinatorial bandit problem. But the robust greedy result depends too much on the form of offline algorithm. Also, not all algorithms for submodular bandit are based on the greedy algorithm. For example, the algorithms in [1][2] are totally different from the offline greedy algorithm and they are also not based on the “offline algorithm to online” style. So it’s even hard to define a robust ”xxx” regret. These papers are also missed in the related literature. Thus, I think it’s inappropriate to state “γ\gamma-regret is not appropriate for this problem”. At least, γ\gamma-regret provides a way to evaluate all-type algorithm rather than restrict the algorithm in the category of “offline to online conversion”.

Please note, even if I proposed this weakness. I still appreciate the author's contributions to the lower bound results and I think the contribution of this paper outweighs the weaknesses.

[1] Zongqi Wan, Jialin Zhang, Wei Chen, Xiaoming Sun, Zhijie Zhang, "Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits", ICML 2023

[2] Stephen Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi, “Sum-max Submodular Bandits”, AISTATS 2024.

问题

Some suggestions:

  1. In page 19, the proof of lemma C.1 should be added. Although I have checked that the claim is correct, a proof is necessary for a complete paper or the authors can give the reference of the proof.
  2. In abstract, the authors use the notation L. But in the main text, they use the notation i instead of L.
  3. Line 80, there should be a “(” before “min”.
  4. It’s better to remove the claim in line 132: “RgrR_{gr} is the more appropriate measure of performance … not RαR_{\alpha}”. Or use more rigorous language to limit the scope of the claim to those algorithms that use offline to online conversion.
  5. Some related literature on submodular bandit has been missed, for example, [1][2]. Also, the current version of the literature review of online continuous submodular maximization looks very weird, the authors missed all the important papers in this field. For example, [3,4,5] and more. It is better to survey more in this field.

[1]Zongqi Wan, Jialin Zhang, Wei Chen, Xiaoming Sun, Zhijie Zhang, "Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits", ICML 2023

[2]Stephen Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi, “Sum-max Submodular Bandits”, AISTATS 2024.

[3]Zhang, M., Chen, L., Hassani, H., and Karbasi, A. “Online continuous submodular maximization: from full information to bandit feedback.” NeurIPS 2019

[4]Chen, L., Hassani, H., and Karbasi, A. “Online continuous submodular maximization”, AISTATS 2018

[5] Zhang, Q., Deng, Z., Chen, Z., Hu, H., and Yang, Y. “Stochastic continuous submodular maximization: Boosting via non-oblivious function”. ICML 2022

局限性

Yes

作者回复

We appreciate your thorough review and valuable suggestions.

We agree that the statement in line 132 that Rgr R_{gr} is always more appropriate than Rα R_{\alpha} is too strong. We will change it to: "In studying regret against approximations attained by an offline step-wise greedy procedure, RgrR_{gr} can be a more appropriate measure than RαR_{\alpha}."

Regarding Lemma C.1, it is a variant of Hölder's inequality. We will add a short proof in the Appendix to provide clarity and support.

We will expand the discussion on continuous submodular optimization within the Related Works section to survey important results in this area. Thank you for these additional references. Extending RgrR_{gr} to the continuous domain remains an interesting open problem that we will highlight.

评论

Thank you for your response. I would like to keep my score.

审稿意见
7

The authors of this paper study the submodular maximization problem with bandit feedback. By adopting regret as the metric, the authors prove the minimax bound of the regret for their problem. A UCB-based algorithm is devised to tackle the submodular optimization problem and the authors prove that this algorithm's regret almost matches the minimax bound. In sum, this paper provides some new and solid theoretical results for submodular optimization.

优点

S1. The authors prove the minimax regret bound of the submodular maximization problem with bandit feedback.

S2. The authors also devise the algorithm Sub-UCB that can match the minimax bound up to logarithmic factors.

S3. The theoretical results in this paper seem correct.

缺点

W1. There are some typos in the paper. For example, in the abstract "we prove, we prove that".

W2. The experiment is not interesting as it is a pure numerical simulation. What about applications of submodular optimization such as influence maximization where the objective can only be obtained via sampling and is uncertain?

W3. Instead of (1-1/e)-regret, the authors adopt regret as the metric for the reason that (1-1/e)-regret may be loose. Although the greedy solution's approximation is 1-1/e in theory, in practice the approximation ratio can often be close to 1. For example, for the modular objective case mentioned by the authors, greedy-based algorithm is actually optimal. Therefore, the justification of using regret rather than (1-1/e)-regret is not that strong.

问题

W2

局限性

W1, W2, W3

作者回复

We appreciate your review and valuable suggestions.

Regarding your question on the applications of this setting, one example is antibiotic prescription, as mentioned in the introduction section. The reason for noisy feedback in this context is that patients come from a population with an unknown response distribution to an antibiotic for a specific disease. Our experiments were intended to demonstrate the theoretical interpolation between T1/2T^{1/2} and T2/3T^{2/3} regret bounds in our main theorems, and showing an application of submodular maximization wasn't our primary goal.

We agree with the reviewer's point W3 that the offline greedy solution for modular functions is optimal. In our paper, we also argue that regret against the greedy solution (which would be equal to 11-regret) is more natural than (1e1)(1 - e^{-1})-regret for these functions.

最终决定

Reviewers expressed mostly positive opinions on the novelty and significance of this work. Still, some doubts were cast on the level of the authors' contributions due to a possibly not broad enough scope and some disagreement on the significance of the setting/novelty of the algorithmic ideas. Overall, I agree with the majority of the reviewers that the positives outweigh the negatives, and I recommend acceptance (poster).