PaperHub
5.7
/10
Poster3 位审稿人
最低5最高6标准差0.5
6
5
6
3.0
置信度
正确性2.7
贡献度2.3
表达3.0
NeurIPS 2024

Gradient Methods for Online DR-Submodular Maximization with Stochastic Long-Term Constraints

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06

摘要

In this paper, we consider the problem of online monotone DR-submodular maximization subject to long-term stochastic constraints. Specifically, at each round $t\in [T]$, after committing an action $\mathbf{x}_t$, a random reward $f_t(\mathbf{x}_t)$ and an unbiased gradient estimate of the point $\widetilde{\nabla}f_t(\mathbf{x}_t)$ (semi-bandit feedback) are revealed. Meanwhile, a budget of $g_t(\mathbf{x}_t)$, which is linear and stochastic, is consumed of its total allotted budget $B_T$. We propose a gradient ascent based algorithm that achieves $\frac{1}{2}$-regret of $\mathcal{O}(\sqrt{T})$ with $\mathcal{O}(T^{3/4})$ constraint violation with high probability. Moreover, when first-order full-information feedback is available, we propose an algorithm that achieves $(1-1/e)$-regret of $\mathcal{O}(\sqrt{T})$ with $\mathcal{O}(T^{3/4})$ constraint violation. These algorithms significantly improve over the state-of-the-art in terms of query complexity.
关键词
DR-submodularlong term constraintgradient ascent

评审与讨论

审稿意见
6

In this work, the authors address the problem of online DR-submodular maximization subject to long-term stochastic (linear) constraints. At each round t[T]t\in[T], after committing an action \xt\x_t, a random reward and an unbiased estimate of the point are revealed.

This paper focuses on the stochastic DR-submodular maximization setting, where the objective functions are i.i.d. sampled from a distribution. Moreover, this work also considers the semi-bandit feedback setting where only the noisy function value and an unbiased gradient estimator of that point can be observed.

The main contributions of this work can be summarized as follows:

  • A stochastic gradient ascent-based algorithm for stochastic online DR-submodular maximization with stochastic long-term constraints achieving O(T)12O(\sqrt{T}) \frac{1}{2}-regret and O(T)O(\sqrt{T}) constraint violation with high probability. in the semi-bandit feedback setting.
  • A stochastic gradient ascent-based algorithm for stochastic online DR-submodular maximization with stochastic long-term constraints achieving O(T)(11e)O(\sqrt{T}) (1-\frac{1}{e})-regret and O(T)O(\sqrt{T}) constraint violation with high probability. in the semi-bandit feedback setting.

It is worth noting that the query complexity is significantly lower compared to previous works in both settings.

优点

The presentation was clear, and I enjoyed reading this paper.

The results were clearly explained, and I especially liked the proof of Algorithm 1.

缺点

I have two concerns:

  • The paper lacks empirical evaluation, which would be valuable in highlighting a use case and demonstrating the effectiveness of your proposed approach.
  • The theoretical results, while commendable, are not particularly groundbreaking.

问题

I have no questions

局限性

The authors adequately addressed the limitations.

作者回复

Thank you for your time and your encouraging review. We address your concerns in the following:

Weaknesses

The paper lacks empirical evaluation, which would be valuable in highlighting a use case and demonstrating the effectiveness of your proposed approach.

We agree with you that empirical evaluation can be valuable to demonstrate the effectiveness of proposed approaches in real-world applications. However, we highlight a few points:

  1. Our paper primarily emphasizes the theoretical contribution of our approach.

  2. Our Algorithm 1 is the first algorithm for online stochastic DR-submodular optimization with long-term constraints under semi-bandit feedback. There would not be any baselines to compare it to. All of the algorithms from previous works listed in Table 1 are for simpler settings (full information feedback, 0K0 \in \mathcal{K}, some requiring exact gradients). We could potentially empirically evaluate Algorithm 1 against those in simpler settings (full information feedback, 0K0 \in \mathcal{K}, exact gradients) but we do not think it would be fair to do so.

  3. Similarly, our Algorithm 2 is the first algorithm for online stochastic DR-submodular optimization with long-term constraints with full-information feedback (when 0K0 \in \mathcal{K} is queryable). There would not be any baselines to compare to (aside from our Algorithm 1 designed for a harder setting of semi-bandit feedback). We could potentially apply Algorithm 2 in a simpler setting (0K0 \in \mathcal{K}, exact gradients) to evaluate against other algorithms listed in Table 1, though we think it would not be entirely fair to do so.

  4. Existing methods for DR-submodular maximization with long-term constraints typically exhibit high query complexity, often exhibiting T\sqrt{T} per round for query complexity. In contrast, our method achieves a query complexity of just 1 per round.

The theoretical results, while commendable, are not particularly groundbreaking.

We highlight some aspects we believe are significant in Q1 of the general response above.

Questions

I have no questions

评论

Thank you for the response.

Regarding the empirical evaluation baselines, the choice largely depends on the specific experiments. In most cases, you can compare your results to random baselines and existing methods (I suggest plotting the function value and the query complexity).

I've decided to maintain my score since the theoretical contribution of this work is nice but not groundbreaking.

评论

Thank you again for your time and consideration.

In most cases, you can compare your results to random baselines and existing methods

We would like re-confirm that we agree experiments can be valuable but would like to clarify that there are no existing methods to compare to for the problem setting we considered. Our algorithms are the first. At best we could apply our methods for simpler problems (full information, 0K0\in\mathcal{K}), though other methods designed for that special setting could have an advantage. We would also note that random baselines would incur linear regret and anticipate that such a comparison may not be illuminating.

审稿意见
5

This paper investigates the online learning of DR-submodular with a stochastic budget (linear) constraint. The constraints vary from different rounds. In each round, the constraint is sampled from an unknown distribution independently, and the constraint in round tt can only be observed after the learner have submitted her action xtx_t. The goal is to minimize the regret and total constraint violation. This paper achieves O(T)O(\sqrt{T}) regret and O(T)O(\sqrt{T}) total constraint violation in both semi-bandit setting and first-order feedback setting. Here, the semi-bandit setting means the learner can only obtain the gradient at the played action. The first-order feedback setting means the learner can query a gradient at any point each round.

For semi-bandit setting, the author achieves O(T)O(\sqrt{T}) 0.5-regret and O(T)O(\sqrt{T}) total constraint violation. For the first-order feedback setting, the author achieves O(T)O(\sqrt{T}) (1-1/e)-regret and O(T)O(\sqrt{T}) constraint violation. These results have the same bound with the existing works, but the algorithms in this paper only need to query 1 gradient each round and the existing algorithms need T\sqrt{T} gradient queries each round. Reducing the number of gradient queries is the main contribution of this paper.

优点

  1. The problem studied is well motivated and has many potential applications
  2. The proposed algorithm is computational-efficient compared with the existing results, and the authors provide a rigorous theoretical guarantee.

缺点

  1. The main weakness is the technical contribution. The method used in this paper is a simple combination between non-oblivious PGA [37] and regularized Lagrangian function techniques [26]. I do not find the new analysis or algorithmic challenge.
  2. The author's references to previous works do not seem to be appropriate. Especially the citation about the paper which proposed the non-oblivious PGA. For example, in line 135—137, Lemma 2 is just a simple corollary from [37] (just setting the curvature to 1 in that paper). However, the authors use vague statements, that is “The proof can be derived from [37] and is presented in Lemma 2 of [39]”. First, I think the author should directly cite [37] behind the Lemma 2 rather than use another statement to show the reference, since this lemma is proposed in [37]. Second, [39] seems do not contribute to this lemma (this paper just writes down the lemma where the submodular curvature is set to 1), so the author should not mention [39] here. Also, in line 286—289, Lemma 7 is also a direct corollary of Proposition 1 of [37] (by setting curvature γ=1). However, the author did not cite [37] directly behind Lemma 7. The authors just write “[37] present a computational approach for…” and did not mention that Lemma 7 is also from [37].
  3. The 1/2 approximation ratio is not satisfactory, even in the semi-bandit setting. It's known that the 1-1/e approximation ratio can be achieved in the full-bandit setting, which is a harder setting than the semi-bandit setting. As the authors say, these bandit algorithms with 1-1/e approximation ratio require the assumption that 0∈K. But I don't think it is necessary to remove this assumption.

问题

  1. There is a new paper extends the non-oblivious PGA technique to non-monotone function setting: https://arxiv.org/abs/2401.08330. Can we use this result to generalize your results to non-monotone setting?
  2. Why do the authors assume that the utility functions are sampled i.i.d. from a distribution? We know the online gradient ascent works for adversarial environment. Can we till derive the current results if the utility functions are adversarially selected? This question also applies on the constraint, what if the constraint is adversarially selected?
  3. See weakness 1, what is the new challenge in your algorithm design and analysis?

局限性

Yes

作者回复

Thank you very much for your careful reading and helpful suggestions. Next, we will address some of your concerns in "Weakness" and "Questions".

Weaknesses

  1. The main weakness is the technical contribution. The method used in this paper is a simple combination between non-oblivious PGA and regularized Lagrangian function techniques. I do not find the new analysis or algorithmic challenge.

We address technical contributions in Q1 of the general response above.

  1. The author's references to previous works do not seem to be appropriate \dots

Thank you for your detailed suggestions. We agree with you and will cite [37] for Lemmas 2 and 7 to clarify [Zhang et al.]'s contributions.

  1. The 1/2 approximation ratio is not satisfactory, even in the semi-bandit setting. It's known that the 1-1/e approximation ratio can be achieved in the full-bandit setting, which is a harder setting than the semi-bandit setting. As the authors say, these bandit algorithms with 1-1/e approximation ratio require the assumption that 0K0\in \mathcal{K}. But I don't think it is necessary to remove this assumption.

First, in the final version of the paper we will include application examples that showcase the importance of general convex regions K\mathcal{K} for which the origin is not feasible. Please refer to Q2 in the general response above for examples.

Second, we note that when 0K0\in \mathcal{K}, there is no clear path to get O(T)O(\sqrt{T}) (11/e)(1-1/e)-regret with semi-bandit feedback even without considering long-term constraints. When the origin is included, the best known (11/e)(1-1/e)-regret bound for semi-bandit feedback without long-term constraint is O(T3/4)O(T^{3/4}) in Pedramfar et al., 2023 [24]. As indicated by the non-oblivious function in Lemma 2, the 11/e1-1/e approximation needs information at two locations (e.g. a function value at xx and a gradient value at zxz*x). One possibility is that when 0K0\in \mathcal{K} and we have semi-bandit feedback, we believe we could utilize an explore-then-commit approach to bound (11/e)(1-1/e)-regret, but the horizon TT dependence would be worse than O(T)O(\sqrt{T}). For reference, Table 1 in [Pedramfar et al., 2024] is good summary of regret bounds for online DR-submodular optimization under different types of feedback (semi and full bandit), objective functions, and feasible regions.

Questions

  1. There is a new paper extending the non-oblivious PGA technique to non-monotone function setting: https://arxiv.org/abs/2401.08330. Can we use this result to generalize your results to non-monotone setting?

That is a great question. We think it might be possible but were not able to carefully verify this in the short rebuttal time. We will leave it as a future direction.

  1. Why do the authors assume that the utility functions are sampled i.i.d. from a distribution? We know the online gradient ascent works for adversarial environment. Can we still derive the current results if the utility functions are adversarially selected? This question also applies on the constraint, what if the constraint is adversarially selected?

First, we note that the analysis we perform does not extend to adversarial objectives. Specifically, the operations in equation (38), which uses tower rule of expectations to establish upper bound on the difference of Lagrangian functions, rely on the stochastic nature of the objective. On the other hand, in adversarial objective scenarios, we anticipate that the methodologies applied in Lemma 7.5 from [Hassani et al., 2017] could be used to go through those steps, but we will leave it as future work.

Second, we emphasize that in DR-submodular maximization, the stochastic and adversarial settings are distinctly separate scenarios, as we've detailed in lines 61-63 of the paper: "Note that this setting is still of interest because as opposed of assuming each arriving function ftf_t being DR-submodular, we only assume the expectation of ftf_t to possess DR-submodularity". Working with the stochastic objectives holds equal significance to the adversarial ones, given that many real-world applications are commonly framed within a stochastic framework, such as maximum coverage and budget allocation problems (described in general response), among others.

When the constraints are selected adversarially, for either stochastic or adversarial objectives, there is a known hardness result: [Mannor et al., 2009] provided a simple counterexample showing that achieving sub-linear regret against the best-fixed benchmark action in hindsight while maintaining sub-linear total constraint violation is not always possible. Subsequently, works addressing adversarial constraints alter the regret definition to consider only benchmark satisfying the constraint proportionally over some window of size W[T]W\in [T], e.g., [Sadeghi et al., 2020]. We have included a discussion with this point in Appendix G. It is not clear if our results in this paper could be extended to that setting.

  1. See weakness 1, what is the new challenge in your algorithm design and analysis?

We address technical contributions in Q1 of the general response above.

评论

Thank you for your response. I generally agree with your clarifications, especially regarding the motivation for removing the 0K0\in\mathcal{K} assumption and the consideration of a stochastic constraint.

However, I respectfully disagree with your rebuttal concerning the "technical contribution, 2." While the 1/21/2-approximation algorithm is indeed simple and efficient, I believe that its design and analysis do not introduce new ideas or require significant effort. Researchers might not have documented this algorithm because the 1/21/2 approximation ratio is not particularly compelling. But I agree with that Lemma 6 provides some new ideas. So I will raise my score to 5 and I recommend that you highlight the distinctions in your analysis compared to previous works in your next revision.

评论

Thank you again for your time and feedback.

In the next revision we will more clearly highlight distinctions with prior works.

Researchers might not have documented this algorithm because the approximation ratio is not particularly compelling.

We would like to clarify that in our rebuttal, we meant our algorithms that handle long-term constraints (using a single query per round and yeild O(T)O(\sqrt{T}) constraint violation). We do not claim novelty for algorithm design for the simpler setting without long-term constraints.

The 1/2 ratio for monotone DR submodular maximization (no long-term constraints) using a projected gradient method was shown by Hassani et al. [10]. (It appears it was not until Pedramfar et al. [24] that it was pointed out that all reported (1-1/e) guarantees were based on 0K0\in \mathcal{K} or at least queryable while the 1/2 was not, leading to a conjecture that 1/2 may be the best achievable when 0∉K0\not \in \mathcal{K} for semi-bandit or bandit feedback.)

审稿意见
6

The paper studied the problem of online monotone DR-submodular maximization subject to long term stochastic constraints. In detail, it explored the stochastic DR-submodular maximization setting, where the objective functions are i.i.d. sampled from a distribution. Previous works considered adversarial objective functions at each step. Additionally, it also considered the semi-bandit feedback setting, where only the noisy function value ft(xt)f_t(\mathbf x_t) and an unbiased gradient estimator of that point ~ft(xt)\tilde{\nabla} f_t\left(\mathbf{x}_t\right) can be observed. They proposed the first stochastic gradient ascent based algorithm in both semi-bandit feedback setting and first order full-information setting.

优点

  • The paper introduced two interesting settings: stochastic utility and semi-bandit feedback, which are more general than those previously studied in the literature.

  • They proposed the first stochastic gradient ascent based algorithm under these novel settings. Their algorithms require only 1 gradient query per round, while all previous works require T\sqrt{T}. This makes their algorithms significantly more query-efficient compared to prior approaches. These algorithms were inspired by primal-dual updates in online convex optimization, and it was interesting to see this connection.

  • The paper is well-written, with clear and easy-to-follow presentation and writing

缺点

  • The authors did not motivate the stochastic DR-submodular setting well. It is not explained why previous works used adversarial setting and what the motivation is for relaxing this scenario.
  • It isn't clear what the signfiicance is of not requiring that 0 be in the constraint region.

问题

  • What is the significance of handling search spaces that do not necessarily include 0 between your algorithm and other algorithms? Could you provide more details?
  • Why do most prior works assume the adversarial choice of functions? Are there any example applications that would fall into the stochastic setting considered in this paper?

局限性

Yes

作者回复

Thank you very much for your review. Here we address the questions in "Weakness" and "Questions" as follows.

Weaknesses

  • The authors did not motivate the stochastic DR-submodular setting well. It is not explained why previous works used adversarial setting and what the motivation is for relaxing this scenario to stochastic setting.

First, for objectives, we emphasize that in DR-submodular maximization, the stochastic and adversarial settings are distinctly separate scenarios, as we've detailed in lines 61-63 of the paper: "Note that this setting is still of interest because as opposed of assuming each arriving function ftf_t being DR-submodular, we only assume the expectation of ftf_t to possess DR-submodularity". Working with the stochastic objectives holds equal significance to the adversarial ones, given that many real-world applications are commonly framed within a stochastic framework, such as influence maximization and facility location problems (described in general response), among others.

Second, when the constraints are selected adversarially, for either stochastic or adversarial objectives, there is a known hardness result: [Mannor et al., 2009] provided a simple counterexample showing that achieving sub-linear regret against the best-fixed benchmark action in hindsight while maintaining sub-linear total constraint violation is not always possible. Subsequently, works addressing adversarial constraints alter the regret definition to consider only benchmark satisfying the constraint proportionally over some window of size W[T]W\in [T], e.g., [Sadeghi et al., 2020]. We have included a discussion with this point in Appendix G. It is not clear if our results in this paper could be extended to that setting.

  • It isn't clear what the significance is of not requiring that 0 be in the constraint region.

See response to the first question below.

Questions

  • What is the significance of handling search spaces that do not necessarily include 0 between your algorithm and other algorithms? Could you provide more details?

In offline DR-submodular maximization literature (without long-term constraints), there appears to be a significant hardness gap between 0K0\in \mathcal{K} and otherwise when only feasible points can be queried (e.g. with semi-bandit and bandit feedback). With K\mathcal{K} being a general convex set, the best-known approximation ratio is 1/2 [Hassani et al., 2017]. The tightness of the upper bound is not known, but is conjectured to be 1/2 [Pedramfar et al., 2023]. With the assumption 0K0\in \mathcal{K}, the best-known result is 11/e1-1/e [Mokhtari et al., 2020].

In online scenarios, we note that when 0K0\in \mathcal{K}, there is no clear path to get 11/e1-1/e-regret in semi-bandit setting with only T\sqrt{T} regret even not considering long-term constraint. When the origin is included, the best known 11/e1-1/e-regret in semi-bandit setting without long-term constraint is O(T3/4)O(T^{3/4}) in [Pedramfar et al., 2023]. As indicated by Lemma 2, the 11/e1-1/e approximation needs additional information (to query function value at z\*xz\*x). One possibility is that when 0K0\in \mathcal{K}, we could utilize an ETC style in the semi-bandit setting, where we alternately sample zt\*xtz_t\*x_t (giving estimates for F\nabla F), and xtx_t (giving reward signal,) to achieve 11/e1-1/e approximation ratio but a regret worse than O(T)O(\sqrt{T}). Table 1 in [Pedramfar et al., 2023] gives a great summary of regret results for online DR-submodular optimization.

In the final version of the paper, we will also include application examples that showcase the importance of general convex regions K\mathcal{K} for which the origin is not feasible. Please refer to Q2 in the general response above for those examples.

  • Why do most prior works assume the adversarial choice of functions? Are there any example applications that would fall into the stochastic setting considered in this paper?

Please see response to the first Weakness above.

评论

Thank you for the response and answering my questions. As I am not an expert in the area, I will defer to the other reviewers regarding the theoretical / technical contributions and their novelty.

作者回复

We thank all the reviewers for carefully reading the paper and their constructive suggestions. In the following we address some common questions.

Regarding technical contribution.

Some reviewers have concerns about limited technical contribution. Here we emphasize two technical novelties:

  1. Lemma 6, which upper bounds the relaxed constraint violation (g~t\tilde{g}_t) using dual variables (λt\lambda_t), is completely new and is a critical result to derive the O(T)\mathcal{O}(\sqrt{T}) constraint violation. Previous works did not have an analogous bound. Without this bound, we would only be able to obtain a worse O(T3/4)\mathcal{O}(T^{3/4}) constraint violation. We obtained Lemma 6 by carefully digging into the updates of the λt\lambda_t's and discovering some relations that we could leverage. We note that this lemma could be of independent interest to the online (non-)convex optimization community. For other papers dealing with the simpler setting of OCO that achieve better than O(T3/4)\mathcal{O}(T^{3/4}) constraint violation, some special conditions (e.g., strong duality [Akhtar et al., 2021] or a strong convexity surrogate function [Yu et al., 2020]) are needed. However, when the objective is non-convex and such conditions are difficult to establish, our approach could potentially be used to obtain O(T)\mathcal{O}(\sqrt{T}) constraint violation. DR-submodular objective is one such setting.

  2. Our algorithm design is simple, yet it has remarkably outperformed existing benchmarks in both regret and constraint violations. We believe achieving superior guarantees with a simple algorithm is a strength of our work, not a weakness. Furthermore, this simplicity also translates into complexity efficiency, as evidenced by the substantial reduction in the number of gradient oracle calls required by our algorithm (1 oracle query per round for our methods versus O(T)O(\sqrt{T}) for previous methods).

We will emphasize these arguments in the final version of the paper.

Application examples of 0K0 \notin \mathcal{K}.

We will discuss the following example motivating applications in the final paper to highlight the importance of general convex regions K\mathcal{K} for which the origin is not feasible. For such problems, the best-known approximation ratio is 1/2 [Hassani et al., 2017].

  1. Maximum Coverage [Krause et al., 2014] Imagine a city's emergency management agency aiming to optimize the deployment of emergency response teams (firefighters, medical personnel, rescue teams) in many rounds during crises like major fires or earthquakes. The objective is to maximize coverage across different city zones. Allocating 0 teams to any zone isn't feasible, as it means no emergency response. The goal is to allocate resources in each round to maximize overall response coverage, while satisfying the long term agency budget.

  2. Budget Allocation [Soma et al., 2014] This is a apecial case of the influence maximization problem. Let a bipartite graph G=(S,T;W)G = (S, T; W) represent a social network, where SS and TT are collections of advertising channels and customers, respectively. The edge weight represents the influence probability of channel ss to customer tt. The goal is to distribute the per-round budget among the source nodes, and to maximize the expected influence on the potential customers over multiple rounds, while satisfying a long-term constraint (e.g., money spent). Corporate management may require a minimum level of customer engagement with each campaign overall or from select target groups. There may also be per-round minimum contractual purchase requirements with the advertising partners. Thus, allocating 0 budget in any round may not be permitted.

  3. Facility location [Krause et al., 2014] Consider a scenario where a company needs to decide on the locations (virtual) to set up new service centers to maximize service coverage over multiple rounds, while satisfying a total budget constraint over all rounds. At each round, each customer segment must receive at least a certain level of service or coverage, which means 0 is not a feasible solution because no facilities cannot provide any service.

All the problems above were initially studied in discrete domain and extended to continuous domain in [Bian et al., 2019]. Furthermore, when faced with a discrete objective, one can always use the "relax and rounding" strategy to transition from addressing a discrete problem to tackling a continuous one. Such techniques are frequently utilized within the submodular maximization community, as exemplified by the work of [Chen et al., 2018].

Additional References

Bian et al. Continuous Submodular Function Maximization. 2020.

Krause and Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems. Cambridge University Press, 2014

Soma et al. Optimal budget allocation: Theoretical guarantee and efficient algorithm. ICML 2014.

Chen et al. Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity. ICML 2018.

最终决定

Overall, the reviewers appreciated this paper's results, but there were some concerns regarding the technical contribution of the results and the comparison with prior work that was cleared during the author-reviewer discussion phase of the paper. The authors are encouraged to include the modifications suggested by the reviewers in the final version.