PaperHub
6.0
/10
Poster5 位审稿人
最低1最高5标准差1.3
5
4
3
1
3
ICML 2025

Implicit Riemannian Optimism with Applications to Min-Max Problems

OpenReviewPDF
提交: 2025-01-22更新: 2025-07-24

摘要

关键词
Riemannian OptimizationOptimistic algorithmsOnline OptimizationMin-Max optimization

评审与讨论

审稿意见
5

The paper addresses online optimization and min-max optimization on Hadamard manifolds. The authors propose a Riemannian optimistic online learning algorithm called RIOD, designed to match state-of-the-art Euclidean regret bounds while handling in-manifold constraints. Leveraging RIOD, they develop a min-max solver that likewise exhibits near-optimal rates and can enforce constraints.

给作者的问题

No questions.

论据与证据

The claims are well supported.

方法与评估标准

The paper addresses the previously incomplete story of optimistic methods on manifolds by introducing implicit updates that exactly preserve in-manifold constraints. This resolves the limitation that arises in other works.

理论论述

The theorems and propositions are well posed and formally proved.

实验设计与分析

The experiments are well designed.

补充材料

The appendix contains the necessary proofs.

与现有文献的关系

Most prior works either avoided constraints or accepted “improper” iterates that stray from the feasible set. By contrast, RIOD ensures proper iterates with classical regret bounds.

遗漏的重要参考文献

The references are well discussed.

其他优缺点

The results in the paper are novel and strong. This work is a substantial step in unifying the benefits of optimistics online learning with the geometry of negatively curved manifolds.

其他意见或建议

No comments

作者回复

We want to thank the reviewer for their time taken revising our manuscript, as well as the appendix and the experimental section.

审稿意见
4

This paper studies Riemannian online optimization as well as min-max optimization problems. The main contribution is to propose algorithms for these problems and provide theoretical analyses.

给作者的问题

None.

论据与证据

Yes.

方法与评估标准

Yes.

理论论述

I did check the proofs in any detail.

实验设计与分析

There are no experiments, but I don't think they are necessary.

补充材料

I lightly checked the proof of their online optimization analysis.

与现有文献的关系

For online Riemannian optimization, the authors apparently remove a spurious curvature-dependent term from the upper bound in previous analyses. However, curvature-dependent terms re-appear in the time-complexity in the inner steps of their algorithm. It appears that the main improvement relative to previous work is the removal of pre-existing assumptions that circularly linked the curvature and the step-sizes, those being difficult to verify.

遗漏的重要参考文献

None that I am aware of.

其他优缺点

None.

其他意见或建议

None.

作者回复

For online Riemannian optimization, the authors apparently remove a spurious curvature-dependent term from the upper bound in previous analyses. However, curvature-dependent terms re-appear in the time-complexity in the inner steps of their algorithm.

Note that in the online setting, the computational or gradient oracle complexity is a metric independent from the regret. That being said, in minmax and other applications, both metrics translate to computational or gradient oracle complexity, since they are multiplied to obtain the final complexity.

The implementation using our CRGD subroutine has an oracle complexity which does not introduce curvature terms, except in a mild logarithmic term. This showcases the power of CRGD when it comes to reducing the subproblem complexity. We provided the time complexity of PRGD as well, for comparison, since it is a widely used and natural subroutine, this one does have curvature terms.

It appears that the main improvement relative to previous work is the removal of pre-existing assumptions that circularly linked the curvature and the step-sizes, those being difficult to verify.

We would like to note that beyond the contribution mentioned by the reviewer we also improve in the following:

  • Our method can explicitly handle bounded in-manifold constraints, which are interesting in their own right, since constraints are also critical for online optimization algorithms, even in the Euclidean setting.

  • We further obtain important new results in min-max optimization.

    • Our methods reduce the computational complexity with respect to other constrained methods, it obtains a near optimal gradient oracle complexity, and does not require the knowledge of the initial distance to a solution RR and the Lipschitz constant GG (see table 2).
    • It was previously unknown if achieving our rate was possible, since in fact in g-convex and LL-smooth minimization, nearly matching Euclidean upper bounds is impossible due to lower bounds implying an inherent hardness coming from the geometry, but we showed that the minmax problems are of a different nature.
审稿意见
3

This paper studies Riemannian online optimization and min-max optimization. The authors propose a novel online optimistic Riemannian optimization algorithm with inexact updates, which achieves optimal dynamic regret, and also enforces in-mainfold constraints. The authors then applied their online optimization method to Riemannian min-max optimization and obtain either reduced complexity or same rates without requiring the knowledge of certain parameters.

Updated Review

I have read the rebuttal and the other reviews. I would like to thank the authors for their detailed feedback, which addresses most of my major concerns. Regarding the issue raised by Reviewer fAnn, I agree with the authors that online learning on Hadamard manifolds is a challenging problem, and there has been a long line of research in this area.

给作者的问题

Please see above.

论据与证据

The claims made in the submission supported by clear and convincing evidence.

方法与评估标准

The evaluation Criteria are dynamic regret for online optimization convergence rate for min-max optimization, which are standard.

理论论述

All theoretical claims are accompanied by proofs.

实验设计与分析

this is a theory paper and does not have experiments.

补充材料

I have checked the proof in general.

与现有文献的关系

This paper did a good job on discussing and comparing with previous work. I particularly appreciate the two tables on page 4, which clearly present all the convergence results and conditions for existing methods.

遗漏的重要参考文献

N/A

其他优缺点

Strengthes:

The major advantage of the proposed method is that it is the first algorithm for minimizing the dynamic regret on Riemannian manifold that can enforce in-manifold constraints, which is a significant and important contribution to this area. In previous work, such as [Hu et al. (2023a)], the algorithm can only minimize "improper" dynamic regret, that is, the decision set is larger than the competitor. This is not very satisfying as it means that the learner can pick decisions outside of the constrained set. This paper successfully solved this problem with novel inexact updates.

Moreover, the authors applied their optimism methods for Riemannian min-max optimizaiton, which improve on prior works by either reducing the complexity or not requiring the knowledge of certain parameters.

Weaknesses:

My major concern about the proposed methods is about the computational complexity.

In [Hu et al. (2023a)] and [Wang et al. (2023b)], the algorithms are usually iterative, that is, to update the decision, the algorithm only need to query the Rimannian gradients once and commit the update. However, in this paper (i.e., lines 4 and 6 of Algorithm 1), to implement the proposed algorithm we need to solve two optimization problems at each round, which requires multiple queries of the gradients, and it makes the setting not truly "online" and also introduce extra dependence on L and \zeta. Finally, in Corollary 2, it says that the number of implements depend on \eta, so I am not sure how to set the number of implements if eta is not known (since eta=1/sqrt{V_T} and V_T is not known).

Since in the sub-problems, the number of iterations depend on \zeta, I think the final "real" regret should also depend on zeta, as the total number of iterations is not T (but T times the number of inner iterations).

其他意见或建议

Please see above.

作者回复

this paper does not have experiments.

We note that as we stated in the paper, we did provide experiments in Appendix E to validate our theory, with an implementation of our method in a constrained problem setting, for an application that only one other method (RAMMA, an impractical method) could tackle before.

to implement the proposed algorithm we need to solve two optimization problems at each round, which requires multiple queries of the gradients, and it makes the setting not truly "online"

The action x~t\tilde{x}_t which RIOD chooses at time tt is computed based on an implicit step on the hint function ~t\tilde{\ell}_t and the loss from the previous round t1\ell_{t-1} (this dependence is implicit via the iterate xtx_{t}). RIOD does not require the knowledge of the loss function t\ell_t before choosing its action. Therefore, our algorithm fits in the classical online optimization setting, which assumes that after choosing an action, the agent observes the full loss function t1\ell_{t-1} and does not specify what the agent does with it: querying the gradient more than once is always permitted.

Note that one of the previous optimistic methods in Riemannian optimization Hu et al. (2023) also queries the gradient more than once per round.

In fact, in many other online learning settings exact minimization of the previous loss plus possibly some regularizer is assumed, and some other online learning papers studied implicit updates where several gradients are taken Kivinen & Warmuth (1997), Campolongo & Orabona (2020) Chen & Orabona (2023) with some advantages over explicit methods. Note however that no work obtained an optimistic online learning algorithm using implicit updates.

However, in this paper (i.e., lines 4 and 6 of Algorithm 1), to implement the proposed algorithm we need to solve two optimization problems at each round, which [...] [introduces] extra dependence on LL and ζ\zeta

The error criteria of the updates in lines 4 and 6 of Algorithm 1 depend on LL and ζ\zeta, but this only introduces a logarithmic dependence when using CRGD. In any case, the regret metric is independent of the gradient complexity so this does not imply a dependence on LL and ζ\zeta in the regret bounds, not even a logarithmic one.

Finally, in Corollary 2, it says that the number of implements depend on η\eta;, so I am not sure how to set the number of implements if eta is not known (since eta=1/sqrtVTeta=1/sqrt{V_T} and VTV_T is not known).

In general, RIOD works for any η>0\eta>0 in contrast to the previous explicit optimistic methods, which require η1/L\eta\lesssim 1/L (and possibly require some geometric terms in the Riemannian setting) in order to show the regret bound, that matches the OOMD Rakhlin et al. (2013). As for Rakhlin et al. (2013), in this work we do not focus on adapting for the best value of \eta. But we note that (A) we did not require this to obtain optimal convergence for smooth g-convex g-concave optimization and (B) in the general case, when VTV_T is not known in advance, we can follow the method laid out in Theorem 3, Zhao et al, (2020), which is based on using an expert meta-algorithm, that is not dependent on the type of algorithm being run. This methods was shown to work in the Riemannian setting in Hu et al. (2023).

Since in the sub-problems, the number of iterations depend on ζ\zeta, I think the final "real" regret should also depend on zeta, as the total number of iterations is not TT (but TT times the number of inner iterations).

In the online setting, the computational or gradient complexity is an independent metric from the regret, so there is no extra factor. But indeed in the minmax problem both metrics multiply in order to obtain the total gradient complexity, that yields our rates in Table 2, which contain a dependence on ζ\zeta for PRGD or RGD, but only a logarithmic factor for CRGD. This showcases the power of CRGD when it comes to reducing the subproblem complexity.

We appreciate that the reviewer found the paper well-written and valued our results in Riemannian online learning and minmax problems, overcoming several difficulties in existing work. If you now have a better opinion of our work, we kindly ask you to consider increasing your score.

审稿意见
1

In this paper the authors consider Riemannian optimism on Hadamard manifolds.

给作者的问题

NA

论据与证据

Theoretical paper, no evidence presented.

方法与评估标准

None present.

理论论述

Yes, they were briefly checked.

实验设计与分析

None.

补充材料

Did not consider.

与现有文献的关系

NA

遗漏的重要参考文献

NA

其他优缺点

The authors claim that their concentration bounds no longer rely on the curvature of the space is a strength, however, Hadamard manifolds have non-positive curvature. The issue I see with this is that positively curved spaces are the challenging manifolds, in a sense. For instance, in the estimation of the Karcher mean, there are strong limitations on positively curved spaces but on non-positively curved spaces there is no such issues. Generally speaking, non positively curved manifolds behave very similar to Euclidean spaces and thus I don't see the results of this paper as being surprising. Lastly, the main paper didn't have any experimental or simulation results which I see as a huge limitation.

其他意见或建议

NA

作者回复

Experimental results

We note that as stated in the paper, we did provide experiments in Appendix E with an implementation of our method, in a constrained problem setting, an application that only one other method (RAMMA, an impractical method) could tackle before, showing that now it is possible to efficiently solve such problems.

The Hadamard assumption is standard in the field and quite general (accelerated minimization algorithms [1], lower bounds, [2], [3], online learning [4], [5], [6], among many others [7], [8], [9], [10], [11], [12]).

The Hadamard case is challenging enough, note that besides other optimization problems listed above, 7 papers before ours study our minmax problem and none of them could achieve our parameter freeness, or working with constraints or achieving the optimal convergence rates. Similarly, for the online learning case, prior work could not achieve the optimal regret, while at the same time they had unreasonable assumptions, like some circularity between the step sizes (requiring a specific bound on the iterates a prior) and where the iterates lied.

审稿意见
3

This paper proposes implicit Riemannian optimistic online methods for problems with g-convex and g-concave objectives, accommodating in-manifold constraints and matching state-of-the-art Euclidean rates. The analysis removes the dependence on geometric constants which closes the gap between the Riemannian problem and its Euclidean counterpart, in some sense. The principle is applied to the (Riemannian) minmax problem and obtains new results. Experiments on the Karcher mean problem are implemented to demonstrate the effectiveness of the proposed methods.

给作者的问题

  1. One of the main contributions emphasized by the authors is the removal of dependence on geometric constants (e.g., ζ\zeta) from complexity bounds. Could the authors provide intuitive explanations regarding why the proposed framework successfully circumvents this geometric hurdle, whereas previous methods have not?
  2. Corollary 2 reveals that implementing the proposed framework with PRGD still introduces dependence on the geometric constant ζ\zeta while using CRGD does not. This raises a natural question: is the removal of geometric constants primarily due to the choice of the optimization subroutine (CRGD) rather than the proposed framework (RIOD) itself? Clarifying this point would help better highlight the true sources of improvement in the paper.
  3. Adopting CRGD as a subroutine removes dependence on ζ\zeta. However, implementing the CRGD update steps might introduce additional computational complexity in practice. Therefore, it would be valuable to experimentally implement and compare the CRGD-based version of the algorithm, rather than limiting the discussion solely to theoretical advantages.
  4. In line 225 (right column), where the author explains OFTRL algorithms without linearizing the loss functions, the statement "minimizing i=1ti(xi)+~t(xt)+12ηd(xt,x)2\sum_{i=1}^t\ell_i(x_i)+\tilde{\ell}_t(x_t)+\frac{1}{2\eta}d(x_t,x)^2" seems confusing in that the variable xx only appears in the last term of the objective.
  5. The comparison presented in Table 2 clearly outlines the superiority of this work over existing methods. However, in Appendix E, the implementation only includes the proposed approach. It would be insightful to experimentally compare the proposed methods with some of the algorithms listed in Table 2.

I would like to increase my grade if the concerns are addressed.

论据与证据

The paper is well-written, and the presentation of Table 1 and Table 2 is detailed, clarifying contributions to Riemannian optimization, online learning, and minmax problem.

方法与评估标准

The extension to minmax problem is interesting, which broadens the work's applicability in machine learning community.

理论论述

The theory is solid, and the method (RIOD), with the min-max extension (RIODA) is shown to yield good regret and oracle-complexity bounds that overcome difficulties in existing work.

实验设计与分析

The experiments provided in this paper are limited exclusively to the minmax setting. However, since the original motivation of this work is to address the Riemannian online learning problem, the minmax experiments should serve as an illustrative application rather than the primary experimental validation. Expanding the experimental section to assess performance in online learning settings would significantly strengthen the paper and better align with its stated purpose.

补充材料

The appendices are well-organized and easy to follow.

与现有文献的关系

The extension to Riemannian minmax problem is interesting, which broadens the application of Riemannian optimisatoin.

遗漏的重要参考文献

It seems that all the highly related references are discussed.

其他优缺点

See above.

其他意见或建议

See questions.

作者回复

Experiments in online learning settings would strengthen the paper

We note that the online learning setting assumes an adversarial agent generating losses against our algorithm. This means that in order to empirically validate the bound, we would need to craft a worst-case adversary to play against our algorithm. This is not only impractical, but it is also undesirable since online learning is a framework that says that we perform well with respect to the best fixed action, and so one also wants to see if it also performs well in favorable scenarios.

In the literature, typical ways of validating online learning bounds in practice are settings where losses are unpredictable, just as the landscape of min max optimization losses.

One of the main contributions emphasized by the authors is the removal of dependence on geometric constants (e.g., ζ\zeta ) from complexity bounds. Could the authors provide intuitive explanations regarding why the proposed framework successfully circumvents this geometric hurdle, whereas previous methods have not?

There are several independent factors that made our final result possible. We found that (A) implicit (proximal) approaches make the outer loop of our algorithms be free of these constants if used with optimism, as opposed to most of prior work that uses explicit methods and have to rely on the use of Riemannian Cosine-Law Inequalities introducing ζ\zeta factors (Appendix D.1). Then (B), achieving such implicit optimism requires a careful structure where losses are not linearized since the linearizations are not g-convex (unlike in the Euclidean space). Finally (C) the use of a subroutine such as CRGD as opposed to the most commonly used Riemannian Gradient Descent ones makes the inner problem be also free of these constants, up to a log factor.

The comparison presented in Table 2 clearly outlines the superiority of this work over existing methods. It would be insightful to experimentally compare the CRGD-based version and other proposed methods with some of the algorithms listed in Table 2.

We focused on showcasing that we can effectively optimize problems with in-manifold constraints, which was not possible to optimize before, except with RAMMA in Martínez-Rubio et al. (2023). While this method is of theoretical interest, as it establishes new upper bounds, it is highly complex and consists of 5 loops, making it impractical.

In line 225 (right column)...

Thanks for pointing this typo out, it should have been i=1ti(x)+~t(x)+12ηd(xt,x)2\sum_{i=1}^{t}\ell_i(x) +\tilde{\ell}_t(x) + \frac{1}{2\eta}d(x_t,x)^2.

Corollary 2 reveals that implementing the proposed framework with PRGD still introduces dependence on the geometric constant while using CRGD does not. Is the removal of geometric constants primarily due to the choice of the optimization subroutine (CRGD) rather than the proposed framework (RIOD) itself?

Note that in the online setting, the regret depends only on the cumulative cost t=1Tt(xt)\sum_{t=1}^T \ell_t(x_t) paid for the agent's actions, and is independent of the oracle complexity. Hence, the improved regret guarantees (table 1) are independent of the oracle complexity of the optimization subroutine used to compute the actions.

The improvement in terms of the regret is achieved via our inexact implicit optimistic technique. Corollary 2 only discusses the oracle complexity of implementing the subproblems up to the specified precision, which is orthogonal to the regret. Removing the geometric constant in the subroutine is mostly due to CRGD. On the other hand, when applying our method to optimization, the complexities do multiply, making necessary to be free of these constants at both levels, but CRGD only adds a log factor.

The design of the subroutine with our implicit approach has other advantages orthogonal to the use of CRGD or any other subroutine, such as removing the need to know the initial distance a priori or the Lipschitz constant (see table 2).

We appreciate that the reviewer found the paper well-written and valued our results in Riemannian online learning and minmax problems, overcoming several difficulties in existing work. If you now have a better opinion of our work, we kindly ask you to consider increasing your score.

审稿人评论

Thanks for the explanation and clarification. I would keep my original score.

最终决定

All the reviewers are appreciative of the work and believe that the paper presents interesting insights for enforcing the manifold constraint. The proposed updates are definitely novel. A concern which has been raised is the lack of experiments.