Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games
This paper proposes a novel payoff-perturbed Gradient Ascent that achieves fast last-iterate convergence rates.
摘要
评审与讨论
This paper studies last-iterate convergence rates of online learning in monotone games. The main contribution is an algorithm called Gradient Ascent with Boosting Payoff Perturbation (GABP). The GABP algorithm achieves (1) last-iterate convergence with full gradient feedback, which is near-optimal; (2) and last-iterate convergence with noisy gradient feedback (the noise is zero-mean with bounded variance). The latter result improves prior results of . Moreover, the GABP algorithm guarantees an individual dynamic regret of under full gradient feedback, slightly worse than the state-of-the-art bound of . This paper also contains numerical experiments on small game instances to demonstrate the effectiveness of GABP.
优点
The problem of last-iterate convergence rates of no-regret learning algorithms in monotone games is relevant and interesting. Most existing results focus on the full gradient feedback, while only a few provide concrete convergence rates under the noisy gradient or the bandit feedback. The proposed GABP algorithm has near-optimal last-iterate convergence rate under full gradient feedback. It also improves the convergence rates under noisy gradient feedback from to . This is a solid contribution to learning in games, although the rate for the noisy gradient feedback setting may not be tight.
缺点
- The proposed GABP algorithm does not achieve the optimal last-iterate convergence rate under full gradient feedback. The last-iterate convergence rate is also not tight for the noisy feedback.
- The relationship between the proposed GABP algorithm and the AOG algorithm in [1] and the intuition behind the fast last-iterate convergence rates is not clearly discussed. These two algorithms are different (as shown in Appendix F) but share similar ideas. The anchoring term in both algorithms comes from the (implicit) Halpern iteration algorithm, which can not be run directly. The difference is that GABP views each step of Halpern iteration as a fixed point problem (Line 170) and uses an inner loop of steps to get an -approximation (this is called updating the reference strategy in the paper.); In contrast, AOG directly uses optimism to approximate the implicit update. This leads to GABP being a log factor slower than AOG in the full gradient setting. However, the approximating the fixed point approach is more robust in the noisy gradient setting due to strong monotonicity. Moreover, the potential function and the approximately non-increasing potential analysis are very similar to that used in [1]. If they are inspired by [1] then this should be acknowledged.
[1] Doubly Optimal No-Regret Learning in Monotone Games, Cai and Zheng, ICML 2023.
问题
- Could you explain Equation (5) and the related discussion? This inequality is not consistent with the text above. In particular, this inequality contains only , and the superscript does not appear in the inequality.
- What is the intuition behind the convergence rate in the noisy gradient setting? In your opinion, what is the best possible rate that can be achieved by the current approach?
局限性
N/A
We are deeply appreciative of your positive feedback and constructive comments, especially your invaluable suggestions on improving our presentation. We will incorporate your feedback into our presentation to make it better. The detailed answers to each of the questions can be found below.
Weakness and Question 1
The proposed GABP algorithm does not achieve the optimal last-iterate convergence rate under full gradient feedback. The last-iterate convergence rate is also not tight for the noisy feedback.
What is the intuition behind the convergence rate in the noisy gradient setting? In your opinion, what is the best possible rate that can be achieved by the current approach?
Answer
As you pointed out, it is still unrevealed whether our convergence rate in the noisy feedback setting is tight or not, given the absence of existing results on lower bounds and faster rates than ours. Nevertheless, we anticipate that the convergence rate we have achieved in the noisy feedback setting may not be optimal. This is largely due to the fact that we have roughly derived the upper bound on and by using Cauchy–Schwarz inequality in the proof of Lemma B.2. If we can tighten this upper bound, it could potentially lead to an improved convergence rate of in the noisy feedback setting for the same algorithm.
Weakness and Question 2
The relationship between the proposed GABP algorithm and the AOG algorithm in [1] and the intuition behind the fast last-iterate convergence rates is not clearly discussed. These two algorithms are different (as shown in Appendix F) but share similar ideas.
Moreover, the potential function and the approximately non-increasing potential analysis are very similar to that used in [1]. If they are inspired by [1] then this should be acknowledged.
Answer
We agree with you that GABP can be viewed as an approximation of Halpern iteration, whereas it has a different update scheme from the AOG algorithm. From the perspective of theoretical guarantee, our potential function (in Eq. (8)) is indeed inspired by the one used in [1] while there are slight variations. We will ensure to elucidate the detailed relationship between our study and [1] in the revised manuscript.
Weakness and Question 3
Could you explain Equation (5) and the related discussion? This inequality is not consistent with the text above. In particular, this inequality contains only , and the superscript does not appear in the inequality.
Answer
Thank you for pointing out the typo regarding Equation (5)! As you pointed out, the term should be , not . We will correct this in the revised version of the manuscript.
I acknowledge that I have read the rebuttal. I will keep my current score.
The paper introduces a novel algorithmic approach to enhance the convergence of first-order methods in the context of monotone games. The authors propose a payoff perturbation technique that introduces strong convexity to players' payoff functions, which is crucial for achieving last-iterate convergence. This technique is particularly designed to handle scenarios where the gradient of the payoff functions is monotone and potentially noisy. The paper presents a method called Gradient Ascent with Boosting Payoff Perturbation (GABP), which incorporates a unique perturbation into the payoff function and maintains a periodically re-initializing anchoring strategy. The authors demonstrate that GABP offers faster last-iterate convergence rates compared to existing algorithms, even in the presence of additive noise.
优点
Originality: The paper presents a unique perturbation technique that addresses the challenge of last-iterate convergence in monotone games. The proposed GABP algorithm is an innovative modification of Adaptively Perturbed Mirror Descent (APMD), offering improved convergence rates.
Quality: The theoretical development is thorough, with rigorous proofs provided for the convergence rates of GABP in both full and noisy feedback settings. The paper also includes a detailed analysis of the algorithm's performance in terms of individual regret.
Clarity: The paper is well-organized, with clear explanations of the algorithm, theoretical results, and experimental setup. The use of pseudo-code for GABP aids in understanding the algorithm's implementation.
Significance: The work contributes to the field of online learning in games, providing a solution that is particularly relevant for applications such as Generative Adversarial Networks (GANs) and large language model fine-tuning, where last-iterate convergence is desirable.
缺点
Experimental Validation: While the paper provides empirical results, the experiments could be expanded to include a broader range of game types and noise levels to further validate the robustness and generalizability of GABP.
Comparison with State-of-the-Art: The paper compares GABP with APMD and Optimistic Gradient Ascent (OGA) but could benefit from a more comprehensive comparison with other existing methods in the literature to better situate its contributions.
Practical Considerations: While the paper addresses the theoretical aspects of GABP, it could provide more insights into practical considerations, such as the implementation challenges and potential modifications needed for real-world applications.
问题
None
局限性
None
We thank you for your positive feedback and constructive comments. The detailed answers to each of the questions can be found below.
Weakness 1
Experimental Validation: While the paper provides empirical results, the experiments could be expanded to include a broader range of game types and noise levels to further validate the robustness and generalizability of GABP.
Answer
We would like to note that the main contribution of this paper lies in its theoretical aspects, and the experimental validation serves as supportive evidence. Indeed, most existing studies on learning in monotone games either completely lack experimental results or solely focus on experiments with zero-sum games at a single noise level. That being said, if you are still concerned about this, we will be able to conduct some additional experiments.
Weakness 2
Comparison with State-of-the-Art: The paper compares GABP with APMD and Optimistic Gradient Ascent (OGA) but could benefit from a more comprehensive comparison with other existing methods in the literature to better situate its contributions.
Answer
We can argue that OGA is a representative baseline that ensures last-iterate convergence under full feedback. Among payoff-perturbed algorithms, there are some highly regarded ones, such as Sokota et al. [2023] and Liu et al. [2023], although they do have no theoretical guarantee under noisy feedback. In contrast, we are the first to provide a theoretical guarantee for last-iterate convergence under noisy feedback.
Weakness 3
Practical Considerations: While the paper addresses the theoretical aspects of GABP, it could provide more insights into practical considerations, such as the implementation challenges and potential modifications needed for real-world applications.
Answer
We believe that our technique could be applied to real-world applications, due to its simplicity. While this issue is beyond the scope of this paper, as you are pointing out, our perturbation approach would help us to solve large-scale imperfect information games [Perolat et al. 2022] and minimax optimization for preference learning with LLMs [Munos et al. 2023], especially because our technique is effective for training even with deep neural network architectures. We will extend our algorithm in a follow-up paper.
- Munos, Rémi, et al. “Nash learning from human feedback.” 2023.
- Perolat, Julien, et al. "Mastering the game of Stratego with model-free multiagent reinforcement learning." 2022.
This work focuses on last-iterate convergence of game dynamics. A payoff perturbation technique is proposed by adding strong convexity to players' payoff functions. Despite it is a well studied technqiue in learning in repeated games with first-order methods, especially in last-iterate convergence, a novel perturbation scheme introduced in this paper allows on to provide faster last-iterate convergence compared to previous works.
优点
This paper provides a relatively complete result containing last-iterate convergence rate of the proposed algorithm GABP in full feedback and noisy feedback. The faster rate of convergence is an improvement compared to existing works. Except for some weakness (will be stated later), the presentation of this paper is clear to understand. The authors have reviewed most related works to my best knowledge, so that the contributions claimed are easy to follow. Addition to theoretical works, this paper has provided experiments (sufficient in my opinion) showing the comparison of GABP and existing algorithms such as Adaptively perturbed gradient ascent and Optimistic gradient ascent.
缺点
One obvious spot that should be added to improve the presentation is the following. The game considered in this paper is motivated by real-life examples. But the authors only give one example motivating monotone games. Part of contributions of the paper is claimed to be the study of two feedback models: full feedback and noisy feedback, but there is not specific examples and applications illustrating the importance of these settings. For sure readers can always find related works even just by googling the keywords, but providing concrete application scenes where the gradient of payoff can be achieved perfectly or only partially achievable gradients can be obtained is important, especially "noisy feedback" can be just a model of many cases.
问题
Noisy feedback has been studied in some previous works e.g. Mertikopoulos and Zhou, 2019, Hsieh et al. 2019. What is technical difference/challenge of this paper comparing to aforementioned ones?
局限性
This is theory based paper, no potential negative impact will cause.
We thank you for your positive feedback and constructive comments. The detailed answers to each of the questions can be found below.
Weakness 1
The game considered in this paper is motivated by real-life examples. But the authors only give one example motivating monotone games.
Answer
As we have elaborated in the introduction section, monotone games include a wide range of applications other than concave-convex games, such as Cournot competition, two-player zero-sum imperfect information games, and zero-sum polymatrix games. For instance, Munos et al. [2023] and Perolat et al. [2022] have demonstrated the perturbation of the payoff function is effective for equilibrium learning in large-scale imperfect information games, as well as minimax optimization for preference learning with large language models. We believe these examples highlight the adaptability and practicality of our technique in real-world applications.
- Rémi Munos, et al. “Nash learning from human feedback.” 2023.
- Julien Perolat, et al. "Mastering the game of Stratego with model-free multiagent reinforcement learning." 2022.
Weakness 2
Part of contributions of the paper is claimed to be the study of two feedback models: full feedback and noisy feedback, but there is not specific examples and applications illustrating the importance of these settings. For sure readers can always find related works even just by googling the keywords, but providing concrete application scenes where the gradient of payoff can be achieved perfectly or only partially achievable gradients can be obtained is important, especially "noisy feedback" can be just a model of many cases.
Answer
The full feedback setting serves as an ideal and significant benchmark for evaluating algorithms. We can argue that the noisy feedback setting is more practical, as feedback or observations in real-world scenarios often fluctuate. For instance, when training neural networks or learning in large-scale imperfect information games, it becomes necessary to estimate the gradient from sampled data. This process inevitably prevents the observation of a perfect gradient vector.
Question 1
Noisy feedback has been studied in some previous works e.g. Mertikopoulos and Zhou, 2019, Hsieh et al. 2019. What is technical difference/challenge of this paper comparing to aforementioned ones?
Answer
While Hsieh et al. [2019] and Mertikopoulos and Zhou [2019] have assumed strictly variational stability or strong monotonicity, our study has achieved last-iterate convergence in (not necessarily strongly) monotone games. Last-iterate convergence in non-strongly monotone games with noisy feedback has been largely unexplored, with only asymptotic convergence being provided, except for the work by Abe et al. [2024]. Contrarily, we have derived a faster last-iterate convergence rate in the noisy feedback setting by introducing a novel perturbation payoff function.
Thanks for the response, I have no further questions and will keep my original evaluation.
This paper studies first order methods to solve monotone games where the gradient of the payoff function is monotone in the strategy, along with additive noise. The authors introduce a payoff perturbation technique which introduces strong convexity to the to the payoff functions and thereby derive last iterate convergence rates.
优点
Overall the paper is well written and the method and results are interesting.
缺点
The authors should include a table which compares their paper with others in the literature. This would make it easier for the reader to place the results in context and see where improvements are made more easily. (for example comparison to [Yoon and Ryu, 2021, Cai and Zheng, 2023] including constants)
问题
The idea of changing the anchoring slowly seems very interesting. How different is this approach from the two-scale GDA type algorithms that have been studies recently (For example, see Lin, Tianyi, Chi Jin, and Michael Jordan. "On gradient descent ascent for nonconvex-concave minimax problems." International Conference on Machine Learning. PMLR, 2020. and follow up papers). Can a similar algorithm be derived from the updates of the algorithms proposed by the authors of this paper?
局限性
See Weaknesses and Questions.
We thank you for your positive feedback and constructive comments. The detailed answers to each of the questions can be found below.
Weakness 1
The authors should include a table which compares their paper with others in the literature. This would make it easier for the reader to place the results in context and see where improvements are made more easily. (for example comparison to [Yoon and Ryu, 2021, Cai and Zheng, 2023] including constants)
Answer
Thank you for your insightful suggestion. We agree that a comparative table would indeed provide a clearer context for the readers. Please find below a table that compares our work with the existing literature, including Yoon and Ryu [2021] and Cai and Zheng [2023]. In summary, our GABP enjoys a nearly optimal convergence rate under full feedback and a much faster rate under noisy feedback, compared to existing studies. We will include this table in the updated manuscript.
| Results under Full Feedback | Results under Noisy Feedback | |
|---|---|---|
| Extragradient [Cai et al., 2022a,b] | N/A | |
| Optimistic Gradient [Golowich et al., 2020a, Gorbunov et al., 2022, Cai et al., 2022a] | N/A | |
| Extra Anchored Gradient [Yoon and Ryu, 2021] | N/A | |
| Accelerated Optimistic Gradient [Cai and Zheng, 2023] | N/A | |
| Iterative Tikhonov Regularization [Koshal et al., 2010, Tatarenko and Kamgarpour, 2019] | N/A | Asymptotic (also holds for bandit feedback) |
| Adaptively Perturbed Gradient Ascent [Abe et al., 2024] | ||
| Ours |
Weakness 2
The idea of changing the anchoring slowly seems very interesting. How different is this approach from the two-scale GDA type algorithms that have been studies recently (For example, see Lin, Tianyi, Chi Jin, and Michael Jordan. "On gradient descent ascent for nonconvex-concave minimax problems." International Conference on Machine Learning. PMLR, 2020. and follow up papers). Can a similar algorithm be derived from the updates of the algorithms proposed by the authors of this paper?
Answer
GABP and two-timescale GDA are fundamentally different from both the algorithmic and convergence guarantee perspectives. From an algorithmic viewpoint, two-timescale GDA uses different learning rates for each player, whereas our GABP employs identical learning rates for all players. Furthermore, GABP introduces the payoff perturbation technique, which is not utilized in two-timescale GDA. This perturbation technique is an essential element for our GABP to achieve last-iterate convergence even under noisy feedback.
From the perspective of convergence guarantee, our GABP achieves last-iterate convergence, whereas two-timescale GDA has only been shown to converge in an average-iterate sense. However, it is indeed an intriguing direction to improve the convergence results of two-timescale GDA by incorporating our payoff perturbation technique.
The paper introduces a modifed perturbed scheme and show last iterate convergence of the proposed algorithm with rate of 1/T and 1/T^1/7 for full feedback and noisy feedback models on monotone games. The reviewers agree that the paper has merits/novelty but were not (especially the experts) enthusiastic enough so that it is clear that the paper is above the bar. After careful discussion of AC and SAC, we came to a conclusion that the current writing of the paper does not highlight the actual technical differences/modifications from Abe et al and as a result the paper needs a revision to address those differences. In a nutshell, the idea of the modifed purturbed scheme is well appreciated but a revision that includes a discussion on the actual modifications on the proofs from Abe et al will be beneficial.