On Tractable $\Phi$-Equilibria in Non-Concave Games
We initiate the study of *tractable* $\Phi$-equilibria in non-concave games and examine several natural families of strategy modifications.
摘要
评审与讨论
The paper presents new results about no--regret learning when the utilities are not concave. When is either finite or in a precise sense "local", it is shown that no--regret learning is possible, and hence that the corresponding notions of -equilibria can be efficiently approximated at rate .
优点
This presents some interesting and novel results in a regime (non-concave games) where positive results seem to be relatively difficult to come by. The paper was also very well written and easy to read. I vote to accept and only have some minor comments.
缺点
It was useful to me to keep in mind that the regime (where hides game/instance-dependent constants) is trivial, because no -local change can affect a Lipschitz utility function by more than . It would be nice to explicitly state this somewhere.
In Table 1, it would be nice to distinguish the contributions of this paper from past contributions. Perhaps for each cell, include either the past citation or the theorem number in the current paper.
Minor things and typos (not affecting score):
- Table 1: effifient -> efficient
- Missing close-parens: Algorithm 1, line 5; two in the botton half of page 18
- Algorithm 2, line 3: form -> from
问题
- Could you elaborate on Corollary 1? It is not obvious to me how that bound follows from Theorem 2.
- I find it interesting that the algorithms in this paper are presented as randomized algorithm outputting a single rather than a determinisitic algorithm outputting a distribution (as done by Peng and Rubinstein [2024], Dagan et al [2024], and Zhang et al [2024]). At least with your techniques, this difference seems to be an unavoidable consequence of the fact that the utilities are nonlinear, and therefore one cannot say . Do I understand this correctly?
- If I understood the previous point correctly: in Theorem 5, the nonlinearity of is a nonissue by assumption. So, is it possible to de-randomize that result by instead deterministically outputting the uniform distribution ?
- Is there an efficient algorithm for -regret minimization?
- Do the results of this paper have any novel implications for interesting special cases, such as nonconvex-concave zero-sum games, or simply the classical setting of concave no-regret learning? For example the results of Section 4 seem to have interesting (new?) implications for local (mixed) Nash equilibria in nonconvex-nonconcave zero-sum games.
Papers cited here but not in the submission:
- Dagan, Y., Daskalakis, C., Fishelson, M., & Golowich, N. (STOC 2024). From External to Swap Regret 2.0: An Efficient Reduction for Large Action Spaces.
- Peng, B., & Rubinstein, A. (STOC 2024). Fast swap regret minimization and applications to approximate correlated equilibria.
局限性
yes
Thanks for your positive review and constructive comments! We will add a note on the trivial regime where . We will update Table 1 to distinguish our contributions better. Below, we address your questions.
Q: Could you elaborate on Corollary 1? It is not obvious to me how that bound follows from Theorem 2.
A: We will add the proof of Corollary 1 in the revised version of the paper. In Corollary 1, we consider the case where is the class of -Lipschitz functions from to and the utility is -Lipschitz. The -covering number of satisfies (this fact is also used in [1, Example 23]). Thus, we can run Algorithm 1 over the finite -cover of . This leads to a regret bound of , where the second term is due to the covering error. By choosing , we get regret bound where c is some constant that only depends on and .
[1] Stoltz, Gilles, and Gábor Lugosi. "Learning correlated equilibria in games with compact sets of strategies." Games and Economic Behavior, 2007
Q: I find it interesting that the algorithms in this paper are presented as randomized algorithm outputting a single rather than a deterministic algorithm outputting a distribution (as done by Peng and Rubinstein [2024], Dagan et al [2024], and Zhang et al [2024]). At least with your techniques, this difference seems to be an unavoidable consequence of the fact that the utilities ut are nonlinear, and therefore one cannot say . Do I understand this correctly?
A: Two issues prevent us from outputting a distribution. The first one, as you pointed out, is the nonlinearity of the utility function. The second is that the distribution we constructed has an exponentially large support of |\Phi|^\sqrt{T}. To achieve an -approximate -equilibrium, we need to be , so outputting the entire distribution requires time exponential in . Instead, using a sampling procedure allows us to significantly improve the dependence on and obtain an algorithm whose running time is only polynomial in .
Q: If I understood the previous point correctly: in Theorem 5, the nonlinearity of ut is a nonissue by assumption. So, is it possible to de-randomize that result by instead deterministically outputting the uniform distribution ?
A: We can derandomize this result by directly outputting the distribution . In this case, we allow the algorithm to output a mixed strategy and consider regret over the expected utility in each iteration. Thank you for this insightful observation.
Q: Is there an efficient algorithm for -regret minimization?
A: We currently don’t have an efficient algorithm and believe this is an interesting open question. We introduce -regret to show that even for simple local strategy modification sets , the landscape of efficient local -regret minimization is already quite rich, and many basic and interesting questions remain open.
Q: Do the results of this paper have any novel implications for interesting special cases, such as nonconvex-concave zero-sum games, or simply the classical setting of concave no-regret learning? For example the results of Section 4 seem to have interesting (new?) implications for local (mixed) Nash equilibria in nonconvex-nonconcave zero-sum games.
A: To the best of our knowledge, the notion of -regret is new, and its efficient minimization is unknown even in the classical setting of no-regret learning with concave utilities. We think our upper and lower bound results for the -regret are, therefore, interesting even in this setting. We do not see any immediate implications for local mixed Nash equilibria in nonconvex-nonconcave zero-sum games, but this is a very interesting open question.
Thank you. My opinion of the paper was and remains positive, and I will keep my score.
This paper describes a concept of -equilibrium in non-concave games, which is a rarely discussed notion in the recent literature. This notion of equilibrium is defined over a set of strategy modifications, with specific definitions of , -equilibrium can recover commonly known notions such as CCEs, though this paper mostly focuses on more limiting choices of . The paper shows that there exists efficient algorithms to find an -approximate -equilibrium in tractable time for some specific families of set .
优点
Although some assumptions on families of set seem restrictive, it is known that without any assumptions, the task of finding e.g. CCE, even approximately and locally, is intractable. This paper provides a solid direction towards better understanding of algorithms in non-convex games.
This paper is quite self-contained, and section 1 explained the problem formulation such that the reviewer find easy to follow.
For the infinitely large , the three families offered good insight and can potentially cover many existing strategy modification schemes in application.
缺点
The structure of this paper is questionable. Section 1 alone takes 4 pages, and the remainder of this paper seems rushed and/or crammed. The authors did address the space constraint of this paper, but I believe the paper can be greatly improved in terms of better readability. For instance, I believe it would be beneficial to address algorithm 1 in the main paper. And more space could be assigned to the finite setting, as a warm-up. The reviewer suggests that some paragraphs or even theorems in section 4 could be moved to the appendix.
Some statements throughout the paper appear to be slightly repetitive and reiterates of previous sections, many statements can be shortened and more concise. For example, section 1.1 interweaves between introduction and contributions.
Although this paper is quite high-level and technical, it would be reasonable to consider some empirical results to validate and strengthen the theoretical results such as the realistic complexity for the sampling procedure. This can perhaps be added in the appendix.
问题
No
局限性
The authors adequately address the limitations of this paper.
Thank you for the positive review and constructive comments on the structure and presentation of the paper!
We will incorporate your suggestions and improve the presentation in the revised version of the paper. Since one more page is allowed in the camera-ready version, we will assign more space to the finite setting and include Algorithms 1 and 2 and relevant discussions in the main paper. We will also adjust the structure of section 4 and make all statements in the main paper more concise.
Complexity of our algorithms: The algorithms we study (except for Algorithms 1 and 3) are gradient-based algorithms like gradient descent and optimistic gradient descent, which have efficient practical implementations. Regarding the complexity of the sampling procedure (Algorithm 2), the per-iteration complexity is exactly , where we need to sample from a distribution that supports on points, evaluate the strategy modification function , and repeat the above procedure for steps. The above complexity analysis for Algorithm 2 is tight. There is an equivalent implementation that improves the expected running time by a factor of 2: we could first sample a number N uniformly between and then only run the sampling procedure for steps rather than steps, and finally return the last point. This improves the expected per-iteration complexity to . We will add the discussion in the revised version of the paper.
Thank you for the comments, my concerns have been addressed. Given that the issues will be addressed in the revised manuscript, while also referencing the discussion between the author and other reviewers, I have decided to increase my score.
This work studies the -equlibrium in non-concave games and discuss when the -equlibrium of a non-concave game can be learned in polynomial time. The theoretical results presented in this paper indicate that if the set is finite, then there exists an efficient uncoupled algorithm that converges to the corresponding -equilibria. Moreover, this work also considers the case when is not finite but consist of local modifications.
优点
Significance: This paper gives a non-trivial results on the -equilibrium of non-concave games. It provides a simple condition to tell if -equilbria can be learned in polynomial iterations.
Originality: There are no existing literature that have resolved the -equilibrium problem. So, the result of this paper is new. This paper extends existing framework to consider infinite local strategy modifications, which also leads to new technical novelty.
Clarity: This work includes sufficient backgroun introduction and put adequate materials in the appendix for references. I feel easy to understand the main contribution of this work.
缺点
I believe this is a high quality paper and I give a clear accept. But I do have some confused points. I put them in the next sections.
问题
-
I find some papers showing that for Markov games, the CE or CCE can be learned in polynomial time. For example: Jin, Chi, et al. "V-Learning--A Simple, Efficient, Decentralized Algorithm for Multiagent RL." arXiv preprint arXiv:2110.14555 (2021). From my understanding, when setting their horizon , the Markov game seems to be a classical game. Is there any difference between this setting and yours?
-
Is there any applications of -equilibrium? For example, when do we need to efficiently solve some -equilibrium?
-
The definition of -equilibrium looks really similar to the Correlated Equilibrium (CE) in your Definition 6. Is CE as the same as -equilibrium if is choosen as the whole space?
局限性
This is a purely theoretical work so there is no any negative impact.
Thank you for your very positive review. Below, we address your questions.
Q: I found some papers showing that for Markov games, the CE or CCE can be learned in polynomial time. For example Jin, Chi, et al. "V-Learning--A Simple, Efficient, Decentralized Algorithm for Multiagent RL." arXiv preprint arXiv:2110.14555 (2021). From my understanding, when setting their horizon H = 1, the Markov game seems to be a classical game. Is there any difference between this setting and yours?
A: If the horizon , then a general-sum Markov game becomes a general-sum normal-form game where each player’s utility function is linear in their own strategy. This is a special case of the non-concave games we consider in this paper since, in a non-concave game, each player’s utility could be non-concave in their own strategy. Additionally, Markov games with are special cases of non-concave games that include additional structure, such as the Markovian property. Since our focus is on the more general class of non-concave games, these specific results do not apply.
Q: Is there any applications of -equilibrium? For example, when do we need to efficiently solve some -equilibrium?
A: The notion of -equilibrium is very general. If is all constant strategy modifications, the corresponding -equilibrium coincides with the notion of coarse correlated equilibrium (CCE); If is all strategy modifications, then the corresponding -equilibrium coincides with the notion of correlated equilibrium (CE). CE and CCE are both central and classical notions extensively studied in the literature and used in practice. Generally, a -equilibrium guarantees that no single agent has an incentive to unilaterally deviate by using strategy modifications from their set . This provides a stability guarantee that we believe will be relevant for many economics and machine learning applications, including game AI and multi-agent reinforcement learning.
Q: The definition of -equilibrium looks really similar to the Correlated Equilibrium (CE) in your Definition 6. Is CE as the same as -equilibrium if is chosen as the whole space?
A: Yes. If we choose as all possible strategy modifications, the corresponding -equilibrium is exactly CE.
Thanks for your response. I have read the rebuttal and I will keep my current positive score.
The paper studies nonconcave games and -equilibrium. When is finite, the paper showed that there exists an uncoupled learning algorithm that can efficiently find the equilibria. Under certain classes of strategy modification, online gradient descent can also approximate -equilibria when is infinite.
优点
I am not an expert in non-concave games and equilibria. Thus I suggest referring to other reviewer for the strength and weaknesses
缺点
I am not an expert in non-concave games and equilibria. Thus I suggest referring to other reviewer for the strength and weaknesses
问题
I am not an expert in non-concave games and equilibria. Thus I suggest referring to other reviewer for the strength and weaknesses
局限性
yes
The paper focuses on no--regret learning when the utilities of the agents are not concave. It is shown that when is either finite or in a precise sense "local" (local deviations are allowed), no--regret learning is possible and therefore the corresponding notions of -equilibria can be efficiently approximated in time poly( ). On the negative side, the authors did not discuss the suboptimality aspects of the equilibrium notion considered (most notably, that even a global utility minimizer can be a -equilibrium in the range of - paramters considered by the authors) but, nonetheless, the reviewers were positive about this paper and the AC recommends acceptance.