8.0

/10

Oral3 位审稿人

最低8最高8标准差0.0

3.3

置信度

ICLR 2024

Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

Ian Gemp,Luke Marris,Georgios Piliouras

OpenReview PDF

提交: 2023-09-21更新: 2024-04-12

TL;DR

We propose the first stochastic NE loss for normal-form games.

摘要

关键词

game theorystochastic optimizationnash equilbriumnormal-form gamex-armed bandits

评审与讨论

审稿意见

评分: 8置信度: 32023-10-20

This paper investigates the problem of solving a Nash equilibrium (NE) inside a normal-form general-sum game (NFG). Unlike previous works, the authors propose a new kind of loss that serves as an upper bound for $\epsilon$ (coefficient of $\epsilon$ -Nash). More importantly, the authors show that an unbiased gradient of the loss can be obtained using the symmetrization technique. As a result, the problem of NE can be treated as a stochastic optimization problem, such that some famous algorithms like SGD and BLiN can exploited. Moreover, the authors also consider the case when the NE is not inside the simplex but on its boundary. And they show that this issue can be also handled by optimizing on a refined game with additional bonuses. Finally, empirical studies validate the effectiveness of the proposed method.

优点

I have not read the papers about how to solve an NE efficiently before. However, this paper is quite clear in this presentation, such that I have now obtained a whole picture of this problem. The problem setup is clearly introduced. The solution is illuminated step by step, which is clear and intuitive. It is worth mentioning that the authors give several illustration figures (e.g., Figures 1 and 2) to help the readers understand the paper better, which is very good. The experiments are also sufficient and complete. Moreover, this work offers the community a new and principled way of solving NEs efficiently, and many more stochastic optimization algorithms can be leveraged in this field.

缺点

I do not see major weaknesses in this work, though I am not an expert in this field.

问题

I do not have any questions, as the presentation of this work is quite clear and intuitive, along with some illustrations to help readers understand it well.

评论- Response

2023-11-16

Thank you for your encouraging feedback! One of the goals of this paper is to open up the challenge of approximating Nash equilibria to researchers without a predominant game theory background (and in particular, to those with more of a machine learning / optimization background). We are glad to hear that you valued the step-by-step explanation and accompanying illustrations!

审稿意见

评分: 8置信度: 42023-10-30

The paper proposes a new loss function for approximating Nash equilibria in normal-form games. The key idea is that the loss function can be unbiasedly estimated through Monte Carlo sampling of the joint strategies. This allows the use of stochastic optimization, eliminating the need to read all the payoffs, which would be exponential with respect to the number of players.

The loss function upper-bounds the exploitability of an approximate equilibrium. The authors prove this both theoretically and empirically. This approach enables the scaling of equilibrium computation to larger games than previously thought feasible. Experiments are conducted on games with up to 286 actions.

Two algorithms are explored: vanilla SGD and a bandit-based method. The bandit algorithm comes with theoretical guarantees regarding exploitability.

优点

The paper proposes a new loss function for approximating Nash equilibria in normal-form games that is amenable to unbiased Monte Carlo estimation. This allows for the use of powerful sample-based stochastic optimization techniques, eliminating the need to read all the payoffs.
The loss function upper-bounds the exploitability of an approximate equilibrium. The authors provide theoretical proofs and present empirical results.
The proposed approach enables the scaling of equilibrium-finding techniques to larger games than was previously possible. The paper presents experiments on games with up to 286 actions.
Two algorithms are analyzed: vanilla SGD and a bandit-based method. The bandit approach offers theoretical guarantees, such as a high probability bound on exploitability.

缺点

The loss function only captures fully mixed equilibria. The authors address this by considering quantal-response equilibrium. As a result, a zero loss can only serve as an approximation to a Nash equilibrium.
There is limited empirical evaluation on real-world games. Most experiments involve small synthetic games from previous research. It's possible that more complex games may expose certain limitations.
SGD encounters issues with saddle points in certain games, which is a common challenge in non-convex optimization.

问题

How well does the approach scale to larger real-world games with structures like symmetry? Are techniques such as grouping actions necessary?
Have the authors attempted more advanced optimization methods like Momentum SGD or Adam to address saddle points?
What heuristics could be employed to guide equilibrium selection when multiple equilibria exist?

评论- Response

2023-11-16

Thank you for taking the time to carefully review our paper and for your interesting questions.

We agree with each of the weaknesses you describe and hope you are satisfied with how we identified these clearly in our text. We point these out below for convenience.

Lemma 14 is composed of two terms. The first term is a positive constant that remains even when our proposed loss is zero. Therefore, yes, as you say, this clearly indicates our loss can only be used to approximate Nash equilibria (hence the title of the paper).
The scope of this work is normal-form games, which is the natural starting point for studying more complex game settings and arguably the bedrock of most traditional economic theory. If by real-world games, you mean games like poker (extensive-form) or some video games (Markov games), then yes, we do not cover that in this work. We mention extensive-form games as one of our future goals in the conclusion.
Figure 2 presents an empirical analysis of the existence of saddle points as indicated by the colors of points in the scatter plots. We also verify in Appendix K.4 that the Hessian of our loss evaluated at the final iterate of SGD in Figure 3 in both Blotto experiments contains both positive and negative eigenvalues, i.e, the landscape is saddle-shaped.

We now answer your questions:

The structure of symmetric games can be exploited in our approach. Recall that every symmetric NFG has a symmetric NE. Therefore, we can reduce our search space over joint profiles of size $nm$ to a search over a single player’s mixed strategy of size $m$ . Moreover, if you inspect the gradient of our loss in equation (9) which contains a sum over all players $k$ , we can simplify this as well to a sum over player $\ell$ and any single player $k \ne \ell$ (scaled by $n-1$ ). Due to symmetry and the fact that all players play the same background strategy in a symmetric NE, each of the summands is the same, so we only need to retain one. See Figure 4 where we explicitly take advantage of the symmetry of the game to both solve it and visualize it; we present all player strategies as a single 1-d variable ( $p$ ) on the x-axis. If instead of player / payoff-symmetry, you meant action-symmetry, then yes, grouping actions would help to scale our approach when two actions are equivalent.
We believe this is a very interesting direction for future work. We have tried Adam actually on Blotto (10c, 3f, 4p) by reparameterizing the mixed strategy as a softmax over real-valued logits (vanilla Adam assumes unconstrained optimization). We found that Adam improves on SGD slightly ( $\epsilon$ drops from $0.029$ to $0.020$ ), but we still see a saddle-shape in the landscape when we inspect the loss at Adam’s final step. This suggests Adam helped cope with the saddle point problem, but is not a complete solution. We explicitly decided against including these preliminary results so as to encourage follow-up work by non-convex / deep-learning optimization experts with respect to a variety of non-vanila SGD algorithms such as Adam, Adagrad, saddle-point avoidant solvers, learning-to-learn approaches, a.o. We believe that the ICLR audience would be the perfect set of experts to take this on.
A simple heuristic would be to regularize our proposed loss with a second loss term encoding an additional criterion, e.g., maximizing welfare. You could anneal this term over learning to try to find the equilibrium with the highest welfare.

2023-11-20

Thank you for your responses that answered my questions. I will maintain my score.

审稿意见

评分: 8置信度: 32023-10-31

The paper provides a loss function for computation of approximate Nash equilibria in general-sum multiplayer matrix games for which unbiased estimators can be constructed. This allows the exploitation of stochastic optimization algorithms for computation of Nash equilibria in general matrix games. They provide numerical evidence that this approach is competitive with existing approaches.

优点

The paper is generally well-written.
The formulation of the loss function which allows for unbiased sampling is a solid novel technical contribution. As the authors note, this is a step towards enabling computation of Nash equilibria in large-scale settings (in spite of theoretical hardness results), which is quite important for operationalizing game theory in the real world.
The paper provides numerical evidence supporting that the loss function is "easy" to optimize in many benchmark games of interest, and that it is competitive with other SOTA approaches.

缺点

The empirical section (6.2) could use some additional explanation/discussion. The baselines that are compared against should be explicitly stated in the body text (and cited appropriately) instead of just stating that they are the baselines used in Gemp et al. 2022's simulations.
The flow of Section 6 generally could be improved (one suggestion is provided below).

问题

There is a typo in the definition of the projected-gradient (it should be $\Pi_{T\delta^{d-1}}(z) = z - \frac{1}{d} \mathbf{1}^\top z \mathbf{1}$ )
In Section 6, it is confusing to introduce SGD first and then the bandit algorithm and then immediately have a subsection discussing the bandit algorithm, especially since it is stated that "in the next section, we find it performs well empirically in games previously examined by the literature." Perhaps it makes sense to include the SGD experiments before any discussion of the bandit approach.

评论- Response

2023-11-16

Thank you for your positive review and for your constructive feedback!

We agree, the empirical section 6.2 is quite short and there are several details (in particular, descriptions of the baselines) in Appendix K that we will pull into the main body.

We also appreciate your suggestion regarding flow. Would you agree with a structure such as this?

SGD
- Discussion
- Experiments
Bandits
- Discusision
- Theory
- Experiments

And finally, great catch on the typo. Yes, we are missing the “ones vector”.

2023-11-20

Thank you for your response. I think that the structure you have proposed for Section 6 makes sense. I will maintain my score.

评论- General Response

2023-11-16

Thank you to all the reviewers for taking the time to read our paper and provide helpful feedback. We are very encouraged by your words! We were quite pleased to hear that many of you feel this approach is both novel and should be intuitively clear to a non-game theory audience. We hoped precisely for this: that by transforming the fundamental problem of game theory (approximating Nash) into a stochastic optimization problem, we might open the game theory door to world class ML researchers in the ICLR community.

AC 元评审

2023-12-06

There is consensus---and I agree---that this paper proposes an interesting combination of techniques that deserves to be highlighted at the conference, for its somewhat broad appeal and possible future work.

为何不给更高分

为何不给更低分

The paper has a broad appeal to people working in computational game theory. I think it deserves either a spotlight or oral presentation.

最终决定Accept (oral)

2024-01-16

Accept (oral)