PaperHub
5.5
/10
Poster4 位审稿人
最低2最高4标准差0.7
4
3
3
2
ICML 2025

Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24

摘要

This paper examines multiplayer symmetric constant-sum games with more than two players in a competitive setting, such as Mahjong, Poker, and various board and video games. In contrast to two-player zero-sum games, equilibria in multiplayer games are neither unique nor non-exploitable, failing to provide meaningful guarantees when competing against opponents who play different equilibria or non-equilibrium strategies. This gives rise to a series of long-lasting fundamental questions in multiplayer games regarding suitable objectives, solution concepts, and principled algorithms. This paper takes an initial step towards addressing these challenges by focusing on the natural objective of *equal share*—securing an expected payoff of $C/n$ in an $n$-player symmetric game with a total payoff of $C$. We rigorously identify the theoretical conditions under which achieving an equal share is tractable and design a series of efficient algorithms, inspired by no-regret learning, that *provably* attain approximate equal share across various settings. Furthermore, we provide complementary lower bounds that justify the sharpness of our theoretical results. Our experimental results highlight worst-case scenarios where meta-algorithms from prior state-of-the-art systems for multiplayer games fail to secure an equal share, while our algorithm succeeds, demonstrating the effectiveness of our approach.
关键词
Multiplayer symmetric games; Securing Equal Share

评审与讨论

审稿意见
4

This submission studies multiplayer (n-player where n >= 3) symmetric zero-sum games (such as Mahjong and Poker). Unlike two-player zero-sum games, equilibria in multiplayer games are neither unique nor non-exploitable, which poses a challenge when competing against opponents who play strategies from different equilibria or non-equilibria strategies. Motivated by these observations, the authors propose a new learning objective: ensuring that each player secures at least C/n utility in a n-player symmetric game with total payoff C.

The authors derive conditions under which obtaining this equal share is tractable, and design learning algorithms to approximate this objective which are inspired by no-regret learning. The authors also derive complementary lower bounds which largely match their upper bounds.

They empirically evaluate their algorithms in two symmetric zero-sum games: majority vote and switch dominance game. They find that self play-based methods often fail to guarantee an equal share, and are often out-performed by the authors’ algorithms.

给作者的问题

Do you think any of your ideas could generalize to multiplayer games which are not symmetric? Or are completely new solution concepts needed?

论据与证据

Yes.

方法与评估标准

Yes.

理论论述

Yes.

实验设计与分析

Yes.

补充材料

Yes - Appendix C.

与现有文献的关系

This work falls under the category of learning in games. Specifically, they propose novel objectives for learning in multiplayer symmetric games.

遗漏的重要参考文献

No.

其他优缺点

Strengths: The downsides pointed out by the authors about multiplayer equilibria are well-known “pain points” in algorithmic game theory, and I appreciate the authors’ attempt to propose a new objective. This new objective is very natural in symmetric games. The authors provide nearly-matching upper- and lower-bounds on learning algorithms for accomplishing this objective, which, when combined, give a clean solution to this problem.

Weaknesses: One weakness of this submission is that the proposed algorithms are not terribly novel. When playing against stationary opponents, the authors use the well-known Hedge algorithm to obtain an approximately equal share. Against adapting opponents, they adapt the Sttrongly Adaptive Online Learning algorithm of Daniely et al., 2015 to achieve an approximately equal share. With that being said, there is still value in showing that existing techniques can accomplish this new objective and so I do not view this as a major weakness.

其他意见或建议

n/a

作者回复

Thank you for your inspiring comments. We address your concerns as follows.

Q1: Novelty and contributions

A1: Please refer to A1 to Reviewer Ada2 for clarification on the novelty and technical contribution of our work.

Q2: Extension to asymmetric games

A2: As mentioned on page 4, L184-186 (left column), we can effectively transform all asymmetric games into symmetric ones by introducing an initial stage where each player’s position is randomly assigned. This randomization process effectively symmetrizes the game, as each player has an equal probability of assuming any role, ensuring that no player has a consistent positional advantage. Therefore, under this transformed setting, our analysis remains applicable.

To further clarify the mathematical transformation, we enlarge the action space to incorporate both the player's position and their actions. Specifically, for each player ii, we define (Pi,ai)(P_i, a_i) where PiP_i represents the player's position or role in the game, and aia_i denotes their action. We then define a symmetric utility function UisymU^{sym}_i as follows:

Uisym((P1,a1),,(Pn,an))U^{sym}_i ((P_1, a_1), \ldots, (P_n, a_n))

:=Uσ1(i)asym(aσ1(1),,aσ1(n)),:=U_{\sigma^{-1}(i)}^{asym} (a_{\sigma^{-1}(1)}, \ldots, a_{\sigma^{-1}(n)}),

where σ\sigma is a permutation that reorders player positions such that σ(P1,,Pn)=(1,,n)\sigma(P_1, \ldots, P_n) = (1, \ldots, n).
The action space is defined as j=1n(Pj×Aj)\bigcup_{j=1}^n (P_j \times \mathcal{A}_j), where Aj\mathcal{A}_j denotes the set of actions available to players when they are in position or role PjP_j.
This transformation aligns the players' positions with a standardized symmetric framework, enabling our analysis to apply to games that are asymmetric by nature but can be symmetrized under random initial positioning.

审稿意见
3

The paper proposes a new solution concept for constant-sum, multi-player, symmetric games, which is referred to as equal share. What this means is that each player secures the same utility. The paper observes that usual equilibrium concepts do not necessarily satisfy equal share, and it then proceeds by identifying necessary and sufficient conditions under which equal shares can be attained--namely, players adopting symmetric dynamics and limited opponent adaptivity. Under those two conditions, they provide algorithms that indeed attain equal share. Finally, experimental results are provided to support some of the theoretical claims.

给作者的问题

I have no further questions.

论据与证据

Overall, the claims made in the paper are well-grounded, although there are certain points that are very much debatable. In particular, I want to mention two claims made in Section 4.1, which serve to justify the main assumptions of the paper. The first one is that "Condition 1 is further implicitly adopted by most prior state-of the-art AI agents for multiplayer games." It is true that self-play algorithms satisfy the symmetry condition postulated in the paper, but I see no reason whatsoever why this is a reasonable assumption when interacting with opponents in a realistic game (in the self-play setting, one, of course, controls the algorithms employed by all players, but this is not so in practice).

The second claim is that "it is often safe to assume that the population meta-strategy will not quickly adapt to one particular player’s strategy." As far as I can see, no evidence is given to justify this claim. In many ways, limited opponent adaptivity trivializes the problem and goes against basic game-theoretic principles.

方法与评估标准

The evaluation in Section 6 supports some of the claims of the paper, but it does not go far enough. The benchmarks tested have been cherry-picked to serve the narrative of the paper, and they are not comprehensive enough. The paper would benefit from providing a more complete evaluation based on more benchmarks. The main question is why existing self-play algorithms perform so well in practice despite such clear deficiencies. Is it the case that the games tested by the authors do not reflect practical examples? It would be interesting if the authors could provide a more general characterization of games with practical relevance where similar phenomena are present.

理论论述

All theoretical claims appear to be sound; I did not find any notable issue.

实验设计与分析

The experimental evaluation is sound in the methodology of the games tested, but, as I pointed out above, it is not comprehensive enough. It's hard to draw concrete conclusions from a very limited benchmark evaluation.

补充材料

I checked the proofs in the supplementary material, and I did not find any notable issues. The supplementary material is overall polished.

与现有文献的关系

The paper tries to address a thorny issue in multi-agent learning concerning the deficiencies of traditional equilibrium concepts. These issues are well understood, and so the observations made in Section 3 are fairly immediate from existing results. The paper departs from most of the line of work in game theory by going beyond equilibrium, and is mostly related to opponent modeling.

遗漏的重要参考文献

One paper that could be discussed is "Game Theory-Based Opponent Modeling in Large Imperfect-Information Games," and more broadly I believe that the authors could elaborate more on how their results relate to opponent modeling.

其他优缺点

On the positive side, the paper attempts to address an important problem in multiagent learning. The proposed solution concept is fairly natural, and as far as I know has not been analyzed before. The paper fully characterizes conditions under which guaranteeing equal share is possible. I believe that there are certain settings in which the results provided by the paper will be relevant.

On the negative side, the theoretical results are not surprising, and mostly follow directly from existing results, although there is value in stating and formalizing such results. Furthermore, as I said above, the assumption that opponents are not fully adaptive is a very strong assumption, which really goes against basic game-theoretic principles. I understand that making progress on this problem necessitates departing from the existing normative assumptions, but the paper hasn't made an entirely convincing case.

其他意见或建议

As I pointed out earlier, I believe that the paper would benefit considerably by a more comprehensive experimental evaluation.

作者回复

Thanks a lot for reading our paper and for your insightful comments.

Q1: The theoretical results are not surprising

A1: Novelty. We remark that, compared to developing complex algorithmic techniques, identifying and formulating the right question to solve is equally—if not more—important. This paper precisely addresses the latter: propose novel, practical, and important formulations, study basic properties, and bring the right tools to give principled solutions in a variety of settings.

In multiplayer games, much of the existing research has focused on achieving equilibrium-based outcomes. While extensively studied, equilibrium-based approaches have well-known limitations and are often unsuitable for developing AI agents in multiplayer settings. This concern was explicitly raised in prior work on real-world multiplayer game systems (e.g., Brown & Sandholm, 2019); however, such works do not resolve the issue or propose alternative objectives. Our paper is the first to formulate the problem as achieving equal share, and to clarify that the objective is theoretically well-behaved only under the condition that all opponents deploy identical strategies. While this condition seems strong, we justify its practical relevance and note that it aligns with the settings implicitly enabled in most prior state-of-the-art empirical work. We emphasize that these findings, though not technically challenging, are very fundamental and entirely novel—they have not been previously discovered or formally articulated in the literature.

Technical Contribution on Efficient Algorithms. In addition to our contribution to identifying the correct solution concept, this paper also provides a comprehensive set of algorithmic results for achieving equal share in various settings: (1) fixed opponents; (2) slowly changing opponents; (3) opponents that adapt at intermediate rates; (4) matching lower bounds.

For (1) and (2), we leverage established techniques from the no-regret and no-dynamic-regret learning literature, achieving equal shares with provable guarantees. Our contribution here mostly lies in identifying and bringing the right tools and adapting them to solve the right setup of the multiplayer games, where, to the best of our knowledge, prior state-of-the-art AI systems on multiplayer games (for Poker, etc) have not utilized techniques such as no-dynamic-regret.

For (3), we believe our discovery here is completely new, that simple algorithms like behavior cloning can sometimes even outperform sophisticated no-dynamic regret algorithms. Such a result is only possible by leveraging the symmetric structure of the multiplayer game beyond treating it as one versus an adversarial group of opponents.

For (4), while similar lower bounds have been shown in more general games, they do not apply to our settings as the hard instances constructed in prior lower bounds are not symmetric games. In this paper, we carefully construct new hard instances showing that the algorithmic results we proved in (2) and (3) are near-optimal.

Q2: Two claims made in Section 4

A2: About Condition 1: In Section 4.1, we introduce the concept of population meta-strategy and show that within a large player base, even opponents with different strategies can be viewed as adopting the same population meta-strategy. As mentioned in Section 4.1, various forms of games, such as card games, board games, and online video games, all exemplify the concept of population games via matchmaking in a large population of players.

About Condition 2: Our paper studies both scenarios where the opponents' policy is fixed (see Section 5.1) and where it is slowly adapting (see Section 5.2). While capable of updating their policies, the opponents do not run algorithms specifically targeting our AI agent, as it is merely one player within a vast pool of participants. Moreover, Proposition 4.2 points out that attaining equal shares is impossible if opponents can change their meta-strategy arbitrarily fast across different rounds, supporting Condition 2.

Q3: About experiment

A3: Please refer to A1 to Reviewer Ada2 for details on our experiment design. Our experimental cases are adversarially constructed, based on the common characteristic of many multiplayer games: the existence of multiple NE. This can cause self-play variants to converge to a bad NE, resulting in negative payoffs against carefully chosen opponent policies. Since many real-life games have multiple NEs, the worst-case results of our experiments are relevant to practical games.

While self-play algorithms have achieved impressive results in games like multiplayer poker, these successes are often context-specific and rely on extensive engineering and human input. We agree that understanding why these algorithms succeed and whether they can generalize across different game settings is an important future question, though it is beyond the scope of this paper.

审稿人评论

I thank the authors for the detailed response. I certainly agree with the authors that identifying the right questions is essential. My main concern is about the proposed solution concept. It can only be attained under very strong assumptions that in some sense trivialize the problem. As I said in my evaluation, I understand that some concessions have to be made to make progress in this problem, but I am not entirely convinced that the paper makes reasonable concessions. That being said, I appreciate the novelty of the paper and the fact that it tries to approach a fundamental problem from a new angle, so I will increase my score.

作者评论

Thank you for acknowledging our contribution and raising the score! We would like to respectfully clarify our position regarding the claim that our results rely on “very strong assumptions that in some sense trivialize the problem.” We disagree with this characterization.

Our paper makes two assumptions, both of which we have argued in Section 4 are, in some sense, necessary for achieving an equal share in multiplayer games:

(1) All opponents deploy identical mixed strategies (referred to as the meta-strategy);

(2) The meta-strategy evolves slowly over time.

Assumption (1) approximately holds when opponents are randomly drawn from a large pool of players, as shown in Proposition 4.3. Assumption (2) is naturally satisfied in many real-world settings, such as online games where population-level strategies evolve gradually. These assumptions directly apply to games, such as Poker, Mahjong, and others, in casino or online platforms with a large player base. As such, they do not trivialize the problem; rather, they reflect realistic features of the environments we aim to model.

Moreover, many successful AI systems for multiplayer games---including those for Poker and Mahjong---implicitly rely on both assumptions, even if not explicitly stated. For instance, self-play algorithms often use a shared neural network to sample actions for all opponents, which effectively assumes (1). Similarly, many works aim to build strong agents against a fixed or slowly evolving meta-human policy, implicitly assuming (2). Therefore, we believe our assumptions are not strong, but instead essential for enabling principled learning in these settings.

审稿意见
3

The authors present an algorithms for learning in symmetric constant-sum multiplayer games, where the solution concept is equal allocation of social welfare among the players. They show that standard no-regret learning algorithms in a self-play setup cannot achieve "equal share". They demonstrate necessary conditions for a game and algorithmic assumptions necessary to allow an equal share strategy to be computable, and then adapt algorithms from the online learning literature to compute equal share strategy under different assumptions on the opponent.

给作者的问题

  1. Can you explain the experiments in more detail? What are they aiming to illustrate?

论据与证据

Yes, the claims are supported, and theoretical and empirical evidence are presented to support the claims.

方法与评估标准

Yes. they do.

理论论述

Yes, I checked the correctness of all proofs.

实验设计与分析

The experimental analysis is a bit confusing. The comparison is done against a set of fixed meta-strategies, but then the self-play algorithm is run, trying to compute optimal strategies for each player in a no-regret fashion. It is not clear to me what the experiments aim to illustrate.

补充材料

Yes, I have reviewed all of the supplementary material.

与现有文献的关系

The authors do a good job of contextualizing their work in the broader line of work on this topic. In general, computing NE in general multiplayer games is hard. The authors choose to focus on the case of zero-sum symmetric multiplayer games and come up with a tractable solution concept for this reason.

遗漏的重要参考文献

N/A

其他优缺点

While the authors propose an interesting theoretical framework, the novelty of the results is limited. The sufficient conditions are immediate observations that can be made. The extension of no-static-regret algorithms and dynamic regret algorithms to the equal share setting while interesting, is a fairly straightforward extension.

其他意见或建议

N/A

作者回复

Thanks a lot for reading our paper and for your insightful comments.

Q1: Clarification of the experiment

A1: Our experiment aims to demonstrate that although self-play-based meta-algorithms have been effective in many real-world multiplayer games, they can still fail to secure an equal share in worst-case scenarios. Specifically, we show that the meta-algorithms under consideration — SP_scratch, SP_BC, and SP_BC_reg — which are shown to converge to NE, do not guarantee equal outcomes in our constructed cases.

Consistent with the theoretical analysis in Section 5.1, our experimental setup assumes that all opponents play a fixed, non-adaptive strategy across all rounds. This setup represents a fundamental and arguably the simplest scenario in which a well-designed algorithm should perform reliably, especially given that real-world opponents can be adaptive or even adversarial. For self-play-based algorithms, we first run each method until convergence and then evaluate the learned policy against the fixed opponent strategy. While SP_scratch has no access to the opponent’s strategy, we include SP_BC and SP_BC_reg as baselines for fair comparison. These methods incorporate the opponent’s strategy as either initialization or regularization, following design principles from recent self-play variants that achieve state-of-the-art performance in multi-player games. Finally, as discussed in Section 3, self-play from scratch without any opponent knowledge can fail even in simple settings, such as the majority-vote game.

Q2: The novelty of the results is limited

A2: Novelty. We would like to point out that, compared to developing complex algorithmic techniques, identifying and formulating the right question to solve is equally—if not more—important. This paper precisely addresses the latter: propose novel, practical, and important formulations, study basic properties, and bring the right tools to give principled solutions in a variety of settings.

In multiplayer games, much of the existing research has focused on achieving equilibrium-based outcomes. While extensively studied, equilibrium-based approaches have well-known limitations and are often unsuitable for developing AI agents in multiplayer settings. This concern was explicitly raised in prior work on real-world multiplayer game systems (e.g., Brown & Sandholm, 2019); however, such works do not resolve the issue or propose alternative objectives. Our paper is the first to formulate the problem as achieving equal share, and to clarify that the objective is theoretically well-behaved only under the condition that all opponents deploy identical strategies. While this condition seems strong, we justify its practical relevance and note that it aligns with the settings implicitly enabled in most prior state-of-the-art empirical work. We emphasize that these findings, though not technically challenging, are very fundamental and entirely novel—they have not been previously discovered or formally articulated in the literature.

Technical Contribution on Efficient Algorithms. In addition to our contribution to identifying the correct solution concept, this paper also provides a comprehensive set of algorithmic results for achieving equal share in various settings: (1) fixed opponents; (2) slowly changing opponents; (3) opponents that adapt at intermediate rates; (4) matching lower bounds.

For (1) and (2), we leverage established techniques from the no-regret and no-dynamic-regret learning literature, achieving equal shares with provable guarantees. Our contribution here mostly lies in identifying and bringing the right tools and adapting them to solve the right setup of the multiplayer games, where, to the best of our knowledge, prior state-of-the-art AI systems on multiplayer games (for Poker, etc) have not utilized techniques such as no-dynamic-regret.

For (3), we believe our discovery here is completely new, that simple algorithms like behavior cloning can sometimes even outperform sophisticated no-dynamic regret algorithms. Such a result is only possible by leveraging the symmetric structure of the multiplayer game beyond treating it as one versus an adversarial group of opponents.

For (4), while similar lower bounds have been shown in more general games, they do not apply to our settings as the hard instances constructed in prior lower bounds are not symmetric games. In this paper, we carefully construct new hard instances showing that the algorithmic results we proved in (2) and (3) are near-optimal.

审稿意见
2

This paper proposes a novel Monte Carlo Tree Search–inspired algorithm for multi-agent, simultaneous-move games under imperfect information.

给作者的问题

How does your method adapt if the game includes partial observability of the environment state itself (not just hidden opponent actions)? Do you foresee major changes to the design of the no-regret worker or the neural network architecture in more general imperfect-information settings?

论据与证据

The proposed NN-CCE algorithm achieves performance that is superior or competitive with some multi-agent reinforcement learning algorithms (e.g., MAPPO, MADDPG) across cooperative, competitive, and mixed tasks.

方法与评估标准

The authors benchmark performance mainly via win rate in competitive scenarios or average reward/success rate in cooperative tasks.

理论论述

The key theoretical claim is that following a no-regret learning procedure (like EXP-IX) in repeated states leads to approximate CCE in the time-averaged strategy profile. This is a standard result from the game theory literature on no-regret dynamics.

实验设计与分析

The authors evaluate their algorithm on 17 tasks, spanning different complexities and team-based structures (2-player zero-sum, 2-team with multiple agents, purely cooperative).

They compare to diverse baselines: MAPPO, MADDPG, s-MCTS, DORA, plus references to prior results for variants of PSRO or CFR on smaller tasks.

补充材料

N/A

与现有文献的关系

The paper is well positioned within the literature on multi-agent reinforcement learning (MADDPG, MAPPO) and game-theoretic equilibrium approximation (PSRO, CFR, DORA).

遗漏的重要参考文献

No glaringly missing citations.

其他优缺点

Strengths: The authors introduce a method that abandons deeper search in favor of single-step lookahead, using an online no-regret learner (EXP-IX) at each state to approximate a CCE distribution over actions.

They decouple the “data collection” (policy rollout) from the “equilibrium approximation” (no-regret workers) and from the “value/policy network training"

Weaknesses: By restricting the MCTS procedure to depth 1, the algorithm might miss deeper strategic foresight. While the neural value function can somewhat compensate, there may be domains where deeper lookahead is crucial.

其他意见或建议

A brief discussion contrasting Coarse Correlated Equilibria with Correlated Equilibria and (in zero-sum) Nash Equilibria would be beneficial for readers less familiar with these solution concepts. The paper touches on it, but a succinct subsection might be useful.

伦理审查问题

N/A

作者回复

Thank you for your feedback on our submission. Upon reviewing your comments, we believe there may have been a misunderstanding, as the points raised do not seem to be relevant to our paper.

We appreciate the time and effort you have put into your review and are happy to provide further clarification or address any specific questions if needed.

最终决定

[Note: Review SUo1 was excluded from my evaluation of the paper given that the review appears to not be about the paper at hand and the reviewer has not responded to queries.]

Overall, the reviewers feel the reasons to accept are more than the reasons to reject. The reviewers feel the problem domain is unique and challenging. However, they feel the evaluations are somewhat trivial -- more extensive evaluations with more sophisticated opponents would improve the paper substantially. However, as one reviewer noted, you have to start somewhere. As such, I am marking this paper as "weak accept" (if there is room in the program).

I encourage the authors to take into account the reviewer feedback as they iterate on the paper.