PaperHub
6.3
/10
Poster4 位审稿人
最低6最高7标准差0.4
6
6
7
6
2.5
置信度
正确性3.3
贡献度3.0
表达3.0
NeurIPS 2024

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06

摘要

First-order methods (FOMs) are arguably the most scalable algorithms for equilibrium computation in large extensive-form games. To operationalize these methods, a distance-generating function, acting as a regularizer for the strategy space, must be chosen. The ratio between the strong convexity modulus and the diameter of the regularizer is a key parameter in the analysis of FOMs. A natural question is then: what is the optimal distance-generating function for extensive-form decision spaces? In this paper, we make a number of contributions, ultimately establishing that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors. The DilEnt regularizer is notable due to its iterate-equivalence with Kernelized OMWU (KOMWU)---the algorithm with state-of-the-art dependence on the game tree size in extensive-form games---when used in conjunction with the online mirror descent (OMD) algorithm. However, the standard analysis for OMD is unable to establish such a result; the only current analysis is by appealing to the iterate equivalence to KOMWU. We close this gap by introducing a pair of primal-dual treeplex norms, which we contend form the natural analytic viewpoint for studying the strong convexity of DilEnt. Using these norm pairs, we recover the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU. Along with a new regret lower bound for online learning in sequence-form strategy spaces, we show that this ratio is nearly optimal. Finally, we showcase our analytic techniques by refining the analysis of Clairvoyant OMD when paired with DilEnt, establishing an $\mathcal{O}(n \log |\mathcal{V}| \log T/T)$ approximation rate to coarse correlated equilibrium in $n$-player games, where $|\mathcal{V}|$ is the number of reduced normal-form strategies of the players, establishing the new state of the art.
关键词
extensive-form gamesfirst order methodsdilated entropyequilibrium computation

评审与讨论

审稿意见
6

This work studies first-order methods for equilibrium computation in extensive form games (EFGs) and the effect of the regularization choice (distance-generating function). The key parameter for the performance of these methods is the ratio between the strong convexity modulus and the diameter of the regularizer. The main result is that a particular regularizer, namely the dilated entropy, is an optimal choice in terms of the aforementioned ratio. This regularizer can be used to recover prior results in EFGs such that the regret performance of Kernelized OMWU.

优点

The main result of the paper is essentially a toolbox for the dilated entropy (DilEnt) regularizer that can be then employed to get improved regret bounds. The main corollary of this toolbox is that running OMD with DilEnt gives the same performance as Kernelized OMWU, which employs the "normal-form reduction" of the EFG. I find this result interesting since it provides a nice unifying tool between prior works and the power of first-order methods.

The paper is in general well-written, some parts are a little bit dense and the notation is heavy (this is a general issue in this line of work).

Essentially, the main result of the paper is a matching-performance first-order method compared to KOMWU. I think that the problem studied by the authors has a concrete motivation and the solution is nice. Hence, I believe that the paper is above the borderline for acceptance.

缺点

In terms of contribution, I believe that the main result complements well the existing literature. I do not have some concrete weakness to point out but I have a couple of clarifying questions for the authors.

问题

  1. Is there some motivation in preferring a FOM rather than kernelization?

  2. In the Clairvoyant OMD result, is the regret bound achieved by an efficient algorithm? Because clairvoyance requires solving a fixed-point problem, right?

  3. Could the results obtained by either first-order methods with DilEnt or KOMWU hold for last-iterate convergence?

  4. Regarding the lower bound of Theorem 5: is the result in the adversarial setting, where one player employs some algorithm and the other players decide "in the worst case" for the algorithm?

局限性

作者回复

Thanks for your feedback!

Q1: “Is there some motivation in preferring a FOM rather than kernelization?”

A1: Given that FOMs have been extensively explored in the literature, they can benefit from the existing techniques. An example of this is the adoption of the COMD approach in our paper, which leads to a better theoretical convergence rate for learning CCE in extensive-form games. Also, FOMs are more flexible than kernelization, in general. With kernelization, one would be restricted to (variants of) the multiplicative weights update algorithm. In contrast, our DilEnt result can be applied to any first-order method that requires a strongly convex regularizer (including adaptive-stepsize methods, mirror prox, Nesterov’s excessive gap technique, etc., which do not have a kernelization equivalent). Furthermore, the analytic tools we introduce, including the treeplex norms, are likely of independent interest for first-order methods on extensive-form strategy sets and might apply to other regularizers as well.

Q2: “In the Clairvoyant OMD result, is the regret bound achieved by an efficient algorithm? Because clairvoyance requires solving a fixed-point problem, right?”

A2: Yes, COMD is indeed efficient. The fixed-point problem is solved via fixed-point iteration. By proving that the fixed-point iteration steps are contractions with respect to the treeplex norm (Lemma 4.5), the fixed-point subproblem is guaranteed to be solved at ε\varepsilon-approximation in logarithmic iterations in 1/ε1/\varepsilon.

Q3: “Could the results obtained by either first-order methods with DilEnt or KOMWU hold for last-iterate convergence?”

A3: While this paper was not concerned with last-iterate convergence guarantees, by applying our new analysis framework to the result in [1], it should be straightforward to establish last-iterate convergence guarantees with a better dependency on game size in the settings for which [1] guarantees convergence.

Q4: “Regarding the lower bound of Theorem 5: is the result in the adversarial setting, where one player employs some algorithm and the other players decide "in the worst case" for the algorithm?”

A4: Yes, the result given by Theorem 5.1 is in the adversarial setting, which exactly matches the upper bound provided in Theorem 4.4. This is the standard online learning setting in which the usual lower bounds (for example, Ω(Tlogn)\Omega(\sqrt{T \log n}) for a probability simplex) are given. Lower bounds in the learning-in-games settings are widely unknown.

[1] Lee, Chung-Wei, Christian Kroer, and Haipeng Luo. "Last-iterate convergence in extensive-form games." Advances in Neural Information Processing Systems 34 (2021): 14293-14305.

审稿意见
6

The paper examines Distance Generating Functions (DGFs), a framework developed for providing online learning and equilibrium-computing first-order methods for extensive-form games (EFGs). Specifically, DGFs were introduced in the literature as a form of regularization tailored to the strategy space of EFGs. Combining Online Mirror Descent with DGFs leads to first-order methods for equilibrium computation in EFGs. A DGF ϕ\phi is required to be μ\mu-strongly convex with respect to some norm. An important quantity of interest is the ratio between the diameter introduced by the Bregman divergence induced by ϕ\phi and the μ\mu-strong convexity constant, i.e., Dϕ/μ\mathcal{D_\phi}/\mu. This ratio appears in the regret bounds and affects the convergence rates. Consequently, the authors investigate which DGF achieves the best diameter/strong convexity ratio. They demonstrate that a specific form of DGF, the weight-one dilated entropy (DilEnt), achieves the optimal diameter/strong-convexity ratio up to logarithmic factors out of all DGFs. More precisely, DilEnt admits an O(logV)\mathcal{O}(\log |V|) diameter/strong-convexity ratio, where VV denotes the set of possible pure strategies for the EFGs. Additionally, they demonstrate that combining DilEnt with the recently proposed framework of Clairvoyant Mirror Descent can lead to state-of-the-art convergence rates for computing Coarse Correlated Equilibrium (CCE) in EFGs.

优点

The DGF framework has been instrumental in designing no-regret and efficient equilibrium computation algorithms for EFGs. Consequently, I believe that characterizing the DGF that achieves the optimal diameter/strong-convexity ratio is a significant result for the game theory community. Additionally, the paper introduces intriguing technical concepts by incorporating primal-dual treeplex norms to bound the diameter/strong-convexity ratio. Finally, by combining their approach with the Clairvoyant framework, the paper presents a substantial improvement in convergence rates for computing Coarse Correlated Equilibrium (CCE) in EFGs, reducing the rate from O(log4T/T)\mathcal{O}(\log^4 T/T) to O(logT/T)\mathcal{O}(\log T/T).

缺点

Despite the fact that all the presented results are interesting and introduce compelling technical ideas, I have some doubts about the main takeaway of the paper. As far as I understand, DilEnt was introduced by [1] to establish an equivalence between Mirror Descent and the kernelized approach of [2], which already achieved the O(logVT)\mathcal{O}(\sqrt{\log |V| T }) regret bounds for EFGs. From this perspective, the O(logV)\mathcal{O}(\log |V|) bound on the diameter/strong-convexity ratio is somewhat expected. While I acknowledge that establishing this bound is far from trivial and technically challenging, it can also be seen as an alternative way of retrieving an already known result.

The main takeaway message of the paper might be the lower bound, which establishes that the results of [2] cannot be improved by adopting the DGF approach. If this is the case, highlighting this point would greatly benefit the paper.

Regarding the improved rate for CCE through the use of Clairvoyant Mirror Descent, is the current result truly necessary to achieve this improvement? I am not entirely sure, but I have a feeling that coupling the clairvoyant approach with the previous results of [2] would result in the same improvement.

That being said, I believe this is an interesting paper, and I am willing to further increase my score if these points are addressed.

[1] Bai et al. Efficient phi-regret minimization in extensive-form games via online mirror descent

[2] Farina et al. Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games

问题

See weaknesses above.

局限性

Yes

作者回复

Thanks for your feedback!

Q1: “The main takeaway message of the paper might be the lower bound, which establishes that the results of [2] cannot be improved by adopting the DGF approach. If this is the case, highlighting this point would greatly benefit the paper.”

A1: Thanks for the suggestions. In our opinion, however, the main takeaway of the paper is twofold. One is the lower bound, which suggests that KOMWU [2] is nearly optimal—as you said. The other, which we think is very important, is the analysis technique based on the treeplex norms. We believe that this contribution creates important analytical tooling that can be of independent interest for future research on regularization of these important combinatorial domains. As a byproduct, we are able to establish that the DilEnt regularizer achieves the near-optimal diameter-to-strong-convexity ratio for EFGs with full-information feedback. This also reconciles the more ad-hoc analysis based on kernelization (references [1,2] in your review) with the standard analysis of first-order methods and mirror descent, resolving the disconnect in the analysis.

Q2: “Regarding the improved rate for CCE through the use of Clairvoyant Mirror Descent, is the current result truly necessary to achieve this improvement?”

A2: This question is more speculative and it’s hard to know for sure. But, we are fairly confident that the new analytic tools we introduced, that are in line with the standard analysis of mirror descent-based algorithms, provide the natural groundwork for analyzing the algorithm. In particular, Clairvoyant MWU uses fixed-point iterations on a map that has to be shown to be contractive with respect to some norm. The treeplex norm is a very natural candidate for extensive-form games, as we show in the paper. It does not seem immediately straightforward to replicate these results for the kernelization approach.

[1] Bai, Yu, et al. "Efficient Phi-regret minimization in extensive-form games via online mirror descent." Advances in Neural Information Processing Systems 35 (2022): 22313-22325.

[2] Farina, Gabriele, et al. "Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games." International Conference on Machine Learning. PMLR, 2022.

评论

Thank you for the response. I still do not understand why the kernelization approach cannot be directly coupled with the Clairvoyant framework. I believe that the latter should be elaborated in detail in the revised manuscript. Nevertheless I think that the paper is interesting and I will keep my score.

审稿意见
7

This paper studies the optimality of diluted entropy functions for online learning in extensive form games. The weighted one diluted entropy is shown to be optimal up to logarithmic factors, for which a new lower bound is also provided.

优点

The paper studies a very relevant problem of the optimal choice of diluted entropy function, as it provides insights into the practical implementation of online learning with diluted entropy in real-world applications. The result is that the weight-one diluted entropy and the new matching lower bound is an addition to the field of online learning in games. Additionally, the new analysis of OMD, with the use of treeplex norms, is also new (to my best knowledge), and may be of separate interest for solving extensive form games.

缺点

I would suggest adding some more explanation and providing some hindsight of the proof of the theorems in the main paper, i.e. how are the treeplex, diameter-to-strong-convexity ratio of DilEnt used.

问题

Could you provide some hindsight on how are the treeplex, diameter-to-strong-convexity ratio of DilEnt used?

局限性

yes

作者回复

Thanks for your feedback!

Q: “Could you provide some hindsight on how are the treeplex, diameter-to-strong-convexity ratio of DilEnt used?”

A: The ratio between diameter and strong convexity of the regularizer on the feasible set (in the case of extensive-form games, the feasible set of every player is their sequence-form strategy polytope, also known as treeplex) is a key quantity in the performance of mirror descent-based algorithms. Intuitively, the performance degrades the more the diameter is large (as there is more “space” to search over), and increases the more the regularizer is bowl-shaped (i.e., strongly convex), as the minimum of the function becomes easier to predict. Of course, there is a tradeoff between these two parameters: if the regularizer is too strongly convex, then it grows really fast, making the diameter of the set Dφ:=maxxQDφ(xx1)=maxxQφ(x)minxQφ(x)D_\varphi := \max_{x \in Q} D_\varphi(x \| x_1) = \max_{x \in Q} \varphi(x) - \min_{x \in Q} \varphi(x), larger.

Thanks for the suggestion, we will include this intuitive discussion in the revision.

审稿意见
6

The paper offers significant advancements in understanding the efficacy of distance-generating functions (DGFs) for extensive-form games (EFGs), specifically focusing on the optimization and application of first-order methods (FOMs). Central to the study is the exploration of the weight-one dilated entropy (DilEnt) regularizer, which is proven to be optimal up to logarithmic factors for these games. Overall, the paper makes a strong contribution to the field of algorithmic game theory, particularly in optimizing computational approaches for EFGs, and sets a new benchmark for subsequent research in this area. It opens up several avenues for further exploration, particularly in refining these methods for broader classes of games and in practical, real-world settings.

优点

  • The paper successfully establishes that DilEnt provides an optimal distance-generating function in terms of the diameter-to-strong convexity ratio for the strategy spaces in EFGs.
  • By developing new primal-dual treeplex norms, the authors enhance the analytical framework for FOMs in EFGs, allowing for more precise performance predictions that align with the capabilities of the Kernelized Optimistic Multiplicative Weight Update (KOMWU) algorithm.
  • The research presents a comprehensive analysis of the DilEnt regularizer's performance, including establishing new state-of-the-art approximation rates for achieving coarse correlated equilibrium in multiplayer games.

缺点

  • While the paper makes substantial theoretical advancements, there are less empirical validations of the proposed methods across real-world datasets or diverse game settings, which might be necessary to fully understand the practical implications.
  • The work primarily focuses on EFGs with perfect recall and does not extensively address scenarios with imperfect recall or partial information, which could be relevant for a broader range of applications

问题

Is it possible to conduct some proof of concept experiments to validate the theoretical findings?

局限性

The authors discussed in their conclusions limitations and future directions.

作者回复

Thanks for your feedback and your praise of our paper!

We address the two weaknesses you mentioned:

  • Your claim that our paper does not deal with “partial information” is wrong. The techniques of the paper apply to imperfect-information extensive-form games. You are however right that the paper focuses on the case of perfect-recall games (i.e., players that do not forget their observations). This is a standard assumption. Equilibrium computation and learning in imperfect-recall games is known to be computationally intractable.

  • Regarding empirical validation: you are right that we dedicated the entire paper on developing theoretical contributions. However, other work has examined the empirical performance of the DilEn regularizer (e.g., [1]), though without the theoretical understanding and machinery that we introduce in this paper to analyze its metric properties.

[1] Lee, Chung-Wei, Christian Kroer, and Haipeng Luo. "Last-iterate convergence in extensive-form games." Advances in Neural Information Processing Systems 34 (2021): 14293-14305.

评论

Thank you for your response. I have read the rebuttal and decide to keep my score.

最终决定

The paper advances the understanding on extensive form games. They analyze weight 1 dilated entropy regularizer used in the OMD framework can lead to state of the art results in EFGs. While this result was already established, it was done via identification of iterate equivalence between OMD with DilEnt and a Kernelized Online multiplicative weights. The paper provides a more 'standard' OMD analysis by showing that one can bound the ratio of diameter to strong convexity modulus of these regularizers. They also show this is optimal upto log factors for this setting. This is achieved via the development of appropriate treeplex norms which is interesting in its own right. They further show the use of their result by applying to a Clairvoyant OMD setting for achieving coarse correlated equilibria.

There have been a sequence of results in this area in recent times. The paper provides a nice addition to this sequence and the result as well as the techniques seem useful to the online learning and learning in games community and perhaps beyond. Overall the paper was found to be well written and the reviewers were also unanimous in acceptance. I recommend acceptance.