Comparator-Adaptive $\Phi$-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games
摘要
评审与讨论
The paper considers minimizing the regret in the expert problem, and propose new algorithms achieving certain regret bounds that adapt to a sparsity based complexity measure of each comparing instance, improving the results from a recent work (Lu et al., 2025). Different from the existing algorithm using wavelet inspired matrix features, the proposed algorithms are based on the arguably simpler idea of expert aggregation, with the twists being the incorporation of a carefully designed prior distribution over all possible s, as well as the idea of kernelization from (Farina et al., 2022). Finally, a variant of the algorithm is proposed to achieve an adaptive version of "fast rate" for learning in games, improving (Anagnostides et al., 2022).
优缺点分析
Overall, I think the paper has several notable strengths making it well above the acceptance threshold.
-
The results are strong as it removes several suboptimalities from the prior work and tightens its complexity description for the vast majority of transformation rules that are non-binary.
-
The ideas behind the algorithms are substantially different from that of Lu et al, bringing a novel, complementary perspective to the problem. In general, adaptivity in online learning is typically achieved by either aggregating different nonadaptive subroutines or the "parameter-free" type of approaches. Their connections and differences have provided interesting insights to the field, and the present work contributes to this thread as well.
-
The application to the fast rates in games is a nontrivial addition.
-
Writing is exceptionally clear despite the complicated technicalities.
On the flip side, I would suggest adding an in-depth discussion on the computational complexity. I understand the key motivation of this adaptive -regret problem as providing a beyond-the-worst improvement over the classical Blum-Mansour algorithm, therefore BM could be seen as the natural baseline. There, given the fixed point computational oracle which is matrix multiplication time ( in practice but in theory), the essential online learning occurs in a dimensional space, therefore the fixed point oracle is the only computational bottleneck and there has been some attempts accelerating it (such as power method, discussed in Appendix I.3 by Zhang et al. 2024). (Lu et al, 2025) is computationally similar, but in the present work the essential online learning appears to take time per iteration for both of the two proposed algorithms, dominating the fixed point computation. This makes the comparison to BM and Lu et al a bit more subtle, and some clarification would be nice.
Zhang, Brian, et al. "Efficient -Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games." Advances in Neural Information Processing Systems 37 (2024): 125192-125230.
问题
In the fast kernelized algorithm (Algorithm 6), I wonder if the computation of can be further reduced to matrix multiplication, thus time.
局限性
Yes
格式问题
I do not see any formatting concerns.
Q1: In the fast kernelized algorithm (Algorithm 6), I wonder if the computation of can be further reduced to matrix multiplication, thus time.
This is an excellent question. In fact, we now realize that each can be computed in time (so better than the time we provided in Algorithm 6 and also better than what the reviewer suspected, ). This makes the computational complexity of our algorithm exactly the same as Blum-Mansour and Lu et al. [2025]. We provide the proof below and will update Algorithm 6 this way in the next version. Thanks again for your comment.
First, note that using the definition of in Algorithm 6 and the definition of , one can verify that the entry of takes the following form: if ,
$
(\phi_{t+1})_{ij}=
\begin{aligned}
\frac{d-1}{c_{t}d(d-2)}(V_t)_{ij}\left(\frac{d-2}{d-1}(C_t)_{ij}+\frac{1}{d(d-1)}(S_t)_i+\frac{1}{d-1}(C_t)_{i,d+1}\right),\end{aligned}
$
if ,
$
\begin{aligned}
(\phi_{t+1})_{ij}=
\frac{d-1}{c_{t}d(d-2)}(V_t)_{ij}\left(\frac{d-2}{d-1}(C_t)_{ij}+\frac{1}{d(d-1)}(S_t)_i+(d-1)(C_t)_{i,d+1}\right),
\end{aligned}
$
where
- is defined by and can be calculated in time.
- is defined as . With , can be computed in time.
- is defined as follows: for all , and where and . Again, with , and can be computed in time, and then can be computed in time too.
- is defined by for . With , can be computed in time too.
Therefore, it is now clear that computing also only takes time.
Thank you for your rebuttal, this is a very nice result that perfectly addressed my comment. I'll strongly recommend this paper in the discussion phase.
This paper investigates instance-dependent \Phi-regret bounds. Building on techniques from prior work, the authors propose a refined complexity measure for the comparator, develop two efficient algorithms, and derive improved regret bounds.
优缺点分析
The paper is well written: the discussion of existing work is fair, the motivations behind the algorithmic development are clearly articulated, and the limitations are also acknowledged.
The construction of the prior is interesting, but the paper lacks a clear motivation for introducing it, and it is unclear whether this prior (and the resulting c_\phi) is the best possible choice. While the results presented are fairly comprehensive, the paper would benefit from a more thorough justification of the complexity measure c_\phi. It indeed interpolates between external and swap regret, but it is unclear what precise middle ground it captures. Moreover, c_\phi is defined as the minimum of two complexities, which seems somewhat ad hoc. It would strengthen the paper to provide a justification showing that c_\phi is inherent: for example, demonstrating that for some reasonable class of comparators (lying between \Phi-external/internal and \Phi-best), c_\phi is in fact the best complexity measure achievable.
问题
See above
局限性
Yes
最终评判理由
I have no outstanding concerns.
格式问题
N/A
Thanks for your positive review and valuable comments. We address the issues mentioned in your review below.
Q1: The construction of the prior is interesting, but the paper lacks a clear motivation for introducing it, and it is unclear whether this prior (and the resulting c_\phi) is the best possible choice. While the results presented are fairly comprehensive, the paper would benefit from a more thorough justification of the complexity measure c_\phi. It indeed interpolates between external and swap regret, but it is unclear what precise middle ground it captures. Moreover, c_\phi is defined as the minimum of two complexities, which seems somewhat ad hoc. It would strengthen the paper to provide a justification showing that c_\phi is inherent: for example, demonstrating that for some reasonable class of comparators (lying between \Phi-external/internal and \Phi-best), c_\phi is in fact the best complexity measure achievable.
While we also do not believe that our is the ``best'' complexity measure (in the sense that any reasonable algorithms must incur for all ), we note that we can at least say that for any integer and any algorithm, there exists a -expert problem and a with , such that is at least for this algorithm.
To see this, simply let experts be dummy experts who always have maximum loss , while the other experts follow the swap regret lower bound construction of Ito [2020] scaled by . We then define in this way. First, for a non-dummy expert, maps it to another non-dummy expert optimally (to minimize the total loss after swap). Then, to define 's behavior for dummy experts, we consider two cases: in the case where the algorithm picks dummy experts for more than times, we let map all dummy experts to a fixed non-dummy expert (in which case ); otherwise, all dummy experts are mapped to themselves (in which case ). Note that in both cases, we have . Moreover, in the first case, is at least since whenever the algorithm picks a dummy expert , it gets loss while expert gets loss at most (and the rest of the regret is non-negative by construction). In the second case, is essentially the swap regret of the algorithm for a -expert problem that lasts for at least rounds, and since the instance comes from the lower bound construction of Ito [2020], we also have .
Thank you for the response. It’s helpful to see the lower bound, though I was hoping for something more instance-specific. I maintain my positive evaluation of the paper and support its acceptance.
This paper introduces a novel framework for comparator-adaptive -regret, significantly improving upon the work of Lu et al. (2025). The authors propose two efficient algorithms—based on kernelized Multiplicative Weight Update (MWU) and a prior-aware BM-reduction—to achieve tighter regret bounds, removing unnecessary terms from the prior work.
优缺点分析
Strengths:
- The paper achieves tighter regret bounds, matching the minimax-optimal bounds for standard regret notions (external, internal, swap).
- The proposed kernelized MWU and BM-reduction approaches are computationally more efficient than prior methods.
- The extension to games is notable, providing the first adaptive and accelerated -equilibrium guarantee.
Weaknesses:
- Although the work is purely theoretical, lacking experimental results to demonstrate practical performance. Empirical evaluation on real-world datasets or game instances would strengthen the claims.
- The -equilibrium bound includes an term (Theorem 6.4), which scales quadratically with the number of players . This may limit applicability to large multi-agent systems.
问题
- Given the focus on upper bounds for comparator-adaptive -regret, is it possible to provide a lower bound analysis to establish the optimality of the proposed bounds, especially in relation to the complexity measure ?
- Regarding the kernelized MWU algorithm (Section 4), the paper mentions an complexity per iteration. How does this complexity impact the algorithm’s scalability for large ? Are there practical optimizations considered to handle high-dimensional scenarios?
- Can the framework be extended to handle heterogeneous agents with different action spaces and loss functions, as encountered in real-world games, and if so, how would the current analysis need to be adapted to accommodate such heterogeneity?
局限性
- Game Assumptions: The nonnegative-social-external-regret assumption restricts the applicability to specific game classes, limiting generality.
- Lack of Experiments: The absence of empirical results prevents validation of the algorithms’ efficiency and practical utility, especially compared to the method in Lu et al. (2025).
最终评判理由
The rebuttal has addressed my questions and improved upon the original results. As this is a solid theoretical paper that exceeds the acceptance threshold, I have decided to increase my score.
格式问题
No formatting issues were noticed.
Thank you for your positive review and comments. We address your questions below.
Q1: Given the focus on upper bounds for comparator-adaptive -regret, is it possible to provide a lower bound analysis to establish the optimality of the proposed bounds, especially in relation to the complexity measure ?
We already mentioned in the paper that our bound is optimal for the special case of external/internal/swap regret. In fact, by a simple argument, one can establish the following stronger lower bound as well: for any integer and any algorithm, there exists a -expert problem and a with , such that is at least for this algorithm.
To see this, simply let experts be dummy experts who always have maximum loss , while the other experts follow the swap regret lower bound construction of Ito [2020] scaled by . We then define in this way. First, for a non-dummy expert, maps it to another non-dummy expert optimally (to minimize the total loss after swap). Then, to define 's behavior for dummy experts, we consider two cases: in the case where the algorithm picks dummy experts for more than times, we let map all dummy experts to a fixed non-dummy expert (in which case ); otherwise, all dummy experts are mapped to themselves (in which case ). Note that in both cases, we have . Moreover, in the first case, is at least since whenever the algorithm picks a dummy expert , it gets loss while expert gets loss at most (and the rest of the regret is non-negative by construction). In the second case, is essentially the swap regret of the algorithm for a -expert problem that lasts for at least rounds, and since the instance comes from the lower bound construction of Ito [2020], we also have .
Q2: Regarding the kernelized MWU algorithm (Section 4), the paper mentions an complexity per iteration. How does this complexity impact the algorithm’s scalability for large ? Are there practical optimizations considered to handle high-dimensional scenarios?
First, we point out that in response to Review zKH9's comment, we have in fact further improved the time complexity of Algorithm 6 from to (see proof therein). So the final per-round complexity of our algorithm is where is the complexity of computing the fixed point of a stochastic matrix, which is in theory with , but can be much faster in practice using e.g., power method, as also pointed out by Review zKH9. Importantly, our algorithm is thus computationally as efficient as the classic Blum-Mansour algorithm and also the Lu et al. [2025] algorithm (while enjoying strictly better and more adaptive regret bounds).
Q3: Can the framework be extended to handle heterogeneous agents with different action spaces and loss functions, as encountered in real-world games, and if so, how would the current analysis need to be adapted to accommodate such heterogeneity?
First, we point out that we definitely do not require the same loss functions for different agents, since we are considering general-sum games. Second, as pointed out in our Footnote 4, we assume that the action set size is the same for all players, but that is completely for notational conciseness, and our analysis can be directly extended to games with different action set sizes, with now representing the maximum action set size among the agents.
Thank you for your rebuttal. It addressed my concerns and questions well. I think this is a technically solid paper, and I have therefore raised my score to Accept.
This paper builds upon the recent work of Lu et al. (2025), providing -regret bounds that are based on a sparsity complexity measure that recovers the almost optimal no-external, no-internal, and no-swap regret guarantees. -regret is a well-established regret notion, where given a sequence of probabilities , it measures the difference between the experienced cost and the cost of the best transformation , . Lu et al. (2025) provide an algorithm with , while the authors provide an improved bound of with a significantly simpler algorithm.
The basic technical idea of the authors is to introduce a prior probability distribution on the set of – row-stochastic matrices and to establish -regret bounds by running the famous MWU on this set. The authors then use MWU as a meta-algorithm in order to select the optimal step size. The authors additionally show that, despite dealing with an exponentially large action set, their algorithm can be efficiently implemented using kernelization ideas.
In Section , the authors introduce an alternative approach based on the Blum–Mansour reduction. Finally, in Section 6, the authors show how to extend their algorithm so as to achieve for approximate -equilibrium in a specific class of normal-form games.
优缺点分析
I think the paper presents solid theoretical results that will be of interest to the online learning community at NeurIPS. Beyond the improved regret guarantees over those of Lu et al., the proposed algorithms are both simple and elegant. The presentation of the paper is very clear, and the technical ideas are well-explained.
Perhaps the only result I found somewhat expected was the convergence rate to -equilibrium, given the extensive recent literature on the subject. Despite the considered class has been previously considered to the literature, the assumption feels a bit artificial. At the same time, I believe that a similar result for general normal-form games would add value to the paper.
问题
-
What are the limitations of extending your results for general sum normal form games?
-
Can you elaborate more on the class of games satisfying Assumption~6.3?
局限性
Yes
最终评判理由
After the author-reviewer discussion I am very confident on the significance of the results provided by the paper. I thus recommend acceptance.
格式问题
N/A
Thank you for the very positive review. We address your questions below.
Q1: What are the limitations of extending your results for general sum normal form games?
We use the key nonnegative-social-external-regret assumption to establish a small path-length bound of the entire learning dynamic (see Theorem D.7). Without this assumption, in principle, we could also use the fact that swap regret is by definition nonnegative to establish another path-length bound, but it would be larger than by some factors of poly, making the final bound on at least as large as the best existing bound for swap regret [Anagnostides et al., 2022a,b] even for a with small (and thus vacuous for our purpose).
Q2: Can you elaborate more on the class of games satisfying Assumption~6.3?
While there is no obvious intuitive explanation for this assumption, as argued by Anagnostides et al. [2022c], this class of games is intricately connected with the admission of a minimax theorem. They show (in their Proposition A.10) that all the following standard and well-studied games satisfy this property:
- Two-player zero-sum game;
- Polymatrix zero-sum games;
- Constant-sum Polymatrix games;
- Strategically zero-sum games;
- Polymatrix strategically zero-sum games;
- Convex-concave (zero-sum) games;
- Zero-sum games with objective such that
- Quasiconvex-quasiconcave (zero-sum) games;
- Zero-sum stochastic games.
Thank you very much for your prompt repsonse. I maintain my positive impression of the paper and I keep my score.
This paper was met with unanimous enthusiasm from all four reviewers. The work introduces a new framework for achieving comparator-adaptive Phi-regret that is simpler, more efficient, and achieves tighter bounds than previous state-of-the-art methods. A key contribution is the extension of these results to game theory, providing the first adaptive and accelerated convergence rates to Phi-equilibria. The authors' rebuttal was comprehensive and successfully addressed all reviewer concerns, including a key point about computational complexity that led one reviewer to raise their score. The work is certainly strong enough for a spotlight presentation.