PaperHub
8.2
/10
Spotlight4 位审稿人
最低5最高5标准差0.0
5
5
5
5
3.3
置信度
创新性3.3
质量3.3
清晰度3.3
重要性3.0
NeurIPS 2025

Comparator-Adaptive $\Phi$-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games

OpenReviewPDF
提交: 2025-05-11更新: 2025-10-29

摘要

In the classic expert problem, $\Phi$-regret measures the gap between the learner's total loss and that achieved by applying the best action transformation $\phi \in \Phi$. A recent work by Lu et al., [2025] introduced an adaptive algorithm whose regret against a comparator $\phi$ depends on a certain sparsity-based complexity measure of $\phi$, recovering and interpolating optimal bounds for standard regret notions such as external, internal, and swap regret. In this work, we propose a general idea to achieve an even better comparator-adaptive $\Phi$-regret bound via much simpler algorithms compared to Lu et al., [2025]. Specifically, we discover a prior distribution over all possible binary transformations and show that it suffices to achieve prior-dependent regret against these transformations. Then, we propose two concrete and efficient algorithms to achieve so, where the first one combines multiple copies of the kernelized Hedge algorithm of Farina et al., [2022], and the second one combines multiple copies of a variant of the BM-reduction [Blum and Mansour, 2007]. To further showcase the power of our methods and the advantages over Lu et al., [2025] besides the simplicity and better regret bounds, we also show that our second approach can be extended to the game setting to achieve accelerated and adaptive convergence rate to $\Phi$-equilibria for a class of general-sum games. When specified to the special case of correlated equilibria, our bound improves over the existing ones from Anagnostides et al., [2022a,b].
关键词
Swap Regret$\Phi$-RegretLearning in Games

评审与讨论

审稿意见
5

The paper considers minimizing the Φ\Phi regret in the expert problem, and propose new algorithms achieving certain regret bounds that adapt to a sparsity based complexity measure of each comparing ϕ\phi instance, improving the results from a recent work (Lu et al., 2025). Different from the existing algorithm using wavelet inspired matrix features, the proposed algorithms are based on the arguably simpler idea of expert aggregation, with the twists being the incorporation of a carefully designed prior distribution over all possible ϕ\phis, as well as the idea of kernelization from (Farina et al., 2022). Finally, a variant of the algorithm is proposed to achieve an adaptive version of "fast rate" for learning in games, improving (Anagnostides et al., 2022).

优缺点分析

Overall, I think the paper has several notable strengths making it well above the acceptance threshold.

  1. The results are strong as it removes several suboptimalities from the prior work and tightens its complexity description for the vast majority of transformation rules that are non-binary.

  2. The ideas behind the algorithms are substantially different from that of Lu et al, bringing a novel, complementary perspective to the problem. In general, adaptivity in online learning is typically achieved by either aggregating different nonadaptive subroutines or the "parameter-free" type of approaches. Their connections and differences have provided interesting insights to the field, and the present work contributes to this thread as well.

  3. The application to the fast rates in games is a nontrivial addition.

  4. Writing is exceptionally clear despite the complicated technicalities.

On the flip side, I would suggest adding an in-depth discussion on the computational complexity. I understand the key motivation of this adaptive Φ\Phi-regret problem as providing a beyond-the-worst improvement over the classical Blum-Mansour algorithm, therefore BM could be seen as the natural baseline. There, given the fixed point computational oracle which is matrix multiplication time (d3d^3 in practice but dωd^\omega in theory), the essential online learning occurs in a d2d^2 dimensional space, therefore the fixed point oracle is the only computational bottleneck and there has been some attempts accelerating it (such as power method, discussed in Appendix I.3 by Zhang et al. 2024). (Lu et al, 2025) is computationally similar, but in the present work the essential online learning appears to take d3logdd^3\log d time per iteration for both of the two proposed algorithms, dominating the fixed point computation. This makes the comparison to BM and Lu et al a bit more subtle, and some clarification would be nice.

Zhang, Brian, et al. "Efficient Φ\Phi -Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games." Advances in Neural Information Processing Systems 37 (2024): 125192-125230.

问题

In the fast kernelized algorithm (Algorithm 6), I wonder if the computation of ϕt\phi_t can be further reduced to matrix multiplication, thus dωd^\omega time.

局限性

Yes

格式问题

I do not see any formatting concerns.

作者回复

Q1: In the fast kernelized algorithm (Algorithm 6), I wonder if the computation of ϕt\phi_t can be further reduced to matrix multiplication, thus dωd^\omega time.

This is an excellent question. In fact, we now realize that each phi_t\\phi\_t can be computed in mathcalO(d2)\\mathcal{O}(d^2) time (so better than the mathcalO(d3)\\mathcal{O}(d^3) time we provided in Algorithm 6 and also better than what the reviewer suspected, mathcalO(domega)\\mathcal{O}(d^{\\omega})). This makes the computational complexity of our algorithm exactly the same as Blum-Mansour and Lu et al. [2025]. We provide the proof below and will update Algorithm 6 this way in the next version. Thanks again for your comment.

First, note that using the definition of phi_t+1\\phi\_{t+1} in Algorithm 6 and the definition of psik\\psi^k, one can verify that the entry of phi_t+1\\phi\_{t+1} takes the following form: if iji\neq j,
$

(\phi_{t+1})_{ij}= \begin{aligned} \frac{d-1}{c_{t}d(d-2)}(V_t)_{ij}\left(\frac{d-2}{d-1}(C_t)_{ij}+\frac{1}{d(d-1)}(S_t)_i+\frac{1}{d-1}(C_t)_{i,d+1}\right),\end{aligned}
$

if i=ji=j,
$

\begin{aligned} (\phi_{t+1})_{ij}= \frac{d-1}{c_{t}d(d-2)}(V_t)_{ij}\left(\frac{d-2}{d-1}(C_t)_{ij}+\frac{1}{d(d-1)}(S_t)_i+(d-1)(C_t)_{i,d+1}\right), \end{aligned}
$

where

  • VtinmathbbRdtimesdV_t\\in\\mathbb{R}^{d\\times d} is defined by (V_t)_ik=fracexp(eta(L_t)_ik)sum_j=1dexp(eta(L_t)_ij)(V\_t)\_{ik} = \\frac{\\exp(-\\eta(L\_t)\_{ik})}{\\sum\_{j=1}^d \\exp(-\\eta(L\_t)\_{ij})} and can be calculated in mathcalO(d2)\\mathcal{O}(d^2) time.
  • ctinmathbbRc_{t}\\in \\mathbb{R} is defined as frac1dsum_k=1dprod_i=1dleft((V_t)_ik+frac1d(d2)Big)+prod_i=1dBig((V_t)_ii+frac1d(d2)right)\\frac{1}{d} \\sum\_{k=1}^d\\prod\_{i=1}^d\\left((V\_t)\_{ik}+\\frac{1}{d(d-2)}\\Big) + \\prod\_{i=1}^d\\Big((V\_t)\_{ii}+\\frac{1}{d(d-2)}\\right). With V_tV\_t, c_tc\_{t} can be computed in mathcalO(d2)\\mathcal{O}(d^2) time.
  • CtinmathbbRdtimes(d+1)C_t\\in\\mathbb{R}^{d\\times (d+1)} is defined as follows: for all kin[d]k\\in[d], (C_t)_ik=frac(v_t)_k(V_t)_ik+frac1d(d2)(C\_t)\_{ik} = \\frac{(v\_{t})\_k}{(V\_t)\_{ik}+\\frac{1}{d(d-2)}} and (C_t)_i,d+1=fracu_t(V_t)_ii+frac1d(d2)(C\_t)\_{i,d+1} = \\frac{u\_t}{(V\_t)\_{ii}+\\frac{1}{d(d-2)}} where (v_t)_k=prodi=1d((V_t)_ik+frac1d(d2))(v\_{t})\_k = \\prod_{i=1}^d((V\_t)\_{ik}+\\frac{1}{d(d-2)}) and u_t=prod_i=1d((V_t)_ii+frac1d(d2))u\_t = \\prod\_{i=1}^d((V\_t)\_{ii}+\\frac{1}{d(d-2)}). Again, with V_tV\_t, v_tv\_t and u_tu\_t can be computed in mathcalO(d2)\\mathcal{O}(d^2) time, and then C_tC\_t can be computed in mathcalO(d2)\\mathcal{O}(d^2) time too.
  • S_tinmathbbRdS\_t\\in \\mathbb{R}^d is defined by (S_t)_i=sum_k=1d(C_t)_ik(S\_t)\_i = \\sum\_{k=1}^d (C\_t)\_{ik} for iin[d]i\\in [d]. With C_tC\_t, S_tS\_t can be computed in mathcalO(d2)\\mathcal{O}(d^2) time too.

Therefore, it is now clear that computing phi_t+1\\phi\_{t+1} also only takes mathcalO(d2)\\mathcal{O}(d^2) time.

评论

Thank you for your rebuttal, this is a very nice result that perfectly addressed my comment. I'll strongly recommend this paper in the discussion phase.

审稿意见
5

This paper investigates instance-dependent \Phi-regret bounds. Building on techniques from prior work, the authors propose a refined complexity measure for the comparator, develop two efficient algorithms, and derive improved regret bounds.

优缺点分析

The paper is well written: the discussion of existing work is fair, the motivations behind the algorithmic development are clearly articulated, and the limitations are also acknowledged.

The construction of the prior is interesting, but the paper lacks a clear motivation for introducing it, and it is unclear whether this prior (and the resulting c_\phi) is the best possible choice. While the results presented are fairly comprehensive, the paper would benefit from a more thorough justification of the complexity measure c_\phi. It indeed interpolates between external and swap regret, but it is unclear what precise middle ground it captures. Moreover, c_\phi is defined as the minimum of two complexities, which seems somewhat ad hoc. It would strengthen the paper to provide a justification showing that c_\phi is inherent: for example, demonstrating that for some reasonable class of comparators (lying between \Phi-external/internal and \Phi-best), c_\phi is in fact the best complexity measure achievable.

问题

See above

局限性

Yes

最终评判理由

I have no outstanding concerns.

格式问题

N/A

作者回复

Thanks for your positive review and valuable comments. We address the issues mentioned in your review below.

Q1: The construction of the prior is interesting, but the paper lacks a clear motivation for introducing it, and it is unclear whether this prior (and the resulting c_\phi) is the best possible choice. While the results presented are fairly comprehensive, the paper would benefit from a more thorough justification of the complexity measure c_\phi. It indeed interpolates between external and swap regret, but it is unclear what precise middle ground it captures. Moreover, c_\phi is defined as the minimum of two complexities, which seems somewhat ad hoc. It would strengthen the paper to provide a justification showing that c_\phi is inherent: for example, demonstrating that for some reasonable class of comparators (lying between \Phi-external/internal and \Phi-best), c_\phi is in fact the best complexity measure achievable.

While we also do not believe that our cϕc_\phi is the ``best'' complexity measure (in the sense that any reasonable algorithms must incur Reg(ϕ)=Ω(cϕTlogd)\mathrm{Reg}(\phi)=\Omega(\sqrt{c_\phi T\log d}) for all ϕ\phi), we note that we can at least say that for any integer k[d]k \in [d] and any algorithm, there exists a dd-expert problem and a ϕ\phi with cϕk+1c_\phi \leq k+1, such that Reg(ϕ)\mathrm{Reg}(\phi) is at least Ω(cϕTlogcϕ)\Omega(\sqrt{c_\phi T\log c_\phi}) for this algorithm.

To see this, simply let dkd-k experts be dummy experts who always have maximum loss 11, while the other kk experts follow the swap regret lower bound construction of Ito [2020] scaled by 1/21/2. We then define ϕ\phi in this way. First, for a non-dummy expert, ϕ\phi maps it to another non-dummy expert optimally (to minimize the total loss after swap). Then, to define ϕ\phi's behavior for dummy experts, we consider two cases: in the case where the algorithm picks dummy experts for more than kTlogk\sqrt{kT\log k} times, we let ϕ\phi map all dummy experts to a fixed non-dummy expert (in which case dϕunifdkd_\phi^{\text{unif}} \geq d-k); otherwise, all dummy experts are mapped to themselves (in which case dϕselfdkd_\phi^{\text{self}} \geq d-k). Note that in both cases, we have cϕk+1c_\phi \leq k+1. Moreover, in the first case, Reg(ϕ)\mathrm{Reg}(\phi) is at least 12kTlogk\frac{1}{2}\sqrt{kT\log k} since whenever the algorithm picks a dummy expert ii, it gets loss 11 while expert ϕ(i)\phi(i) gets loss at most 1/21/2 (and the rest of the regret is non-negative by construction). In the second case, Reg(ϕ)\mathrm{Reg}(\phi) is essentially the swap regret of the algorithm for a kk-expert problem that lasts for at least TkTlogkT-\sqrt{kT\log k} rounds, and since the instance comes from the lower bound construction of Ito [2020], we also have Reg(ϕ)=Ω(klog(k)(TkTlogk))=Ω(kTlogk)\mathrm{Reg}(\phi)=\Omega(\sqrt{k\log(k)(T-\sqrt{kT\log k})}) = \Omega(\sqrt{kT\log k}).

评论

Thank you for the response. It’s helpful to see the lower bound, though I was hoping for something more instance-specific. I maintain my positive evaluation of the paper and support its acceptance.

审稿意见
5

This paper introduces a novel framework for comparator-adaptive Φ\Phi-regret, significantly improving upon the work of Lu et al. (2025). The authors propose two efficient algorithms—based on kernelized Multiplicative Weight Update (MWU) and a prior-aware BM-reduction—to achieve tighter regret bounds, removing unnecessary terms from the prior work.

优缺点分析

Strengths:

  • The paper achieves tighter regret bounds, matching the minimax-optimal bounds for standard regret notions (external, internal, swap).
  • The proposed kernelized MWU and BM-reduction approaches are computationally more efficient than prior methods.
  • The extension to games is notable, providing the first adaptive and accelerated Φ\Phi-equilibrium guarantee.

Weaknesses:

  • Although the work is purely theoretical, lacking experimental results to demonstrate practical performance. Empirical evaluation on real-world datasets or game instances would strengthen the claims.
  • The Φ\Phi-equilibrium bound includes an O(N2logd)O(N^2\log d) term (Theorem 6.4), which scales quadratically with the number of players NN. This may limit applicability to large multi-agent systems.

问题

  • Given the focus on upper bounds for comparator-adaptive Φ\Phi-regret, is it possible to provide a lower bound analysis to establish the optimality of the proposed bounds, especially in relation to the complexity measure c_ϕc\_{\phi}?
  • Regarding the kernelized MWU algorithm (Section 4), the paper mentions an O(d3)O(d^3) complexity per iteration. How does this complexity impact the algorithm’s scalability for large dd? Are there practical optimizations considered to handle high-dimensional scenarios?
  • Can the framework be extended to handle heterogeneous agents with different action spaces and loss functions, as encountered in real-world games, and if so, how would the current analysis need to be adapted to accommodate such heterogeneity?

局限性

  • Game Assumptions: The nonnegative-social-external-regret assumption restricts the applicability to specific game classes, limiting generality.
  • Lack of Experiments: The absence of empirical results prevents validation of the algorithms’ efficiency and practical utility, especially compared to the method in Lu et al. (2025).

最终评判理由

The rebuttal has addressed my questions and improved upon the original results. As this is a solid theoretical paper that exceeds the acceptance threshold, I have decided to increase my score.

格式问题

No formatting issues were noticed.

作者回复

Thank you for your positive review and comments. We address your questions below.

Q1: Given the focus on upper bounds for comparator-adaptive Φ\Phi-regret, is it possible to provide a lower bound analysis to establish the optimality of the proposed bounds, especially in relation to the complexity measure c_ϕc\_{\phi}?

We already mentioned in the paper that our bound is optimal for the special case of external/internal/swap regret. In fact, by a simple argument, one can establish the following stronger lower bound as well: for any integer k[d]k \in [d] and any algorithm, there exists a dd-expert problem and a ϕ\phi with c_ϕk+1c\_\phi \leq k+1, such that Reg(ϕ)\mathrm{Reg}(\phi) is at least Ω(c_ϕTlogc_ϕ)\Omega(\sqrt{c\_\phi T\log c\_\phi}) for this algorithm.

To see this, simply let dkd-k experts be dummy experts who always have maximum loss 11, while the other kk experts follow the swap regret lower bound construction of Ito [2020] scaled by 1/21/2. We then define ϕ\phi in this way. First, for a non-dummy expert, ϕ\phi maps it to another non-dummy expert optimally (to minimize the total loss after swap). Then, to define ϕ\phi's behavior for dummy experts, we consider two cases: in the case where the algorithm picks dummy experts for more than kTlogk\sqrt{kT\log k} times, we let ϕ\phi map all dummy experts to a fixed non-dummy expert (in which case dϕunifdkd_\phi^{\text{unif}} \geq d-k); otherwise, all dummy experts are mapped to themselves (in which case dϕselfdkd_\phi^{\text{self}} \geq d-k). Note that in both cases, we have cϕk+1c_\phi \leq k+1. Moreover, in the first case, Reg(ϕ)\mathrm{Reg}(\phi) is at least 12kTlogk\frac{1}{2}\sqrt{kT\log k} since whenever the algorithm picks a dummy expert ii, it gets loss 11 while expert ϕ(i)\phi(i) gets loss at most 1/21/2 (and the rest of the regret is non-negative by construction). In the second case, Reg(ϕ)\mathrm{Reg}(\phi) is essentially the swap regret of the algorithm for a kk-expert problem that lasts for at least TkTlogkT-\sqrt{kT\log k} rounds, and since the instance comes from the lower bound construction of Ito [2020], we also have Reg(ϕ)=Ω(klog(k)(TkTlogk))=Ω(kTlogk)\mathrm{Reg}(\phi)=\Omega(\sqrt{k\log(k)(T-\sqrt{kT\log k})}) = \Omega(\sqrt{kT\log k}).


Q2: Regarding the kernelized MWU algorithm (Section 4), the paper mentions an O(d3)\mathcal{O}(d^3) complexity per iteration. How does this complexity impact the algorithm’s scalability for large dd? Are there practical optimizations considered to handle high-dimensional scenarios?

First, we point out that in response to Review zKH9's comment, we have in fact further improved the time complexity of Algorithm 6 from O(d3)\mathcal{O}(d^3) to O(d2)\mathcal{O}(d^2) (see proof therein). So the final per-round complexity of our algorithm is O(d2)+FPd\mathcal{O}(d^2)+\textrm{FP}_d where FPd\textrm{FP}_d is the complexity of computing the fixed point of a d×dd\times d stochastic matrix, which is O(dω)\mathcal{O}(d^\omega) in theory with ω2.37\omega \approx 2.37, but can be much faster in practice using e.g., power method, as also pointed out by Review zKH9. Importantly, our algorithm is thus computationally as efficient as the classic Blum-Mansour algorithm and also the Lu et al. [2025] algorithm (while enjoying strictly better and more adaptive regret bounds).


Q3: Can the framework be extended to handle heterogeneous agents with different action spaces and loss functions, as encountered in real-world games, and if so, how would the current analysis need to be adapted to accommodate such heterogeneity?

First, we point out that we definitely do not require the same loss functions for different agents, since we are considering general-sum games. Second, as pointed out in our Footnote 4, we assume that the action set size is the same for all players, but that is completely for notational conciseness, and our analysis can be directly extended to games with different action set sizes, with dd now representing the maximum action set size among the agents.

评论

Thank you for your rebuttal. It addressed my concerns and questions well. I think this is a technically solid paper, and I have therefore raised my score to Accept.

审稿意见
5

This paper builds upon the recent work of Lu et al. (2025), providing Φ\Phi-regret bounds that are based on a sparsity complexity measure cϕc_\phi that recovers the almost optimal no-external, no-internal, and no-swap regret guarantees. Φ\Phi-regret is a well-established regret notion, where given a sequence of probabilities p1,,pTΔdp_1,\ldots,p_T \in \Delta_d, it measures the difference between the experienced cost and the cost of the best transformation ϕ:ΔdΔd\phi: \Delta_d \mapsto \Delta_d, R(ϕ):=t=1Ttpttϕ(pt)\mathcal{R}(\phi):=\sum_{t=1}^T \ell_t^\top \cdot p_t - \ell_t^\top \cdot \phi(p_t). Lu et al. (2025) provide an algorithm with R(ϕ):=O(cϕ(T+d)log3d)\mathcal{R}(\phi) := O\left(\sqrt{c_\phi(T + d) \log^3 d}\right), while the authors provide an improved bound of R(ϕ):=O(cϕTlogd)\mathcal{R}(\phi) := O\left(\sqrt{c_\phi T \log d}\right) with a significantly simpler algorithm.

The basic technical idea of the authors is to introduce a prior probability distribution on the set of 0011 row-stochastic matrices and to establish Φ\Phi-regret bounds by running the famous MWU on this set. The authors then use MWU as a meta-algorithm in order to select the optimal step size. The authors additionally show that, despite dealing with an exponentially large action set, their algorithm can be efficiently implemented using kernelization ideas.

In Section 55, the authors introduce an alternative approach based on the Blum–Mansour reduction. Finally, in Section 6, the authors show how to extend their algorithm so as to achieve O~(1/T)\tilde{\mathcal{O}}(1/T) for approximate Φ\Phi-equilibrium in a specific class of normal-form games.

优缺点分析

I think the paper presents solid theoretical results that will be of interest to the online learning community at NeurIPS. Beyond the improved regret guarantees over those of Lu et al., the proposed algorithms are both simple and elegant. The presentation of the paper is very clear, and the technical ideas are well-explained.

Perhaps the only result I found somewhat expected was the O~(1/T)\tilde{O}(1/T) convergence rate to Φ\Phi-equilibrium, given the extensive recent literature on the subject. Despite the considered class has been previously considered to the literature, the assumption feels a bit artificial. At the same time, I believe that a similar result for general normal-form games would add value to the paper.

问题

  1. What are the limitations of extending your results for general sum normal form games?

  2. Can you elaborate more on the class of games satisfying Assumption~6.3?

局限性

Yes

最终评判理由

After the author-reviewer discussion I am very confident on the significance of the results provided by the paper. I thus recommend acceptance.

格式问题

N/A

作者回复

Thank you for the very positive review. We address your questions below.

Q1: What are the limitations of extending your results for general sum normal form games?

We use the key nonnegative-social-external-regret assumption to establish a small path-length bound O(Nlogd)\mathcal{O}(N\log d) of the entire learning dynamic (see Theorem D.7). Without this assumption, in principle, we could also use the fact that swap regret is by definition nonnegative to establish another path-length bound, but it would be larger than O(Nlogd)\mathcal{O}(N\log d) by some factors of poly(d)(d), making the final bound on Reg_n(ϕ)\mathrm{Reg}\_n(\phi) at least as large as the best existing bound for swap regret [Anagnostides et al., 2022a,b] even for a ϕ\phi with small c_ϕc\_\phi (and thus vacuous for our purpose).


Q2: Can you elaborate more on the class of games satisfying Assumption~6.3?

While there is no obvious intuitive explanation for this assumption, as argued by Anagnostides et al. [2022c], this class of games is intricately connected with the admission of a minimax theorem. They show (in their Proposition A.10) that all the following standard and well-studied games satisfy this property:

  1. Two-player zero-sum game;
  2. Polymatrix zero-sum games;
  3. Constant-sum Polymatrix games;
  4. Strategically zero-sum games;
  5. Polymatrix strategically zero-sum games;
  6. Convex-concave (zero-sum) games;
  7. Zero-sum games with objective f(x,y)f(x,y) such that min_xXmax_yYf(x,y)=max_yYmin_xXf(x,y) \min\_{x\in \mathcal{X}}\max\_{y\in \mathcal{Y}} f(x,y)=\max\_{y\in \mathcal{Y}} \min\_{x\in \mathcal{X}}f(x,y)
  8. Quasiconvex-quasiconcave (zero-sum) games;
  9. Zero-sum stochastic games.
评论

Thank you very much for your prompt repsonse. I maintain my positive impression of the paper and I keep my score.

最终决定

This paper was met with unanimous enthusiasm from all four reviewers. The work introduces a new framework for achieving comparator-adaptive Phi-regret that is simpler, more efficient, and achieves tighter bounds than previous state-of-the-art methods. A key contribution is the extension of these results to game theory, providing the first adaptive and accelerated convergence rates to Phi-equilibria. The authors' rebuttal was comprehensive and successfully addressed all reviewer concerns, including a key point about computational complexity that led one reviewer to raise their score. The work is certainly strong enough for a spotlight presentation.