PaperHub
7.3
/10
Poster4 位审稿人
最低4最高5标准差0.5
4
4
5
5
2.3
置信度
创新性3.0
质量3.0
清晰度3.0
重要性2.8
NeurIPS 2025

Scalable Neural Incentive Design with Parameterized Mean-Field Approximation

OpenReviewPDF
提交: 2025-05-12更新: 2025-10-29
TL;DR

We design a mean-field explicit differentiation method for incentive design in large-scale games with exchangeable agents.

摘要

Designing incentives for a multi-agent system to induce a desirable Nash equilibrium is both a crucial and challenging problem appearing in many decision-making domains, especially for a large number of agents $N$. Under the exchangeability assumption, we formalize this incentive design (ID) problem as a parameterized mean-field game (PMFG), aiming to reduce complexity via an infinite-population limit. We first show that when dynamics and rewards are Lipschitz, the finite-$N$ ID objective is approximated by the PMFG at rate $\mathcal{O}(\frac{1}{\sqrt{N}})$. Moreover, beyond the Lipschitz-continuous setting, we prove the same $\mathcal{O}(\frac{1}{\sqrt{N}})$ decay for the important special case of sequential auctions, despite discontinuities in dynamics, through a tailored auction-specific analysis. Built on our novel approximation results, we further introduce our Adjoint Mean-Field Incentive Design (AMID) algorithm, which uses explicit differentiation of iterated equilibrium operators to compute gradients efficiently. By uniting approximation bounds with optimization guarantees, AMID delivers a powerful, scalable algorithmic tool for many-agent (large $N$) ID. Across diverse auction settings, the proposed AMID method substantially increases revenue over first-price formats and outperforms existing benchmark methods.
关键词
Mean-Field GamesEquilibriumOptimizationMechanism DesignAuctionsGame Theory

评审与讨论

审稿意见
4

The paper aims to solve the incentive design (ID) problem for general N-player dynamic games, which is intractable. To make it tractable, the authors propose to construct mean-field game (MFG) approximations to general dynamic games. The authors develop such methods for games parameterized with Lipschitz continuity and for auction design where the Lipschitz assumptions do not hold.

优缺点分析

Strengths
The paper is overall well-written with clarity and technical formality that meets the expectation from such work. This review will emphasize more on the weaknesses to justify the rating.

Weaknesses
W1. The paper lacks context for readers unfamiliar with the topic to appreciate the significance of the key contributions in Sections 2 and 3. Here we take Section 2 as an example but similar concerns might apply to Section 3.

  • Theorem 1 extends prior results on the quality of the MFG approximation (i.e. unparameterized or with a single theta) to the parameterized setting for ID. It’s unclear how straightforward such an extension is. What are the key challenges in extending the prior results, if it is not straightforward?
  • Lemma 1’s formality is appreciated but it is unsurprising by itself nor does it help the method in any algorithmical manner.
  • The significance of the AMID algorithm is unclear. Lemma 2 confirms the soundness of AMID, but what's the key innovative techniques that make it more efficient than standard methods? What is the intuition and key idea behind those techniques? Are those techniques specific to MID or general enough to apply to other T-step approximate update functions? Is there any prior work or existing techniques to address the efficiency problem?

W2. Some key points regarding the problem formulation are under-explained. Please refer to Questions.

W3. Notation and clarity is overall good but can be further improved. To name a few,

  • In the ID objective (Line 108), e\mathbf{e} is not defined.
  • At Line 145, it should be ΠH\to \Pi_H.
  • Inconsistent usage of FomdF_{omd} vs Fˉomdτ\bar{F}^\tau_{omd}. It seems FomdF_{omd} should be reserved for the update in the log prob space (Line 153) but first appears at Line 145.
  • At Line 166, ρ\rho is not defined.

问题

In addition to the questions in the Strengths And Weaknesses section:

Q1. In the (ID) objective below Line 24, is the maximization over both θ\theta and π\pi? If pi is also an maximization variable, how does this paper (and other works) address this Nash selection problem?

Q2. How does the Lipshcisnt constant in Assumption 1 affect the bounds in Theorem 1?

Q3. In Section 2, isn't that we are given a PDG G\mathcal{G} first and seek an PMFG M\mathcal{M} approximation? If so, why does Theorem 1 state it the other way around (i.e., Given M\mathcal{M} first…let G\mathcal{G} be such that…)?

Q4. At Line 165, qhτq^\tau_h is required to be C1C^1 regarding (θ,ζ)(\theta, \zeta^*). How about its continuity in LL, the population density?

Q5. At Line 170, what do you mean by “locally”?

局限性

No, the authors did not discuss limitations.

最终评判理由

The rebuttal is comprehensive so I have raised my score to the postive side.

However, I am not an expert in this field and cannot confidently evalute the technical aspects. Therefore, I have decreased my confidence.

格式问题

None.

作者回复

We thank the reviewer for recognizing the strengths of the work and the insightful questions. We summarize our responses below.

W1:

We respond to the concerns of the reviewer for Sections 2-3 jointly.

Theorems 1 and 2: Theorem 1 uses the proof techniques developed for Lipschitz continuous MFGs (as acknowledged in Appendix D.2) with modifications to handle entropy regularization and the incentive function gg. The theorem justifies our PMFG approximation for NN-player ID in the first place. However, our main contribution to MFG approximation comes from Theorem 2, which incorporates non-Lipschitz (in fact, not continuous) dynamics and requires special use of the properties of auctions. To the best of our knowledge, no existing approximation result applies to this case (see our discussion on related works between lines 83-93). Critically, both theorems are required to justify our overall framework and demonstrate the well-posedness of incentive design via MFGs.

Lemma 1: We disagree that the lemma is not relevant for the algorithm, as it provides two important conclusions. First, it justifies the use of our explicit differentiation scheme by showing that the scheme approximates θFomd\partial_\theta F_{omd}^{\infty} under reasonable assumptions, which is not clear a priori. Second, it provides intuition on how to tune the parameters T,η,τT,\eta,\tau and their impact on the quality of explicit differentiation. Lemma 1 also provides two conditions that can be evaluated empirically: while θFomd\partial_\theta F_{omd}^{\infty} is difficult to evaluate, one can easily verify the requirements 1 and 2 of the lemma.

Significance of AMID: The main significance of AMID in our context is that it reduces an expensive (in fact, intractable) computation into an equivalent but tractable one. The key innovations associated with it are presented in Theorems 1 and 3 (showing that PMFGs form a low-bias approximation to ID) and Lemmas 1 and 2 (showing that PMFGs are also efficiently solvable). Without the PMFG model, many-player incentive design will be intractable.

In principle, apart from the PMFG approximation that is specific to the incentive design problem, the technique could be applied to other settings where there are repeated evaluations of an operator. However, PMFG incentive design can be considered unique since the operator FomdτF_{omd}^\tau itself is quite complicated and involves forward-backward iterations on the MFG. For instance, the explicit differentiation scheme of [2] does not require our adjoint optimization, since the iterated operator is much simpler. Our technique permits explicit differentiation of Nash of MFGs, to the best of our knowledge, for the first time. We have seen tangential ideas employed in Neural ODEs [1] with possible applications to recurrent neural networks, although the works do not consider incentive design for MFGs and diverge from our goals and unique challenges of ID.

We also refer to our explicit measurements on various sizes of auctions (see the Table below) in our response to reviewer DFyx, copied for reference. The measurements were conducted in the setting (A2) and demonstrate the effectiveness of AMID even for small problem sizes: in all cases, AMID requires less memory and computational time than a naive gradient backpropagation. Here, ❌ indicates that the naive backprop method was not able to run on a single GPU, and the time ranges indicate one standard deviation. The table (which will be incorporated in the form of a plot in the paper) demonstrates that for explicit differentiation of PMFGs, the AMID approximation is crucial.

Horizon HHNaive (time, seconds)Naive (memory)AMID (time, seconds)AMID (memory)
H=5H=50.19 ± 0.02 s2760 MiB0.09 ± 0.01 s560 MiB
H=10H=100.25 ± 0.10 s8746MiB0.208 ± 0.08 s586 MiB
H=25H=250.71 ± 0.15 s16960MiB0.67 ± 0.12 s826 MiB
H=50H=501.72 ± 0.41s1076 MiB

W2 and relevant questions:

We thank the reviewer for the questions. We address these below, and will incorporate additional explanations in the final version using our extra page.

Q1

There is no standard formulation to the best of our knowledge when there are multiple equilibria. Most works implicitly or explicitly assume uniqueness [2,3] and incorporate regularization when it does not hold [3]. Some works sidestep the issue by adopting a Nash oracle [4] or restriction to the local implicit map [5]. Our approach is similar to [5], where we differentiate with respect to the implicit map induced by FomdτF_{omd}^\tau, under the conditions of Lemma 1.

Q2

These were already alluded to in Remark 2, which will be expanded and clarified. In the structured setting of auctions, for many relevant cases, the Lipschitz constant LL is upper bounded by 1, therefore, the dependence on the time horizon HH scales with O(H2)O(H^2) with no explicit dependence on LL arising. For instance, if the valuations of agents are fixed across rounds (or evolve with a Markov kernel), our results imply L1L\leq 1, see Appendix E.

Our framework is general enough to incorporate complex models where valuations of the bidders can evolve depending on the valuations of others. In this case, the Lipschitz constant LL of the valuation dynamics ω\omega in μ\mu will incur an O(LHN)O(\frac{L^H}{\sqrt{N}}) approximation error, which is the best possible upper bound matching the lower bound of (Yardim et al., 2024).

Q3

We agree with the reviewer that G\mathcal{G} is precursive and M\mathcal{M} is a useful algorithmic construct. The reason the theorem is stated in the mentioned way is the fact that it holds true for all G\mathcal{G} with NN players with asymptotic rates on NN, and the mathematical statement of the inverse problem could obfuscate the main message (cf. [4]). However, for the theorem to be useful, an appropriate MFG formulation M\mathcal{M} must be provided, as the reviewer has noticed.

This becomes clearer in Section 3: Definition 3 (of BA-MFGs) provides the heavy-lifting of defining the appropriate MFG, consequently, Theorem 2 provides the approximation guarantees for all relevant NN-player auctions. More generally, our work is easily compatible with “inverse” problem designs in MFG (e.g. [6,7]), however, we leave the formal development of this as future work.

Q4

For the purpose of the lemma, it is sufficient that the “composite” map (θ,ζ)qhτ(,Λ(softmax(ζ)),softmax(ζ),θ)(\theta, \zeta) \rightarrow q^\tau_h(\cdot, \cdot | \Lambda(\operatorname{softmax}(\zeta)), \operatorname{softmax}(\zeta), \theta) must be C1C^1 in the neighborhood of (θ,ζ)(\theta, \zeta^*). This holds true in the specific case Λ\Lambda is C1C^1 and qhτq_h^\tau is C1C^1 in L,π,θL, \pi, \theta, but for the composition to be C1C^1, this is not a necessary condition, hence the current statement of the lemma.

We have noticed that the statement of the lemma (in particular, qhτq_h^\tau being C1C^1 in the neighborhood) could be improved for legibility, which will be done in the final submission.

Q5

For the gradient computation at some θ\theta, θπθ\theta’ \rightarrow \pi^{\theta’} must be a map that such that πθ\pi^{\theta’} is the Nash corresponding to design θ\theta’ whenever θU\theta’ \in U for some open set containing the point of differentiation. The current statement is to ensure clear distinction between computing gradients (which is a local operation) and computing Nash for all θ\theta: Computing a Nash equilibrium for all θ\theta might be intractable, but a map that locally computes Nash is sufficient for first-order information.

W3:

To assist reading our paper and help readers who are used to other conventions, we have provided a full notation table in Appendix A. This table was moved to the appendix due to space limitations. For the notation on ρ\rho and e\mathbb{e} and potentially others, we refer the reviewer to the appendix.

We thank the reviewer for pointing out the typos on lines 146 and the usage of Fˉomd\bar{F}_{omd}. These will be fixed in the final submission.

References:

[1] Chen, et al. Neural ordinary differential equations. NeurIPS 2018.

[2] Li et al. End-to-end learning and intervention in games. NeurIPS 2020.

[3] Liu et al. Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence. NeurIPS 2022.

[4] Wang, el al. Coordinating followers to reach better equilibria: End-to-end gradient descent for Stackelberg games. AAAI 2022.

[5] Fiez, el al. Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study. ICML 2020.

[6] Yardim et al., Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning. L4DC 2025.

[7] Chen at al., Individual-Level Inverse Reinforcement Learning for Mean Field Games. AAMAS 2022

评论

Thanks. I am raising my score to 4 as the response is comprehensive.

审稿意见
4

This paper reframes the study of incentive design (ID) as a mean-field game (MFG) under a exchangeability assumption across agents. It establishes non-asymptotic guarantees between the finite-agent ID problem and its mean-field counterpart, covering both Lipschitz-continuous settings and non-Lipschitz cases such as sequential auctions. To solve the resulting mean-field ID problem efficiently, it also proposes an adjoint-based gradient estimation algorithm (AMID) that reduces memory overhead during optimization. The proposed algorithm is tested on classical congestion pricing and various large-scale auction design tasks.

优缺点分析

Strengths:

  • The paper is well-motivated from a practical perspective. Setting the right incentives in large multi-agent systems is a crucial real-world problem with applications in traffic management, auctions, and resource allocation.

  • The paper frames the problem from an innovative perspective by leveraging mean-field games (MFGs) with a exchangeability assumption to approximate and solve the original incentive design problem at scale.

  • The proposed method outperforms the selected baseline methods.

Weaknesses:

  • It is not entirely clear how strong or restrictive the key assumptions are. For example, the theoretical guarantees depend on Lipschitz continuity (or specific structural properties like no-zero-dominance for auctions).

问题

  • The theoretical results rely on several assumptions, such as Lipschitz continuity or the no-zero-dominance property for auctions. Could the authors provide more justification or empirical evidence on how often these assumptions hold in real-world scenarios?

局限性

  • The paper does not provide a discussion on how strong its assumptions are.

最终评判理由

The authors' response helped me understand the assumptions. After reading the rebuttal and the reviews from other reviewers, I decide to maintain my current rating.

格式问题

No formatting concerns.

作者回复

We thank the reviewer for recognizing the strengths of the work and the insightful questions. We summarize our responses below.

W1, Q1, and the discussion of our limitations:

In the main paper, the assumptions (in particular, NZD) as well as their consequences have been discussed (see for instance lines 253-255, lines 245-246, 264-266, and Remark 2). Here, we provide further commentary that will be incorporated into the paper.

Lipschitz continuity is natural in many settings, as it implies that small changes in population behaviour does not cause major changes in game dynamics. It is is a standard and minimal assumption in MFG literature, as it ensures the existence and stability of solutions. In fact, without Lipschitz continuity, even basic properties such as approximation of finite-player Nash equilibria with explicit bounds might not hold (see Yardim et al., 2024). Theorem 1 could be considered therefore as stated under the most relaxed conditions.

That said, a key contribution of our work is showing that such assumptions can be relaxed in structured settings. Specifically, we performed a dedicated analysis for the auction setting, which is inherently non-Lipschitz due to discontinuities in the reward and allocation functions. Despite this, we were able to prove the same approximation results without relying on Lipschitz continuity (Theorem 2). This demonstrates that our framework can handle important real-world cases where standard assumptions do not hold.

The NZD (No zero-dominance) property is realistic in auction settings. When optimizing with entropy regularization, NZD is guaranteed: every entropy-regularized MFG-NE π\pi^* assigns non-zero probability to all actions (see lines 253-255).

Even without entropy regularization, the NZD property can naturally arise in auction MFGs due to the structure of the reward and transition dynamics. When the item allocation function depends only on time, all losing actions yield the same immediate reward, zero, and the next state depends solely on the bidder’s valuation, not on the specific action taken. As a result, all losing actions are indistinguishable in value: for all states ss and all losing actions a,a,qh(s,aL,π,θ)=qh(s,aL,π,θ)a, a', q_h(s, a | L, \pi, \theta) = q_h(s, a' | L, \pi, \theta), and any MFG-NE can be perturbed into an equivalent equilibrium that assigns non-zero probability to the highest losing action, satisfying NZD.

This is also reflected in the behavior of Online Mirror Descent (OMD) when used to approximate MFG-NEs. Since all losing actions have equal expected rewards, OMD assigns equal probability among them, including the highest losing action. Thus, if OMD converges to a MFG-NE, the resulting policy satisfies NZD. The same reasoning extends to many auction settings where allocation depends on the bid distribution. For instance, in reserve-price auctions where goods are allocated proportionally to the number of agents bidding above a threshold, the learned MFG-NE still satisfies NZD due to the equal treatment of losing actions in the optimization dynamics.

Moreover, in our experimental setting, we observed that the NZD property consistently holds across the learned policies. In particular, the policies obtained by AMID and the other baselines assign non-zero probability to the highest losing actions, consistent with the theoretical argument above. To support this, we will include plots in the camera-ready version that illustrate the action distributions and confirm that NZD is satisfied in all experiments.

评论

Thanks for the response, which helped me understand the assumptions. After reading the rebuttal and the reviews from other reviewers, I decide to maintain my current rating.

审稿意见
5

This paper studies the incentive design (ID) problem in multi-agent systems. The paper approximates the finite-player game with a parameterized mean-field game (PMFG) and subsequently optimizes the incentives in the PMFG. The authors show that under Lipschitzness, the PMFG objective well-approximates the original problem with rate O(1/sqrt(N)). The authors then propose an algorithm (adjoint mean-field incentive design), which differentiates through the iterated equilibrium operators. They then show empirical results on auction settings.

优缺点分析

Strengths:

  • The theory that bridges PDG with PMFG is interesting an provides a useful tool for solving incentive design problems for large numbers of agents.
  • The contributions of the paper allow for scalable first-order optimization methods to solve this challenging problem, which I believe to be of importance to the field.
  • The introduction of the paper (with clear desiderata) is well-written and clear.
  • The large-scale auction examples are well-developed and deeper than typical empirical examples in ML papers.
  • The authors provide PyTorch and JAX implementations.

Weaknesses:

  • The notation can be hard to digest and could use some explanation to improve the readability of the paper. One example is in Defn 2, L isn't really defined in words. Also, the concept of mean field exploitability could also be explained in words instead of just the equation. In general, I felt that the notation in the paper is often hard to comprehend and some rethinking or additional explanations may significantly improve the paper, especially for non-experts.
  • In Theorem 1, is there anything novel about the analysis compared to similar results in MFGs without ID?
  • Is NZD realistic for real-world auction equilibriums?
  • The paper uses discrete S, A, but real bidding is continuous. What would be necessary to extend this paper to function approximation settings?
  • I would be curious to see what the compute/memory requirements are when using naive back-prop compared to the adjoint method. This would better justify the approach.

问题

See weaknesses.

局限性

Yes

最终评判理由

The author gave convincing responses to each of my concerns. The additional measurements concerning backprop are especially useful.

格式问题

N/A

作者回复

We thank the reviewer for recognizing the strengths of the work and the insightful questions. We summarize our responses below, by numbering the weaknesses suggested by the reviewer W1 to W5.

(W1): To support readers unfamiliar with our notation, we included a full table of symbols in Appendix A. Due to space constraints, this was placed in the appendix, but we hope it to assist the reading of our paper. An explicit reference to Appendix A (which was missing in our submission) will be added in the camera-ready, before the start of Section 2.

In particular, the quantity LL is the idealised, infinite player distribution of the population over states and actions in time, at the infinite-player limit. Mean-field exploitability can be considered an infinite-player extension of exploitability in classical game theory, which quantifies how much each player is incentivised to deviate from a strategy profile (which must be 0 for exact Nash). We will incorporate the corresponding explanations in our final version to assist the understanding of our work.

(W2): Theorem 1 builds on previous results to prove the convergence result for our setting: it relies on two extensions to handle exploitability analysis in the entropy-regularized objective function and the approximation in terms of the design objective GG. A detailed proof is provided in Appendix E. While Theorem 1 is an extension of existing analysis, our key contribution is Theorem 2, which relies on an entirely novel analysis going beyond the Lipschitz continuous case.

(W3): The NZD (No zero-dominance) property is realistic in auction settings. When optimizing with entropy regularization, NZD is guaranteed: every entropy-regularized MFG-NE π\pi^* assigns non-zero probability to all actions (see lines 253-255). Even without entropy regularization, the NZD property can naturally arise in auction MFGs due to the structure of the reward and transition dynamics. When the item allocation function depends only on time, all losing actions yield the same immediate reward, zero, and the next state depends solely on the bidder’s valuation, not on the specific action taken. As a result, all losing actions are indistinguishable in value: for all states ss and all losing actions a,aa, a', it holds that qh(s,aL,π,θ)=qh(s,aL,π,θ)q_h(s, a | L, \pi, \theta) = q_h(s, a' | L, \pi, \theta), and any MFG-NE can be perturbed into an equivalent equilibrium that assigns non-zero probability to the highest losing action, satisfying NZD.

This is also reflected in the behavior of Online Mirror Descent (OMD) when used to approximate MFG-NEs. Since all losing actions have equal expected rewards, OMD assigns equal probability among them, including the highest losing action. Thus, if OMD converges to an MFG-NE, the resulting policy satisfies NZD. The same reasoning extends to many auction settings where allocation depends on the bid distribution. For instance, in reserve-price auctions where goods are allocated proportionally to the number of agents bidding above a threshold, the learned MFG-NE still satisfies NZD due to the equal treatment of losing actions in the optimization dynamics.

Moreover, in our experimental setting, we observed that the NZD property consistently holds across the learned policies. In particular, the policies obtained by AMID and the other baselines assign non-zero probability to the highest losing actions, consistent with the theoretical argument above. To support this, we will include plots in the camera-ready version that illustrate the action distributions and confirm that NZD is satisfied in all experiments.

(W4): We would like to emphasize that in many auctions, a discrete action space is natural, either due to the discrete nature of currency or due to technical limitations, particularly in online auctions. As such, discrete auctions have been a topic of study in literature (e.g. [1,2]), and in fact, can introduce unique theoretical difficulty compared to the continuous setting. In order to encapsulate realistic discrete bid auctions, in our experiments, we take a relatively fine discretization (S=100|\mathcal{S}| = 100).

Separately, it is possible to extend our work into the continuous action setting by replacing the discrete state-action MFG formalism standard in e.g. [3], which we leave for future work.

(W5): To illustrate the effectiveness of AMID, we provide the following measurements on single iteration time and memory usage of naive backpropagation and AMID using the JAX implementation (on the setting (A2)). The results indicate clearly that naive backpropagation has intractable growth in memory and time as a function of HH, and in fact becomes prohibitively expensive on a single GPU for HH as small as H=25H=25. Here, ❌ indicates that we were not able to run the naive algorithm.

Horizon HHNaive (time, seconds)Naive (memory)AMID (time, seconds)AMID (memory)
H=5H=50.19 ± 0.02 s2760 MiB0.09 ± 0.01 s560 MiB
H=10H=100.25 ± 0.10 s8746MiB0.208 ± 0.08 s586 MiB
H=25H=250.71 ± 0.15 s16960MiB0.67 ± 0.12 s826 MiB
H=50H=501.72 ± 0.41s1076 MiB

We agree that these figures will contribute to the paper significantly, and plan to incorporate these measurements in our final submission, in Section 4. Currently, the computational benefits of AMID are highlighted in the main paper, specifically in Contribution 3 (line 65) and in Remark 1 (line 189). In fact, the experimental results in the paper are already in the regime where the improvement due to AMID is significant.

References:

[1] David et al., 2011. Optimal design of english auctions with discrete bid levels, ACM Transactions on Internet Technology.

[2] Chwe, 1989. "The discrete bid first auction." Economics Letters.

[3] Saldi et al., 2018. Markov--Nash equilibria in mean-field games with discounted cost. SIAM Journal on Control and Optimization.

评论

Thanks for the responses. I will raise my score.

审稿意见
5

This paper proposes a method for incentive design in multi-agent systems. The authors introduce the Adjoint Mean-Field Incentive Design algorithm, which combines parameterized mean-field games and adjoint differentiation to compute gradients and optimize incentive design at scale efficiently. Experimental results validate the theoretical findings, showing that AMID outperforms first-price auctions and existing benchmark methods across various auction settings.

优缺点分析

Strengths:

  • The writing is clear
  • The combination of mean-field approximation and the adjoint method enables the optimization of complex incentive design problems.
  • The paper provides strong theoretical contributions by proving that under Lipschitz continuity assumptions, the mean-field approximation for incentive design converges with a rate of O(1/N). The authors extend this result to non-Lipschitz continuous settings, particularly focusing on sequential auction settings.

Weakness:

  • It would be helpful to include a comparison of the convergence speed between AMID and the baseline algorithms, especially as the number of agents increases.
  • There are still some formatting issues, such as inconsistent spacing in lines 295-299.

问题

It would be helpful to include numerical comparisons of the optimization efficiency between AMID and baseline methods, such as the optimization time for different numbers of agents.

局限性

yes

最终评判理由

The additional experiments provided by the author addressed my concerns. Considering the opinions of other reviewers, I maintain my score.

格式问题

No

作者回复

We thank the reviewer for recognizing the strengths of the work and the insightful questions. We respond to the weaknesses and questions in order.

Weakness 1 and Question:

The PMFG framework makes it possible to solve the incentive design (ID) problem independently of the number of agents. Without the MFG approximation, the problem quickly becomes intractable even for moderate NN, due to the high computational complexity of computing and differentiating through Nash equilibria. The PMFG construction helps handle this complexity while still providing theoretical guarantees. For the bias-efficiency trade-off due to the PMFG approximation and corresponding AMID algorithm, we refer the reviewer to Figure 2 and Figure 3 (left plot) in our submission.

All the methods compared in the experimental section are built on the PMFG framework to address the ID problem. The per-iteration cost of AMID is similar to that of the other methods, as AMID requires one forward and one backward pass, both of which have similar computational cost. In contrast, zero-order methods require at least two forward passes to approximate the gradient. As a result, while the per-iteration cost is comparable, AMID converges faster because it uses the exact gradient rather than approximating it.

That said, we agree that a numerical comparison of optimization efficiency would be helpful to have in the main paper. In the camera-ready version, we will update the appendix to include actual optimization runtimes for each algorithm, giving a clearer picture of their practical performance.

To illustrate the effectiveness of AMID, we provide the following measurements on single iteration time and memory usage of naive backpropagation and AMID using the JAX implementation (on the setting (A2)). The results indicate clearly that naive backpropagation has intractable growth in memory and time as a function of HH, and in fact becomes prohibitively expensive on a single GPU for moderate HH. Here, ❌ indicates that we were not able to run the naive algorithm.

Horizon HHNaive (time, seconds)Naive (memory)AMID (time, seconds)AMID (memory)
H=5H=50.19 ± 0.02 s2760 MiB0.09 ± 0.01 s560 MiB
H=10H=100.25 ± 0.10 s8746MiB0.208 ± 0.08 s586 MiB
H=25H=250.71 ± 0.15 s16960MiB0.67 ± 0.12 s826 MiB
H=50H=501.72 ± 0.41s1076 MiB

Weakness 2:

We thank the reviewer for pointing this out. While we are not sure which specific spacing inconsistency was referred to between lines 295–299, we will carefully review this section and ensure that any formatting issues, including spacing and alignment, are corrected in the camera-ready version.

评论

Thank you. Considering all the reviewers' comments, I will maintain my score.

最终决定

The paper proposes Adjoint Mean-Field Incentive Design algorithm for incentive design in multiagent systems. Experimental results are provided to validate the theoretical findings. Overall the reviewers all seem to be positive about the paper and the paper is definitely worth publishing at Neurips. Please incorporate the discussion/rebuttal response as appropriate in to the paper.