PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
4
5
4
4
3.3
置信度
创新性3.0
质量2.8
清晰度2.8
重要性2.5
NeurIPS 2025

Look-Ahead Reasoning on Learning Platforms

OpenReviewPDF
提交: 2025-05-12更新: 2025-10-29
TL;DR

We analyze how collective strategic behavior influences predictive models, introducing level-$k$ reasoning and showing how user coordination affects equilibrium outcomes.

摘要

Predictive models are often designed to minimize risk for the learner, yet their objectives do not always align with the interests of the users they affect. Thus, as a way to contest predictive systems, users might act strategically in order to achieve favorable outcomes. While past work has studied strategic user behavior on learning platforms, the focus has largely been on strategic responses to the deployed model, without considering the behavior of other users, or implications thereof for the deployed model. In contrast, *look-ahead reasoning* takes into account that user actions are coupled, and---at scale---impact future predictions. Within this framework, we first formalize level-$k$ thinking, a concept from behavioral economics, where users aim to outsmart their peers by looking one step ahead. We show that, while convergence to an equilibrium is accelerated, the equilibrium remains the same, providing no benefit of higher-level reasoning for individuals in the long run. Then, we focus on collective reasoning, where users take coordinated actions by optimizing through their impact on the model. By contrasting collective with selfish behavior, we characterize the benefits and limits of coordination; a new notion of alignment between the learner's and the users' utilities emerges as a key concept. We discuss connections to several related mathematical frameworks, including strategic classification, performative prediction, and algorithmic collective action.
关键词
performative predictionstrategic classificationcollective actionperformative shiftperformativity

评审与讨论

审稿意见
4

This paper investigates the effects of sophisticated user strategies in the context of performative prediction, where a machine learning model's predictions influence the data distribution it is trained on. The authors introduce and analyze two main forms of strategic user behavior:

Level-k Thinking: Drawing from behavioral economics, the paper formalizes a hierarchy of strategic depth where "level-k" agents best-respond to a model they anticipate will be trained on the actions of "level-(k-1)" agents.

Collective Reasoning: This models a scenario where a group of users coordinates their actions to "steer" the learning algorithm toward a more favorable outcome for the population, effectively acting as a leader in a Stackelberg game.

The paper provides theoretical characterizations for both scenarios. It finds that higher levels of individual strategic thinking (level-k) accelerate convergence to a stable equilibrium but do not change the equilibrium point itself. In contrast, collective action can lead to a qualitatively different and more beneficial equilibrium for the users. The paper quantifies this benefit, termed the "price of selfish reasoning," and shows it is governed by the alignment between the population's utility and the learner's loss function. Finally, it analyzes mixed populations containing both collective and selfish agents, exploring the dynamics of free-riding, the benefits of participation, and the costs of steering the model. These theoretical findings are supported by simulations on a credit scoring task.

优缺点分析

Strength: The paper's primary strength is its successful integration of concepts from behavioral economics (level-k thinking) and game theory (Stackelberg games, collective action) into the machine learning framework of performative prediction. The paper delivers clear and significant theoretical contributions.

Weakness: The theoretical results depend on several strong, standard assumptions that may not hold in all practical settings.

问题

  1. The analysis relies on the loss function being smooth and strongly convex. While common for theoretical analysis, many complex machine learning models, such as neural networks, operate in non-convex landscapes. The paper does not discuss how the results might change in such settings.

  2. Assumption 2, which posits that the data distribution is a linear function of the population's strategy, is a significant simplification. While the authors note it can be seen as an approximation, the implications of violating this assumption on the derived bounds are not explored.

  3. The paper models the collective as a unified actor capable of selecting and implementing a globally optimal strategy (h‡). It successfully analyzes the consequences of this optimal collective action but abstracts away the significant practical challenge of how a large, decentralized population would coordinate to find and execute such a strategy.

  4. The model of strategic manipulation, where an agent can transform their data from z to h(z), is highly abstract. In the simulations, this is instantiated as a simple additive change to feature values. Real-world strategic behavior often involves complex, feature-dependent costs and hard constraints (e.g., it is costly to change income but impossible to change age). A more nuanced cost model could lead to different strategic equilibria.

局限性

See question section.

格式问题

N/A

作者回复

Thank you for your thorough review and the valuable feedback. Please see below for our point-by-point response to your questions.

Q1. Convexity assumption
It is inevitable to establish most theoretical results with the standard convexity and smoothness assumptions. However, we note that the simulation we conducted in Section 6.2 violated the strong convexity assumption of the collective’s utility. As indicated by the experimental results, there is no qualitative inconsistency with our theoretical results. The insights gained in convex settings are often indicative of the phenomena happening under a more complex landscape.

Q2. Linearity assumption
We note that only Theorems 4 and 6 require the linearity assumption in the last step of the proof to permit a cleaner and more interpretable expression. We will rewrite the proof to expose the expression before applying the linearity assumption. For Theorem 4, for example, Φ\Phi would need to be replaced by _h_θE_zD_hˆ\Vert\nabla\_h\nabla\_\theta\mathbb{E}\_{z\sim \mathcal{D}\_{h\^*}} u(z, \theta) (Hˆ)1ˆEzD_hˆ(\mathbf{H}\^\star)\^{-1}\mathbb{E}_{z \sim \mathcal{D}\_{h\^\ast}} \ell(z, \theta) \Vert. In this form the bound still holds without the assumption.

Q3. Practical challenges of coordinating populations
We agree that coordination is an important aspect to implement collective action strategies in practice. Reported cases have successfully used forums and social media groups to coordinate such efforts; see, for example, the #DeclineNow movement on Doordash. There is also ongoing research dedicated to developing digital tools for facilitating information exchange among platform participants. See, for example, [1, 2] who develop tools for gig workers on digital platforms. All of this is to say that empirically there is some evidence of large, decentralized populations unifying to execute collective strategies.

[1] Carlos Toxtli and Saiph Savage. Designing AI Tools to Address Power Imbalances in Digital Labor Platforms, 2023. [2] Kashif Imteyaz,Claudia Flores-Saviaga, Saiph Savage. GigSense: An LLM-Infused Tool for Workers' Collective Intelligence. 2024

Q4. Complexity of strategies hh
In the context of collective reasoning, our goal is to understand the effectiveness of a given strategy hh that a collective might deploy, rather than model organic human behavior based on incentives and costs. As you pointed out in your previous question, implementing and coordinating large-scale efforts can be challenging in practice. This is why simple strategies, such as simple feature manipulations, play an important role (especially since they are often implemented by people who do not have algorithmic expertise). For example, in the #DeclineNow movement on Doordash drivers on a food delivery platform coordinate to decline orders above a threshold of $7 to raise the overall pay level. The numerical threshold they picked is not necessarily optimal but very easy to communicate and implement. It is an example of a simple strategy that was effective in practice.

评论

In the response of Q2: .... For theorem 4, for example, Φ\Phi would need to be replaced by _h_θE_zD_h\Vert\nabla\_h\nabla\_\theta\mathbb{E}\_{z\sim \mathcal{D}\_{h^\ast}} u(z, \theta) (H)1ˆ_θE_zDh(\mathbf{H}^\star)\^{-1}\nabla\_\theta\mathbb{E}\_{z \sim \mathcal{D}_{h^\ast}} \ell(z, \theta) \Vert.... where the last term was missing a "_θ\nabla\_\theta". Apologies for the typo, the openreview platform does not render LaTeX equations very well.

审稿意见
5

This work studies the impacts of collective reasoning in predictive systems. They formalize level-k-thinking and and analyze how these more sophisticated levels of thinking impact outcomes within the context of performative prediction. They show that individuals making use of higher levels of strategic thinking influence the rate of convergence to an equilibrium, whilst not impacting the stable point that is converged to. They then move to consider the impact of collective reasoning in trying to achieve better outcomes. In this setup, the authors consider situations in which populations are mixed between agents engaging in collective optimization and agents who engage in level-k reasoning. They first establish a benefit of 'free-riding' by considering the case when k = 1. They then move to establish results indicating the price of partial participation and the implications for convergence to a stable point. They then move on to consider the case of having an unstrategic population mixed with a strategic population and prove a set of results on the cost of steering the learner and the benefit of participating in and the benefit of implementation.

优缺点分析

Strengths:

  • Clearly presented and well structured: This paper is well written, and the structure makes it easy to follow the key contributions the authors present.
  • Formalizing and instantiating key behavioural economics concepts: I found the incorporation of level k reasoning within the framework of performative prediction to be particularly interesting. I think this is particularly valuable as these ideas are natural, particularly within the context of performative prediction
  • Wide range of formal settings explored: Another key strength of the paper is that it provides a wide range of results spanning many settings from results on equilibria convergence given differences in k-level reasoning of sub-populations to and quality of stable points under different mixtures of agents

Weaknesses:

  • The justification for considering an unstrategic segment of the population is not made particularly clear in the broader context of the paper, wherein strategic agents were the focus before. This transition seemed somewhat arbitrary to get to the results they had?

问题

In the collective reasoning section, it seems as though the work only considers k=1 for selfish participants. What would be the implications if we were to extend this to higher levels of k reasoning for the selfish participants?

局限性

Authors adequately addressed limitations

最终评判理由

I retain my positive assessment of this work.

格式问题

None immediately apparent

作者回复

Thank you for the positive assessment of our work. We elaborate on your questions and comments below:

Justifying unstrategic segments in the population
Throughout, our goal is to study heterogeneous populations of agents and to contrast different reasoning strategies, including the lack of strategic thought. In the context of collective action there is ample evidence that in practice typically not all agents in a population participate in collective strategies. For example, Burrell et al. report that a strategic control action was found in 17% of the tweets they coded for their study [1], still the strategy was effective. Our goal is to model this level of heterogeneity by incorporating non-strategic users.

[1] J. Burrell, Z. Kahn, A. Jonas, D. Griffin.”When Users Control the Algorithms: Values Expressed in Practices on the Twitter Platform” ACM CHI, 2019.

What would be the implications if we were to extend this to higher levels of k reasoning for the selfish participants?

In our results we characterize the stable point in scenarios where we have heterogeneous populations, meaning both agents who act selfishly (as studied in Section 3) and agents who reason collectively. In the following we explain why considering the mixture with only a single component at k=1k=1 as a representative case for selfish agents is sufficient.

At the performatively stable point the behavior of agents at different levels k1k \geq 1 is the same as the behavior of a homogeneous population with only k=1k=1 agents, since the second argument in the utility in Eq. (2) is fixed. Further, by Theorem 3 we know that the interplay between the population and the model converges to a unique stable point for any instantiation of the mixture weights in Eq. (3). As a result, the stable point across varying mixtures of selfish agents, no matter what their kk is. We will clarify this and explain that our results apply to selfish agents more broadly.

审稿意见
4
  • This paper investigates performative retraining dynamics under different models of strategic response.
  • In the setting, the population is initially distributed according to a base distribution D0D_0 over feature-label pairs. A population may alter its distribution to DD to improve utility, where each individual moves from its initial point zsimD0z\\sim D_0 to a point h_\\theta(z), where theta\\theta represents the currently deployed model. The learner performs repeated risk minimization to optimize theta\\theta at each step.
  • In Section 3, user response is inspired by KK-level thinking: level-00 response is the identity function h^{(0)}_\\theta(z)=z, and the response of level-kk thinkers is a best response to the distribution induced by level-(k1)(k-1) thinkers. The main theoretical result shows that retraining converges to a performatively stable point at an exponential rate depending on the composition of different-level thinkers in the population, that the convergence rate is higher when there are higher-order thinkers in the population, and that the performatively stable does not depend on the composition of different-order thinkers in the population.
  • In Section 4, analysis presents a bound on the gap between performatively optimal and “utility maximizing points” based on Hessian-weighted gradient alignment. Section 5 presents convergence bounds for mixture populations (collective action and level-kk selfish agents in Section 5.1), a lower bound for the amount of “collective action” to reach a target state (Section 5.2), a characterization of utilitarian agents’ utility at equilibrium as a function of their collective size (Section 5.3), and a similar characterization for fixed strategies (Section 5.4). Finally, Section 6 presents experiments on empirical data with simulated response, showing alignment with theoretical analysis.

优缺点分析

Strengths:

  • Interesting bridge between performative prediction and behavioral dynamics.
  • Results further widen the applicability of repeated risk minimization.
  • Work has substantial breadth.

Weaknesses

  • Attached code does not seem to generate the graphs presented in the paper, has missing modules and dataset, and most of its document is not in English, making it hard to verify results and build upon them.
  • Theoretical results in sections 4,5 characterize alignment as a function of an inverse-Hessian weighted gradient inner product of the performative response. However, it is unclear how the gradient alignment conditions can be computed or verified in practice.
  • There doesn't seem to be a direct discussion of limitations and broader impact.

问题

  • What are the information/computational requirements of level-kk thinkers? What do they need to know, and what do they need to be able to compute?
  • The setting presented in Section 5.2 seems very similar to the setting analyzed by Hardt et al. (“Algorithmic Collective Action in Machine Learning”, ICML 2023). How does Proposition 8 relate to the results they present?
  • Small question about notation: In L127, the agent utility is denoted by u(z,theta)u(z,\\theta) (i.e. only depends on a single feature-label pair zz), however in the classic Strategic Classification model of Hardt et al., the utility of of each agent depends on the “distance” between the original point zsimD0z\\sim D_0 and the strategic response Delta(z)\\Delta(z). Does the utility U(h)U(h) as defined in L126 capture such dynamics?
  • For Theorem 6, it seems that the bound is vacuous for alpha=1\\alpha=1. However, if I understand correctly then for this case convergence is “immediate” in some sense, as D_{h^\\sharp} does not depend on the currently deployed model. Maybe it is possible to derive a “best of both worlds” bound based on this observation?

局限性

Limitations and broader impact are not discussed in a dedicated section.

最终评判理由

The authors have addressed my concerns in the rebuttal. I now find that the paper offers new and interesting perspectives on performative prediction, which I believe will be of interest to the community.

格式问题

Minor typos:

  • L242 - Strategic agent -> Strategic agents?
  • L295 - fractures -> features?
作者回复

Thank you for your thorough review. We incorporated your feedback and we will use the additional page available for the revision to include a dedicated section on limitations, discussing the points that came up in the reviews. Below we discuss your comments in order.

Q1. Computation requirement The level-k thinkers assume all other agents are level-(k1)(k-1) thinkers. Hence, they need to be able to compute the moves of level-{k1,k2,,1}\{k-1,k-2, \ldots, 1\} thinkers (in order to compute Dk1(θ)\mathcal{D}_{k-1}(\theta)). Further, they need to be able to compute the output of the learning algorithm. The former assumption that agents at higher levels are able to emulate agents at lower levels is standard in economic models of cognitive hierarchies, see Nagel 1995. The latter is specific to our learning setting and requires that the agents know the specifics to the learner’s objective (which is common in strategic classification).

Additional experiment in response to the comment To investigate the sensitivity to violations of this assumption we have performed an additional empirical ablation where we allow agents to only approximately solve the learning problem. In particular, we assume all agents can only perform one stochastic gradient descent step on the deployed model to evaluate the model update with learning rate of 0.01. We set up the experiment with 2 levels of agents (same as the experiment in Section 6.1). The following table reports the convergence of the model parameters as the differences between iterations θt+1θt\Vert \theta_{t+1} - \theta_t \Vert (same as Figure 1):

Iterations123456>=7
Level-1 agents0.373.41053.4*10^{-5}4.31074.3*10^{-7}4.51094.5*10^{-9}5.310115.3*10^{-11}5.710135.7*10^{-13}<= 101310^{-13}
50% Level-1 + 50% Level-20.472.61052.6*10^{-5}3.161073.16*10^{-7}3.01093.0*10^{-9}3.710113.7*10^{-11}4.410134.4*10^{-13}<= 101310^{-13}
Level-2 agents0.0092.91052.9*10^{-5}9.11089.1*10^{-8}2.11092.1*10^{-9}5.810125.8*10^{-12}2.910132.9*10^{-13}<= 101310^{-13}

We observe that convergence is preserved, and a larger proportion of higher-level agents still accelerates convergence.

Q2. How does Proposition 8 relate to Hardt et al.? We assume you refer to Section 4.1 in Hardt et al. They analyze a fixed strategy whereas our bound holds for any possible strategy that could potentially shift the model to the target state. We demonstrated that the collective’s effort to implement such a strategy (modeled by the shift needed) is inversely proportional to the size of the collective. This is not a quantity studied in prior work. However, what is similar is that the norm of the gradient at the target solution naturally emerges as a notion of suboptimality (under the base distribution) for what the collective aims to achieve.

Q3. Connection to strategic classification utility model Thanks for pointing this out. For simplicity we removed the cost term from our problem statement to be more consistent with the performative prediction formalism. To fully recover the strategic classification’s utility model of Hardt et al., one would need to extend the notation as U(h)=E_zD_0[u(z,h(z),A(Dh))]U(h) = \mathbb{E}\_{z\sim \mathcal{D\_0}}[u(z, h(z), \mathcal A (D_h))]. However, this does not change the results in our paper as it acts as a regularizer that equally applies to any level of reasoning for any fixed agent. For example, in Theorem 3, the convergence still holds as long as the sensitivity of the distribution map satisfies ϵγβ\epsilon \leq \frac{\gamma}{\beta}. And for later results still hold as the envelope theorem still applies. We will clarify this point in the revised version.

Q4. Clarification on Theorem 6
Yes, the model indeed converges immediately since hh^\sharp is not dependent on the current model parameter. Our goal is to characterize the suboptimality due to deviating from the case α=1\alpha=1. We are not sure we understand what you are suggesting - could you please clarify what you mean by a best-of-both-worlds bound?

Code
The code requires additional modules that can be found in the github repo of the paper "Performative prediction." ICML 2020. The code is runnable if the dataset (Kaggle contest: Give Me Some Credit, cs-training.csv) and the modules (data_prep.py, optimization.py, strategic.py) are placed in the root folder of the notebook. We will work on the documentation and release our polished code publicly alongside the revised version.

Alignment condition
As long as the loss function and the agents’ utility functions are available, the gradient alignment expressions are in principle computable. The Hessian of utilities in differential games is indeed an important indicator of the “alignment” of the player’s objectives (see Section 2, Balduzzi et al., “The Mechanics of n-Player Differentiable Games”, ICML’18). It is inevitable to utilise the second-order information in characterising the exact alignment of the players’ objectives. In practice, it is possible for practitioners to infer the adversarialness of the learner and the collective through qualitative analysis without direct computations. For example, in the simulations, we measure the alignments using the importance of features, and the empirical results also align with our theoretical observations. However in general we see the main value of our analysis in highlighting the role of steering in collective reasoning (which is absent from strategic classification) and offering the first formal characterization of the tradeoff involved in strategizing collectively against a risk minimizing learner.

评论

Thank you for the detailed response, and for the additional experiments! My concerns are addressed, and I'm adjusting my score accordingly. Regarding Q4, your clarification resolved my confusion - after going through the formal statements in Section 5.1 again, I think the question arose from a misunderstanding. Your explanation of the intent was very helpful, and may be worth emphasizing.

评论

Thank you for your response and for explaining where your confusion was coming from. We will make sure to incorporate the explanations from the rebuttal in the paper to clarify this upfront.

审稿意见
4

This paper studies strategic user behavior in the context of performative prediction, where the data distribution depends on the deployed predictive model. Prior work has largely assumed independent, myopic user behavior. This paper generalizes the strategic behavior of agents by introducing two models from behavioral economics and game theory: (1) Level-k reasoning, where agents reason recursively about other agents’ strategies. (2) Collective reasoning, where agents treat themselves as part of a statistically significant population capable of steering the learning system through coordinated actions.

The paper characterizes learning dynamics and equilibrium outcomes under these models and introduces the concept of the price of selfish reasoning to compare selfish and coordinated strategies. It also analyzes the role of mixed populations—part selfish, part collective—and studies how alignment between population utility and model loss impacts convergence, utility, and incentives to participate in collective strategies. Simulations are provided using a credit scoring dataset to support theoretical claims.

优缺点分析

Strengths

  1. Novel conceptual framing: Introduces collective reasoning into performative prediction, bridging the gap between strategic classification and algorithmic collective action.

  2. Technically sound derivations: Provides multiple formal results on convergence, utility trade-offs, and price of selfish reasoning under various assumptions.

  3. Insightful alignment analysis: Shows how gradient alignment between utility and loss determines the benefits or costs of collective action.

  4. Simulation studies: Includes a few simulations that qualitatively support theoretical insights, especially regarding convergence speed and utility curves in mixed populations.

Weaknesses

  1. Limited Formalization of Game-Theoretic Structures: Although the problem involves Stackelberg-like interactions, the game-theoretic formalism is weakly developed.

The notion of agents “best-responding” collectively is used, but no explicit utility functions or equilibria for individual agents are presented beyond collective averages. Whether the agents receive utility in a single round or repeatedly in a repeated game can significantly change the analysis.

The Stackelberg structure is only implicitly referenced and lacks formal modeling of individual incentives in multi-agent dynamics.

  1. Absence of Agent-Level Utility Analysis:

There is no detailed discussion of agent best responses, deviation incentives, or whether the proposed collective strategies are individually rational.

The conditions under which agents would prefer to improve or manipulate features are not clearly contrasted, as is standard in the strategic learning literature.

  1. Simulation Limitations:

The simulations are qualitative and small-scale, with limited quantitative rigor (e.g., no standard deviations, statistical tests, or multiple runs reported).

Only a single real-world dataset is used, and no baselines (e.g., existing performative prediction or strategic classification methods) are included for comparison.

Claims about convergence speed and utility alignment are not empirically validated across diverse models or settings.

  1. Missing Empirical Demonstration of Key Ideas:

Core concepts such as the price of selfish reasoning, collective shift bounds, and benefit of participation are not clearly illustrated with examples or real decision-making scenarios.

The paper lacks simple examples or visualizations to build intuition about higher-order reasoning or the effects of collective action.

  1. Some Assumptions Are Strong or Unverified:

Key results rely on linearity assumptions of the distribution, strong concavity of utility, and smoothness of losses—all of which may be hard to verify or hold in practice.

These assumptions are presented without discussion of their practical implications or limitations.

问题

  1. Can the authors formally model the Stackelberg game between the learner and the population under collective reasoning? What are the individual agents’ incentives, and do they form an equilibrium? Do they receive benefit from each model update or only in the PS state?

  2. Can the authors characterize which subset of agents are more willing to collaborate and which subsets are more willing to free-ride? How does this depend on the transition mapping, the initial data distribution and the choice of model hypothesis space?

  3. Can the authors add simple examples (even synthetic) to show concrete strategies under level-1 vs. level-k vs. collective reasoning? How do individual utilities compare?

  4. Are the results robust to violations of the linearity and strong concavity assumptions? Would the conclusions still hold with more realistic agent behavior or stochasticity?

  5. The simulation section would benefit from baseline comparisons, error bars, and quantitative metrics (e.g., average regret, time to convergence, etc.), as well as more real-world datasets in previous works. Can the authors extend this?

局限性

not beyond weaknesses and questions

格式问题

no

作者回复

Thank you for your feedback. In the following we address your comments in sequence. Please let us know if anything requires further clarification.

Q1. Stackelberg structure
In the context of collective reasoning (Section 4), the actions of the learner and the population are defined as follows:

The learner is assumed to be risk minimizing (recall Assumption 1): observing D\mathcal D, they choose A(D)=argmin_θΘE_zD[(z;θ)]\mathcal{A}(\mathcal{D}) = \arg\min\_{\theta \in \Theta} \mathbb{E}\_{z \sim \mathcal{D}}[\ell(z; \theta)]. In words, the learner best-responds to the population; their utility corresponds to the negative expected loss, and their action to a choice of parameters. The collective in turn finds a strategy that maximizes their collective utility UU by planning through the learner’s response. This optimal strategy is defined in L176 as h=argmaxhEzDh[u(z,A(Dh)] h^\sharp = \arg\max_h \mathbb{E}_{z \sim \mathcal{D}_h}[u(z, \mathcal{A}(\mathcal{D}_h) ].

Together, this structure corresponds to a two-player Stackelberg game where the collective acts as the leader and the learner as the follower. Note that the agents in the population are not necessarily optimizing their individual utilities. Like in traditional cases of collective action, participating in a collective is not always individually rational, but individuals may opt in for altruistic reasons.

In later sections we consider deviations from this model where the strategy space of the collective only permits the manipulation of a subset of the population (α<1\alpha<1). This corresponds to some agents prioritizing their individual utilities, and others acting altruistically in the interest of the collective. We hope this clarifies your question about the game-theoretic formalization of collective reasoning.

Do individuals benefit from model updates? Agents’ utilities are not guaranteed to be monotone over time and even at stability a benefit is not guaranteed at the individual level. As explained above, our goal is to study agent behavior that is driven by collective goals rather than individual incentives. We demonstrate how a group of agents can improve their aggregate utility through coordination, more so than if all of them perform individually rational actions. The reason is that collectives can ‘steer’ the learner, rather than just their own data.

Q2. Participation dynamics
This question relates to an important conceptual departure between our work and strategic classification. We study agents that purposefully optimize the collective utility. This behavior does not arise from behavior that is rational according to an individual utility. Taking this collective perspective, our results show when the collective is better off coordinating, rather than acting myopically. In the present work, we do not study the dynamics by which collectives form, or mechanisms to incentivize participation. Understanding the role of heterogeneous incentives in the context of algorithmic collective action is an interesting and, to the best of our knowledge, an unexplored question.

Q3. Example
One concrete example of level-k strategies is demonstrated in the simulation of Section 6.1. It is an extension of the classical strategic classification setting with a quadratic cost and linear utility. Here, agents at level kk modify their strategic features xSx_S assuming the other agents act according to level (k1)(k-1). The actions of agents can be written in closed form as x_Sk=xSϵθk1_Sx\_S^k = x_S - \epsilon \cdot \theta^{k-1}\_S, where θk=argminθi(xik,yi,θ)\theta^k = \arg\min_\theta \sum_{i} \ell(x^k_i, y_i, \theta). In the following we use this example to numerically compare the utilities of level-1 and level-2 agents with the same initial features x=(0,,0)x = (0, \ldots, 0). We report averages over 10 runs.

Timestep12345>= 6
uk=2uk=1u^{k=2} - u^{k=1}7.21017.2*10^{-1}4.51054.5*10^{-5}2.11092.1*10^{-9}2.8109-2.8*10^{-9}2.11092.1*10^{-9}<= 10910^{-9}

We observe that the utility gap decreases across time steps, as the procedure converges towards stability. However, there is no evidence on the monotonicity of utilities against their cognitive levels. We will include an extended version of this numerical analysis in the paper. Thank you for the suggestion!

Q4. Assumptions:

We discuss the three assumptions individually:

  • Strong concavity. The simulation in Section 6.2 is an example when the utility is not strongly concave, but the results still qualitatively align with our theoretical findings.
  • Linearity. The condition is only used in Theorem 4 and 6 in the last step of the proof to permit a cleaner and more interpretable bound. A “less clean version” of the result does not require this assumption and preserves the qualitative take-aways. For Theorem 4, for example, Φ\Phi would need to be replaced by _h_θE_zD_hˆ\Vert\nabla\_h\nabla\_\theta\mathbb{E}\_{z\sim \mathcal{D}\_{h\^*}} u(z, \theta) (Hˆ)1ˆEzD_hˆ(\mathbf{H}\^\star)\^{-1}\mathbb{E}_{z \sim \mathcal{D}\_{h\^\ast}} \ell(z, \theta) \Vert. In general, for simple densities the assumption can always be made possible by reparameterising the density functions which allows to differentiate through the individual terms.
  • Agent behavior The analysis of the mixture distribution (Equation 3) is our attempt to understand heterogeneous populations and deviate from the typical model of a homogeneous population with a fixed level of reasoning. One insight we have is that convergence to stability is achieved for varying proportions of agents at different levels of cognitive hierarchy. We find that convergence is robust to these deviations.

Q5. Experiments
Thank you for the suggestions, we are happy to extend our experimental section along the dimensions you mentioned. We added error bars (as one standard deviation) for all plots in the paper. See below for the numbers.

Figure 1:

Timestep123456>=7
standard deviation: \alpha = \{0,1,0}2.91012.9*10^{-1}2.481052.48*10^{-5}2.181072.18*10^{-7}1.951091.95*10^{-9}1.7610111.76*10^{-11}1.5310131.53*10^{-13}<= 101410^{-14}
standard deviation: \alpha = \{0,0.5,0.5}1.91011.9*10^{-1}1.61051.6*10^{-5}2.21072.2*10^{-7}2.01092.0*10^{-9}1.810111.8*10^{-11}1.6110131.61*10^{-13}<= 101410^{-14}
standard deviation: \alpha = \{0,0,1}1.71011.7*10^{-1}1.51071.5*10^{-7}1.41091.4*10^{-9}1.210111.2*10^{-11}1.010131.0*10^{-13}<= 101410^{-14}<= 101410^{-14}

Figure 2:

α\alpha0.020.130.230.340.450.550.660.770.870.98
utility std - #Age4.61034.6*10^{-3}2.01032.0*10^{-3}1.21031.2*10^{-3}8.21048.2*10^{-4}4.61044.6*10^{-4}4.91044.9*10^{-4}2.81042.8*10^{-4}1.81041.8*10^{-4}7.61057.6*10^{-5}6.01066.0*10^{-6}
utility std - #Dependents7.21037.2*10^{-3}1.61031.6*10^{-3}7.41047.4*10^{-4}6.81046.8*10^{-4}5.01045.0*10^{-4}3.01043.0*10^{-4}1.91041.9*10^{-4}1.51041.5*10^{-4}8.01058.0*10^{-5}6.01066.0*10^{-6}
Benefit of Participation - #Age4.81034.8*10^{-3}2.61032.6*10^{-3}1.91031.9*10^{-3}1.61031.6*10^{-3}1.11031.1*10^{-3}1.51031.5*10^{-3}1.11031.1*10^{-3}1.11031.1*10^{-3}9.51049.5*10^{-4}3.91043.9*10^{-4}
Benefit of Participation - #Dependents7.41037.4*10^{-3}2.11032.1*10^{-3}1.21031.2*10^{-3}1.31031.3*10^{-3}1.21031.2*10^{-3}9.41049.4*10^{-4}7.91047.9*10^{-4}9.11049.1*10^{-4}1.01031.0*10^{-3}3.61043.6*10^{-4}

In terms of convergence: note that Figure 1 (in the paper) illustrates convergence for the experiment of Section 6.1. For the experiment conducted in Section 6.2, for the inference of the collective strategy, convergence is achieved after 200 steps for all cases. We will give a more comprehensive report on these quantitative metrics in the revised version.

Regarding real-world datasets, we would like to emphasize that the credit scoring simulator is based on real data (data collected from 250,000 borrowers including historical information about the individual and information on whether they defaulted on a loan). More generally, in related literature, there are few experiments on real-world data since that would require obtaining samples under different model deployments. If the reviewer has a suggested dataset we could look at, we would be happy to incorporate it.

评论

In the response of Q4: .... For theorem 4, for example, Φ\Phi would need to be replaced by _h_θE_zD_h\Vert\nabla\_h\nabla\_\theta\mathbb{E}\_{z\sim \mathcal{D}\_{h^\ast}} u(z, \theta) (H)1ˆ_θE_zDh(\mathbf{H}^\star)\^{-1}\nabla\_\theta\mathbb{E}\_{z \sim \mathcal{D}_{h^\ast}} \ell(z, \theta) \Vert.... where the last term was missing a "_θ\nabla\_\theta". Apologies for the typo, the openreview platform does not render LaTeX equations very well.

最终决定

This paper studies a performative learning setting in which user behavior is modeled as k-level reasoning and as collective reasoning. All reviewers appreciate the novelty of the setup and agree that the paper makes a nice link between retraining and micro behavioral modeling through the dependencies that follow from collective action. However, while the authors claim that most works have so far considered individual responses, there are several recent works that do consider interdependent user behavior (e.g., see [1-4] and references within), which should be addressed. The reviews were also helpful in identifying areas in which the paper can be improved, such as: a more formal game-theoretic description of the setup, the need for making strong assumptions, and the breadth and robustness of the experiments. The author's rebuttal was helpful in addressing some of the concerns, but they are nonetheless encourage to incorporate the reviewers' suggestions into the final version.

[1] Strategic Classification with Graph Neural Networks. ICLR 2023. [2] Performative Prediction on Games and Mechanism Design. AISTATS 2025. [3] Strategic classification with externalities. ICLR 2025.
[4] Learning Classifiers That Induce Markets. ICML 2025.