PaperHub
5.5
/10
Poster4 位审稿人
最低2最高4标准差1.0
2
4
4
2
ICML 2025

Contract Design Under Approximate Best Responses

OpenReviewPDF
提交: 2025-01-24更新: 2025-07-24

摘要

关键词
Principal-AgentContract DesignRobustness

评审与讨论

审稿意见
2

This paper studies a repeated contracting game between a principal and an agent. The principal offers contracts to the agents who has the choice to accept the contract or not. The paper assumes a hidden action on which the outcome (to which are related the principal's and the agent's utilities) depends. Instead of the classic setup where the agent best responds to the contract, the agent here deviates by a quantity δ\delta from his best action. In that setup, algorithm 1 gives a way to compute the "optimal δ\delta-robust contract". Then, it is assumed that the agent's type changes over time in a repeated game. For that scenario, the principal discretizes the space of contracts as an ϵ\epsilon-grid of the dimension m hypercube. The UCB bandit algorithm is used, the set of actions being the grid of the hypercube.

From my understanding, δ\delta is known to the principal (otherwise algorithm 1 cannot run because of equations (6a), (6b), (6c)). Thus, since it is assumed that the agent picks the worst action for the principal within the set AδA^\delta, the term "robustness" seems an abuse of language. The principal knows the agent's deviation, which makes it way easier to tackle and not very far from analyzing a best response.

给作者的问题

Could you compare the technical novelty of your approach to Zhu, B., Bates, S., Yang, Z., Wang, Y., Jiao, J., and Jordan, M. I. The sample complexity of online con- tract design?

Could you explain how it makes a difference to assume that the agent picks the worst action for the principal in AδA^\delta as long as δ\delta is known? I am not sure it makes it a harder problem as compared to a best-responsive agent.

论据与证据

I am concerned about Remark 2, which states the following:

"Our algorithm can be extended to deal with settings in which the agent can play any δ-best response within the set Aθt,δ(pt). In such a setting, the principal’s utility is not fully stochastic. However, our Algorithm 2 can be easily extended by instantiating an adversarial no-regret algorithm instead of UCB1."

An important difficulty would come from dealing with agents who potentially play any action in the set AδA^\delta (for δ\delta being not close to 00, I am not even sure that a solution exists). I would appreciate a clear algorithm and a proof for this remark, which seems overclaimed to me.

As I mention it above, the term "robustness" for this setup is also overclaimed.

方法与评估标准

There are no experiments in this paper, which is absolutely not a concern for me. The paper is theoretical and is more about proposing a framework to study contract design than about offering possibly practical implementations.

理论论述

I checked the proofs for the theoretical claims and they seem correct to me. My concern is on one side about the lack of mathematical precisions. Typically, UCB1 is called without any reference. Of course, I can understand that it is a reference to the Upper Confidence Bound algorithm but I would appreciate to see it written down, especially when it comes to analyze it.

I feel like the proofs and techniques are very close to Zhu et al. 2023, except that they are made easier. The fact that the agent chooses the worst action for the principal within the set AδA^\delta does not change the way the analysis can be done. At the end of the day, it is equivalent in the analysis (at least for the learning part) to an agent choosing the best action. I am not convinced by the technical novelty of the theoretical claims.

实验设计与分析

NA.

补充材料

I appreciate the fact that the proofs are given in the supplementary material. However, I would have enjoyed if the authors had taken advantage of the extra space to give clear and rigorous mathematical explanations.

与现有文献的关系

I am concerned about the novelty of the paper as compared to Zhu et al. 2023. The claimed advantages are the following:

  • apparently the method by Zhu et al. 2023 needs a more complex discretization, which makes "the approach by Zhu et al. (2023) challenging to be employed in practice compared to ours". The current state of this line of work with a UCB having an action set of size T1/m+1T^{1/m+1} seems to me a sufficient burden to make it stay a very theoretical field right now. If the authors believe that the practical implementation is a significant advantage of their method, then I would have appreciated experiments or at least straight applications clearly stated.
  • second, the method in the paper "does not require apriori knowledge of the principal’s reward, which is instead required by the algorithm of Zhu et al. (2023)." I agree that it can be of some interest but to me, it is a very marginal advantage.

遗漏的重要参考文献

Yes. The very classic reference in contract theory could be mentioned:

  • Laffont and Martimort, The theory of incentives: the principal-agent model.

There exists a literature on robust contract design, such as

  • Yu and Kong, Robust Contract Designs: Linear Contracts and Moral Hazard
  • Miao and Rivera, Robust contracts in continuous time but the only reference given is:
  • Carroll, Robustness and linear contracts.

其他优缺点

My first concern is the assumption on the agent's behavior. The paper assumes that the agent picks the principal's worst action within the set AδA^\delta. This assumption does not make any sense from a rational agent's perspective, does not exist in the literature and seems to be there only to allow the proofs to work.

The learning part of the paper seems very weak to me. It does not go beyond running UCB on a discretized hypercube to approximate a very specific notion of regret. Also, the last section does not provide anything more than reminding the hidden type setup of Zhu et al. 2023, then proposing a discretization of the hypercube and running an UCB on it (similar techniques were already proposed). Hence, despite the pros and cons one could find in the paper, I do not believe that it is a good fit for ICML, which is a machine learning conference.

The capacity of the method to handle any δ\delta best-response seems overclaimed.

I appreciate the problem of designing robust contracts, which seems very interesting to me. However, I believe that the setup and the approach are poorly formulated and do not bring additional value as compared to the existing state-of-the-art.

其他意见或建议

The regret definition in part 5 should me more discussed since it is not obvious.

作者回复

We believe there is misunderstanding on the main contributions of our paper and the challenges in obtaining efficient optimization algorithms.

  • Computing robust equilibria is much harder than standard one. The Reviewer claims: “The principal knows the agent's deviation, which makes it way easier to tackle and not very far from analyzing a best response.” We disagree with this statement, as it contradicts known results for similar bilevel problems, such as Stackelberg games and Bayesian persuasion, where computing an equilibrium in the best-response case is computationally easy, but hard for worst-case robust approximate best-response.

  • Main contribution of our paper. The Review seems mostly focused on the learning part, while it seems to ignore the more “surprising” computational results. As should be clear from our writing (see, e.g., the abstract), the learning part is not the main contribution but rather a valuable addition. While we agree with the Reviewer that the learning part alone is not enough for ICML, we hope the Reviewer would agree that it includes nice extensions and simplifications with respect to the approach by Zhu et al.

  • On agent’s behavior. The Reviewer claims: “The paper assumes that the agent picks the principal's worst action within the set Aδ(p)A^\delta(p). This assumption does not make any sense from a rational agent's perspective, does not exist in the literature and seems to be there only to allow the proofs to work.” We respectfully, but strongly, disagree on this statement. First, the assumption is not intended to be predictive of the agent's behavior: rather, it is a worst-case perspective based on the fact that the agent's behavior is unpredictable within a small δ\delta utility difference (e.g., due to noise in the model parameters or the agent's perception of their own utility). In such a scenario, the principal adopts a worst-case approach by computing a robust contract. In this way, the utility the principal secures, OPT(δ\delta), is a lower bound on the actual utility the principal will obtain. Indeed, this robust equilibrium model is not our own invention, it has been previously proposed and studied in both Stackelberg and Bayesian persuasion settings (Gan et al., 2023; Yang & Zhang 2024).

Re: "I am concerned about Remark 2.."

The Reviewer’s observation is right, if the agent is allowed to break ties in any way when indifferent among multiple δ\delta-best response, then an optimal contract for the principal may not exist. However, Remark 2 is concerned with the learning part, where the goal is not to compute an optimal contract, but rather to attain no-regret against an optimal δ\delta-robust contract. Notice that such a contract always exists, given the correctness of Algorithm 1. Thus, non-existence is not an issue. Intuitively, the extension of Algorithm 2 informally described in Remark 2 has to deal with the fact that the feedback actually observed by the algorithm is not about the δ\delta-best response that minimizes principal’s utility, but about some other δ\delta-best response instead. This requires the adoption of an adversarial regret-minimization algorithm, but it intuitively does not detriment the regret against an optimal δ\delta-robust contract, since the observed δ\delta-best responses cannot be worse than the one played under a δ\delta-robust contract. Due to space constraints, we cannot provide a complete formal proof of this to the Reviewer, but we will certainly add it in the final version of the paper.

Re: "Comparison with Zhu et al. (2023)"

As we have already highlighted, the learning part is not our main contribution. However, our algorithm builds on Zhu et al. to deal with δ\delta-best responses of the agent. In doing so, it provides a simple algorithmic approach and analysis. While we agree that our approach is still far from practical applications, it should be acknowledged that a simpler approach and analysis are always beneficial.

Re: "Could you explain how it makes a difference to assume that the agent picks the worst action for the principal in as long as δ\delta is known? I am not sure it makes it a harder problem as compared to a best-responsive agent."

Computing a robust contract when δ\delta is known still poses several challenges. For instance, in Stackelberg games, computing a robust commitment is computationally intractable even when δ\delta is known. This is because, unlike the non-robust case, the principal’s utility function is piecewise linear over an exponential number of regions (see, for instance, the hard instances in the reduction of Gan et al. (2023)). This contrasts with classical contract design, where the utility is piecewise linear over nn regions (one per action). It is only thanks to our analysis, which exploits the particular structure of contract design (not available in other bilevel problems), that we obtain a polynomial-time algorithm.

审稿人评论

I thank the authors for their detailed and interesting response.

Concerning "Computing robust equilibria is much harder than standard one.", I still do not agree with the authors. In their setup, the agent's action is the δ\delta-worst case (see line 182), while the meaning of "robust" encompasses any action for which the loss of utility is between 0 and δ\delta, especially in Stackelberg Games or Bayesian Persuasion. Typically, the authors refer to Gan et al. 2023. In their paper, the agent's δ\delta-approximate best-response is the set (j,u(x,j)>maxiu(x,i)δ) ( j, u(x,j)>max_i u(x,i)-\delta ) for a principal's action xx and the agent is assumed to pick any action in this set. It makes a major difference between both setups and definitely facilitates the analysis in the paper here.

It is linked with the discussion "On agent’s behavior...". The authors answer "the agent's behavior is unpredictable within a small δ\delta utility difference": although it is written in the paper (line 182), "the agent selects a δ\delta-best response that minimizes principal’s expected utility, namely an action aδ(p)Aδ(p)a^\delta(p) \in A^\delta(p) such that aδ(p)a^\delta(p) \in arg minaAδ(p)Fa(rp)_{a \in A^\delta (p)} F_a \cdot (r − p)." It thus means that the agent's behavior is determined and fixed within the set of δ\delta-deviations.

To me, the proof of the statement given in Remark 2 would be interesting and I still think that it is a pity that the authors did not stated it as a theorem. The last response "For instance, in Stackelberg games, computing a robust commitment is computationally intractable even when δ\delta is known." is unsatisfying in the sense that the agent's behavior in this paper is not what is generally assumed by robust in Bayesian Persuasion or Stackelberg games.

I really appreciate the time that the authors took to answer my questions but a lot of issues remain unsolved to me. Therefore, I maintain my grade.

作者评论

We thank the Reviewer for taking the time to read our responses. We think that the Reviewer misunderstood the robust equilibrium definition of Gan et al. (2023). Indeed, our robust equilibrium definition is exactly the same as that of Gan et al. (2023), and we are absolutely certain about this.

In fact, while you pointed out Equation (2) in Gan et al. (2023) (https://arxiv.org/abs/2304.14990), just a few lines below this equation, they formally defined robust Stackelberg equilibrium, where Equation (3) clearly indicates that the follower selects a worst response for the leader from the δ\delta-best response set: jargminjBRδ(x)ul(x,j)j^* \in \arg\min_{j \in \mathrm{BR}_\delta(x^)} u_l(x , j).

Indeed, this follows exactly the rationale that the follower may choose any action in the delta-best response set. Since the exact choice is unpredictable, the worst-case perspective is applied (as in Equation (3) of Gan et al. (2023)). Notice that this is exactly what we say at line 182 when we say that the agent selects aδ(p)argminaAδ(p)Fa(rp)a^\delta(p) \in \arg\min_{a \in A^\delta(p)} F_a \cdot (r - p).

We’d therefore like to request the Reviewer to kindly review the definition in Gan et al. (2023) and ours. In fact, we’d also like to know your thoughts on the following: if Gan et al. (2023) did not use the worst action in the δ\delta-best response set in the definition, then based on what is the leader’s payoff in a robust equilibrium defined? Notice that an exact payoff must be defined in order to obtain a formal optimization problem. We invite the Reviewer to discuss with the other Reviewers in order to have an additional opinion on the correctness of the definition.

审稿意见
4

The paper studies hidden-action principal‐agent problems where a principal designs a contract (an outcome‐dependent payment scheme) to incentivize an agent to take actions that are in favor of the principal. Unlike traditional models assuming the agent always plays an exact best response, here the agent may choose any action that is within a δ(0,1)\delta\in(0, 1) suboptimal range.

  • The authors derive upper and lower bounds on the maximum utility the principal can achieve with robust contracts, characterizing the "price" the principal pays (in terms of utility loss) for robustness as a function of δ\delta. Notably, these bounds do not depend on the inducibility gap (a parameter common in Stackelberg games).
  • The paper presents a polynomial-time algorithm to compute an optimal δ\delta-robust contract by first fixing the optimal arm and a δ\delta-robust arm, and solve the induced LP.
  • The work extends to an online learning framework where the principal does not know the agent’s underlying parameters (costs, types) and shows an T11/2(m+1)T^{1-1/2(m+1)} regret.

给作者的问题

The online learning setting assumes the follower's action to be invisible, rendering the follower's response model a "black box" and thus an exponential dependency is unavoidable. My question is, If the follower consistently takes the worst δ\delta-suboptimal action (from the leader's perspective), and the leader can observe the follower's taken action. Can the leader leverage this adversarial behavior to gain more information about the follower's cost structure? If so, could this lead to an improved regret bound compared to the standard setting?

论据与证据

Yes

方法与评估标准

Yes

理论论述

Yes

实验设计与分析

NA

补充材料

NA

与现有文献的关系

The paper addresses an interesting problem of how to learn a δ\delta-robust contract from both computational and online learning perspectives. Previous work Zhu et al. 2023 focuses on more general Stackelberg game setting, where many of the results can be further refined for the problem of contract design due to the specialty of the (linear) problem structure.

Zhu, Banghua, et al. "The sample complexity of online contract design." arXiv preprint arXiv:2211.05732 (2022).

遗漏的重要参考文献

No

其他优缺点

The analysis is novel and the paper is well written and easy to follow.

其他意见或建议

No further suggestions

作者回复

Re: "The online learning setting assumes the follower's action to be invisible, rendering the follower's response model a "black box" and thus an exponential dependency is unavoidable. My question is, If the follower consistently takes the worst -suboptimal action (from the leader's perspective), and the leader can observe the follower's taken action. Can the leader leverage this adversarial behavior to gain more information about the follower's cost structure? If so, could this lead to an improved regret bound compared to the standard setting?"

We thank the Reviewer for the question. Learning an optimal contract with observable actions in a polynomial number of queries (with respect to the parameters of the problem instance) remains an open problem, even in the non-robust case. Exponential lower bounds exist in Stackelberg games, but it is unclear whether they extend to contract design due to the particular structure of the utilities. However, we do not believe that the adversarial behavior of the follower would benefit the principal, since when δ\delta is close to zero, the problem nearly reduces to the classical one (see Proposition 1).

审稿意见
4

This paper studies optimal contract design under approximate best response agents in hidden-action principal-agent games. First, they propose an efficient algorithm to compute an optimal contract. They also show that the principal's utility is δ\delta close to the optimal contract under best response agents. Finally, the paper relaxes the full knowledge assumption and devises a no-regret learning algorithm.

To design an efficient algorithm, they first show that the parameter space of optimal δ\delta robust contracts can be formulated by a collection of disjunctive constraints (2a) and use linear programming to solve it. The proof idea of δ\delta-closeness in Proposition 1 is similar to Zhu et al. 2023.

给作者的问题

Can you elaborate more on which part of the results in section 5 can be applied to the non-robust setting (δ=0\delta = 0)?

论据与证据

Yes

方法与评估标准

Yes

理论论述

Mostly

实验设计与分析

NA

补充材料

Proof of proposition 1 and skim through Theorem 2.

与现有文献的关系

The paper considers optimal contract design under approximate best response agents. If the agent is best responding, the optimal contract can be solved by a simple LP. Approximate best response introduces additional layer of complexity.

遗漏的重要参考文献

Nothing I am aware of.

其他优缺点

Strength

  • The results are interesting in the context of principal agent game under approximate best response, e.g., Gan et al. 2023 where the paper should hidden action principal-agent games is smooth in the principal's objective and also admit efficient algorithm
  • The writing is good.

Miner issue

The comparison to Zhu et al is not very clear to me.

  • Can the simple discretization apply to non-robust setting?
  • Can algorithm 2 take δ=0\delta=0?
  • Why does the regret definition RT(C)\mathcal{R}_T(\mathcal{C}) coincide with Zhu et al's? If δ0\delta\neq 0, the second part is approx best response.

其他意见或建议

NA

作者回复

Re:"Can the simple discretization apply to non-robust settings? "

Yes, the simple discretization can be employed in the non-robust version of the problem. Intuitively, it is sufficient to follow the same steps as in the proof of Theorem 2, with δ\delta going to zero.

Re: "Can algorithm 2 take δ=0\delta=0?"

No, when δ=0\delta=0 the δ\delta-best responses are not well defined. (Please refer to the first equation in Section 2.2.)

Re: "Why does the regret definition RT(C)\mathcal{R}_T(\mathcal{C}) coincide with Zhu et al's? …."

The Reviewer is right; we will fix this in the final version. Only the baseline (i.e., OPT) in that regret definition coincides with that considered by Zhu et al. (2023). However, when δ\delta is approximately zero, the two regret definitions essentially coincide. Indeed, as δ\delta tends to zero, the problem nearly reduces to the non-robust one (see Proposition 1). **

Re: "Can you elaborate more on which part of the results in section 5 can be applied to the non-robust setting?"

“As previously discussed, when δ\delta is approximately zero, our problem essentially coincides with that of Zhu et al. (2023). Thus, our results can be extended to the non-robust problem. However, this requires addressing some technical details, since when δ=0\delta = 0, the agent's best response regions do not coincide with those in the non-robust case due to the strict inequality “>” in the definition of Aδ(p)A^\delta(p).

审稿意见
2

This paper explores hidden-action principal-agent problems where the agent follows approximate best responses rather than exact optimal strategies. The authors propose a polynomial-time algorithm for computing optimal contracts under these conditions and introduce a no-regret learning algorithm for scenarios where the principal lacks prior knowledge.

给作者的问题

  1. Could the authors specify the unique and new challenges in their proposed model and clarify how their techniques effectively address these challenges?
  2. Is there any new technique developed during this process?

论据与证据

yes.

方法与评估标准

yes.

理论论述

In the first 8 pages

实验设计与分析

N.A.

补充材料

N.A.

与现有文献的关系

n.a.

遗漏的重要参考文献

n.a.

其他优缺点

Strengths:

  1. The paper addresses a novel aspect of contract design in principal-agent problems, considering realistic scenarios where agents do not always act optimally. The introduction of approximation in an agent's action is new, though the idea is similar to Gan et al., 2023.
  2. The authors provide a thorough analysis, rigorously establishing bounds on contract robustness, demonstrating the existence of optimal solutions, and introducing polynomial-time algorithms that improve upon prior intractability results in similar settings.

Weaknesses:

  1. The techniques presented in this paper do not appear particularly novel. Both the analysis of the algorithm 1 and 2 are relatively standard. The novelty of the technical results should be further clarified.

其他意见或建议

N.A.

作者回复

Re: "Challenges in Our Setting and Technical Novelty.”

We believe that the computational results presented in Section 4 are rather “surprising”. Indeed, such positive results are unexpected, as very similar problems are computationally intractable. In particular, computing an optimal δ\delta-robust commitment in Bayesian persuasion and Stackelberg game settings is known to be computationally intractable (see Gan et al. (2023), Yang & Zhang (2024)). Our case actually appears even more challenging because the strategy space, R+m\mathbb{R}_+^m is much less amenable compared with \Dletam\Dleta_m in Bayesian persuaison and Stackelberg games. And because of this, the quasi-polynomial-time approximation schemes (QPTAS) developed in the previous works do not apply to our setting. We initially conjectured that our setting was harder and attempted to prove that it was inapproximable.

It was only after a series of attemps and very careful analysis, we discovered that contract design scenarios are actually more structured and allow for efficient computation. Based on these unique structures, our method works entirely different from previous approaches to computing robust equilibria. It revolves around fixing the δ\delta-robust best response and the actual best response, then computing an optimal robust contract for such a scenario. By iterating this process for each pair of actions and solving an LP at each step, we prove that it is possible to recover an optimal robust contract. Our algorithm is guaranteed to work in contract design due to the particular structure of the principal’s and agent’s utilities. This approach is entirely new compared to existing ones, which, for example, require solving an exponential number of LPs. Indeed, notice that a similar approach cannot be employed in the settings studied in previous works.

We agree with the Reviewer that some of the techniques we adopted are common in algorithmic game theory, such as solving a set of LPs. However, the main contribution and most challenging step of our work is determining which LPs need to be solved to recover an optimal robust contract. (Indeed, a naive approach would require solving exponentially many LPs!) Hence, while the algorithmic approach may appear standard, this is in fact not the case for the analysis, which, as highlighted above, is entirely disjoint from previous works on the topic and requires non-trivial arguments.

最终决定

The paper studies a robust version of contract design where the agent may take any action that is close enough to the optimal one. Overall, all reviewers find the (high-level) model reasonable, and the contributions valuable. There were questions concerning the technical novelty compared to prior work on robustness in game-theoretic settings, as well as whether the specific notion of robustness in this paper is the right notion. Accordingly, we encourage the authors to take this chance to improve their paper and address the specific concerns raised by the reviewers, regardless of the outcome.