6.0

/10

Poster4 位审稿人

最低5最高7标准差0.7

2.8

置信度

正确性3.0

贡献度2.8

表达2.3

NeurIPS 2024

Improved learning rates in multi-unit uniform price auctions

Marius Potfer,Dorian Baudry,Hugo Richard,Vianney Perchet,Cheng Wan

OpenReview PDF

提交: 2024-05-09更新: 2025-01-13

TL;DR

We improve known regret rates in repeated uniform multi-unit auctions under bandit feedback, and introduce a novel partial feedback specific to the auctions.

摘要

Motivated by the strategic participation of electricity producers in electricity day-ahead market, we study the problem of online learning in repeated multi-unit uniform price auctions focusing on the adversarial opposing bid setting. The main contribution of this paper is the introduction of a new modeling of the bid space. Indeed, we prove that a learning algorithm leveraging the structure of this problem achieves a regret of $\tilde{O}(K^{4/3}T^{2/3})$ under bandit feedback, improving over the bound of $\tilde{O}(K^{7/4}T^{3/4})$ previously obtained in the literature. This improved regret rate is tight up to logarithmic terms. %by deducing a lower bound of $\Omega (T^{2/3})$ from the dynamic pricing literature, proving the optimality in $T$ of our algorithm up to log factors. Inspired by electricity reserve markets, we further introduce a different feedback model under which all winning bids are revealed. This feedback interpolates between the full-information and bandit scenarios depending on the auctions' results. We prove that, under this feedback, the algorithm that we propose achieves regret $\tilde{O}(K^{5/2}\sqrt{T})$.

关键词

Online LearningAuctionsBandits

评审与讨论

审稿意见

评分: 5置信度: 22024-06-30

This paper considers online learning in repeated multi-unit uniform price auctions, motivated by the strategic participation of electricity producers in the electricity day-ahead market. By introducing a new modeling of the action space to EXP3, the authors propose an algorithm that achieves $\tilde{O}(K^{3/2}\sqrt{T})$ , $\tilde{O}(K^{5/2}\sqrt{T})$ , and $\tilde{O}(K^{4/3}T^{2/3})$ regret bounds for full information, all-winner, and bandit feedback, respectively. They also provide lower bounds for each case.

优点

They provide regret bounds for various feedback settings: full feedback, partial feedback, and bandit feedback.
The proposed algorithm improves the regret bound by using a new model of the action space.

缺点

I think that the setting is not clearly written and seems to differ from previous work [1] without justification. In more detail, for the allocation described in (1), for player $i$ , any items with a value larger than a price are allocated. However, in [1], the auctioneer allocates the $j$ -th unit to the player who submitted the $j$ -th highest bid. Additionally, the role of the valuation $v_l$ is not clearly described.
The motivation for the allocation policy in this setting may not be well described.

[1]Brânzei, Simina, et al. "Learning and collusion in multi-unit auctions." Advances in Neural Information Processing Systems 36 (2024).

问题

Could you provide a motivating example to justify the allocation policy in setting (1)? According to (1), it seems that multiple players may receive the same item, which may not be convincing to me.
According to the paper, the valuation $v_l$ is known to the bidder. What is the motivating example?

局限性

The authors have addressed some of the open problems in the conclusion.

作者回复

2024-08-06

Thank you for your review.

W1/Q1 It is inexact to say that our setting is not the same as in [1]. While the way our allocation policy is described indeed differs from the description in [1] it results in the same allocations as allocating the $j^{th}$ items to the $j^{th}$ highest bid. Indeed, since all items are identical, it is sufficient to keep count of how many bids from player $i$ are above the price (and hence amongst the $K$ highest). Therefore strictly allocating the $j^{th}$ items to the $j^{th}$ highest bid and allocating one item for each bid in the $K$ highest gives the same allocation (as permuting identical items does not change the allocation). While it is possible for two players $i,k$ to have the same allocation $x_i(\mathbf{b})=x_k(\mathbf{b})$ this only means they get allocated the same number of items. We take note that this might not be straighforward and will ensure to make it clearer.

W2. Multi-unit auctions with uniform pricing and the associated allocation rule [1] used in this paper is a natural model for the short-term electricity market. We give some references l99.

Q2. From both a motivation and technical point of view, it is reasonable to assume some knowledge of the valuations :

The value is a quantity that represents how useful each item will be to the bidder. In that sense, any bidder trying to acquire items should have some knowledge of the valuation of this item $*for them*$ .
We are studying the adversarial opposing bid setting, yet, as pointed out in [2] without any assumption on the valuation (ie adversarial valuation that can vary across auctions) no non-trivial regret guarantees can be obtained. Since their setting ( first price single item auction with valuation observed before each round ) is strictly easier this impossibility result also applies to multi-unit uniform auctions.

We chose to focus on known valuation as it allows for a more concise and understandable analysis. Another possible model would be to assume that valuations are unknown but constant across auctions and that a noisy version of the valuation is observed whenever an item is won. Adapting our algorithm to this setting would require to use estimates of valuations instead of the real values. We leave this adaptation to future work as we believe the main ideas of our work are easier to get in the setting we consider.

[1] Brânzei, Simina, et al. "Learning and collusion in multi-unit auctions." Advances in Neural Information Processing Systems 36 (2024).

[2] Balseiro, S., Golrezaei, N., Mahdian, M., Mirrokni, V., & Schneider, J. (2019). Contextual bandits with cross-learning. Advances in Neural Information Processing Systems, 32.

评论- Thank you for your response

2024-08-08

Thank you for your response. I have some further questions based on your response.

According to your response, the items are identical. Then, what is the reason for having different valuations for each item? Also, based on your motivation example of the electricity market, I'm wondering whether the items are identical or may be different based on the usage.

评论- Response to your questions

2024-08-09

Thank you for these questions.

First, we need to clarify that both our allocation function and valuation are parameters that relate to the number of items attributed to a player (because items are identical, their indexes are irrelevant). Furthermore, the valuations are marginal: for bidder $i$ and for $j \in [K]$ , $v_{i,j}$ quantify how much more bidder $i$ values getting $j$ items than $j-1$ items.

With this in mind, the reason we have different valuations is to allow for diminishing marginal utility, a common microeconomics assumption introduced in [1], chapter 2. The main intuition is that the more someone has of an item, the less he values getting more of it (drinking water for instance is only very valuable for the first liters of the day). In the example of electricity markets, while the first MWh and the tenth MWh attributed to a bidder are identical, the bidder will dedicate the first to make critical infrastructure work, while the tenth would be used to activate processes that are more easily shut down and restarted.

In our setting, an item cannot be allocated to multiple bidders at the same time. The description of our setting only specifies the number of items to be allocated to each player, hence ensuring that the total amount of allocated items is equal to $K$ is sufficient to ensure no items need to be allocated twice. Because items are identical it is unnecessary to give them an index and determine which item goes to which bidder.

[1] Greenlaw, Steven A., et al. Principles of Microeconomics 2e. United States, OpenStax, 2017.

评论- Thank you for your response.

2024-08-09

Your response addressed my concern regarding the bidding settings. However, in my opinion, the applicability of this model seems to be limited when dealing with identical items. Therefore, I raise my score to 5 while maintaining low confidence.

审稿意见

评分: 6置信度: 32024-07-09

This work studies the problem of multi-unit uniform price auctions. By introducing a new modeling of the action space, the paper improves the regret of the online learning problem to $\tilde{O}(K^{4/3}T^{2/3})$ under bandit feedback, and $\Omega(T^{2/3})$ is a regret lower bound under this feedback model. Under the all-winner (partial feedback), the algorithm achieves a regret of $\tilde{O}(K^{5/2} \sqrt{T})$ .

优点

The main strength of this work is to provide a new modelling of the action space to largely improve the regret of the considered problem. This new finding, together with the improvement, if correct, already makes a solid contribution in my view.

缺点

The manuscript has the following weaknesses in my view:

Although the all-winner feedback model could be first raised in the multi-unit auctions, other similar partial feedback models have already been introduced in different settings, e.g., Chen et al., 2024.
The paper's method is an improvement over the method of Branzei et al., 2024, as claimed. Yet, the authors do not provide details on their DAG equivalence method, which makes it a bit hard to appreciate all the details of the newly proposed method.
The authors could provide more intuitions on some technical details; see Questions.
There seems to be an extra "}" in Equation (4); And should Equation (6) be $b_k \geq (j + 1) \epsilon \geq j \epsilon \geq b_{k + 1}$ ?

[Ref]
Chen et al., Dynamic Budget Throttling in Repeated Second-Price Auctions, AAAI 2024.

问题

Could the authors provide more intuitions and details on the method of Branzei et al., 2024?
Could the authors provide some intuitions on Algorithm 2, the weight-pushing sampling? In my sense, this algorithm makes a sequential sampling of all $h$ 's in $\mathbf{h}$ . Please correct me if I am wrong.
In bandit and all-winner feedback models, why do you need to subtract $K$ in the numerator of the estimator? Could you please provide some intuition on this by comparing it with your treatment in the full feedback model where you do not do this step?

局限性

The authors have addressed the limitations of their work.

作者回复

2024-08-06

Thank you for your review.

W1. The all-winner feedback model is indeed a particular case of partial feedback that is widely studied. However, it has never been studied in multi-unit auctions which is the focus of our study. We will clarify this in the related work section.

W2/Q1. The key idea in [1] is the decomposition of the utility in a sum of terms where each term only depends on two consecutive bids. They therefore associate to every set of bids a path and the utility is seen as the sum over the utility of each edge. In contrast, we are able to decompose the utility as a sum of terms that can be estimated individually (without any need for a dependency in previous terms) and therefore obtain improved guarantees.

W4. Indeed equation (4) has an extra “}“ and equation (6) should be $b_k \geq (j+1) \epsilon > j \epsilon \geq b_{k+1}$ .

Q2. Your understanding is correct. Algorithm 2 indeed performs a sequential sampling of the $h$ ’s in $\mathbf{h}$ , this allows for both a more efficient sampling and the use of specific weights for each $h$ , see [3] for a more complete overview. One could also directly sample $\mathbf{h}$ , ensuring the weights of each $\mathbf{h}$ are the sum of all the weights of their components $h$ ’s. Intuitively this would have a higher complexity cost as it would sample in the product space instead of successively in each space. (for $N,M$ integers computing weights and sampling once in $[N]^M$ is exponentially less efficient than computing weights and sampling $M$ times an element of $[N]$ ).

Q3. Subtracting $K$ in the numerator is a technical detail which ensures that the value the estimator $\hat{u}^t(\mathbf{h})$ takes can be upper bounded which is necessary for the analysis of EXP3. In the full-information feedback, we directly use the utility (and not an estimator) $u^t(\mathbf{h})$ , which is already upper bounded by $K$ . The difference between Hedge and Exp3 is discussed in [2] notes 11.5 remark 3.

[1] Brânzei, Simina, et al. "Learning and collusion in multi-unit auctions." Advances in Neural Information Processing Systems 36 (2024).

[2] Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.

[3] TAKIMOTO, Eiji et WARMUTH, Manfred K. Path kernels and multiplicative updates. In : International Conference on Computational Learning Theory. Berlin, Heidelberg : Springer Berlin Heidelberg, 2002. p. 74-89.

2024-08-08

Thanks for your response.

审稿意见

评分: 6置信度: 32024-07-11

The paper studies no-regret bidding algorithms in multi-unit uniform-price auctions with adversarial competing bids. In this problem, the bidder has diminishing marginal values for units of the item, and submits one bid for each unit. The top K bids win, and the payment is uniformly the lowest winning bid for each unit of the item. The authors present an algorithm that (1) under bandit feedback, achieves regret $\tilde{O}(T^{2/3})$ , and (2) in a slightly more informative model where all winning bids are revealed, achieves regret $\tilde{O}(T^{1/2})$ . Both bounds are almost optimal.

优点

The paper makes concrete improvement and presents almost optimal bounds for a meaningful problem that has received considerable attention. The techniques appear nontrivial.

缺点

The paper is somewhat hard to navigate. For example, I was lost a few times when new concepts / definitions / constructions are introduced while it's not clear at the moment what purposes they serve. Some explanatory text would help. I'd also appreciate more motivation for the problem setup (e.g., diminishing marginal values). The paper could also use some polishing (there are many more small issues in addition to the ones listed below in detailed comments).

问题

(also including detailed comments)

Line 9: "this feedback interpolate ..."

Line 27: "this represent ..."

Line 28: "... others bidding strategies"

Line 41, "... with uniform pricing is strictly harder than with uniform pricing": the latter "uniform" should be "discriminatory"? Also I wouldn't say they "suggest the former is strictly harder". Something like "the former might be strictly harder" sounds more accurate.

"Auction rules" paragraph: it might help to quickly motivate diminishing marginal values here. I imagine someone unfamiliar with auctions / microeconomics may not immediately see why this makes sense.

Line 59: competing bids being adversarial (and not stochastic) seems like an important modeling choice, and I'd mention this upfront (e.g., in the abstract or earlier in the introduction).

Line 137: the gaps here seems to be $K^{3/2}$ instead of $K$ ?

Line 183, eq (4): brackets don't match.

局限性

No concerns.

作者回复

2024-08-06

Thank you for your review. Thank you for noticing the typos and for suggesting some improvements in the presentation of our approach. We will clarify this part and make the necessary correction in the revision.

Line 41: Indeed you are right, we will reformulate according to your suggestion.

“Auction rule paragraph”: The non-increasing assumption on the values $v_1,v_2,...,v_K$ results from the law of diminishing returns, a standard assumption in microeconomics see [2] chapter 2. It is also a standard assumption in multi-units uniform auction and is studied in [1] in the non repeated setting. We will add these comments just after line 61.

Line 59: We agree and will add that competing bids are adversarial in the abstract

Line 137: The gap with the lower bound is indeed a multiplicative factor of $K^{3/2}$ . In that sentence we meant to highlight that under all winner feedback our algorithms upper bound is only worse of a factor $K$ compared to the full-information upper bounds ( $\mathcal{O}(K^{5/2} \sqrt{T})$ versus $\mathcal{O}(K^{3/2} \sqrt{T})$ ). We take note that it can be ambiguous and will reformulate it.

Line 183: Thank you for noticing, it should be $h_{k+\frac{1}{2},j} (\mathbf{b}) = \mathbb{1} \left \\{ \exists \boldsymbol{\beta} \in B_{ \setminus \epsilon}, \\{x(\mathbf{b},\boldsymbol{\beta})=k \\} \cap \\{ (j+1)\epsilon > p(\mathbf{b},\boldsymbol{\beta}) > j\epsilon \\} \right \\}$

[1] Ausubel, Lawrence M., Peter Cramton, Marek Pycia, Marzena Rostek, and Marek Weretka. “Demand Reduction and Inefficiency in Multi-Unit Auctions.” The Review of Economic Studies 81, no. 4 (289) (2014): 1366–1400.

[2] Greenlaw, Steven A., et al. Principles of Microeconomics 2e. United States, OpenStax, 2017.

2024-08-13

Thank you for your helpful response. I don't have further questions.

审稿意见

评分: 7置信度: 32024-07-13

The paper analyzes repeated multi-unit uniform price auctions through the lens of online learning. At each time step, a bidder submits a sequence of bids $(b_1,\dots,b_K)$ to win up to $K$ identical items for which the bidder holds known valuations that depends only on the number of won items and are the same from one round to the other. Other bidders participate to these multi-unit uniform price auctions and the first $K$ higher bids are the one that get the items, paying the $K$ -th highest price or the $K+1$ -th highest price, depending on the model. This setting was already studied by Branzei et al. (2024), but they obtained suboptimal learning rates for this problem. In this paper, the authors close the gap in the regret rate and they further analyze and characterize the learning rates arising from a different type of feedback that was not previously considered.

优点

The paper provides interesting insights on how to tackle repeated multi-unit uniform price auctions and this understanding translates into improved (and matching in the time horizon) learning rate with respect to the SOTA. The paper also provides a model for a new type of feedback that could be of interest in concrete applications and also characterize (in the time horizon) the learning rate of this problem. Overall, the paper provides a valid contribution to the development of online learning in auctions.

缺点

Though this is quite common in this literature, the paper does not explore what can be done in strategic settings where also the other bidders learn, and what kind of dynamic might arise in this case. It would be interesting to model other bidders not as an oblivious environment, but rather as other learning agents.

Typo. Line 41. "suggesting that bidding multi-unit auctions with uniform pricing is strictly harder than with uniform pricing." I assume that the second should be discriminatory pricing.

问题

I see that there's still a mismatching rate when it comes to the other parameter $K$ . Do you think that this is an artifact of the proof or we need better algorithms / better lower bounds to close this gap?
What if also the other bidders learn? I think that in this case we probably need a different definition of the regret. Do you have in mind any way to tackle this problem?
What kind of dynamic could arise if also the other bidders utilize the algorithm you proposed? Does the dynamic converge somewhere?

局限性

The authors correctly state the assumptions under which the theorem they proved hold.

作者回复

2024-08-06

Thank you for your review.

Q1. Indeed our upper and lower bound do not match for the parameter K. We believe the lower bound in the bandit setting is not tight with respect to its dependency in $K$ , we expect the tight bound to depend on the scale of the utility $K$ . Regarding upper bounds, we do not see any obvious way to improve them. We therefore leave the question of closing the gap to future work and in particular the task to show whether the all-winner feedback is strictly harder than the full information feedback.

Q2. If the other bidders learn, we need to move from an oblivious adversary to an adaptive one. The notion of regret we should use remains the same : comparing the average utility to the best strategy in hindsight, but we need to allow the opposing bids at time $t$ , $\boldsymbol{\beta}^t$ to depend on the history up to time $t-1$ . Our algorithm is a variation of EXP3 for which the regret bounds can be generalized to an adaptive (or $*reactive*$ ) adversary, as discussed in [3] notes 11.5, point 4, this can be done without any change in the analysis. We are confident that the regret bounds therefore also hold in this setting.

Q3.The dynamics of play that arise when players use no-regret algorithms in a game is an active field of research. The multi-unit uniform auction is a game with continuous action space. However, even in finite games the dynamics are not fully characterized. In finite games, if all players play EXP3, the average play converges to a Coarse Correlated Equilibrium or Hannan set (see chapter 7.4 of [1]). The convergence of iterates towards a Nash equilibrium is only shown for finite potential games [2] which does not fit multi-unit uniform auctions. The current theory is therefore not sufficient to characterize the dynamics of play in our framework.

[1] Cesa-Bianchi, Nicolo, and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.

[2] Heliou, Amélie, Johanne Cohen, and Panayotis Mertikopoulos. "Learning with bandit feedback in potential games." Advances in Neural Information Processing Systems 30 (2017).

[3] Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.

最终决定Accept (poster)

2024-09-25

This manuscript studies the problem of online learning in repeated multi-unit uniform price auctions, an important auction format in the electricity market. The main contribution is the improvements over the regret bounds in [Branzei et al. 2024], mainly an improvement from $O(T^{3/4})$ to $O(T^{2/3})$ in the upper bound, and an improvement from $\Omega(T^{1/2})$ to $\Omega(T^{2/3})$ in the lower bound, both under the bandit feedback. The improved upper bound uses a clever and novel representation of the action space, together with a decomposition of the utility into a sum of functions of independent actions, rather than the action pairs in [Branzei et al. 2024]. The lower bound uses a simple reduction to first-price auctions with binary feedback when the LAB pricing rule is used; no improvement for the FRB pricing rule is available though. The reviewers unanimously appreciate the new action representation and utility decomposition, and I would also like to join the reviewers and recommend acceptance.