PaperHub
6.8
/10
Poster5 位审稿人
最低3最高5标准差0.7
5
3
5
4
4
3.2
置信度
创新性2.8
质量2.8
清晰度2.4
重要性3.0
NeurIPS 2025

Contextual Dynamic Pricing with Heterogeneous Buyers

OpenReviewPDF
提交: 2025-05-12更新: 2025-10-29
TL;DR

contextual dynamic pricing with heterogenous buyers using optimistic posterior sampling

摘要

We initiate the study of contextual dynamic pricing with a heterogeneous population of buyers, where a seller repeatedly posts prices (over $T$ rounds) that depend on the observable $d$-dimensional context and receives binary purchase feedback. Unlike prior work assuming homogeneous buyer types, in our setting the buyer's valuation type is drawn from an unknown distribution with finite support size $K_{\star}$. We develop a contextual pricing algorithm based on optimistic posterior sampling with regret $\widetilde{O}(K_{\star}\sqrt{dT})$, which we prove to be tight in $d$ and $T$ up to logarithmic terms. Finally, we refine our analysis for the non-contextual pricing case, proposing a variance-aware zooming algorithm that achieves the optimal dependence on $K_{\star}$.
关键词
dynamic pricingbanditscontextual

评审与讨论

审稿意见
5

In this paper the authors propose algorithms for posting prices when buyers have unknown, diverse preferences and the seller sees some buyer-specific context. It learns these preferences over time to make better pricing decisions. The methods work both when the number of buyer types is known or unknown and adapt to different feedback levels, achieving near-optimal performance in all cases.

优缺点分析

To me, this paper makes an interesting contribution to the community working in this field, and I cannot identify any real weaknesses. The algorithms are simple, and the analysis is non-trivial. Furthermore, I believe that modifying the zooming algorithm to achieve tighter results in the presence of one-sided Lipschitzness is an interesting idea. As a minor suggestion, the authors could include a table summarizing all the notation, as it is quite annoying to search for symbols across different pages.

问题

Can these techniques be extended to different (and more general) settings? Can the author provide a short discussion on this point?

局限性

yes

最终评判理由

I gave the paper a positive score because I think it's a good article that offers an interesting contribution. Some improvements to the presentation of the paper should be implemented in the camera-ready version.

格式问题

n/a

作者回复

Thank you for taking the time to review our submission and for the clarifying questions.

Thank you for the suggestion. We will include a table before “Basic pricing facts” in Section 2 summarizing all of the main notation (rev, dem, proj, br, etc.).

Yes, the techniques extend to a broader class of pricing problems. As noted in Appendix C.1, we present a more general formulation involving a class of contextual demand functions that are not necessarily Lipschitz. Provided that the disagreement coefficient of this class is bounded, our algorithm yields a regret guarantee that scales with the disagreement coefficient.

评论

I keep my positive score even if I believe the authors should put more effort into making the presentation of the paper clearer, as suggested by other reviewers.

审稿意见
3

This paper studies the contextual dynamic pricing problem. In this problem, in each round, an unknown type θt\theta_t is drawn from an unknown distribution, and a context utu_t is adversarially chosen. The value is then vt=θtutv_t = \theta_t \cdot u_t, and the agent should pick a price ptp_t. The problem is to maximize the total revenue. The paper proposes an algorithm with a regret of O~(KT)\tilde{O}(\sqrt{K_* T}) when KK_* is known, and another algorithm with regret O~(KT)\tilde{O}(K_*\sqrt{T}) when KK_* is unknown. Here KK_* is the support size of the type distribution. The paper also considers the non-contextual case and the model with ex-post type observations and presents corresponding algorithms.

优缺点分析

Overall, the results given in this paper are interesting. Nevertheless, I think the paper has a lot to improve in its presentation. In general, the paper is written in a too technical and notation-heavy manner with many technical definitions, while the intuitions of the given algorithms/analysis are lacking. This makes the paper hard to follow, and also makes it hard to evaluate the contributions of this work. For example, for Section 3, I highly recommend that the authors put technical details into the appendix and present more intuition on the presented analysis (e.g., the disagreement coefficient), and address how the underlying techniques are related to previous works. In Sections 4 and 5, since the two algorithms discussed are not even given in the main body, the pure analysis seems too dry and hard to follow.

问题

In section 3.2, what do you mean by "take D\mathcal D to be a large, finite cover of the full distribution space"? What is D\mathcal D formally?

局限性

Yes.

最终评判理由

I will keep my score.

格式问题

No.

作者回复

Thank you for taking the time to review our submission and for the clarifying questions.

Weakness, overly technical: As suggested by the reviewer, we will update the writing of the paper to present more intuition for the algorithms/concepts and move more of the technical analysis to the appendix. In particular, for the disagreement coefficient, we will add the following:

"In our application, each function ff will represent the gap between the demand function predicted by some model DD and that of the true model D_D\_\star, for a fixed context. In our analysis, the distribution ν\nu will be supported on prices which our algorithm estimates to be near-optimal based on historical feedback. The disagreement coefficient is small if, by playing price pp sampled from ν\nu, it is unlikely for any model DD to disagree substantially with D_D\_\star at pp if this model agrees with D_D\_\star on average."

This is inherently a bit tricky to conceptualize due to the nested quantifiers, but we emphasize that the disagreement coefficient is a standard quantity of interest in active learning and empirical process theory (see the existing Remark 3.7).

In Section 4, we will remove the technical lemmas (4.2 and 4.3) and extend the discussion after the theorem to better describe the variance-aware zooming dimension and why it can be significantly smaller. The key here is that, when expected revenue at a price is small, so is the variance of the revenue at this price.

In Section 5, we will update the discussion after Theorem 5.2 to be less technical, instead focusing on the high-level fact that best responding to historical data leads to dependence on the VC dimension of the relevant function class.

Q1, cover details: D\mathcal{D} is taken to be a large set of distributions such that every distribution DD over the type space Θ\Theta is close to some DDD’ \in \mathcal{D} under the Levy distance. See Appendix C.9 for the precise construction, where we specify “close” and also construct the needed nonuniform prior over D\mathcal{D}. Since the exact details require some careful parameter tuning and several concerns were raised about Section 3 being too technical, we prefer to leave the full construction in the appendix. However, we will add a pointer after introducing D\mathcal{D} to this appendix section to direct the reader to a more detailed discussion.

评论

Thanks the authors for their reply. I appreciate that they will take efforts in improving their writing, but after all, this is not committed. I choose to maintain my score.

审稿意见
5

The paper studies the well-established problem of contextual dynamic pricing, assuming the existence of several buyers. Formally, the valuation vtv_t at every round is a linear function of a context, known by the agent, and a vector of coefficients θt\theta_t, drawn in an i.i.d. fashion with respect to the distribution DD with finite support, in which each element represents a different buyer. By employing a Thompson Sampling-inspired algorithm, the paper manages to bound the expected regret by KdTK_{\star}\sqrt{dT} where KK_{\star} is the number of buyers, dd the dimension of the context space and TT the number of rounds. Concurrently, they prove a lower bound for the same model of the order of KdT\sqrt{K_{\star}dT}. The question of which is the right dependency with respect to KK_{\star} is answered in the paper for the non-contextual case, for which a bound of the order of O(min{KT,T2/3})\mathcal{O}(\min\{\sqrt{K_{\star}T}, T^{2/3}\}) is obtained. Lastly, they provide some additional results for the case in which the learner can rely on some additional information, specifically, after every round, it has access to either the type of the buyer or the whole parameter vector θt\theta_t.

优缺点分析

The paper investigates the well-known problem of dynamic pricing, focusing on the less-studied setting of multiple buyers, therefore the results are novel and of interest to the community. Furthermore, the paper is written in a clear and precise way where all the assumptions needed are clearly stated and the results, together with the steps to achieve them, are well-motivated and easy to follow.

The weaknesses of this work are the theoretical assumptions posed, which represent a limitation to the applicability of the results stated. Specifically, the setting analysed by this paper only considers the case in which the number of buyers is finite, by assuming that the distribution with which the vectors θt\theta_t are drawn has bounded support.

I also want to point out that in [1], they also study the problem of contextual dynamic pricing with multiple buyers. While the model they consider is different and therefore the results they obtain are not comparable, I think it should be mentioned nonetheless.


[1] Contextual Pricing for Lipschitz Buyers, J. Mao et al., NeurIPS 2018

问题

  1. On the line of what I stated in the Strengths and Weaknesses section, I wonder if the assumption on the boundedness of support of the buyers could be relaxed, for instance, assuming that only a finite amount of them are chosen with probability bigger than a threshold or assuming concentrated zones of buyers for which the corresponding parameters θ\theta are similar.

  2. The paper leaves the question of which is the correct dependency on KK_{\star} in the general case open. Do you think it is possible to generalize the analysis of Section 4 to the contextual case, or do you have any insights about what the right rate could be?

局限性

The limitations have been addressed.

最终评判理由

I think this is a valid paper, hence I keep my positive score unchanged.

格式问题

I have not noticed any major formatting issues.

作者回复

Thank you for taking the time to review our submission and for the clarifying questions.

Thank you for pointing out [1], we will update our literature review to include their work.

Regarding Q1, this is an interesting question. We first note that our bound on the disagreement coefficient extends even to unbounded type distributions (it is a consequence of monotonicity and a bounded number of jumps for the demand functions). That being said, our bound on the covering number requires that the support of the types is bounded. However, if instead of being supported in the unit cube, the types were supported in [0,R]d[0,R]^d, we would still get a covering number bound of O~(dK_log(R/ε))\widetilde{O}(d K\_\star \log(R/\varepsilon)), leading to regret O~(K_dTlogR)\widetilde{O}(K\_\star \sqrt{d T \log R}) with only logarithmic dependence on RR. If one imposes additional structure on the type distribution, this would be reflected in a smaller covering number and lower regret. We will add a short remark to this effect at the end of Section 3.2.

Regarding Q2, our attempts to generalize the analysis of Section 4 to higher dimensions encountered a number of significant challenges. It is difficult to maintain uncertainty sets in high dimensions without incurring an exponential dependence in the dimension. While this is achievable for d=1d=1, a key difficulty for d>1d > 1 with multiple types is that you do not know which type the feedback you obtain at time tt corresponds to. This is why, in Section 5, we consider a model where types are identified, allowing for simpler, more efficient algorithms. Currently, we lack substantial evidence to make a conjecture on the optimal regret.

评论

I thank the authors for their thoughtful answers to the points I have raised.

I see where the need for a bounded support arises, and I agree that it is not so trivial to relax that assumption. I had no doubt that it was possible to modify the result from the unit cube to [0,R]d[0, R]^d without impacting the result; it might still be worth mentioning it, as it is a slight generalization and it clarifies the dependency with respect to the support, but the decision is (of course) up to you.

Reading the other reviews, I find myself agreeing with what was said by reviewers 4ydy and 9ki1 regarding the clarity of the exposition and the notation, which can appear quite heavy. Therefore, I invite you to really implement the changes you have suggested in the rebuttal to their review.

Nonetheless, I think this is a valid paper with sound contributions, I will keep my score unchanged and support its acceptance.

审稿意见
4

This paper considers the contextual pricing setting with a twist: the incoming agents/buyers have heterogeneous types that are sampled from an unknown distribution with support KK^* . Prior works considered the buyers to have fixed but unknown type. Clearly, the optimal context-dependent pricing strategy depends on knowing the unknown type distribution (this is the offline optimal against which regret is defined). The work proposes algorithms for this setting based optimal posterior sampling. When the distribution is known to be from a given set, and KK^* is known, a O~(KdT)\tilde{O}(K^*\sqrt{dT}) bound is given (Theorem 3.2) that is tight upto the KK factor (lower bound is for K\sqrt{K}). This guarantees are maintained when assumptions about knowing KK and distribution set are relaxed to an extent. The work also focuses on tightening the bound for KK for non-contextual settings (via zooming) and the setting where agent types are observed after each round (called ex-post observability)

优缺点分析

Overall I am in the middle with this paper

Strengths

The overall model of the paper is fairly attractive and natural. Agents having heterogenous types from an unknown distribution is very natural and standard in many dynamic pricing settings. The paper also answers the natural questions of learning under both type feedback (ex-post observability) and just realized actions. The technical work in the paper is fairly competent.

Weaknesses

My main criticism of the work is it seems to not mention or hand-wave some important claims. There's also parts of the work that seem a bit unfocused. I expand below.

In section 3.2, you assume that D\mathcal{D} is not known but the algorithm uses a a finite large cover of the whole distribution space Δ(Θ)\Delta(|\Theta|). This clearly is an epsilon cover, the size of which scales exponentially in d. So it is not clear to me whether the algorithm 2 is computationally efficient/feasible since we need to sample and update from this very large cover. Further, this also relies on the support size being discrete - this isn't a big issue, but it inconsistent with line 150 where you claim that KK^* may be infinite. I don't see any results in the paper that support this.

I also find the results on the non-contextual setting that leverages the zooming algorithm also a bit unfocused. The work is ostensibly about the contextual pricing setting, so it's uncertain what considering the non-contextual setting as a major section in the work achieves. This is akin to pricing a single item to a heterogenous set of buyers that arrive online - there have been relevant works here.

问题

See above.

局限性

yes

最终评判理由

I think the work has a fair technical contribution, but the exposition of the results and formalness of the arguments could be improved. The rebuttal maintained this perspective.

格式问题

none

作者回复

Thank you for taking the time to review our submission and for the clarifying questions.

Weakness, exponential cover: Yes, the cover is exponential in dd, and Algorithm 2 is statistically efficient but computationally inefficient (as stated in the intro to Section 3). In this case, even designing a statistically efficient algorithm required significant care. These computational issues were the motivation for Section 5, where we incorporate additional information in order to design computationally efficient algorithms.

Weakness, infinite types case: You are correct to point out that the submission did not treat infinite K_K\_\star in the contextual setting — we explored this regime but ended up removing the relevant remark due to space. We will add the following back to the end of Section 3.3 in the revised manuscript:

"Large/infinite K_K\_\star: Our lower bound above can be restated as Ω(minK_dT,d1/3T2/3)\Omega(\min\\{\sqrt{K\_\star d T}, d^{1/3}T^{2/3}\\}). When K_=Ω((T/d)1/3)K\_\star = \Omega((T/d)^{1/3}) and the second term is active, POPS no longer has an advantage over EXP4 with naively discretized actions. In particular, one can run EXP4 with a policy cover of log cardinality O~(dε2)\widetilde{O}(d \varepsilon^{-2}) and price set {ε,2ε,,1}\{\varepsilon,2\varepsilon, \dots, 1\}, incurring regret overhead εT\varepsilon T due to discretization. This gives a regret bound of O~(Td2ε3+εT)\widetilde{O}(\sqrt{Td^2\varepsilon^{-3}} + \varepsilon T), which balances out to d2/5T4/5d^{2/5}T^{4/5} after tuning. Characterizing the optimal regret in this large KK_\star regime is an interesting question beyond the current scope."

In the non-contextual case, Theorem 4.1 features a worst-case T2/3T^{2/3} bound which holds for all K_K\_\star (even infinite).

Weakness, non-contextual contribution: The non-contextual setting in the paper is significant in trying to understand what the optimal dependence on K_K\_\star, the number of types, is. For the contextual setting, we obtain regret K_dTK\_\star \sqrt{dT} but are unable to achieve K_dT\sqrt{K\_\star dT}. However, our d=1d=1 result shows that K_\sqrt{K\_\star} dependence is possible in some cases. Moreover, our K_T\sqrt{K\_\star T} bound closes a gap in the existing literature on dynamic pricing with finitely many types, fully resolving the optimal regret up to logarithmic factors. As discussed in Remark 4.4, the recent work of Cesa-Bianchi, Cesari, & Perchet includes an additional instance-dependent term that can explode to infinity. Finally, we believe that our algorithmic methodology and analysis, which replaces the standard zooming dimension with a smaller variance-aware dimension, may be of independent interest.

审稿意见
4

This paper studies dynamic pricing policy optimization in the presence of KK^\star heterogeneous buyers, each associated with an unknown parameter θt\theta_t. The authors build upon the recently proposed Optimistic Posterior Sampling (OPS) approach to develop their algorithm. They establish a near-optimal regret bound of O(KdT)O(K^\star \sqrt{dT}), which is optimal in both the dimension dd and the time horizon TT. Furthermore, they derive a sharper upper bound of O(KT)O(\sqrt{K^\star T}) in the special case where covariates are not involved.

优缺点分析

Strength: This paper explores a novel and important setting in dynamic pricing, where buyers are heterogeneous and not drawn from a single population. The authors propose a new algorithm along with theoretical guarantees that are near-optimal, making a valuable contribution to the literature.

Weakness: The writing could be improved for clarity. In particular, the paper introduces many non-standard notations, which would benefit from clearer explanations. It would be helpful if each assumption or definition were followed by an intuitive explanation or example to aid reader understanding.

问题

  1. The authors model the buyer's valuation as vt=utθtv_t = \mathbf{u}_t^\top \theta_t , implying that the customer's value is fully determined by the observed covariates. It would be more realistic and flexible to include a noise term in this formulation—for example, vt=utθt+Ztv_t = \mathbf{u}_t^\top \theta_t + Z_t , where ZtZ_t is an i.i.d. random noise term. This would better capture potential randomness in customer behavior not explained by covariates.

  2. As mentioned in the weakness section, the clarity of the paper could be significantly improved by refining the notation. For example, the function proj()\mathrm{proj}(\cdot) is defined but not clearly explained, and some notations like demQ\mathrm{dem}_Q may be confusing to readers. Additionally, when citing results or introducing new concepts, it would be helpful to provide more detailed explanations or context. Without this, readers may find it difficult to follow the technical developments.

  3. It would be beneficial to include more real data experiments to demonstrate the effectiveness of this proposed method in reality and compare it with existing methods.

局限性

See the questions.

最终评判理由

I have read the authors' rebuttal and the other reviews. I would prefer to keep my current score, as my concerns have not been fully addressed. The paper would benefit from an empirical evaluation to support the proposed method.

格式问题

NA

作者回复

Thank you for taking the time to review our submission and for the clarifying questions.

Q1, added noise: In the presence of such added noise, our existing analysis mostly goes through, except for the bound on the disagreement coefficient. This noise smooths out demand functions, and we would then require a bound on the disagreement coefficient for the induced family of smoothed demand functions. When the noise has low variance, matching our existing disagreement coefficient bound is still possible (after introducing price discretization). In fact, our extension from a finite to infinite model class in Section 3.2 employs such an argument (see end of proof in Section B.6). The analysis there can tolerate bounded noise up to scale 1/T, without worsening our regret bound. A slight extension of this argument can extend to sub-Gaussian noise of the same scale. We will add a remark summarizing this explanation for the noisy setting to the end of Section 3.2.

We suspect that noise at substantially larger scales would worsen the achievable regret, but an exact characterization of this question is beyond the current scope. We remark that a noise threshold of 1/T appears in previous literature on contextual search with a single type, while noise at much larger levels leads to a jump in regret from logarithmic to polynomial. We expect that a similar jump arises in our setting but this is definitely an intriguing open direction and we will explicitly add it as an open question.

Q2, clarity: We apologize for the confusion regarding notation. We will tweak the introduction of the projection operator as follows:

"Once we restrict to a fixed context uUu \in \mathcal{U}, each type θΘ\theta \in \Theta induces value v=u,θv = \langle u,\theta \rangle. Thus, each type distribution DΔ(Θ)D \in \Delta(\Theta) induces a projected value distribution Q=proj(D,u)Δ([0,1])Q = \mathsf{proj}(D,u) \in \Delta([0,1]), where proj(D,u)\mathsf{proj}(D,u) denotes the probability law of u,θ\langle u,\theta \rangle when θD\theta \sim D."

Moreover, we will add a table in Section 2, before “Basic pricing facts”, which contains all of the main notation terms with their definitions (proj, rev, dem, gap, etc.).

We will seek to include clarifying explanations where appropriate throughout the paper. In particular, for the disagreement coefficient, we will add the following:

"In our application, each function ff will represent the gap between the demand function predicted by some model DD and that of the true model D_D\_\star, for a fixed context. In our analysis, the distribution ν\nu will be supported on prices which our algorithm estimates to be near-optimal based on historical feedback. The disagreement coefficient is small if, by playing price pp sampled from ν\nu, it is unlikely for any model DD to disagree substantially with D_D\_\star at pp if this model agrees with D_D\_\star on average."

This is inherently a bit tricky to conceptualize due to the nested quantifiers, but we emphasize that the disagreement coefficient is a standard quantity of interest in active learning and empirical process theory (see the existing Remark 3.7).

Q3, experiments: Unfortunately, there are significant limitations to performing meaningful experiments (beyond fully synthetic ones which simply verify the theorem statements). Indeed, while there are large datasets containing purchase records of the form “item - buyer - price - purchase decision”, the buyers’ values are unavailable. Thus, in order to determine purchase decisions for prices not used in the dataset, we must simulate buyer behavior. One could fit a K_K\_\star-types model to the dataset, and then use this for simulations/evaluation, but such an experiment would not tell us whether or not the performance of POPS translates to practice (since we are using the K_K\_\star-types assumption to simulate). In summary, it is non-trivial to design a proper experiment for our setting due to the necessary reliance on synthetic purchase feedback, and we view this as beyond the scope of this work.

最终决定

The paper studies a contextual dynamic pricing problem in which buyers can be categorized into KK_* types, each characterized by an unknown feature vector. Buyers arrive according to an unknown i.i.d. distribution (with features θt\theta_t), alongside contextual information utu_t, and the buyer's valuation is equal to vt=θtTutv_t=\theta_t^Tu_t. The seller posts a price only based on utu_t, and the buyer purchases the item only if its valuation is higher than the price. The seller aims to maximize its cumulative revenue. The authors propose a pricing algorithm based on Optimistic Posterior Sampling, which exhibits a regret bound of O(KdT)O(K_*\sqrt{dT}) after TT rounds (and for d d-dimensional contexts). The bound is shown to be tight in dd and TT. They additionally prove a tighter bound of O(KT)O(\sqrt{K_*T}) for the non-contextual case. Finally, they provide additional results when the seller receives additional information on the buyer at the end of each round.

The reviewers found the model to be novel and well-motivated. They also mostly considered the technical contributions to be valuable and the results interesting and relevant to the community. The only major concern raised by the reviewers was due to the clarity of the exposition and the notations used. The authors have committed to improving this aspect and are expected to implement the modifications they promised in their response.