Contextual Dynamic Pricing with Heterogeneous Buyers
contextual dynamic pricing with heterogenous buyers using optimistic posterior sampling
摘要
评审与讨论
In this paper the authors propose algorithms for posting prices when buyers have unknown, diverse preferences and the seller sees some buyer-specific context. It learns these preferences over time to make better pricing decisions. The methods work both when the number of buyer types is known or unknown and adapt to different feedback levels, achieving near-optimal performance in all cases.
优缺点分析
To me, this paper makes an interesting contribution to the community working in this field, and I cannot identify any real weaknesses. The algorithms are simple, and the analysis is non-trivial. Furthermore, I believe that modifying the zooming algorithm to achieve tighter results in the presence of one-sided Lipschitzness is an interesting idea. As a minor suggestion, the authors could include a table summarizing all the notation, as it is quite annoying to search for symbols across different pages.
问题
Can these techniques be extended to different (and more general) settings? Can the author provide a short discussion on this point?
局限性
yes
最终评判理由
I gave the paper a positive score because I think it's a good article that offers an interesting contribution. Some improvements to the presentation of the paper should be implemented in the camera-ready version.
格式问题
n/a
Thank you for taking the time to review our submission and for the clarifying questions.
Thank you for the suggestion. We will include a table before “Basic pricing facts” in Section 2 summarizing all of the main notation (rev, dem, proj, br, etc.).
Yes, the techniques extend to a broader class of pricing problems. As noted in Appendix C.1, we present a more general formulation involving a class of contextual demand functions that are not necessarily Lipschitz. Provided that the disagreement coefficient of this class is bounded, our algorithm yields a regret guarantee that scales with the disagreement coefficient.
I keep my positive score even if I believe the authors should put more effort into making the presentation of the paper clearer, as suggested by other reviewers.
This paper studies the contextual dynamic pricing problem. In this problem, in each round, an unknown type is drawn from an unknown distribution, and a context is adversarially chosen. The value is then , and the agent should pick a price . The problem is to maximize the total revenue. The paper proposes an algorithm with a regret of when is known, and another algorithm with regret when is unknown. Here is the support size of the type distribution. The paper also considers the non-contextual case and the model with ex-post type observations and presents corresponding algorithms.
优缺点分析
Overall, the results given in this paper are interesting. Nevertheless, I think the paper has a lot to improve in its presentation. In general, the paper is written in a too technical and notation-heavy manner with many technical definitions, while the intuitions of the given algorithms/analysis are lacking. This makes the paper hard to follow, and also makes it hard to evaluate the contributions of this work. For example, for Section 3, I highly recommend that the authors put technical details into the appendix and present more intuition on the presented analysis (e.g., the disagreement coefficient), and address how the underlying techniques are related to previous works. In Sections 4 and 5, since the two algorithms discussed are not even given in the main body, the pure analysis seems too dry and hard to follow.
问题
In section 3.2, what do you mean by "take to be a large, finite cover of the full distribution space"? What is formally?
局限性
Yes.
最终评判理由
I will keep my score.
格式问题
No.
Thank you for taking the time to review our submission and for the clarifying questions.
Weakness, overly technical: As suggested by the reviewer, we will update the writing of the paper to present more intuition for the algorithms/concepts and move more of the technical analysis to the appendix. In particular, for the disagreement coefficient, we will add the following:
"In our application, each function will represent the gap between the demand function predicted by some model and that of the true model , for a fixed context. In our analysis, the distribution will be supported on prices which our algorithm estimates to be near-optimal based on historical feedback. The disagreement coefficient is small if, by playing price sampled from , it is unlikely for any model to disagree substantially with at if this model agrees with on average."
This is inherently a bit tricky to conceptualize due to the nested quantifiers, but we emphasize that the disagreement coefficient is a standard quantity of interest in active learning and empirical process theory (see the existing Remark 3.7).
In Section 4, we will remove the technical lemmas (4.2 and 4.3) and extend the discussion after the theorem to better describe the variance-aware zooming dimension and why it can be significantly smaller. The key here is that, when expected revenue at a price is small, so is the variance of the revenue at this price.
In Section 5, we will update the discussion after Theorem 5.2 to be less technical, instead focusing on the high-level fact that best responding to historical data leads to dependence on the VC dimension of the relevant function class.
Q1, cover details: is taken to be a large set of distributions such that every distribution over the type space is close to some under the Levy distance. See Appendix C.9 for the precise construction, where we specify “close” and also construct the needed nonuniform prior over . Since the exact details require some careful parameter tuning and several concerns were raised about Section 3 being too technical, we prefer to leave the full construction in the appendix. However, we will add a pointer after introducing to this appendix section to direct the reader to a more detailed discussion.
Thanks the authors for their reply. I appreciate that they will take efforts in improving their writing, but after all, this is not committed. I choose to maintain my score.
The paper studies the well-established problem of contextual dynamic pricing, assuming the existence of several buyers. Formally, the valuation at every round is a linear function of a context, known by the agent, and a vector of coefficients , drawn in an i.i.d. fashion with respect to the distribution with finite support, in which each element represents a different buyer. By employing a Thompson Sampling-inspired algorithm, the paper manages to bound the expected regret by where is the number of buyers, the dimension of the context space and the number of rounds. Concurrently, they prove a lower bound for the same model of the order of . The question of which is the right dependency with respect to is answered in the paper for the non-contextual case, for which a bound of the order of is obtained. Lastly, they provide some additional results for the case in which the learner can rely on some additional information, specifically, after every round, it has access to either the type of the buyer or the whole parameter vector .
优缺点分析
The paper investigates the well-known problem of dynamic pricing, focusing on the less-studied setting of multiple buyers, therefore the results are novel and of interest to the community. Furthermore, the paper is written in a clear and precise way where all the assumptions needed are clearly stated and the results, together with the steps to achieve them, are well-motivated and easy to follow.
The weaknesses of this work are the theoretical assumptions posed, which represent a limitation to the applicability of the results stated. Specifically, the setting analysed by this paper only considers the case in which the number of buyers is finite, by assuming that the distribution with which the vectors are drawn has bounded support.
I also want to point out that in [1], they also study the problem of contextual dynamic pricing with multiple buyers. While the model they consider is different and therefore the results they obtain are not comparable, I think it should be mentioned nonetheless.
[1] Contextual Pricing for Lipschitz Buyers, J. Mao et al., NeurIPS 2018
问题
-
On the line of what I stated in the Strengths and Weaknesses section, I wonder if the assumption on the boundedness of support of the buyers could be relaxed, for instance, assuming that only a finite amount of them are chosen with probability bigger than a threshold or assuming concentrated zones of buyers for which the corresponding parameters are similar.
-
The paper leaves the question of which is the correct dependency on in the general case open. Do you think it is possible to generalize the analysis of Section 4 to the contextual case, or do you have any insights about what the right rate could be?
局限性
The limitations have been addressed.
最终评判理由
I think this is a valid paper, hence I keep my positive score unchanged.
格式问题
I have not noticed any major formatting issues.
Thank you for taking the time to review our submission and for the clarifying questions.
Thank you for pointing out [1], we will update our literature review to include their work.
Regarding Q1, this is an interesting question. We first note that our bound on the disagreement coefficient extends even to unbounded type distributions (it is a consequence of monotonicity and a bounded number of jumps for the demand functions). That being said, our bound on the covering number requires that the support of the types is bounded. However, if instead of being supported in the unit cube, the types were supported in , we would still get a covering number bound of , leading to regret with only logarithmic dependence on . If one imposes additional structure on the type distribution, this would be reflected in a smaller covering number and lower regret. We will add a short remark to this effect at the end of Section 3.2.
Regarding Q2, our attempts to generalize the analysis of Section 4 to higher dimensions encountered a number of significant challenges. It is difficult to maintain uncertainty sets in high dimensions without incurring an exponential dependence in the dimension. While this is achievable for , a key difficulty for with multiple types is that you do not know which type the feedback you obtain at time corresponds to. This is why, in Section 5, we consider a model where types are identified, allowing for simpler, more efficient algorithms. Currently, we lack substantial evidence to make a conjecture on the optimal regret.
I thank the authors for their thoughtful answers to the points I have raised.
I see where the need for a bounded support arises, and I agree that it is not so trivial to relax that assumption. I had no doubt that it was possible to modify the result from the unit cube to without impacting the result; it might still be worth mentioning it, as it is a slight generalization and it clarifies the dependency with respect to the support, but the decision is (of course) up to you.
Reading the other reviews, I find myself agreeing with what was said by reviewers 4ydy and 9ki1 regarding the clarity of the exposition and the notation, which can appear quite heavy. Therefore, I invite you to really implement the changes you have suggested in the rebuttal to their review.
Nonetheless, I think this is a valid paper with sound contributions, I will keep my score unchanged and support its acceptance.
This paper considers the contextual pricing setting with a twist: the incoming agents/buyers have heterogeneous types that are sampled from an unknown distribution with support . Prior works considered the buyers to have fixed but unknown type. Clearly, the optimal context-dependent pricing strategy depends on knowing the unknown type distribution (this is the offline optimal against which regret is defined). The work proposes algorithms for this setting based optimal posterior sampling. When the distribution is known to be from a given set, and is known, a bound is given (Theorem 3.2) that is tight upto the factor (lower bound is for ). This guarantees are maintained when assumptions about knowing and distribution set are relaxed to an extent. The work also focuses on tightening the bound for for non-contextual settings (via zooming) and the setting where agent types are observed after each round (called ex-post observability)
优缺点分析
Overall I am in the middle with this paper
Strengths
The overall model of the paper is fairly attractive and natural. Agents having heterogenous types from an unknown distribution is very natural and standard in many dynamic pricing settings. The paper also answers the natural questions of learning under both type feedback (ex-post observability) and just realized actions. The technical work in the paper is fairly competent.
Weaknesses
My main criticism of the work is it seems to not mention or hand-wave some important claims. There's also parts of the work that seem a bit unfocused. I expand below.
In section 3.2, you assume that is not known but the algorithm uses a a finite large cover of the whole distribution space . This clearly is an epsilon cover, the size of which scales exponentially in d. So it is not clear to me whether the algorithm 2 is computationally efficient/feasible since we need to sample and update from this very large cover. Further, this also relies on the support size being discrete - this isn't a big issue, but it inconsistent with line 150 where you claim that may be infinite. I don't see any results in the paper that support this.
I also find the results on the non-contextual setting that leverages the zooming algorithm also a bit unfocused. The work is ostensibly about the contextual pricing setting, so it's uncertain what considering the non-contextual setting as a major section in the work achieves. This is akin to pricing a single item to a heterogenous set of buyers that arrive online - there have been relevant works here.
问题
See above.
局限性
yes
最终评判理由
I think the work has a fair technical contribution, but the exposition of the results and formalness of the arguments could be improved. The rebuttal maintained this perspective.
格式问题
none
Thank you for taking the time to review our submission and for the clarifying questions.
Weakness, exponential cover: Yes, the cover is exponential in , and Algorithm 2 is statistically efficient but computationally inefficient (as stated in the intro to Section 3). In this case, even designing a statistically efficient algorithm required significant care. These computational issues were the motivation for Section 5, where we incorporate additional information in order to design computationally efficient algorithms.
Weakness, infinite types case: You are correct to point out that the submission did not treat infinite in the contextual setting — we explored this regime but ended up removing the relevant remark due to space. We will add the following back to the end of Section 3.3 in the revised manuscript:
"Large/infinite : Our lower bound above can be restated as . When and the second term is active, POPS no longer has an advantage over EXP4 with naively discretized actions. In particular, one can run EXP4 with a policy cover of log cardinality and price set , incurring regret overhead due to discretization. This gives a regret bound of , which balances out to after tuning. Characterizing the optimal regret in this large regime is an interesting question beyond the current scope."
In the non-contextual case, Theorem 4.1 features a worst-case bound which holds for all (even infinite).
Weakness, non-contextual contribution: The non-contextual setting in the paper is significant in trying to understand what the optimal dependence on , the number of types, is. For the contextual setting, we obtain regret but are unable to achieve . However, our result shows that dependence is possible in some cases. Moreover, our bound closes a gap in the existing literature on dynamic pricing with finitely many types, fully resolving the optimal regret up to logarithmic factors. As discussed in Remark 4.4, the recent work of Cesa-Bianchi, Cesari, & Perchet includes an additional instance-dependent term that can explode to infinity. Finally, we believe that our algorithmic methodology and analysis, which replaces the standard zooming dimension with a smaller variance-aware dimension, may be of independent interest.
This paper studies dynamic pricing policy optimization in the presence of heterogeneous buyers, each associated with an unknown parameter . The authors build upon the recently proposed Optimistic Posterior Sampling (OPS) approach to develop their algorithm. They establish a near-optimal regret bound of , which is optimal in both the dimension and the time horizon . Furthermore, they derive a sharper upper bound of in the special case where covariates are not involved.
优缺点分析
Strength: This paper explores a novel and important setting in dynamic pricing, where buyers are heterogeneous and not drawn from a single population. The authors propose a new algorithm along with theoretical guarantees that are near-optimal, making a valuable contribution to the literature.
Weakness: The writing could be improved for clarity. In particular, the paper introduces many non-standard notations, which would benefit from clearer explanations. It would be helpful if each assumption or definition were followed by an intuitive explanation or example to aid reader understanding.
问题
-
The authors model the buyer's valuation as , implying that the customer's value is fully determined by the observed covariates. It would be more realistic and flexible to include a noise term in this formulation—for example, , where is an i.i.d. random noise term. This would better capture potential randomness in customer behavior not explained by covariates.
-
As mentioned in the weakness section, the clarity of the paper could be significantly improved by refining the notation. For example, the function is defined but not clearly explained, and some notations like may be confusing to readers. Additionally, when citing results or introducing new concepts, it would be helpful to provide more detailed explanations or context. Without this, readers may find it difficult to follow the technical developments.
-
It would be beneficial to include more real data experiments to demonstrate the effectiveness of this proposed method in reality and compare it with existing methods.
局限性
See the questions.
最终评判理由
I have read the authors' rebuttal and the other reviews. I would prefer to keep my current score, as my concerns have not been fully addressed. The paper would benefit from an empirical evaluation to support the proposed method.
格式问题
NA
Thank you for taking the time to review our submission and for the clarifying questions.
Q1, added noise: In the presence of such added noise, our existing analysis mostly goes through, except for the bound on the disagreement coefficient. This noise smooths out demand functions, and we would then require a bound on the disagreement coefficient for the induced family of smoothed demand functions. When the noise has low variance, matching our existing disagreement coefficient bound is still possible (after introducing price discretization). In fact, our extension from a finite to infinite model class in Section 3.2 employs such an argument (see end of proof in Section B.6). The analysis there can tolerate bounded noise up to scale 1/T, without worsening our regret bound. A slight extension of this argument can extend to sub-Gaussian noise of the same scale. We will add a remark summarizing this explanation for the noisy setting to the end of Section 3.2.
We suspect that noise at substantially larger scales would worsen the achievable regret, but an exact characterization of this question is beyond the current scope. We remark that a noise threshold of 1/T appears in previous literature on contextual search with a single type, while noise at much larger levels leads to a jump in regret from logarithmic to polynomial. We expect that a similar jump arises in our setting but this is definitely an intriguing open direction and we will explicitly add it as an open question.
Q2, clarity: We apologize for the confusion regarding notation. We will tweak the introduction of the projection operator as follows:
"Once we restrict to a fixed context , each type induces value . Thus, each type distribution induces a projected value distribution , where denotes the probability law of when ."
Moreover, we will add a table in Section 2, before “Basic pricing facts”, which contains all of the main notation terms with their definitions (proj, rev, dem, gap, etc.).
We will seek to include clarifying explanations where appropriate throughout the paper. In particular, for the disagreement coefficient, we will add the following:
"In our application, each function will represent the gap between the demand function predicted by some model and that of the true model , for a fixed context. In our analysis, the distribution will be supported on prices which our algorithm estimates to be near-optimal based on historical feedback. The disagreement coefficient is small if, by playing price sampled from , it is unlikely for any model to disagree substantially with at if this model agrees with on average."
This is inherently a bit tricky to conceptualize due to the nested quantifiers, but we emphasize that the disagreement coefficient is a standard quantity of interest in active learning and empirical process theory (see the existing Remark 3.7).
Q3, experiments: Unfortunately, there are significant limitations to performing meaningful experiments (beyond fully synthetic ones which simply verify the theorem statements). Indeed, while there are large datasets containing purchase records of the form “item - buyer - price - purchase decision”, the buyers’ values are unavailable. Thus, in order to determine purchase decisions for prices not used in the dataset, we must simulate buyer behavior. One could fit a -types model to the dataset, and then use this for simulations/evaluation, but such an experiment would not tell us whether or not the performance of POPS translates to practice (since we are using the -types assumption to simulate). In summary, it is non-trivial to design a proper experiment for our setting due to the necessary reliance on synthetic purchase feedback, and we view this as beyond the scope of this work.
The paper studies a contextual dynamic pricing problem in which buyers can be categorized into types, each characterized by an unknown feature vector. Buyers arrive according to an unknown i.i.d. distribution (with features ), alongside contextual information , and the buyer's valuation is equal to . The seller posts a price only based on , and the buyer purchases the item only if its valuation is higher than the price. The seller aims to maximize its cumulative revenue. The authors propose a pricing algorithm based on Optimistic Posterior Sampling, which exhibits a regret bound of after rounds (and for -dimensional contexts). The bound is shown to be tight in and . They additionally prove a tighter bound of for the non-contextual case. Finally, they provide additional results when the seller receives additional information on the buyer at the end of each round.
The reviewers found the model to be novel and well-motivated. They also mostly considered the technical contributions to be valuable and the results interesting and relevant to the community. The only major concern raised by the reviewers was due to the clarity of the exposition and the notations used. The authors have committed to improving this aspect and are expected to implement the modifications they promised in their response.