Fairness-Regularized Online Optimization with Switching Costs
摘要
评审与讨论
The paper studies an online optimization problem, where an algorithm must make an irrevocable decision in every round without knowing future costs. The authors study an objective that is composed of three parts: the cost of the chosen action in each round (hitting cost), the cost of switching the action from the previous round (switching cost), and a cost that measure the fairness of the solution. Importantly, the fairness regularization can only be evaluated at the end of the instance. This smooth fairness regularized setting thus generalizes previously studied smoothed online optimization without fairness and fairness-regularized online optimization without switching cost (smoothness).
The authors present an online algorithm and prove regret and competitive ratio bounds, which recover the results of the previous settings. A crucial component of the author's approach is to decompose the fairness cost into online costs via a new regularizer and auxiliary variables, and to maintain approximate fairness through a Lagrangian‐multiplier mirror‐descent update, while balancing hitting cost, switching cost, and fairness cost in each round.
优缺点分析
Strengths:
- Formulation and motivation of the problem
- Asymptotically tight guarantees; good model of resource augmentation (restricted optimum)
- The algorith is neat; it uses mirror descent with an additional auxiliary variable.
- Empirical evaluation that confirms the theoretical findings.
Weaknesses:
- Requires strong assumptions on fairness cost: convex and Lipschitz.
- The result is somewhat incremental (given that different subsets of the problem have been studied before)
问题
- What can be done if the fairness cost is non-convex and Lipschitz?
- How should the different hyperparameters (e.g. \lambda) be chosen for a good performance in practice?
局限性
All limitations have been addressed.
最终评判理由
The authors sufficiently addressed my concerns. I think the paper has sufficient content and results to be accepted at NeurIPS.
格式问题
Formatting looks good.
We sincerely appreciate the reviewer’s valuable feedbacks on our paper and we are happy to address the reviewer’s concerns.
Assumptions for Fairness Cost: For the long-term fairness cost, depending if we are minimizing costs or maximizing rewards, convexity or concavity is a commonly adopted assumption in existing literature (e.g. [6], [34]). The convexity for fairness cost is commonly observed in real world applications, such as the min-max fairness consideration in cloud resource provisioning. This is intuitive because the reward or cost doesn’t increase linearly with respect to the actions. Instead, for the rewards, as the law of diminishing marginal utility indicates, the more resource we allocate, the fewer marginal improvements will be, which directly implies concavity. Similarly, the health risks become more and more severe as the air pollution in a certain region increases.
In addition, this convexity is technically essential to our analysis and other related studies (e.g., [6]) that deal with similar long-term fairness regularizers. At a high level, we decouple the long-term fairness cost in Eq(1) to the temporally decoupled costs in Eq(3) using the auxiliary variable . During this conversion, it is important to ensure the optimal solutions in Eq(3) and Eq(1) are identical. If the fairness function is not convex, when we substitute the optimal solution of Eq(1) into Eq(3), there may existing some sequences of that further minimizes the . By rebalancing the fairness costs, hitting costs and switching costs, it’s likely that the optimal solution of Eq(3) is different from Eq(1), which invalidates this essential conversion step. To ensure there is no sequence that can further minimize the last term in Eq(3), we need to guarantee the average of functions () is always larger than the function of the average (), which directly implies the convexity property. To sum up, convexity for the fairness cost is fundamental in our design as well as other related studies (e.g., [6]) that deal with similar fairness regularizers.
The Lipschitz assumption further bounds the impact of variations in the frame-wise average action across the horizon (denoted by ) on the overall cost. It's also used to constrain the step-wise cost differences in our proofs. If fairness costs are naturally bounded over their domain, these effects remain controlled. Alternatively, enforcing a stricter benchmark constraint () could obviate the need for Lipschitz continuity when bounding the impact of variations. In summary, while relaxing these assumptions would require alternative proof techniques and potentially different results, removing Lipschitz continuity remains technically feasible.
Technical Contributions:
First of all, in the presence of switching costs, our algorithm significantly differs from existing methods dealing with long-term fairness. Existing works (e.g., [6], [7]) require the cost objectives (except the long-term cost) to be temporally independent, which is a crucial requirement for their weak duality proofs. In contrast, our objective, after fairness decomposition, is more challenging than [6] due to the switching cost. Specifically, the switching cost breaks the temporal independence, thereby voiding the direct use of existing techniques in [6]. Our novel proof eliminates the need for such independence by focusing on frame-wise optimal action sequences, enabling comparison between different action sequences. Therfore, due to the added switching cost, our proof fundamentally differs from the proofs in [6], which require the per-step cost to be independent of the previous actions once is decided. On the other hand, the proof in [16] for SOCO only allows a difference in between the online algorithm and optimal offline solution, but cannot allow another decision variable in the per-step objective. Simply combining the proofs for SOCO (e.g. [10],[16]) and fairness regularization (e.g. [6]) does not yield provable bounds.
Secondly, we address the significant challenges in generalizing algorithms designed for standard smooth online convex optimization (SOCO) problems to incorporate long-term fairness costs. This difficulty arises because fairness costs couple all actions within the horizon, unlike SOCO's typical one-step dependence. Specifically, SOCO's per-step objective relies only on the previous action, whereas our per-step objective (Equation 5 in Algorithm 1 in FairOBD) depends on a variable influenced by all prior actions, highlighting the non-trivial nature of integrating fairness into SOCO.
Last but not least, our hardness analysis of the long-term regularizer provides crucial lower bounds for the unconstrained offline optimal benchmark. As demonstrated in Theorems 4.1 and 4.2, we prove that any online algorithm's total regret and competitive ratio will grow linearly (or become infinite) as . While some related works (e.g., [6]) consider no swithcing costs and conjectured that no online algorithm could achieve vanishing regret or a finite competitive ratio, our theorems are the first to rigorously prove this inherent difficulty in this setting to the best of our knowledge. Then, building on our lower bounds showing the problem hardness, we establish an important offline benchmark where the offline optimal algorithm is constrained by constraint. Just as constraining the offline optimal action is needed for online algorithm analysis in other contexts, this constraint is essential for meaningful analysis in our problem. It is also practical, especially in applications like data center scheduling, where contextual information often exhibits natural periodicity, such as diurnal workloads.
The Choice of Hyperparameter: The hyperparameters in FairOBD, such as , serve to balance the contributions of hitting cost, switching cost, and long-term fairness cost in determining each per-step action. For example, selecting a larger drives the solution toward maintaining proximity to the previous configuration, thereby reducing switching cost. In practice, these weights can be tuned to reflect the relative importance of the three cost components, depending on the requirements of different applications. Moreover, Theorem 5.2 offers theoretically optimal choices for and , providing a principled guideline for parameter selection. The learning rate requires similar consideration: it should be neither too large (leading to instability) nor too small (causing slow convergence), and our theory suggests scaling it inversely with the horizon length. Generally speaking, while our analysis provides important guidance for parameter selection, fine-tuning based on empirical observations remains important for achieving optimal performance in practice and can be of independent interest.
I thank the authors for their response. I am now confident to support the paper for acceptance at NeurIPS.
We sincerely appreciate the reviewer’s effort in evaluating our paper and providing insightful feedback. We are pleased that all concerns have been fully addressed.
The paper works on a smoothed online optimization problem with a long-term horizon fairness regularizer. In the problem, the agent obtains a cost function at each time slot and a matrix. Then the agent will select an action, incurring three costs: hitting cost by the cost function, switching cost induced from changing to a different action from the last time slot, and horizon fairness cost. The problem aims to minimize the total cost. In this paper, the authors first analyze the worst-case regret and the competitive ratio of the problem, using the offline setting as a benchmark. Then the authors provide an algorithm and analyze its performance.
优缺点分析
Strength:
The paper studies an interesting and challenging problem, which is well-motivated by several real world applications.
The paper is well organized.
Weakness
The lower bound and upper bound does not match. That is, the lower bound is the comparison of the optimal online algorithm and the optimal offline algorithm, while the upper bound is the comparison of the proposed algorithm and so-called (R,\delta)-optimal algorithm. These are two different stories.
The technique is standard.
问题
In Theorem 5.1 and 5.2, the (R,\delta)-optimal algorithm is online or offline? Or they are the same? Please clarify.
局限性
NA
最终评判理由
The reviewer's major concern is the mismatch between the upper bound and lower bound (using different offline benchmarks). The authors also acknowledge that difference in their rebuttal and explicitly mentioned that they are not able to derive bounds using the same adapted offline benchmark. Being a theory person, I would consider this mismatch affects the quality of the paper a lot. Although the reviewer does consider the upper bound and the lower bound nicely proved, since they are compared to different benchmarks, the story of the paper is not complete. Hence, the reviewer decides to maintain the score.
格式问题
NA
We're very thankful for the reviewer's careful review and for acknowledging the technical strengths of our work. We're happy to address any concerns and offer additional insights into our motivation and technical approaches.
$(R,\delta)$-Optimal Algorithm: This provides the offline optimal action sequence for minimizing the total cost (e.g., hitting, switching, and fairness) and subject to the constraint in Line 209. Both Theorem 5.1 and Theorem 5.2 use this benchmark but differ in the specific objectives. Theorem 5.1 highlights the impact of the long-term regularizer on our online algorithm's regret by considering a zero switching cost. We demonstrate that our algorithm achieves sublinear regret even with adversarial context sequences. Theorem 5.2 considers a more general case by incorporating both long-term and switching costs. We establish our algorithm's asymptotic competitive ratio and show its tightness by comparison to the optimally competitive algorithm when there is no fairness cost. Notably, Theorem 5.2 can also recover Theorem 5.1 by setting hyperparameters (e.g., ) appropriately when the switching cost is absent. The comparison of these theorems reveals that the asymptotic competitive ratio primarily stems from the switching cost, which is inevitable even without the long-term fairness cost (as indicated by Theorem 1 in [16]).
Lower Bound vs Upper Bound: For regret analysis in online algorithms, selecting an appropriate benchmark is crucial, as it defines the strength of the offline optimal comparator. A common choice is the static benchmark, where the offline policy must commit to a single action throughout the entire horizon. This constraint enables meaningful theoretical results, such as sublinear regret bounds in online convex optimization (OCO).
In our setting, we initially considered a stronger benchmark in which the offline policy is unconstrained and may choose arbitrary actions over time. However, we found this infeasible: without knowledge of future contexts, an offline adversary could force any online algorithm to incur unbounded regret. This impossibility is formally shown through our hardness analysis in Theorems 4.1 and 4.2, where we prove that the total regret and competitive ratio of any online algorithm grow linearly (or become infinite) as .
The impossibility and hardness analysis (with respect to the unconstrained offline policy) motivates our introduction of the -constrained offline algorithm, which adds some restrictions to the offline policy benchmark. With this new constrained offline benchmark, we derive upper bounds (Theorem 5.1 and Theorem 5.2) for our algorithm under adversarial contexts. Therefore, the upper bounds and lower bounds consider different benchmarks and cannot be directly compared: the upper bounds in Theorem 5.1 and Theorem 5.2 consider -constrained offline algorithm, whereas the lower bounds in Theorem 4.1 and Theorem 4.2 consider the unconstrained offline policy benchmark. Determining whether matching lower bounds exist for the benchmark is an open problem and a promising avenue for future research.
Technical Contributions:
Our work offers four key technical contributions.
First, our hardness analysis of the long-term regularizer provides crucial lower bounds for the unconstrained offline optimal benchmark. As demonstrated in Theorems 4.1 and 4.2, we prove that any online algorithm's total regret and competitive ratio will grow linearly (or become infinite) as . While some related works (e.g., [6]) consider no swithcing costs and conjectured that no online algorithm could achieve vanishing regret or a finite competitive ratio, our theorems are the first to rigorously prove this inherent difficulty in this setting to the best of our knowledge.
Second, building on our lower bounds showing the problem hardness, we establish an important offline benchmark where the offline optimal algorithm is constrained by constraint. Just as constraining the offline optimal action is needed for online algorithm analysis in other contexts, this constraint is essential for meaningful analysis in our problem. It is also practical, especially in applications like data center scheduling, where contextual information often exhibits natural periodicity, such as diurnal workloads.
Third, we address the significant challenges in generalizing algorithms designed for standard smooth online convex optimization (SOCO) problems to incorporate long-term fairness costs. This difficulty arises because fairness costs couple all actions within the horizon, unlike SOCO's typical one-step dependence. Specifically, SOCO's per-step objective relies only on the previous action, whereas our per-step objective (Equation 5 in Algorithm 1 in FairOBD) depends on a variable influenced by all prior actions, highlighting the non-trivial nature of integrating fairness into SOCO.
Finally, in the presence of switching costs, our algorithm significantly differs from existing methods dealing with long-term fairness. Existing works (e.g., [6], [7]) require the cost objectives (except the long-term cost) to be temporally independent, which is a crucial requirement for their weak duality proofs. In contrast, our objective, after fairness decomposition, is more challenging than [6] due to the switching cost. Specifically, the switching cost breaks the temporal independence, thereby voiding the direct use of existing techniques in [6]. Our novel proof eliminates the need for such independence by focusing on frame-wise optimal action sequences, enabling comparison between different action sequences. Therefore, due to the added switching cost, our proof fundamentally differs from the proofs in [6], which require the per-step cost to be independent of the previous actions once is decided. On the other hand, the proof in [16] for SOCO only allows a difference in between the online algorithm and optimal offline solution, but cannot allow another decision variable in the per-step objective. Simply combining the proofs for SOCO (e.g. [10],[16]) and fairness regularization (e.g. [6]) does not yield provable bounds.
Thank you very much for your response!
About the different measures used to derive upper bound and lower bound, it is good to see that the authors acknowledge the difference and proposes it as future work. However, this mismatch does put the paper at the boundary. Therefore, I would keep my current score.
We are grateful for your acknowledgment of the challenges involved in applying a long-term regularizer to smooth online optimization, a problem with significant real-world relevance. We also thank you for your insightful feedback. The mismatch between our upper and lower bounds is, in fact, a central contribution of our work. The lower bounds characterize the inherent difficulty of the problem and hence, by definition, need to consider any possible online algorithm, whereas the upper bounds characterize the performance of our proposed algorithm. Our lower bounds (Theorems 4.1 and 4.2) demonstrate the impossibility of achieving meaningful performance metrics (linear regret and unbounded competitive ratio) against an unconstrained optimal offline benchmark. The lower bounds imply that we have to constrain the offline benchmark for more meaningful analysis. Therefore, we propose and analyze a more appropriate -constrained optimal offline benchmark. With this new benchmark, our algorithm's performance is not only meaningful but also achieves a bounded asymptotic competitive ratio, as shown in our upper bounds (Theorems 5.1 and 5.2). In other words, our upper bound using the new constrained benchmark is in fact a more meaningful and better bound (i.e., hence, a "mismatch") than the pessimistic lower bound that uses the unconstrained benchmark. Such constrained benchmarks are also a recognized approach in other contexts where unconstrained analysis fails to yield meaningful results.
In this work, the authors model an online learning setting. A decision-maker is tasked with selecting an -dimensional decision at each step. They observe a complete cost function that maps this chosen decision to a non-negative cost. Furthermore, an weighting matrix transforms the decision into an -dimensional utility vector, upon which a long-term fairness objective is imposed, yielding a scalar fairness value. The decision-maker also incurs a switching cost, which captures the expense associated with changing decisions over time. The core objective is to devise a policy that guarantees a worst-case asymptotic competitive ratio compared to the offline optimum as the time horizon tends towards infinity.
优缺点分析
Strengths.
The paper is well-written, and its results are clearly presented. The authors' contributions are distinct, technically sound, and easily understood. They begin by establishing significant negative results in Theorems 4.1 and 4.2, demonstrating the impossibility of deriving meaningful guarantees for both regret and competitive ratio against an unrestricted adversary, like the one typically assumed in Online Convex Optimization (OCO). The authors then address this by relaxing their objective, introducing the concept of an -optimal algorithm, and deriving guarantees against this new benchmark. Their main result is presented as a competitive ratio guarantee for the proposed algorithm.
Weaknesses.
-
The significance of the results and the problem statement's motivation remain unclear. It is not evident how an additive combination of a fairness criterion with a cost objective is meaningful. Fairness goals, such as min-max or proportional fairness (as exemplified by the authors), typically incorporate notions of efficiency, like Pareto optimality. Therefore, the system designer's secondary objective is ambiguous, especially given that efficiency is already implicitly accounted for in the fairness metric's definition. For instance, providing zero allocation to involved parties, while potentially fair, may not represent an efficient solution.
-
The algorithmic complexity of Algorithm 1 is not transparent. The cost function is defined only as continuous and non-negative over the action set, providing no inherent structure. Consequently, there is no clear justification that the crucial step in Equation (5) can be solved in polynomial time, rendering the algorithm impractical in real-world applications. If solving Equation (5) requires approximation for efficiency, this could potentially compromise the theoretical guarantees proposed by the authors.
-
The authors claim to use mirror descent as an integral part of their algorithm's design. However, the specific advantages or efficiencies that mirror descent, particularly the choice of its associated mirror map (regularizer), offers are unclear, as no apparent link to the obtained results is provided. It is not evident why standard gradient descent would not suffice.
问题
- Given that fairness objectives often inherently account for efficiency (e.g., Pareto optimality), could you clarify the system designer's intended secondary goal when combining these objectives additively?
- The algorithmic complexity of Algorithm 1 remains unclear. Since the cost function has no specified structure beyond being continuous and non-negative, could you justify how the crucial step in Equation (5) can be solved in polynomial time? If approximations are necessary for practical efficiency, how might this impact the theoretical guarantees presented in the paper?
- You state that mirror descent is integral to the algorithm's design. Could you please elaborate on the specific advantages or efficiency gains it provides over simpler optimization methods like standard gradient descent?
局限性
yes
最终评判理由
The reviewer believes that concerns 1 and 2 were not satisfactorily addressed by the authors during the discussion phase, so he is maintaining his original score.
格式问题
It appears your literature review might be missing some highly relevant works, specifically:
Molina, Mathieu, et al. "Trading-off price for data quality to achieve fair online allocation." Advances in Neural Information Processing Systems 36 (2023): 40096-40139. (This paper investigates the same problem in a stochastic setting.)
Si Salem, Tareq, et al. "Enabling long-term fairness in dynamic resource allocation." Proceedings of the ACM on Measurement and Analysis of Computing Systems 6.3 (2022): 1-36. (This work presents a similar setup involving long-term fairness with restrictions on the adversary.)
We sincerely appreciate the reviewer’s valuable feedbacks on our paper and we are happy to address the reviewer’s concerns.
Implication of Fairness Objectives: We acknowledge that, in certain scenarios, optimizing for efficiency can naturally enhance fairness, and some fairness objectives may implicitly incorporate efficiency considerations. However, this alignment does not hold universally—particularly for fairness costs evaluated over longer horizons. Solely prioritizing efficiency can inadvertently lead to inequitable outcomes over time.
For example, in our empirical study on fair resource provisioning for AI inference across geographically distributed data centers, the hitting and switching costs capture instantaneous system performance. While minimizing these costs motivates selecting the most cost-efficient data center at each time slot, this approach overlooks localized environmental impact, such as air pollution from electricity generation. These public health risks primarily affect nearby communities. Focusing exclusively on efficiency may potentially concentrate workloads in specific locations, thereby imposing disproportionate public health burdens. Consequently, beyond immediate efficiency metrics, it is essential to explicitly incorporate long-term fairness costs to ensure sustainable and equitable system behavior.
Beyond our application, the need to explicitly incorporate fairness regularizers has been recognoized in other contexts such as online resource allocation [6].
Algorithmic Complexity: Our Assumption 3.1 indicates that both hitting and fairness costs are convex, and the switching cost is the standard L2 norm. (See Section 3.1 for discussions on these assumptions.) This ensures that the objective in Eq(5) is convex with respect to and . Furthermore, inspired by practical applications, the action sets and for and , respectively, are bounded (and convex). Consequently, while the specific complexity depends largely on the solvers, our problem at each step in Algorithm 1 is convex optimization, which can be efficiently solved by many established solvers in a polynomial time.
The Advantage of Mirror Descent: Mirror Descent (MD) can be viewed as a generalized form of standard Gradient Descent (GD). At a high level, MD augments GD by introducing a carefully chosen potential, or reference, function (e.g. in our paper). The algorithm maps the primal variable into a dual space via , performs a gradient step in the dual space, and then maps back to the primal space through . When , the primal and dual spaces are identical, recovering the standard GD update. By selecting different reference functions, MD adapts the geometry of the optimization problem to improve efficiency. For example, using negative entropy yields the multiplicative weights update (MWU) algorithm, which is particularly well-suited for simplex constraints in probabilities. More broadly, in high-dimensional settings, an appropriate choice of can substantially accelerate convergence. Thus, while GD remains widely applicable, MD offers a more flexible framework that tailors optimization to the problem’s geometry. For this reason, we adopt the generalized MD in our analysis, which reduces to GD by using .
Related works: We're grateful to the reviewers for highlighting relevant works; we'll be sure to include them in the related works section of our camera-ready version.
We sincerely appreciate the reviewer’s efforts in evaluating our paper and providing constructive suggestions. We hope that all concerns have been successfully addressed, and we are happy to provide any additional information should further questions arise.
Point #1. In the response, the misalignment is only justified in one direction---specifically, that if the goal is to optimize for efficiency, fairness may be compromised. However, the reviewer’s point concerns the opposite direction: when the objective is to optimize a well-defined fairness criterion, such as -fairness (e.g., max-min fairness, proportional fairness, or utilitarian), the resulting solutions inherently lie on the Pareto frontier. This is well-established, for instance, through the connection between fairness metrics and axiomatic bargaining theory---where proportional fairness corresponds to the Nash bargaining solution, and max-min fairness aligns with the Kalai–Smorodinsky solution. These fairness criteria are constructed with efficiency (Pareto optimality) is embedded in their axioms. For example, in dividing a divisible resource among agents, allocating zero to all may seem fair in a purely equal sense, but it is clearly inefficient and thus undesirable. Meaningful fairness notions, like -fairness, reject such outcomes by embedding efficiency (Pareto optimality) into their definitions. Therefore, it is conceptually incoherent to introduce an alternative, potentially conflicting notion of efficiency and simply incorporate it into the objective.
Point #2. The reviewer does not observe a convexity assumption in Assumption 3.1 regarding the hitting cost. It is only stated that the cost function is "continuous and non-negative within the action set," but the convexity of the function---and even the convexity of the action set itself---are not explicitly stated.
Point #3. While the reviewer understands that mirror descent can offer sharper bounds and improved performance under certain structural assumptions, the paper does not explore how the Lipschitz constant may vary under different choices of norms and mirror maps. As a result, the advantages of using MD in this setting remain unclear. For example, it is not evident whether the action set X possesses a simplex structure (i.e., a subset of ) that would justify the use of an entropic mirror map.
We appreciate the reviewer’s response and constructive suggestions.
Point #1: We agree that in some simple settings, where both the fairness function and the hitting cost (or efficiency) are defined at each time slot, optimizing the fairness function may place the outcome on the Pareto frontier. However, in our problem setting, the fairness function is evaluated over the long term, while the hitting cost or efficiency is assessed at each individual time slot. This creates a challenging, yet commonly-studied setup in related works (e.g., [6]). In such cases, optimizing the long-term fairness regularizer does not directly guarantee efficient resource allocation. For example, in the cloud resource provisioning scenario, even if we allocate units of AI workloads to each location according to the long-term fairness objective, poor temporal allocation of these workloads could still lead to inefficiency. By further optimizing the hitting cost, we can improve the system’s operational efficiency. Moreover, the type of long-term fairness regularizer is a function of the average,, which is mathematically different from the average of the function, . This difference means the long-term fairness regularizer cannot be optimized independently at each time slot. Therefore, both the hitting cost and the long-term fairness cost are essential components of our formulation, and there may not be a direct one-to-one correspondence between fairness cost and system efficiency. Moreover, we also account for the switching cost associated with action changes over time, which adds another layer of challenge to the problem.
Point #2: As shown in ([10], [17]), when switching costs are present, assuming convexity of the hitting cost is essential to achieving a dimension-free competitive ratio for any online algorithm. Motivated by this fundamental challenge, we adopt a similar convexity assumption to obtain meaningful results in this setting, as stated on Line 315 of Theorem 5.2. In contrast, in the absence of switching costs, our algorithm design and analysis are more general, and we do not require this assumption in the theoretical analysis of Theorems 4.1, 4.2, and 5.1. For this reason, we did not explicitly impose convexity of the hitting cost in Assumption 3.1. We acknowledge that in certain cases, non-convex hitting costs may necessitate approximation in the solution at each step (Equation 5 in Algorithm 1). Nevertheless, since the primary focus of this paper is on jointly optimizing hitting, switching, and long-term fairness costs with sequentially revealed context, the problem of efficiently optimizing the per-step cost is orthogonal to our approach.
Point #3: In the empirical study presented in this paper, we select the squared -norm as the reference function, which reduces the mirror descent method to standard gradient descent. In future revisions, we will make it clearer what Theorem 5.2 implies in this special case, where standard gradient descent is used. Although we did not adopt the negative entropy reference function to handle action sets with a simplex structure, we believe it is still valuable to keep our analysis general. In other words, our algorithm design and analysis remain applicable even when the standard -norm is not chosen as the reference function, such as in real-world scenarios involving simplex constraints, provided that the assumptions are satisfied. Maintaining this generality introduces additional challenges in the theoretical analysis, but we believe it is worthwhile for the community, as it enables application to settings with inherent simplex constraints.
This paper introduces FairOBD, an online algorithm designed to address the challenge of incorporating both long-term fairness and smoothness (via switching costs) in online convex optimization (OCO). The work formalizes a new problem setting that jointly penalizes hitting costs, action changes, and fairness violations over time. The approach decomposes the long-term fairness regularizer using auxiliary variables and dynamically updates dual multipliers through mirror descent. Theoretical analysis demonstrates an asymptotic competitive ratio guarantee against a constrained offline oracle, while empirical experiments on AI inference workload balancing across real data center traces show meaningful improvements in both fairness and overall cost compared to established baselines.
优缺点分析
Strengths:
-
The paper tackles an important gap in the literature by studying online optimization with both switching (smoothness) costs and long-term fairness, providing a model relevant for a variety of impactful applications, such as energy-efficient and socially equitable resource allocation.
-
The introduction of FairOBD represents a meaningful algorithmic contribution. It reformulates long-term fairness regularizer as long-term constraint, which is then addressed through the use of auxiliary variables and dual updates via mirror descent. The theoretical analysis is also carefully done, clearly establishing the competitive ratio and explicitly accounting for the complexities introduced by switching costs and fairness costs.
-
The paper validates FairOBD on a realistic setting: trace-driven experiments for AI inference load balancing across several Microsoft data centers using real-world traces. The results demonstrate that FairOBD achieves improved fairness and overall cost compared to five baselines, including the offline optimum and established algorithms.
Weaknesses:
-
The transformation from long-term fairness regularizer to long-term fairness constraints in equation (3) is interesting. However, the subsequent treatment largely follows standard techniques, combining a conventional primal-dual framework with Online Balanced Descent [10,16].
-
The model assumes a convexity assumption on fairness cost and squared l2-norm switching costs. While justified by the need for theoretical tractability, this restricts applicability to a lot of problems. Although the limitations section briefly mentions that relaxing the convexity assumption or generalizing the form of switching costs is left for future work, a more concrete discussion of how the algorithm might be adapted under such cases would significantly strengthen the contribution.
-
Some relevant works [Li et al., 2025, Chen et al., 2024, Ma et al., 20204] are missing and should be cited.
[Li et al., 2025] Learning for Sustainable Online Scheduling with Competitive Fairness Guarantees. [Chen et al., 2024] Network Revenue Management with Demand Learning and Fair Resource-Consumption Balancing. [Ma et al., 20204] Optimal Regularized Online Allocation by Adaptive Re-Solving.
问题
Have you considered your problem in the stochastic setting where { (f_t, A_t) }_t are i.i.d.? Would it be possible to achieve sublinear regret in this case?
局限性
Yes
最终评判理由
Having reviewed the rebuttal, I believe the paper is suitable for acceptance at NeurIPS.
格式问题
None.
We sincerely appreciate the reviewer's thorough review and the valuable points raised. We are pleased to offer clarifications regarding our technical contributions.
Technical Contribution for Long-term Regularizer:
Though decoupling the long-term fairness cost into a sequence of online costs and optimizing the primal and dual variables with online mirror descent has been considered, our algorithm is significantly different from the existing works due to the switching costs. Specifically, for the cost objectives apart from the long-term fairness cost, existing methods (e.g. [6], [7]) require them to be temporally independent.
This is a key requirement for the weak duality and the existing proof technique (e.g. [6], [7]), as the inequalities in the proof heavily rely on the optimal solution to a temporally-independent modified objective function at each time slot.
In contrast, our objective after the fairness cost decomposition is significantly more challenging than [6] as it does not meet the requirement due to the added switching cost. Our novel proof eliminates the need for such independence by focusing on frame-wise optimal action sequences, enabling comparison between different action sequences. Technically speaking, due to the added switching cost, our proof fundamentally differs from the general proofs in [6], which require the per-step cost to be independent of the previous actions once is decided. On the other hand, the proof in [16] only allows a difference in between the online algorithm and optimal offline solution, but cannot allow another decision variable in the per-step objective. Simply combining the proofs for SOCO (e.g. [10],[16]) and fairness regularization (e.g. [6], [7]) does not yield provable bounds.
Moreover, we also make other technical contributions such as the hardness analysis by lower bounds on both regret and competitive ratio (Theorem 4.1 and Theorem 4.2), which are the first in the literature to our knowledge.
Relaxed Assumptions For Switching Cost: For the switching cost, though we are considering the L-2 norm for the ease of presentation, it’s technically tractable to consider a more general distance function for the switching cost, such as the Bregman Divergence. The Bregman Divergence is more general than the L-2 norm and it’s also commonly considered [10,16]. When the reference function of Bregman Divergence is the squared function, the switching cost is reduced to the L-2 norm.
By making some common assumptions on the reference function of Bregman Divergence (e.g. strong convexity and smoothness), our proof technique can be applied to handle this more general distance function. At the high level, our proof can still convert the action differences between our online algorithm and offline optimal policy to the upper bound of the switching costs. Though technically tractable, due to the space limit, we will explore this direction in our future works.
Relaxed Assumptions For Fairness Cost: For the long-term fairness cost, depending on if we are minimizing costs or maximizing rewards, convexity or concavity is a commonly adopted assumption in existing literature (e.g. [6], [34]). The convexity for fairness cost is commonly observed in many real world applications, such as the min-max fairness consideration in cloud resource provisioning. This is intuitive because the reward or cost doesn’t increase linearly with respect to the actions. Instead, for the rewards, following the law of diminishing marginal utility, the more resource we allocate, the fewer marginal improvements will be, which directly implies concavity. Similarly, the health risks become more and more severe as the air pollution in a certain region increases.
In addition, this convexity is technically essential to our analysis and other related studies (e.g., [6]) that deal with similar long-term fairness regularizers. At a high level, we decouple the long-term fairness costs in Eq(1) to the temporally decoupled costs in Eq(3) using the auxiliary variable . During this conversion, it is important to ensure the optimal solutions in Eq(3) and Eq(1) are identical. If the fairness function is not convex, when we substitute the optimal solution of Eq(1) into Eq(3), there may existing some sequences of that further minimizes the . By rebalancing the fairness costs, hitting costs and switching costs, it’s very likely that the optimal solution of Eq(3) is different from Eq(1), which invalidates this essential conversion step. To ensure there is no sequence that can further minimize the last term in Eq(3), we need to guarantee the average of functions () is always larger than the function of the average (), which directly implies the convexity property. To sum up, convexity for the fairness cost is fundamental in our design as well as other related studies (e.g., [6]) that deal with similar fairness regularizers.
The I.I.D Setting: In the stochastic setting where the context are sampled from an I.I.D distribution, it’s interesting to analyze the performance of an online algorithm. When there is only a long-term cost without the switching cost, it’s generally possible to obtain a sublinear regret. As Theorem 1 in [6] implies, by carefully choosing the learning rate, the regret can be sublinear. However, as we discussed above, their technique requires the objective functions to be temporally independent. This fundamental limitation restricts their applicability to our problem setting. On the other hand, when there is only a switching cost without long-term cost, additional assumptions or considerations are required to achieve sublinear regret. For instance, we need to consider static benchmarks, where the online algorithm can choose any action but the offline optimal can only choose a static action across the whole horizon. In addition to the static benchmark, another relaxation is constraining the total distance between the subsequent optimal actions, which is also known as the path length in [16]. To sum up, when combining the switching cost and long-term cost altogether, it’s challenging to achieve a sublinear regret even if we consider the I.I.D setting, and further assumptions are required to obtain meaningful results.
Related works: We appreciate the reviewers for complementing the related works and will add them to our related works in our revision.
I appreciate the authors’ responses and efforts in the rebuttal. My concern is that considering the squared norm switching costs somewhat simplifies the analysis. Actually, the analysis for squared norm switching costs or Bregman divergence is largely similar. I would appreciate it if the current analysis could be extended to the non-squared forms of switching costs, which are more challenging but remain an interesting direction for future work (I understand that this is beyond the current scope of the paper). So I tend to maintain my original score.
We sincerely appreciate the reviewer's efforts in evaluating our work and providing such insightful feedback.
While we consider the squared norm as the switching cost, our proof technique can actually be extended to switching costs beyond Bregman Divergence. In particular, our proof technique still applies after modification if the (possibly non-squared) switching cost is convex and smooth. Our current squared norm switching cost satisfies the convexity and smoothness condition. A high-level intuition is that, under the convexity and smoothness condition, the switching cost satisfies the generalized triangle inequality, enabling us to bound the switching cost difference given the action sequence difference. In our revision, we will add details and show how to extend our proof to convex and smooth switching costs.
We fully agree that exploring more general switching costs is a challenging and interesting direction, and hope that our work can inspire the research community.
This paper presents FairOBD, a novel algorithm addressing online convex optimization with both switching and long-term fairness costs, motivated by compelling applications such as fair and efficient AI resource allocation. The paper's contributions are significant, providing the first rigorous hardness results, a meaningful new constrained offline benchmark, and competitive empirical validation. Reviewers agree the problem is important and the work is technically solid, especially in extending the theoretical frontier and highlighting practical challenges in unifying fairness and efficiency. However, weak points remain: key steps leverage standard primal-dual and mirror descent techniques, and the theoretical analysis does not tightly close the gap between upper and lower bounds, instead using different offline benchmarks for each, which some reviewers consider a major limitation. Additionally, open questions persist about the necessity of combining efficiency and fairness cost objectives, and about practical tractability for non-convex settings. While not all reviewers are fully satisfied with the current form, the consensus supports a borderline accept: the work makes clear progress, despite technical incompleteness, and is likely to inspire follow-up work on tackling the identified limitations.