Maximizing the Value of Predictions in Control: Accuracy Is Not Enough
We study the value of stochastic predictions in online optimal control with random disturbances.
摘要
评审与讨论
This paper considers a control problem where we have some externals way to obtain some predictions for the disturbances. The paper then defines predictive power as the improvement in control cost one gets by using the predictions vs ignoring them. The authors make the important observation that this goal is quite different from accuracy! They then derives closed form expressions for the simple LQR case. The paper also discusses promising ideas for extension beyond the basic LQR case.
优缺点分析
Strength:
- Written well and motivates the problem nicely
- Timely problem: value of predictions for various downstream tasks is being actively studied as ML pervades all areas
- High level message that prediction power and accuracy are not the same will resonate with non-theory folks as well
Weakness:
- More connections with predict-then-optimize framework where similar issues arise would have been better
问题
Can you comment on the connection with predict then optimize literature? There a similar distinction between the value of prediction for downstream task and usual measures of prediction accuracy is emphasized.
局限性
yes
最终评判理由
The authors agreed to include a bit more discussion about comparisons to work on predict-then-optimize aka decision-focused learning problems. I continue to think this is a good paper and maintain my score.
格式问题
no
Thank you for your comments! Please find our response below.
- Connection with predict-then-optimize literature.
We would like to point the reviewer to our section on Related Literature, where we specifically highlighted the connection of our work with the decision-focused learning (DFL) literature, which includes the "predict-then-optimize" setting. (We indeed cite the seminal "Smart Predict, then Optimize" (Elmachtoub and Grigas, 2022) as reference [14].) In our revision, we will clarify that "decision-focused learning" and "predict-then-optimize" are different terms that refer to the same idea.
As you mentioned, a key insight is that minimizing a standard prediction error (e.g., MSE) may lead to suboptimal downstream decisions. Thus, the loss function for training the predictor needs to depend on the downstream optimization task. This is similar to the insight behind our Example 3.3.
A major difference between our work and typical predict-then-optimize literature, though, is that our controller must decide control actions sequentially in a dynamical system, where the predictions are revealed in an online process. The controller can also build its knowledge from past disturbances and predictions to infer future disturbances or the optimal control action. Therefore, our online setting presents unique challenges compared with making a one-shot decision for a classic optimization problem. For this reason, we also focus on the evaluation of a given predictor and leave an end-to-end optimization of the predictor parameter as future work (see Section 5).
We will discuss this connection in the revision.
Thanks for that clarification. The predict-then-optimize literature is related and must be discussed especially in light of your comment above that, despite a high level similarity, there is a difference due to the sequential nature of the control task. This means that actions roll out sequentially and also information is collected over time. This is not present in the usual predict-then-optimize literature. Also, good to point our that "decision-focused learning" refers to the same idea.
Thank you for your valuable feedback. In the revision, we will improve the Related Literature part so that it emphasizes both the similarities and the key distinctions between our work and the predict-then-optimize literature.
This paper develops a mathematically grounded framework for quantifying the value of predictions in control, building on and extending the notion of prediction power introduced in [1]. The authors motivate the need for such a measure by showing that standard notions of prediction accuracy do not fully capture the impact of predictions on control performance. In particular, they demonstrate that the structure of the predictor can influence the optimal policy even when its accuracy is fixed. The paper is technically dense but includes illustrative examples that clarify the main contributions. Overall, it provides an interesting perspective that seems overlooked in much of stochastic optimal control: the idea that accuracy is not sufficient on its own may be relevant beyond the scope of this work.
优缺点分析
Strengths:
-
The paper is generally well motivated. Despite its technical depth, it offers insights that extend beyond the formal mathematics and are relevant to a broader control audience.
-
The mathematical derivations are rigorous and clearly applied to illustrative examples, which help ground the theoretical claims.
Weaknesses:
-
While the motivation is clear, the paper may be difficult to follow for readers without a solid mathematical background. Some sections could benefit from additional qualitative explanations or intuition to complement the technical derivations.
-
Condition 4.1 seems central to the theoretical results, but its applicability is not fully characterized. While it holds in standard LQR settings, the paper does not discuss whether and how a nonzero matrix can be found in more general cases.
-
The paper focuses on fully observed state settings. It would be interesting to discuss how the results might extend to partially observed settings, where the policy only has access to a noisy or compressed observation of , as is common in stochastic control.
-
The paper does not provide a method to estimate or optimize . While this is acknowledged in the text, it might limit the immediate applicability of the results.
问题
-
Does the agent (i.e., the policy) have access to the system dynamics ? If so, and if the state and past inputs are observable, then under invertible dynamics the agent may be able to reconstruct the disturbance . Is this possibility excluded by assumption, or is there something I'm missing in how the disturbance remains unpredictable?
-
Are there any further assumptions on the predictors, particularly regarding their bias or statistical relationship to the true disturbances ? For instance, are predictors assumed to be unbiased, or can they be arbitrarily structured?
-
Could the framework be extended to partially observed settings, where the policy does not observe the full state but only a noisy or filtered version (e.g., through a Kalman filter)? This would bring the framework closer to standard formulations in stochastic optimal control.
-
How does the analysis generalize when disturbances and/or predictions are temporally correlated? Would the proposed definition of prediction power and its theoretical properties still hold?
-
It is possible that a less accurate predictor - in terms of mean squared error - could lead to better control performance, for instance by reducing the impact of noisy or biased predictions. Would such a case be compatible with the proposed framework, or does the definition of prediction power implicitly assume that more accurate predictions always improve performance?
局限性
the authors have addressed the limitations.
最终评判理由
The main concerns I raised in my initial review have been addressed thoroughly and satisfactorily in the rebuttal and subsequent discussion. The authors provided clear clarifications on the points I found most critical, and their explanations strengthened my understanding and appreciation of the work. I therefore maintain my positive assessment and confirm my recommendation for acceptance, as I believe the manuscript makes a solid and valuable contribution to the field.
格式问题
no formatting concerns.
Thank you for your comments! Please find our response below.
- Adding more qualitative explanations or intuitions.
Thank you for the suggestion! We will add more explanations in the revision based on the feedback from the review process.
- Applicability of Condition 4.1 beyond LQR
Thank you for pointing this out! Since Condition 4.1 is about the Q and C functions, specific assumptions on the stage cost and the dynamics are required to verify it. We want to clarify that Condition 4.1 is more general than the LQR setting. For example, we show it holds under Assumption 4.5 in Section 4.1, which allows non-quadratic cost functions. We will add a clarification in the revision.
- Access to the dynamics and reconstruction of the past disturbances.
Yes, we assume the controller has full knowledge of the dynamical function , while the disturbance captures the uncertainties in the dynamics. With linear dynamics discussed in Sections 3 and 4, the controller can reconstruct using the transition tuple . This does not affect our prediction power because we include past disturbances (until time step ) in the history , which is available to the controller at time step .
- Potential extensions to partially observed settings.
Thanks for pointing out this extension! We do not think there is a straightforward way to achieve this due to the need to distinguish the disturbances in the system dynamics from the noises in the observation model, and the function form of the optimal policy is hard to define because it depends on the whole history. We will add this point to Section 5 in the revision.
- Clarification about the bias and the structure of predictions.
Our setting does not assume the prediction to have any specific form of relationship with future disturbances. And does not need to be an unbiased estimator of or any future disturbances.
We note that our Examples 3.3 and 3.4 may leave the false impression that must be some structured function (e.g., linear function) of . However, our intention is to show how a general phenomenon like the misalignment between accuracy and the prediction power may occur with simple constructions. We will add a clarification in the revision.
- Applicability to temporally correlated disturbances and predictions.
Our definition of the prediction power and the theoretical results (except Theorem 4.8 in Section 4.1) allow the predictions and the disturbances to have temporal correlations across different time steps. When the cost function is non-quadratic, we require stronger joint-Gaussian and independence assumptions (see Assumption 4.7) to establish Condition 4.2. However, we believe it is possible to relax them and provide a justification in Appendix E.
- Intuitions about why more accurate predictions might not improve the prediction power.
First, we want to clarify that this is not because the more accurate prediction is noisy or biased. To see this, note that any prediction needs to pass through its corresponding optimal policy to decide the action. If needed, the function can help eliminate the bias and reduce the variance.
Perhaps a simple example that analogs the intuition is computing . If we replace with , the variance of the input increases, but remains the same. Back to our setting, the prediction power will not change if we replace the prediction with because the optimal control policy changes correspondingly.
Second, we would like to provide an intuitive explanation of the idea behind Example 3.3: The key takeaway is that MSEs of estimating each individual entry of the multi-dimensional disturbance are insufficient to decide the prediction power. The off-diagonal entries of the covariance matrix also matter, and their impact on the prediction power depends on the specific dynamics . Therefore, a general accuracy metric (like MSE) that is unaware of the costs/dynamics can be misaligned with the prediction power.
Please let us know if the above clarifications help, and we are happy to add them in the revision.
Thank you for the clarifications and responses. The main points I raised in the review have been addressed satisfactorily. I therefore confirm my rating and support acceptance.
Thank you for confirming that our responses addressed your concerns and for supporting the paper’s acceptance. We value your constructive feedback and will integrate it into the revision.
This paper introduces a new framework for analyzing stochastic online control with prediction. Because prior literature only considered pessimistic settings on the potentially stochastic predictions, the the main contribution of this paper is to analyze a much more general class of parametric predictions. With this new framework, the authors introduced a notion "prediction power" and later showed that this is stronger than a simple measure on the accuracy of the predictions.
This paper discussed two applications of the new framework:
-
This paper first considered a linear time-varying dynamics with quadratic costs. Given the prediction framework, the resulting solution (at a high level) is a blend of the Riccati equation and Bellman equation.
-
Then, this paper discussed a general setting where the Q functions and optimal policies satisfy certain mild assumptions. Examples were given to justify these assumptions.
优缺点分析
I find this paper to be well-written and I especially appreciate the level of rigor displayed in Section 2. The theoretical contributions are also solid that they convincingly extended the analysis on the significance of predictions in online control. I really appreciate that the settings of this paper are both very general and well thought-out that they do not come across as sounding artificial.
However, the notations of this paper are quite dense. In particular, the usage of uncommon Greek letters (e.g. and ) may through many readers off. Also, the distinctions between and could be clarified.
Question: In equation (3), shouldn't and also depend on the particular realization ? Or you meant to denote them as random functions whose randomness come from ?
Suggestions:
- I feel the MPC formulation around equation (7) is quite interesting, but this part needs more discussion for the readers to understand the principles behind this formulation. Maybe you should consider moving parts of Appendix A.3 to the main body.
- My feeling is that the proof techniques used in this paper are on the more standard side, but if there are any clever tricks being employed, I feel they should be highlighted in a proof sketch.
- I feel that given examples and results are too much on the abstract side and I would like to see more concrete applications. In particular, I would like to see how the baseline prediction and the prediction parameter space are defined for a relevant problem.
Minor concerns:
- On line 209, I believe that should be instead?
- On line 222, there is a naming collision between the sequence of (disturbance, prediction) pairs and the identity matrix.
Overall, I think the paper makes a solid contribution to the literature. However, I am a little hesitant to give higher rating right now because the paper is very dense in notation but somewhat lacking in practically. It may be difficult for readers to fully appreciate the results. I will recommend a borderline accept right now, and reserve the possibility to raise my rating if the authors could adequately address my concern and justify the practical relevance of their work with more concrete examples.
问题
see above
局限性
see above
最终评判理由
The overall technical contributions of this work is solid. And the author's rebuttal clarified most of my questions. But I am still afraid that the notation of the paper is inherently too cumbersome to be accessible for a wider audience. Therefore, I will maintain my original rating while being supportive of this paper's acceptance.
格式问题
I have no concerns regarding the formatting.
Thank you for your comments! Please find our response below.
- The distinctions between and .
is the realization of the problem instance that contains all disturbances and predictions for all time steps is the realization of the history , which contains all past disturbances and predictions that are observed until an intermediate time step . We will add a combined explanation of in the revision, because they are used frequently in our paper.
- How and functions in (3) depend on the particular realization .
We define our instance-dependent function and the cost-to-go function based on the problem instance . Therefore, the and values are determined by the specific realizations of (see Eq. (3)). Since the history realization is a part of the problem instance realization , our and functions depend on but are not decided by . To recover the classic definition of and cost-to-go functions, one can take the conditional expectations of them given the history .
- Moving parts of Appendix A.3 to the main body.
Thanks for providing this suggestion! We will move (21) and some discussions from Appendix A.3 to the main body to further clarify what is “planning according to the .”
- Highlighting technical novelties in a proof sketch.
Thank you for providing this suggestion! Here are some technical novelties that we would like to highlight in the revision. First, in Theorem 4.3, we decompose the task of bounding the prediction power for the whole horizon to the verification of two conditions at each intermediate time step . Our proof takes inspiration from the performance difference lemma, but we adopt novel methods to bound the difference between cost-to-go functions with the conditional covariance of the predictive policy's actions. Second, to show Lemma 4.6 and Equation (14), we prove and use properties of infimal convolution (see (35) in Appendix C.1) to preserve strongly convexity, smoothness, and the covariance of the input functions or variables. Therefore, we can verify Conditions 4.1 and 4.2 through backward induction, while the optimal policy does not have a closed-form solution.
- Definitions of the baseline prediction and the prediction parameter space in a relevant problem.
In the adaptive video streaming problem, a controller determines which quality level of a video segment to download, aiming to optimize the user’s experience. The predictions are about the future network bandwidth within a lookahead window. The baseline () could be “no-prediction” (e.g., some classic methods are not planning-based.) Other candidates () can be a classic exponentially weighted moving average predictor, an ML-based predictor trained on a specific dataset, and other alternative sources of predictions.
- On Line 209, should be .
Sorry for the confusion. Here, we refer to using the predictions to compute , which is defined as the conditional expectation of future disturbances given the current history (see Line 176). We will add a pointer to remind the readers in the revision.
- Naming collision between the sequence of (disturbance, prediction) pairs and the identity matrix on Line 222.
Thanks for catching this! We will change the notation of the identity matrix to and add an explanation in the revision.
I appreciate the author's clarifications. I think the biggest downside of this paper is the notational burden. I am afraid that most readers would get tripped up by the zoo of variables. So, while this paper is good and I would vouch for its acceptance, I think it may not be award-worthy. Therefore, I will keep my current score.
Thank you for your valuable feedback and for supporting the paper's acceptance. We will work on improving clarity to make the presentation more accessible to a broader audience.
The paper studies the concept of prediction power and extends the analysis of the previous works by studying time-varying predictions with more general costs. The motivation is clear, and the paper introduces clear concepts. The analysis of the stated results follows from the well-known counterparts and offers good intuition in understanding the forms and implications of the results. The paper addresses some general cases where convex costs can be applied with stability conditions on the time-linear varying system. Applicability of the results and extensions for relaxed conditions are briefly discussed at the end and in the appendix.
优缺点分析
Strengths: Overall, the paper is well-constructed, and the concepts are clear. The authors propose a concept extending the existing literature and analyze it under reasonable settings with a thorough analysis. The results and the theoretical analysis offer good intuition and follow from well-known counterparts in the literature. The simple examples for the different settings and counterintuitive concepts help the reader understand the situation better, which strengthens the paper overall.
Clarity: The language of the paper is good and clear. Overall, the paper is well constructed, but some of the arguments are not very easy to follow due to the notation overhead and a few definitions that are not explicitly given. Please consider the notational clarity of the paper and especially the time indices of the time-varying arguments. Some of the concepts are not well-defined, such as the predictor class and the setting of the disturbances. Further clarity about these will help the reader.
Significance: The results offer insights into the use of the power of predictions in the time-varying setting with general costs. The theoretical results are all somewhat expected, and most examples are given for time-invariant cases with quadratic costs (even linear), even though the paper claims its novelty in time-varying cases. Showing simulation examples for the nonlinear or at least linear time-varying cases with general costs would certainly improve the significance of the paper. Limitations of the work are not well outlined.
Originality: The paper is largely inspired by the previous literature in [1]. The formal treatment of the power of predictions and the setting of the paper are original. However, many of the developments seem incremental with respect to the existing works in the literature and a reorganization of results rather than new ones. The authors should better highlight the originality beyond the power definition and time-varying adaptations of existing results.
问题
- The concepts of predictions in this work are presented rather general; however, many recent works are studying online prediction algorithms, also touching on the importance of multi-step predictions for better control performance. Can the authors please clarify these points, especially in the introduction, and relate their work to the existing literature studying online prediction algorithms? Can you adapt existing prediction methods into your framework? Can you give examples from the existing literature in the setting that you study?
- The connection of the disturbance W and the realizations is not precise. It seems that w_t is the sampled version of W_t in each time instance, as the shorthands below (6) suggest; however notation sometimes seems inconsistent. What sort of structure and assumptions do you need on W or the underlying distribution? Is W allowed to have temporal cross-correlation? The Assumption 2.2 seems to circumvent most of the issues related to time correlation, but do you need any additional assumptions on the support of the distribution, martingale structures, etc.?
- Do you implicitly assume a distribution over W in Assumption 4.7? The distributions are not explicitly stated in the paper and could use a careful treatment to clarify the conditions throughout.
- The parametric prediction vector is unclear in the way the authors present it. Based on the theta, the parameter of a prediction algorithm, we will have imperfect predictions as illustrated in the examples. In this case, is the prediction power guaranteed to be positive? Your zero-theta case seems clear in the linear case, but this is not clear for general V, as the form of V and requirements on the predictor are not well addressed. Does this have any further implications? Please clarify.
- Example 3.3 is interesting; however, the construction depends on the specific choice of the estimating function V. Can you make claims about what the same example looks like for standard estimators such as RLS or similar? Do you think the issue stems also from the lack of cumulative information in the estimator structure, in other words, a lack of memory in the estimator? Please comment.
- The use of M-GAPS in the example is somewhat distracting. Could you provide at least some intuition on how the algorithm works for the reader to better understand the example? The notation of V(1) and V(2) is confusing in the example; consider improving it for readability.
- What assumptions are needed on h for Condition 4.1? Please elaborate.
- The conditions in Assumption 4.5 are rather restrictive and somewhat unclear. Do you require the \mu_A and \mu_B to be positive? Do you think you can exchange the strong convexity and smoothness assumptions with growth rate assumptions that may be less restrictive?
- The first step in A.2 (16) is not clear, please clarify.
局限性
There is a brief discussion of limitations. However, the applicability of the results to time-varying examples are not shown and the authors do not provide clear claims of settings that satisfy their assumptions.
最终评判理由
The authors provided good responses to the points I raised. The suggested revisions will improve the paper. The authors provide satisfactory clarifications and explanations to the unclear points in the text. The authors also consider potential future directions and discussing their framework in the context of common prediction methods. I improve my initial score accordingly.
格式问题
No
Thank you for your comments! Please find our response below.
- Key differences between our work and the previous literature in [1].
We respectfully disagree with the reviewer's point regarding the originality and would like to highlight several key novelties of our work in comparison to [1]. First, the misalignment issue between prediction power and accuracy (Section 3.1) would not exist under the exact-prediction setting in [1]. Second, discussing the connections with online policy optimization (Section 3.2) only makes sense in our setting, because the optimal predictive policy in [1] is fixed and known even when the distribution of the disturbances is unknown. Third, we extend our proof framework beyond the LQR setting in Section 4, while the proof techniques in [1] rely on the closed-form solutions of LQR. To address the challenge, we develop a novel proof framework in Theorem 4.3, which reduces the problem of comparing two policies on the whole horizon to verifying conditions about each individual policy at every intermediate time step . Further, we prove and use the properties of infimal convolution (see (35) in Appendix C.1) to preserve strongly convexity, smoothness, and the covariance through backward inductions, which are critical for bounding the prediction power when closed-form solutions do not exist.
- Relationship with online prediction algorithms and how to adapt them.
Online prediction algorithms study methods for making predictions. In contrast, our work focuses on the process of using various forms of predictions to decide control actions. For example, uncertainty quantification (UQ) has recently received significant attention in the field of online prediction algorithms, providing not only point estimates but also confidence intervals. To use UQ as the predictor in our framework, the prediction should be the output of UQ at time , including both the point estimate and the confidence interval.
- Clarifications about the distribution of , the temporal cross-correlation, and the underlying assumptions.
We would like to clarify that we never assume is an independent distribution. Instead, all disturbances and predictions (i.e., the problem instance ) follow a joint distribution. In other words, conditioned on the realizations of past disturbances and predictions, the distribution of may change. We need Assumption 2.2 about , which means its sampling cannot be affected by the controller’s states/actions. For example, the weather and its forecasts do not change because a person goes outside. Under Assumption 2.2, the disturbances and the predictions at different time steps can have correlations.
- Implicit assumptions about the distribution of in Assumption 4.7.
There are no implicit assumptions. We explicitly state the distribution of the pair at each time step in Assumption 4.7. Since different time steps are independent, you can derive the distribution of from Assumption 4.7, which is a joint Gaussian distribution without temporal correlations.
- Clarification about the sign of the prediction power and the parametric form of the predictions.
When the baseline predictor is no prediction, the prediction power is guaranteed to be non-negative. This is because the optimal policy for the no-prediction case is also a predictive policy when predictions are available, i.e., the controller can simply ignore the predictions. But the prediction power could be zero, for example, when the predictions are independent of the disturbances. We will clarify this in the revision.
We do not require the prediction to have any specific form as a function of . Generally speaking, is primarily used for distinguishing different predictor candidates. Additional assumptions on the form of may be necessary for future directions, such as optimizing the prediction power as a function of (Section 5).
- The specific choice of function in Example 3.3.
The intuition behind Example 3.3 does not require a very specific choice of the function : The underlying idea is that the MSEs of estimating each individual entry of the multi-dimensional disturbance are insufficient to decide the prediction power. The off-diagonal entries of the covariance matrix also matter, and their impact on the prediction power depends on the specific dynamics . Therefore, a general accuracy metric (like MSE) that is unaware of can be misaligned with the prediction power. We hope this intuition is convincing, and we will add more discussion in the revision.
- Applying estimators (with memory) on top of the provided prediction vector .
Under our definition, the prediction power does not increase if one replaces the original prediction by any function of the history , because this additional step cannot increase the information available at time step . Indeed, since we do not require any particular form of the decision-maker, the predictive policy class (Definition 2.3) is general enough to allow using filters like RLS as an intermediate step for deciding the control actions.
- Intuition behind M-GAPS and the notations in Example 3.4.
Intuitively, M-GAPS works by taking the gradient of the cost function with respect to the policy parameters at every time step, and it takes gradient steps to update , allowing it to converge towards the optimal policy parameters under certain assumptions. Our goal is to highlight the connection between prediction power and online policy optimization, so the specific online policy optimization algorithms and their proofs are not the primary focus here. We will add more intuition in the revision.
As we mentioned before, the prediction parameters “1” and “2” here are used solely to distinguish between different predictors. We will clarify this in the revision.
- Assumptions on for Condition 4.1 to hold.
Condition 4.1 does not impose explicit assumptions on the stage cost function . However, since and functions depend on , structural assumptions about are necessary to establish Condition 4.1. An example is Assumption 4.5, which assumes that is both strongly convex and smooth for every time step .
- The signs of and in Assumption 4.5.
Thanks for pointing this out! does not need to be strictly positive, but does. This is because the prediction power can be zero if equals zero (see Theorem 4.8). We will clarify this in the revision.
- Exchanging the strong convexity and smoothness assumptions with more general growth rate assumptions.
Thanks for suggesting this direction of further generalization! We believe the quadratic growth/positive-definite covariance matrix conditions are natural generalizations of the LQR setting in Section 3. Following the intuition of Figure 3, we believe that further generalization of the growth rate condition in Condition 4.1 is possible if one can find the corresponding "variance" condition (i.e., the counterpart of Condition 4.2). This is an interesting future direction.
- Clarification about the first step in Appendix A.2 (16).
Thank you for catching this! The equation holds because the cost-to-go function can be expressed as . Substituting the expression of into the expression of in Proposition 3.1 gives (16) in Appendix A.2. We will add this clarification in the revision.
Thank you for the comments. I think a discussion about the use of predictors leveraging past information and the implications of this on the prediction power, similar to your explanation here, would be interesting. I think adding rxplanations about integrating the RLS or similar schemes under your framework would be valuable and potentially increase the readership.
Clarifications about the notation and the use of theta are helpful. Carefully revising these and some of the notation overhead will certainly help the reader. My main points are addressed, and there is a clear pathway for revisions/clarifications. I will update my score accordingly.
Thank you for your thoughtful feedback. We will revise the paper to include a discussion on predictors that leverage past information, clarify the integration of RLS and similar schemes within our framework, and clarify the notations as suggested.
This paper studies the notion of prediction power for disturbances in online control, motivated by a demonstrated gap between prediction accuracy and actual utility for control. The authors develop a rigorous theoretical framework, establish tight upper and lower bounds on prediction power, and provide conditions under which predictions meaningfully improve control performance.
Reviewers agreed that paper is technically solid, and relevant to the NeurIPS community. Strengths include clear conceptual advance, rigorous proofs, and timeliness of the problem. Main concerns centered on heavy notation and readability, limited concrete examples, and closeness to prior work. The rebuttal effectively clarified assumptions, emphasized applications (e.g. video streaming), and highlighted technical novelties, which increased reviewer confidence.
Overall, I recommend acceptance. The paper makes a strong theoretical contribution with potential impact in bridging prediction and control. The result is solid but not strong enough to merit a spotlight IMO.