Disentangling Linear Quadratic Control with Untrusted ML Predictions
Our work introduces DISC, a novel policy that merges online control with learning disentangled predictions, effectively optimizing performance in dynamic systems and maintaining robustness through competitive ratio guarantees.
摘要
评审与讨论
This paper presents a novel policy, DISC, designed to manage uncertain perturbations in dynamical systems by learning a confidence parameter online. The focus is on integrating predictions from machine learning (ML) tools, which are often unreliable, into linear quadratic control (LQC) frameworks. The key innovation of DISC is its ability to harness accurate predictions when available and mitigate the impact of erroneous forecasts, achieving a balance between consistency and robustness.
The paper begins by discussing the challenges of using ML predictions in control systems, particularly the issue of reliability due to factors like high model variability and out-of-distribution generalization issues. The authors propose a policy, λ-CON, which extends existing methods by adapting a vectorized confidence parameter for each latent variable. However, λ-CON has limitations in achieving optimal consistency and robustness. DISC addresses these limitations by dynamically learning the confidence parameter through online learning, ensuring better performance regardless of prediction accuracy.
Theoretical results provide competitive ratio bounds for DISC under both linear and general mixing functions, demonstrating its robustness and consistency. The paper also includes experiments in two real-world scenarios: drone navigation with mixed disturbances and voltage control in power grids. These experiments validate DISC's effectiveness, showing that it outperforms baseline policies in terms of cost and adaptability to rapid changes in the environment.
优点
-
The introduction of a dynamic confidence parameter that adapts online is an important advancement in integrating ML predictions into control systems.
-
The paper provides theoretical guarantees for DISC, offering competitive ratio bounds that highlight its robustness and consistency.
-
The experimental validation in real-world scenarios, such as drone navigation and voltage control, demonstrates the practical relevance and effectiveness of DISC.
-
DISC's ability to adapt to varying levels of prediction accuracy and environmental changes is a valuable feature for real-world applications.
缺点
-
The theoretical analysis relies on several assumptions, such as the continuity and invertibility of the mixing function, which may not hold in all real-world scenarios.
-
While the experiments support the theoretical findings, the datasets used might not fully capture the diversity and complexity of real-world applications.
-
The method may be complex to implement in practice, especially in real-time systems where computational resources are limited.
问题
-
How does DISC scale with the increase in the number of latent variables and the complexity of the system? Are there any practical limits to the number of latent components DISC can handle efficiently?
-
What are the computational requirements for implementing DISC in real-time systems? Can the authors provide insights or benchmarks on the computational overhead introduced by the online learning of the confidence parameter?
-
Have the authors considered extending DISC to handle nonlinear dynamical systems? If so, what challenges do they anticipate, and are there any preliminary results or insights they can share?
-
Can the authors provide a more extensive comparison between DISC and other existing robust control methods? How does DISC's performance compare in terms of both consistency and robustness?
局限性
-
The study mainly focuses on specific applications like drone navigation and voltage control, and its applicability to other fields remains uncertain.
-
The competitive ratio bounds depend on prediction errors, which might limit the policy's effectiveness in scenarios with high prediction inaccuracies.
Question 2. Implementation complexity:
We thank Reviewer 6dfr for this question and we argue that the additional computational overhead of DISC is minimal.
Like MPC, DISC requires the prediction of future disturbances at each time step for decision-making. Given this prediction, DISC's overhead primarily stems from two sources: the disentanglement algorithm and the online learning procedure for the confidence parameter . The latter mainly involves computing the gradient of w.r.t. .
ICA theory requires that the latent dimension be smaller than the observation dimension for identification. The computational cost of disentanglement varies by algorithm but is generally efficient. For instance, in the setting of our experiments, the FastICA algorithm completes in under a second.
For the online learning procedure, when under linear mixing, the gradient computation involves several matrix multiplications with the dimension of the matrix dimensions upper bound by the observation dimension. For nonlinear mixing, one can employ a neural network to represent the mixing function and use inference and the Jacobian computation of this neural network to calculate the gradient. Both can be efficiently completed within seconds for observation dimensions typical in control tasks. In our experiments with linear mixing, the online learning of the procedure takes less than a second.
Overall, DISC's computational requirements remain very manageable for most practical applications.
Besides the responses in the formal rebuttal (will be revealed after the rebuttal phase), due to space limitations, in this official comment we include our additional responses to the limitations mentioned at the end of the review:
Limitation 1: While our study focuses on applications like drone navigation and voltage control, the underlying principles and methods of DISC are designed to be broadly applicable. We acknowledge that real-world deployment may vary across different domains, and future work will aim to test DISC in a wider range of applications to better understand its generalizability. We are also exploring how the methodology can be adapted to other fields. In future work, we plan to evaluate DISC on a broader range of datasets and more diverse scenarios.
Limitation 2: We'd like to highlight that the dependency on prediction error in the derived competitive ratio bound offers the first bound of this nature, grounded in a term as a special function of the prediction error, without restrictive assumptions on the errors .
For reference, the competitive ratio bound is outlined informally as follows: where the term hides quantities that vanish when the total number of steps increases; is the number of latent variables generating the perturbations; ; denotes the prediction window size and each (for ) denotes the prediction error corresponding to each latent component.
The last term in this result depends on the prediction error, but it highlights the desired performance guarantee:
Consistency: When a component-wise prediction error is small, the individual term will be negligible. Precisely, if , the resulted -term becomes zero as well. This actually yields a smaller competitive ratio compared to traditional robust control policies that do not utilize disentangled predictions.
Robustness: Otherwise, it always holds that , regardless of how high the prediction error becomes. The robustness can be inferred from this dependency on the prediction error such that even if becomes large, the last term is still bounded since appears in both the numerator and the denominator. It's worth mentioning that such a bounded competitive ratio is not achievable by traditional control policies such as MPC.
Thanks for the response. I will keep my score.
Thank you so much for the comment.
Due to space limitations, we have used the official comment section above to provide additional responses to the limitations mentioned at the end of the review.
Further responses to the major critical questions, including those regarding assumptions and implementation complexity, are included in the formal rebuttal and will be revealed on August 6, 11:59 PM AoE, after the rebuttal period. We hope you find our main responses satisfactory too.
Thank you for all the comments and we appreciate your positive feedback on our work!
Regarding the issues pointed out in the weakness part, here are our explanations and future plans:
Assumptions: We acknowledge the limitations of assuming the continuity and bijectivity of the mixing function . It is worth highlighting that those assumptions are standard in nonlinear independent component analysis (ICA) models [Hyvarinen 2016, Khemakhem 2020, Yang 2022, Zheng 2022]. These advances leverage additional assumptions on the latent variables in and the mixing function to make the model identifiable. Below we summarize those common assumptions (besides the standard assumptions of mutual independence between latent variables and at most one latent variable is Gaussian) made in those models as examples. A similar table can be found in our Appendix B.1.
Standard Assumptions on and for a Subset of Nonlinear ICA Models
| Nonlinear ICA Models | Key Assumptions on |
|---|---|
| Identifiable VAE [Khemakhem 2020] | Mixing function is bijective and smooth |
| Contrastive learning [Hyvarinen 2016] | Mixing function is bijective and smooth |
| Structural sparsity model [Zheng 2022] | Support of the Jacobian of is sparse |
| Volume-preserving model [Yang 2022] | Mixing function is bijective and . |
| Nonlinear ICA Models | Key Assumptions on |
|---|---|
| Identifiable VAE [Khemakhem 2020] | are conditionally independent given a variable |
| Contrastive learning [Hyvarinen 2016] | is non-stationary or has temporal dependencies |
Datasets: Thank you for the comment. In future work, we plan to evaluate DISC on a broader range of datasets and more diverse scenarios.
Implementation Complexity: We discuss the practicality of DISC in the following 1-1 responses to the questions raised.
Question 1. The core step in DISC is the computation of the action through the following -CON policy:
where ; and denote the predicted latent variable and mixing parameter at time . When the number of latent variables increases, the complexity comes from the computation of the Hadamard product ( operations). The learning of involves solving an online optimization whose variable is in . In practice, the state dimension and action dimension are often much larger than and the number of latent variables is not a bottleneck in all these key steps. For example, in our drone navigation example, external disturbances are caused by two latent variables, i.e., the wind speed and rain intensity. Similarly, in the second voltage control example, corresponding to PV integration, wind generation, and residential consumption.
Question 2. Due to space limitations, we provide a detailed discussion in the following official comment.
Question 3. This is a great question. Dealing with nonlinear dynamics is challenging, since there may not exist an explicit expression of the total regret as a function of some trust parameter . It is therefore nontrivial if optimizing online is still possible. We have been thinking about this issue by constructing surrogate functions and would like share more insights if you are interested.
Question 4. We have compared DISC with vanilla MPC and LQR in our experiments. We will add more experiments that compare DISC with robust control methods such as the controller to compare the consistency and robustness.
Thank you once again for your constructive feedback. We will incorporate all your suggestions to enhance the clarity and applicability of our work. Due to space limitations, we defer our responses to the remaining issues in the official comment.
REFERENCES
[Hyvarinen 2016] Aapo Hyvarinen, and Hiroshi Morioka. "Unsupervised feature extraction by time-contrastive learning and nonlinear ica." Advances in neural information processing systems 29 (2016).
[Khemakhem 2020] Ilyes Khemakhem, Diederik Kingma, Ricardo Monti, and Aapo Hyvarinen. "Variational autoencoders and nonlinear ica: A unifying framework." In International conference on artificial intelligence and statistics, pp. 2207-2217. PMLR, 2020.
[Yang 2022] Xiaojiang Yang, Yi Wang, Jiacheng Sun, Xing Zhang, Shifeng Zhang, Zhenguo Li, and Junchi Yan. "Nonlinear ICA using volume-preserving transformations." In International Conference on Learning Representations. 2022.
[Zheng 2022] Yujia Zheng, Ignavier Ng, and Kun Zhang. "On the identifiability of nonlinear ICA: Sparsity and beyond." Advances in neural information processing systems 35 (2022): 16411-16422.
The submission considers LQC with perturbations. The considered problem is interesting with potential applications on LQC with noisy/unreliable ML predictions.
The submission improves the existing method in Robustness and consistency in linear quadratic control with untrusted predictions, where there is a constant parameter determining the confidence of predictions. The submission is novel by learning the parameter in an online manner.
Both theoretical and empirical results are provided supporting the proposed method.
优点
The proposed method is novel by learning the confidence parameter in an online manner. Such a method can have many applications in control with noisy predictions.
The proposed method is analyzed both theoretically and empirically. Improved results are achieved.
缺点
The writing of the submission is really hard to follow.
First of all, the submission mentions some concepts that are abstract and unprecise, like "best-of-both-worlds" and "consistency and robustness".
Further, the introduction has gets into technical details without proper definitions and explanations. Examples include "(1 + o(1))-consistent", "ω(1)-robust", CR(DISC), and "Best-of-both-worlds utilization".
Finally, the update rule of in section 3.2 is missing. It is not self-contained to just refer to some other papers for such an important piece. Without a clear and explicit definition of the update rule of , it is very hard to reproduce or evaluate the proposed method.
问题
The claim on line 157 is confusing. Why when , the term will disappear? This statement requires further assumptions on which are currently missing.
What is the update rule of ?
Also, I am wondering whether it might be a easy/doable direction to extend the proposed method to the case with a dynamics like , where the ML predictions also affect the effect of the actions.
局限性
NA
Thank you for your review. We appreciate your comments on the strengths and acknowledge the concerns regarding the clarity of our writing and presentation. Please find our clarification below.
Clarification of Basic Concepts in AbstractThe detailed meanings of "best-of-both-worlds" and "consistency and robustness" in our abstract are explained below. FYI, the original sentence reads
"Our results highlight a first-of-its-kind “best-of-both-worlds” integration of machine-learned predictions, thus lead to a near-optimal consistency and robustness tradeoff"
In our study, the predicted latent variable time series and the mixing parameters can vary in accuracy, potentially being either precise or erroneous. The term “best-of-both-worlds” highlights that our method's ability to ensure reliable performance (a feature that we refer to as robustness) for worst-case scenarios. Mathematically, this corresponds to a bounded competitive ratio even if the prediction errors are significant. Conversely, if the predictions are near-optimal, our method demonstrates adaptability, and behaves similarly to an optimal controller (a.k.a consistency). This dual capability is encapsulated in our main result, which establishes a bound on the (expected) competitive ratio, illustrating our method's effectiveness across varying levels of prediction accuracy. Note that the control policy is not given the prediction accuracy beforehand.
For further context, this can be seen in our main result that bounds the (expected) competitive ratio:
where the term hides quantities that vanish when the total number of steps increases; is the number of latent variables generating the perturbations; ; denotes the prediction window size and each (for ) denotes the prediction error corresponding to each latent component. The expectation is over the randomness of the control policy if is a general mixing function (note that the control policy becomes deterministic for the linear mixing case).
The meaning of Best-of-both-worlds utilization in the last term reflects the following guarantees on robustness and consistency:
Robustness The robustness can be inferred from the bound above that even if becomes large, the last term is still bounded since appears in both the numerator and the denominator. It's worth mentioning that such a bounded competitive ratio is not achievable by traditional control policies such as MPC.
Consistency On the other hand, if , the resulted -term becomes zero as well. This yields a smaller competitive ratio compared to traditional robust control policies that do not utilize disentangled predictions.
We hope the above explanations make sense. We will revise our abstract and introduction based on the concern and please let us know if there are further questions. Thank you again for the comment.
Technical Details
The formal definitions of -consistent and -consistent are provided in Definition 2.2 (Section 2). Let denote the competitive ratio of a control policy with a fixed prediction error . Specifically, A policy is -consistent if its competitive ratio satisfies for and -robust if for all .
We will add a pointer to clarify in our introduction section. Thank you for pointing out this issue.
Update Rule
In fact, the update rules of are provided formally in later sections for both the linear mixing and general mixing cases correspondingly. Section 3.2 briefly overviews our control policy, without specifying the ONLINE-PROCEDURE. The main reason is because we use different update rules for linear mixing and general mixing settings, so we chose to specify them in later sections, together with the theoretical results.
Update Rule for Linear Mixing In Section 4.2, Eq. (11), we have detailed the FTRL procedure for learning in our context:
\\right)\lambda + \frac{1}{\beta}\|\lambda-\lambda_0\|^2\right)$$ for some $\beta>0$ that can be optimized. **Update Rule for General Mixing** Similarly, in Section 4.2, Eq. (12), we have detailed the FTPL procedure for learning $\lambda_t$. We will revise Section 3.2 to incorporate the comment and improve our presentation. ***Question 1.*** *The claim on line 157 ...* Thank you so much for the comment. Our statement there is not accurate with the current assumption on $f$. We will revise the context correspondingly. Note that our proof for the general mixing case only requires the Lipschitz continuity and bijectivity of $f$. ***Question 2.*** *Extend ... the ML predictions also affect the actions*. In fact, in our $\lambda$-CON policy, $u_t$ indeed depends on the predicted mixing parameters and latent variables, i.e., $u_t =-Kx_t- Y\sum_{\tau=t}^{\overline{t}}\left(F^\top\right)^{\tau-t} P f\left(\lambda\circ\widetilde{s}_{\tau|t};\widetilde{\theta}_t\right)$ as specified in Eq. (5), Line 153. Going beyond our current setting, an interesting future direction is to consider the disentanglement of latent variables when they depend on current system states and actions.Thank you very much for your clarification. I have one follow-up question, for Update Rule for Linear Mixing, how does the algorithm solve the argmin for " FTRL procedure for learning "? Are we using gradient-based method? Or can we get a closed-form solution? Such information is crucial for the implementation of the proposed method.
Thank you.
Thank you for the great question. In the implementation of the algorithm, we solve the follow-the-regularized-leader (FTRL) optimization in Eq. (12) by the following online mirror descent (OMD) for the linear mixing setting:
where represents our internal parameter updates, denotes the gradient with respect to , and is the projection onto the feasible set of .
The OMD above is shown in [Hazan 2010] to be equivalent to the FTRL procedure in Eq. (12).
More precisely, the gradient of the cost gap at time is given by \\nabla\_{\\lambda}\\left(\zeta\_t^\\top H \\zeta\_t \\right) =\\mathsf{J}^{\\top}(\\zeta\_t)\\left(H+ H^\\top\\right)\\zeta\_t = 2 \\mathsf{J}^\\top\\left(\\zeta\_t\\right)H\\sum\_{\\tau=\\underline{t}}^{t}\\left(F^\top\\right)^{t-\\tau} P \\left(\\theta s\_t - \\widetilde{\\theta}\_\\tau\\left(\lambda\\circ\\widetilde{s}\_{t|\\tau}\\right)\\right), where we have used the fact that is symmetric. Moreover,
-\\sum\_{\\tau=\\underline{t}}^{t}\\left(\\left(F^\top\\right)^{t-\\tau} P\\widetilde{\\theta}\_\\tau\\begin{bmatrix} \\widetilde{s}\_{t|\\tau}(1) & &\\\\ & \\ddots &\\\\ & & \\widetilde{s}\_{t|\\tau}(k) \\end{bmatrix}\\right)$$ denotes an $n\times k$ Jacobian matrix $\\mathsf{J}\\left(\\zeta\_t\\right)$ of $\\zeta\_t$ with respect to the confidence parameter $\\lambda$ whose $(i,j)$-th entry is $\\frac{\\partial \\zeta\_t^{(i)} }{\\partial \\lambda^{(j)}}$. Details we provided here regarding the implementation can also be found in **Lines 250-257 (Section 5. Experimental Setup)** and **Lines 793-794 (Section E in the Appendix)**. As suggested, we will revise our manuscript to further clarify the implementation. We appreciate your time and the critical role you play in the review process, and we remain fully open to any further questions or comments you might have during the discussion period. [Hazan 2010] Elad Hazan, and Satyen Kale. "Extracting certainty from uncertainty: Regret bounded by variation in costs." Machine learning 80 (2010): 165-188.Dear Reviewer 4nx1,
We hope our updated clarification of the Update Rule for Linear Mixing is helpful. Please don't hesitate to reach out if you have any further questions or need additional information. Thank you.
The paper considers the problem of LQR where the added perturbations may not be iid Gaussian noise but depend on some latent variables (which are themselves predicted using an ML model). In particular, the authors consider the dynamics given by
x_{t+1} = A x_{t} + B u_t + f(s_t, \theta)
where s_t and \theta are themselves predicted (up to some future window of length w) by an ML procedure. The goal is to provide an algorithm that (a) enjoys a good competitive ratio w.r.t. the optimal policy when the predictions are correct, and (b) is robust to the prediction of the latent variables. The learner knows A, B and f.
They provide theoretical results when the perturbations could be linear functions of the latent variables, as well as general functions, and provide experiments on drone navigation and power grid. The key idea in the algorithm is to first predict a \lambda vector that decides how much to trust each latent vector, and then to run a \lambda-confident policy. The authors use FTRL style online algorithm to predict \lambda.
优点
The paper considers an important problem of using ML predictions in control and being robust to them. As far as I know, bounding competitive ratio for both consistency and robustness simultaneously has not been considered before in the control literature.
The lower bound in Theorem 4.1 that shows that a fixed \lambda does not suffice is interesting. In hindsight, it makes sense since the ML predictions can change arbitrarily (but it is useful to present it).
Sufficient experimental evaluation.
缺点
The techniques are not fundamentally different from prior works, however, the fact that an online predictor of \lambda parameter followed by plug-in control policy in eqn (5) suffices for robustness is interesting. I still support accepting the paper.
Minor typos: Like 311 -- "do not reply" -> "do not imply" Line 21 -- "adaptively" -> "can adapt"
问题
Questions:
- Is there a straightforward way to incorporate unknown A and B matrices?
- Is there a way to use a fixed \lambda but get o(T)-robustness while being O(1)-consistent, or vice-versa in Theorem 4.1?
局限性
NA
Further responses to the major concerns, including those regarding technical novelty are included in the formal rebuttal and will be revealed on August 6, 11:59 PM AoE, after the rebuttal period.
Question 1: Learning and
Yes, in our work, we focus on learning the latent variable time series and the mixing parameters, and designing a corresponding control policy. For the ease of presentation, we suppose and are given. Standard system identification methods can be used to estimate and offline. If and are unknown before applying the control policy, a new online control policy needs to be designed and this is nontrivial for the dynamics considered in this work. For instance, it is not straightforward if learning and online would lead to an additional logarithmic regret as in [Agarwal 2019], while simultaneously guaranteeing the near-optimal consistency and robustness tradeoff presented in our work, as well as the competitive ratio guarantee. This suggests an interesting future direction, and we will add a discussion in the revised manuscript.
Question 2: Stronger statement in Theorem 4.1
This is a great question. In fact, the statement in Theorem 4.1 can be tightened to the following:
"There exists a constant such that if the -confident policy -CON is -consistent, then it cannot be -robust, even if the mixing parameter estimate is perfect, i.e., ."
In the proof of Theorem 4.1, Appendix G, assuming -CON is -consistent implies has to be . Setting directly implies this stronger claim. Indeed, the adversary can choose any to make the total policy cost as large as possible if is fixed. Moreover, the assumption on the consistency can also be tightened from to , i.e., -CON is -consistent for some satisfying
C\\leq \\frac{1}{J^{\star}} \\lambda\_{\min} \\left((R+B^\\top P B)^{-1}\\right) \\sum\_{t=0}^{T-1} \\left\\|B^{\\top} \\sum\_{\\tau=\\ell}^{\\overline{T}}\\left(F^\\top\\right)^{\tau-\\ell} P \\theta\_{\\tau} \\left( \\left(\\mathbf{\frac{1}{2}}_{k}\\right)\\circ s\_{\\tau|\\ell} \\right)\\right\\|^2.
We can construct matrices and latent variables to make a positive constant. Then, applying the inequality (33) in Appendix G, we know can not be the all-zero vector. The same argument in Lines 818-821 implies that -CON cannot be -robust.
In our original presentation, we used to indicate that learning yields a fundamentally better tradeoff. We thank the reviewer once again for the constructive feedback, and we will add additional remarks in our manuscript to further clarify the points discussed above.
REFERENCES
[Agarwal 2019] Naman Agarwal, Elad Hazan, and Karan Singh. "Logarithmic regret for online control." Advances in Neural Information Processing Systems 32 (2019).
Thank you for recognizing the novel aspects and strengths of our work, especially the dual focus on consistency and robustness in the use of machine learning predictions within control systems. We also appreciate your acknowledgment of the importance of the problem we are addressing and the novel contribution of Theorem 4.1, which highlights the limitations of the fixed .
Regarding the concern about the novelty of the techniques used in our research:
1. Technical Novelty While it is true that the foundational benchmarks such as competitive ratios [Goel 2022a, Sabag 2022], and the basic idea of learning trust parameters online [Li 2022] we employed are established within the field, our work is still technically novel in the following aspects.
a. latent variables: We study a control problem with perturbations depend on latent variables. Bounding the competitive ratio of the proposed control policy requires a significantly different technique (see Appendix E, the competitive analysis in Step 4 of the proof of Theorem 4.2) compared to those appeared in the previous literature. For example, special treatments of the total regret are necessary to decompose it as (see Eq. (43)) in the proof of Lemma 6, Appendix H.
b. Nonlinear : Compared with the linear quadratic control models in [Goel 2022b] and [Li 2022], nonlinear perturbations lead to further technical challenges on learning since they form a Hadamard product inside the mixing function as in the -CON policy (Eq. (5)): where .
c. Quadratic form transformation lemma: More importantly, we do not assume bounded variations of system perturbations and errors, which are often required in existing self-tuning policy [Li 2022], and robust control literature, such as robust model predictive control MPC [Berberich 2020] to ensure worst-case guarantees. For instance, the competitive ratio bound in [Li 2022] depends on a term where the self-variation of a sequence is defined as
\mu\_{\mathrm{VAR}}(\mathrm{y}):=\sum\_{s=1}^{T-1} \max _{\tau=0, \ldots, s-1}\left\|y\_\tau-y\_{\tau+T-s}\right\|. $$ The main reason of having such a term is because the self-tuning policy there basically updates $\lambda$ through a FTL-type online optimization. In contrast, proving our main result above is nontrivial due to the fact that despite it is known that an input-disturbed linear system can be reduced to an online convex optimization (OCO) with structured memory [Shi 2020], the connection between the problem with $\lambda$-CON and a memoryless online optimization is not previously discovered. In the *quadratic form transformation lemma* (Lemma 1), we provide a result that decouples the loss terms in the total regret and future perturbations, thereby reducing the problem of choosing $\lambda_t$ to an online optimization instance. Then we use a two-stage analysis that combines a dynamic regret bound for our control policy and static regret bounds induced by online learning algorithms to derive the main result. Furthermore, the perturbations are assumed to be bounded in [Li 2022]. Our technical achievement lies in offering the first bound of this nature, grounded in a term as a function of the prediction error, without restrictive assumptions on the errors $(\overline{\varepsilon}(1),\ldots,\overline{\varepsilon}(k))$. Besides, the novelty of our work lies not only in the techniques themselves but also in how they are integrated together and applied to address a problem that has not been previously tackled in the learning and control community, as demonstrated by our practical examples. We will add remarks to address these issues in the revised manuscript. ***2. Typos*** Thank you so much for catching them. We have changed "reply" to "rely" and "adaptively" to "can adapt" in the revised manuscript. ***3. Questions*** Thank you for the great questions. Due to space limitations, we discuss them in the following comment. ```REFERENCES``` [Berberich 2020] Julian Berberich, Johannes Köhler, Matthias A. Müller, and Frank Allgöwer. "Data-driven model predictive control with stability and robustness guarantees." IEEE Transactions on Automatic Control 66, no. 4 (2020): 1702-1717. [Shi 2020] Guanya Shi, Yiheng Lin, Soon-Jo Chung, Yisong Yue, and Adam Wierman. "Online optimization with memory and competitive control." Advances in Neural Information Processing Systems 33 (2020): 20636-20647. [Goel 2022a] Gautam Goel, and Babak Hassibi. "Competitive control." IEEE Transactions on Automatic Control 68.9 (2022): 5162-5173. [Goel 2022b] Gautam Goel, and Babak Hassibi. "The power of linear controllers in LQR control." 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022. [Li 2022] Tongxin Li, Ruixiao Yang, Guannan Qu, Guanya Shi, Chenkai Yu, Adam Wierman, and Steven Low. "Robustness and Consistency in Linear Quadratic Control with Untrusted Predictions." ACM SIGMETRICS Performance Evaluation Review 50, no. 1 (2022): 107-108. [Sabag 2022] Oron Sabag, Sahin Lale, and Babak Hassibi. "Competitive-ratio and regret-optimal control with general weights." 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022.The paper investigates the setting of linear quadratic control with latent perturbations. In particular, the state evolution does not only depend on the current state and the control input, but also on various latent variables that are mixed through a linear or nonlinear mixing function of unknown parameterization. The authors attempt to address this problem by considering a -confident policy that balances consistency and robustness. However, they shows that a fixed confidence value cannot achieve a good consistency and robustness tradeoff. Motivated by this, the authors suggest to use a disentangled confidence policy, where the algorithm adaptively learns the trust parameter in an online manner. The authors are able to prove that their scheme achieves near-optimal competitive ratio bounds for both linear and general mixing cases (under reasonable assumptions). The authors confirm the significance and superior performance of adaptive learning for the trust parameter using interesting use cases inspired by real-world applications.
优点
- The problem statement is well-motivated and nice practical examples are provided.
- The paper makes several interesting contributions. I particularly liked the fundamental gap in Theorem 4.1, which motivates that we need to learn the confidence parameter in an online manner. Furthermore, the authors provide competitive ratio results, not only for the linear case, but also for the general mixing case. The assumptions (e.g., Lipschitz continuity and bijectivity of are not so weak, but also not unreasonable).
- On the theory side, the paper is quite rich and the proofs seem to work in general (even though I do have some concerns, as I explain in the following sections).
- The experimental evaluation confirms the fact that online learning of the -confidence parameter is beneficial in two interesting settings. Results look good overall.
- The authors discuss several aspects of their work in the Appendix and most aspects are explained quite well.
缺点
- The paper contains several errors in various formulas. In the Questions section below, I have mentioned several problems I was able to identify, but there could be more. These errors do not seem to be critical, but the manuscript still needs a very careful proofreading to fix them.
- I was not clear about certain theoretical claims, like the norm or certain assumptions. Please also see Questions section below. I believe the authors should be able to address all of them.
- Even though the paper offers theoretical result on general mixing functions, all experiments seem to focus on linear mixing. The authors explain that the nonlinear mixing case is indeed much harder than the linear one (where Fast ICA is used). To me, the problem in the general nonlinear case is that the mixing parameter can be much harder to estimate accurately, so the estimation of the latent variables will also be negatively affected. So, I am not sure how practical the proposed algorithms would be for the general nonlinear case. Perhaps simple experiments with just 2 latent variables (but nonlinearly mixed) could offer some preliminary insights.
问题
- In Equation (8), it seems to me that the matrix was never previously introduced.
- In Equation (8), it seems to me that the index is not present in the right hand side of (8). Is it possible that the authors meant instead of ?
- Similarly, in Equation (9), shouldn't the upper index on the summation operator be ? At least, this is what the authors probably imply in Line 169.
- In Line 164, it is explained that any takes values in , but then in Line 168 appears to have components. Obviously, this cannot work for the Hadamard product . Shouldn't any given have components in total? I believe the authors probably mean the sequence of 's in Line 168, but I think this should be explained more clearly.
- In Figure 2a, how exactly is the offline optimal policy generated? Do the authors rely on the prior literature for this purpose?
- The experiments by the authors seem to rely on linear mixing matrices. As the authors explain, this is by far the simplest setting, unlike general nonlinear mixing functions. Have the authors experimented with nonlinear mixing functions, even with just 2 latent variables? It would be interesting to see how the whole framework (including the estimation of the latent variables) is affected in such a case. Furthermore, the paper's theory covers nonlinear/general mixing functions.
- In Line 779, the authors make us of the norm. They claim that this is because of the Cauchy-Schwarz inequality. My understanding is that the Cauchy-Schwarz inequality (where the norms are -norms is only true for the norm. I was not clear how the norm suddenly appeared there. Could the authors detail their calculations?
- Equation (42) in Line 850 also involves the norm, but I think it should be totally analogous to the point above.
- In Equation (32), the authors define but then in the right-hand side of (32) there is no . Why not just use the same inner summation expression as in (31) instead of introducing new indexes? Furthermore, the inner summation expression in (31) has as the upper index - why then in (32) this is changed to (or, perhaps, )?
- Line 813 makes sense mathematically if is a positive definite matrix. Have the authors shown that formally? I was not so clear what is going on with matrix appearing inside it.
- I am not sure I understood the claim in Line 801.The authors claim that Lemma 2 implies that because is never 0 by assumption. However, in theory someone might claim that the claim could be violated if can become arbitrarily close to 0 but never 0 exactly. Obviously, this is not possible for continuous function, but might be true for strange functions. For general functions, one might instead wan to impose that is lower bounded by some positive constant. That said, I feel that this is not a problem in this work, because is assumed Lipschitz, hence it must also be continuous (and even uniformly continuous).
- For a more rigorous exposition, could the authors discuss why are convex functions of in the linear mixing case or when is convex, so that OCO regret bounds are applicable in these cases?
局限性
No concerns.
We greatly appreciate all the questions received. Further responses to the major questions, including those regarding Questions 1-11, are included in the formal rebuttal and will be revealed on August 6, 11:59 PM AoE, after the rebuttal period.
In this official comment we summarize our responses provided earlier and provide detailed answers to Question 3 and 12.
1. Errors in Formulas Thank you for identifying errors in various formulas throughout the manuscript. We will conduct an extensive review of the entire manuscript. We present responses to the issues one-by-one in the rebuttal section.
2. Theoretical Claims and Assumptions Thank you for your feedback regarding the clarity of certain theoretical claims in our manuscript. We defer our responses to the rebuttal section, where we carefully respond the questions to ensure that all your concerns are adequately addressed.
3. Nonlinear Mixing Case Thank you for your insightful comments regarding our focus on linear mixing in the experiments. We provide additional discussions in Global Rebuttal.
Below we include additional answers to Questions 3 in this comment.
Question 3. For further context, consider the application of online optimization strategies to minimize total regret as depicted in Equation (8). At each time , we obtain . For , since the summation is over , is a function that depends on . But in the original total regret formula (up to time ) relies on future . Thus, the formulation of the offline optimization (8) does not conform to a canonical framework suitable for online optimization, necessitating the role of Lemma 1. We will revise the manuscript to clarify.
I appreciate the authors' very detailed rebuttal. I believe these clarifications significantly improve the exposition and the manuscript clarity. For this reason, I will increase my score. Some clarifications may have seemed obvious, but I still think it is good to incorporate them in the revised version of the manuscript, as not all readers will be familiar with all the details. Regarding the experiments with nonlinear mixing functions, I do realize this case is much more challenging but the preliminary results presented in the rebuttal are encouraging.
Thank you so much for your high-quality, very detailed, and insightful comments. They are very helpful and we sincerely appreciate a lot for what you dedicated in the reviewing process. Below are our responses to the questions:
Question 1. Thank you for pointing out the typo. The matrix is indeed defined in Appendix E (see Theorem E.1). We will revise the manuscript to include a definition of before Eq. (8).
Question 2. Thank you for catching this error. The correct notation indeed involves summation over rather than , aligning with the notation used in Lemma 1. The corrected equation should read:
We will update this in the manuscript.
Question 3. In Equation (9), the upper index on the summation operator is indeed accurate. The clarification in Lines 168-169 highlights that the per-step cost in Equation (8) is dependent on the time horizon . Therefore, it cannot be solved as a canonical online optimization problem at each time since in general for (see Figure 7 in Appendix D for a pictorial explanation). This necessitates a tailored approach for online learning enabled by Lemma 1, which addresses this challenge and derive an equivalent optimization.
Question 4. Yes, in Equation (8), the total regret is defined as a function of the sequence , where each within the sequence, belonging to the interval , is learned online. We will clarify in the revised manuscript. Thank you for the comment!
Question 5. In Figure 2(a), the offline optimal controller is derived given all future perturbations of the following discrete-time kinematic model (similar models can be found in [Li et al. 2019] and [Yu et al. 2020]):
\Delta x_{t+1} \\\\ v_{t+1} \end{bmatrix} = A \begin{bmatrix} \Delta x_{t} \\\\ v_{t} \end{bmatrix} + B u_t + C w_t,for some matrices , , . A detailed description of the drone piloting problem can be found in Example 1, Appendix B.2 and Appendix C.1. The offline optimal control is known as (see [Goel et al. 2022])
where , assuming the terminal cost-to-go matrix is . Note that this is equivalent to the MPC-type action in Eq. (5) with true predictions of latent variables and mixing parameters.
Question 6. Thank you for the great suggestion. We have been working on examples for nonlinear mixing functions, and have discussed them in the Global Rebuttal.
Question 7. Thank you for the question. The -norm appears because of the Hadamard product between and . The detailed derivation is as follows.
Applying the Cauchy–Schwarz inequality, the term in Line 778 satisfies
\left(\sum_{i=1}^{k}(1-\lambda(i))^{2}(s_{\tau}(i))^{2}\right)^{\frac{1}{2}}\leq \left[\left(\sum_{i=1}^{k}(1-\lambda(i))^4\right)^{\frac{1}{2}}\left(\sum_{i=1}^{k}(s_{\tau}(i))^4\right)^{\frac{1}{2}}\right]^{\frac{1}{2}}$$ where the RHS is $$\left(\sum_{i=1}^{k}(1-\lambda(i))^4\right)^{\frac{1}{4}}\left(\sum_{i=1}^{k}(s_{\tau}(i))^4\right)^{\frac{1}{4}}=||\\mathbf{1}\_{k}-\lambda||\_{4}||s\_{\tau}||\_{4}.$$ Note that the Cauchy–Schwarz inequality applies to the inner product between the vectors $\\left((1-\lambda(1))^{2},\ldots,(1-\lambda(k))^{2}\\right)$ and $\\left(s_{\tau}(1))^{2},\ldots,(s_{\tau}(k))^{2}\\right)$ whose entries are already squared. ***Question 8***. Yes, Eq. (42) can be derived similarly. ***Question 9***. We apologize for the typo in Eq. (32). The original term $\psi_{t,T}$ should be replaced by $\psi_{\ell,T}$. We changed the time index in the summation from $t$ to $\ell$ to avoid abusing the online time index $t$ but didn't fix the index in $\psi_{t,T}$. Given a fixed prediction window size $w$, the upper index would be $\min\\{\ell+w-1,T-1\\}$ in Eq. (31). Thank you again for correcting us. We have checked all the equations thoroughly in the revised manuscript to make them consistent. ***Question 10***. The property of $P$ being positive definite is guaranteed as a solution of the Discrete-time Algebraic Riccati Equation (DARE). More specifically, given positive definite $Q$ and $R$, $P$ as a solution of the following DAREP=Q+A^\top P A - A^\top PB (R+B^\top P B)^{-1} B^\top PA.
is symmetric and positive definite. This is a well-known result in classic optimal control theory (c.f.
discussions in [Richardson 1986]), and $P$ determines the solution of LQR and LQG.
***Question 11***. Yes, you have used the Lipschitz continuity of $f$. We will clarify in the revised proof.
***Question 12***. Due to space limitations, we have appended a proof in **Global Rebuttal** above.
```REFERENCES```
1. [Li 2019] Yingying Li, Xin Chen, and Na Li. "Online optimal control with linear dynamics and predictions: Algorithms and regret analysis." Advances in Neural Information Processing Systems 32 (2019).
2. [Yu 2020] Chenkai Yu, Guanya Shi, Soon-Jo Chung, Yisong Yue, and Adam Wierman. "The power of predictions in online control." Advances in Neural Information Processing Systems 33 (2020).
3. [Goel 2022] Gautam Goel, and Babak Hassibi. "The power of linear controllers in LQR control." 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022.
4. [Richardson 1986] Richardson TJ, Kwong R. On positive definite solutions to the algebraic Riccati equation. Systems & control letters. 1986 Apr 1;7(2):99-104.Thank you once again for all the comments. We appreciate your feedback on the preliminary results with nonlinear mixing functions and will incorporate the clarifications in the revised manuscript. Please let us know if there are any additional concerns or suggestions.
I meant to raise my score to 7, not 6. Sorry for the inconvenience.
We greatly appreciate all the questions received. In this global rebuttal, we provide more detailed discussions.
Nonlinear Mixing Function:
Regarding Question 6 from F6Jr, we provide a simple experiment involving two latent variables to offer preliminary insights. While our current experiments focus on the linear setting, DISC’s implementation can be efficiently generalized to nonlinear scenarios.
Following the setting in our paper, we use the same and as in the paper's tracking example. The disturbance is generated nonlinearly from a two-dimensional latent variable , where records absolute value of a sinusoidal curve, and is that of a Laplacian noise. We set , and .
For disentanglement in this nonlinear setting, we employ Slow Feature Analysis (SFA) with nonlinear features. SFA is an unsupervised learning algorithm that extracts slowly varying features from quickly varying input signals. It is closely related to ICA and is a strong candidate for blind source separation problems. We extend the linear mappings of vanilla SFA to nonlinear by expansion into a nonlinear basis using scikit-learn’s PolynomialFeatures.
The online learning of confidence parameter involves computing the gradient of w.r.t. , which can be computed similarly to the linear mixing setting. In this nonlinear example, we employ a neural network to approximate the mixing function, using its inference and the Jacobian computation to calculate the gradient efficiently (also see our response under 'Implementation complexity' to Reviewer 6dfr for more details). In addition, DISC's performance naturally depends on the performance of the employed nonlinear disentanglement algorithm. Developing more efficient and stable disentanglement methods is an interesting direction for future research.
Experimental results show the baseline self-tuning policy achieves a cost ratio (CR) of 3.22, the MPC with a CR of 3.30, and the LQR with a CR of 8.28. Nonlinear DISC achieves a CR of 2.22 in this example of nonlinear mixing functions, beating other baselines. All results are averaged over 5 runs. We also include nonlinear DISC's learning curves of each component of in the attached PDF. We find that , which corresponds to the more predictable latent s1 (sinusoidal curve), quickly converges to the optimal value of 1. Conversely, which corresponds to the more unpredictable latent (Laplacian noise) stays below 0.5. We will add more examples in the revised manuscript and make the code public.
Convexity of :
Regarding Question 12 from Reviewer F6Jr, below we verify that is a convex function of .
By definition,
which equals to (since is symmetric)
Now, we simplify the remaining two terms. Notice that for some matrix . To see this, denote We get
Therefore, it suffices to validate the matrix is positive semi-definite to show is convex where . Note that . Since is positive definite by assumption and the DARE solution is also positive definite (see the response to Question 10), is a summation of a quadratic form and a linear function of , validating the convexity of .
Dear Reviewers,
We sincerely thank you for your constructive and insightful reviews of our paper. We have carefully addressed your feedback and will revise our manuscript accordingly to incorporate these improvements.
As we approach the end of the discussion phase, we welcome any final questions or comments you may have. Your expertise and insight are invaluable to us.
We deeply appreciate the time and effort you have contributed to reviewing our paper. Thank you once again for your invaluable contributions.
This paper generated a fair body of discussion between the authors and the reviewers during which the authors were able to convince the reviewers regarding the merit of their approach; this is evident in the revised scores. The reviewers appreciated the introduction of prediction quality that goes beyond a scalar measure, and thorough experiments that accompany the theory. Glad to recommend this paper for acceptance.