Stabilizing LTI Systems under Partial Observability: Sample Complexity and Fundamental Limits
We design a novel algorithm to stabilize an unknown partially observable LTI system with a complexity bound independent of the state dimension.
摘要
评审与讨论
The paper proposes an approach to designing a stabilizing controller for partially observable linear time-invariant systems with disturbances. Based on a number of (partially) observed trajectories, obtained by applying random Gaussian control inputs, the unstable system components are approximated for a given rank. By raising the system matrix to a given power, the influence of the unstable eigenvalues is amplified, which is crucial for their identification. This is referred to in the paper as lifting the Hankel matrix. The resulting approximation is then used to design the stabilizing controller. The paper provides probabilistic guarantees that the derived controller actually stabilizes the system, and clearly identifies the influence of the system parameters as well as the hyperparameters of the algorithm.
优缺点分析
Strengths:
- The paper extends existing work from fully observable to partially observable systems.
- The probabilistic guarantees are clear and expose the influence of critical parameters, both of the system and the algorithm.
- The sample complexity is relatively low. As in prior related work, the system dimension does not directly influence it. In particular, even modest increases in the number of trajectories substantially raise the probability of success.
Weaknesses:
- The paper is somewhat technical and compact. I admit this is somewhat in the nature of the contribution and not easy to improve. For example, it took me a while to remember what m,M,p,q stand for. Since the result at the end proposes p=q=m, this could be mentioned earlier to ease some of the cognitive load. Reading Algorithm 1 requires referring to 8 equations that are listed on the next page. An intuitive explanation would be just as useful.
- The paper is largely inspired by [50], where, thanks to full observability, only a single trajectory is required.
问题
In practice, an unstable system will be only be observed or simulated over a short time horizon. This situation seems to merit some discussion. In my understanding, it suffices, roughly speaking, that m is as large as the unstable dimension k and T is larger than 3m. Then any probability can be guaranteed by making M large enough.
The claim about boundedness in line 67 could be clarified: In a strictly finite observation scenario (which is considered here), Hankel matrices are finite and bounded.
局限性
I did not see any discussion of numerical difficulties that might arise from observations of unstable systems, the inversion of the matrices used in the algorithm, etc. In fairness to the authors, I must admit that the paper is already heavily packed and I have no suggestions on how to make more room for such an analysis.
Note that I did not check the lengthy math appendix in detail, but I have no reason to doubt its correctness.
最终评判理由
The author response answered my questions. I maintain my judgement that this is a paper that should be accepted.
格式问题
none
We thank the reviewer for the careful and thorough review and provide the explanation below:
-
Trajectory length: The reviewer is absolutely correct that by picking a minimally satisfactory with and , simulations have shown we can increase the probability of getting accurate system identification and stabilization policy almost surely by increasing . This roughly matches the order in the theorem ( scales linearly with ), though we do acknowledge that the theorem has very large numerical constants and needs to depend on other constants as well. We have added discussions on this in the paper.
-
Bounded Hankel around line 67: The reviewer is right that we should have been more careful with word choice. While it is true that in a strictly finite observation scenario, Hankel matrix is always finite and bounded, estimating Hankel matrices accrately becomes much more difficult when the system is unstable. As the -th entry of the Hankel matrix is for some , if is stable, decreases with increase in , so the error of each entry decreases as the trajectory gets longer. However, if is unstable, increases, so any error in system estimation of (and by extension ) would be amplified, making the problem more difficult. We will clarify this in the revision.
We hope the above have answered your questions and provided some more context. Please let us know if you have further questions or suggestions.
Thank you for your detailed response. Your comments clarify the issues I raised and I maintain my positive evaluation.
This paper studies the problem of stabilizing unknown partially observable LTI systems, a natural and necessary extension of previous results in the fully observable setting. The authors propose LTS-P, a novel algorithm that efficiently stabilizes the system via isolation of the unstable system modes using compressed SVD on a "lifted" Hankel matrix, and via a robust controller design under a small-gain-type assumption on the stable residue. The algorithm features a reduced sample complexity that only depends on the number of unstable modes and is thus independent of the system dimension. Preliminary simulation results confirm its outperformance over existing algorithms.
优缺点分析
Strengths:
- The paper is overall well-written and well-organized. It provides an abundance of intuition via simplified settings and comparisons against known results in the main text, while retaining mathematical rigor in the appendix by providing additional technical details.
- The research question is of great theoretical interest, as it's a natural follow-up of several recent works in the fully observable setting.
- The algorithm design idea builds upon a solid ground that isolates the unstable modes of the system, but also provides new insights that leverages the transfer function, "lifted" Hankel matrix and small-gain-type arguments to properly handle partially observable systems.
- The theoretical results are significant, and the proofs are checked to be sound.
Weaknesses:
- Assumption 5.2, though coming with a detailed comment on its necessity, still seems a little suspicious. Specifically, it is unclear how scales with the system dimension . The order of sample complexity could be made more convincing if the growth rate of (and also ) is, at least, illustrated with a representative family of systems with increasing dimensions.
- The simulation results presented in Appendix A is, at best, preliminary and illustrative. Since is selected as the complexity metric, it is natural to ask if there are any potential tradeoffs between and , which is not reflected in the simulations. Instead, performance is only compared in a restrictive setting, with either or fixed. The experimental results could be more convincing through more settings and more results.
- Some notations, though easy to judge from context, are not defined before first use. For example, in eq. (18) seems to be the th block of .
- A few typos are observed, including "ovservable" -> "observable" (line 32), "folloing" -> "following" (line 175), etc.
问题
- How do system parameters (, ) scale with the system dimension ? See above for details.
- Additional experimental results would be appreciated.
局限性
N/A
格式问题
N/A
We thank the reviewer for the careful and thorough review and provide the explanation below:
-
Scale of constants: The constant is a stability margin linking the necessary condition for the existence of a stabilizing controller, i.e. as in Lemma 3.1 and the actual bound on in Assumption 5.2 which requires some margin on top of Lemma 3.1. If we interpret to be in the same order as , then it does not inherently scale with as characterizes the input-output frequency response, related to the input-output dimension, not directly related to the internal state dimension. On the other hand, covers all other constants needed in Theorem 5.3, it upper bounds the matrix norm of all of the following matrices . If we take to be from a large-scale networked LTI systems, i.e. with very large but admits a sparsity pattern to a sparse graph, then scales with the maximum degree of the graph (assuming each entry of is ).
The dependence on is also shown in simulation, i.e. as we increased the state dimension with randomly generated matrices, the length of trajectories and number of trajectories stayed the same.
Finally, we point out that even if depends on polynomially, its effect on the final complexity bound is only as only shows up in terms.
-
tradeoff: Thanks for your suggestions. The and trade off is indeed the idea we want to analyze in the simulation. Although the preliminary result shows that fewer trajectories (smaller ) requires longer trajectories (larger ) and shorter trajectories (smaller ) requires more trajectories (larger ), we did only include one simulation for a single fixed and varying and another simulation for a single fixed and varying . We did not include more results because it seems the trend of the plots look similar to what is present. However, the reviewer is right in pointing out that there might be value in collecting the percentage of successful stabilization for a fixed across different pairs. We are currently completing this simulation and will include that in the final version.
We hope the above have answered your questions and provided some more context. Please let us know if you have further questions or suggestions.
Thank you for your response, and also for this great work! I'll keep my positive evaluation of this paper.
This paper gives an algorithm for learning stabilizing controllers for an unstable LTI systems. The key insight is that the unstable dynamics can be learned via a specialized subspace identification algorithm. Then, as long as the stable part has a sufficiently small norm, a robust controller designed for the unstable component will stabilize the system with high probability.
优缺点分析
Strengths
- This gives at least a partial solution to a long-standing problem in learning-based control: learning a stabilizing controller for partially observed systems.
- The writing is reasonably clear.
Weaknesses
- Theorem 5.3 appears to be stated incorrectly. Namely, I'm guessing (based on Lemma B.1 and Theorem 3.2) it should be that for all the system is stabilized with probability at least . The theorem is currently stated as "for some " the system is stabilized with probability at least .
- The work doesn't give a full solution to the stabilization problem: It really only works for sufficiently small stable parts. (I.e. Assumption 5.2 is required.) While an argument about how to extend the method is sketched, a complete solution that works for all stabilizable and detectable LTI systems would be desirable.
- The paper is somewhat inconsistent in the treatment of process noise. Namely, process noise is included in (14) and (26), but is not used in the (1) or the main theory. My guess is that the problem would be substantially harder with process noise. Namely, (31) would include terms that get amplified through the unstable dynamics, leading to substantially larger errors in the Hankel matrix estimation.
- Similarly, the paper assumes that the state can be reset to 0 at the beginning of each trajectory. If this were not the case, similar to the problem with process noise, unobserved, non-zero initial conditions would make the Hankel matrix errors much larger.
- The choices of , , and used in the simulations are inconsistent with the theory.
- The choice of in Theorem 5.3 requires knowledge of the eigenvalues, and properties of various Gramians. As written, this does not appear to be very useful, since it requires choosing a parameter of the algorithm, , based on properties of the unknown system. The result could be stated a bit differently, so that for a given , the algorithm can stabilize the system provided the bounds on the eigenvalues and Gramians holds. I.e. for a given , the algorithm would work for all systems of a particular class.
问题
- What happens if there are marginally stable eigenvalues?
- Can you do the simulations with the theoretical choices for , , and ?
- Do you run into numerical problems when is large?
- Can Assumption 5.2 be removed completely?
局限性
Yes.
最终评判理由
The authors did a good job convincing me that the contribution was reasonably strong.
However, there are still issues remaining that were not adequately addressed:
- Assumption 5.2 does not appear to be necessary. Only necessary when the stable part is not identified.
- I still think the presentation of the parameter choices in the theorem is not good.
- The work does not not adequately address issues of non-zero initial conditions and process noise.
Based on the limitations above, I don't really want to raise beyond a weak accept.
格式问题
Nothing major, but some equations go into the margins.
We thank the reviewer for the careful and thorough review and provide the explanation below:
-
Regarding : We thank the reviewer for pointing out the typo in the statement of Theorem 5.3. It should have been "with probability at least ", instead of "with probability at least ".
-
Assumption 5.2 and no marginally stable eigenvalue: Both assumptions are necessary for this unstable+stable decomposition to work. As discussed at the end of Section 5, Assumption 5.2 comes from the Small Gain Theorem, which is an if and only if condition. If Assumption 5.2 is not true, it can be shown that no matter what control law the user picks, there always exists some that would make the system unstable [53]. Regarding marginal stable eigenvalues, as our results depend on the separation of the unstable and stable components, we do not allow marginal eigenvalues. That being said, we do allow the eigenvalue to be arbitrarily close to , and its impact on sample complexity is characterized in our sample complexity bounds, discussed under the ''Dependence on system theoretic parameters'' paragraph after Thm 5.3.
Regarding your point ''The work doesn't give a full solution to the stabilization problem:'', we wish to point out that the learn-to-stabilize problem has been shown as a challenging problem with various exponential lower bounds proven in [9] [48] and further in paper ''Learning to Control Linear Systems can be Hard''. Our result essentially carved out a significant subset of problems (those satisfying assumptions in the paper) for which learning-to-stabilze can be easy. In fact, our assumptions is minimal in order to break the hardness of stabilization proposed in [9,48] if one only stabilizes the unstable subspace of the system. While we do acknowledge there is more to do towards ``full solution to the stabilization problem'', our results already make a significant step towards this problem as all prior results actually show learn-to-stabilize is hard.
-
Process noise: The paper's theoretical guarantee works when there is no process noise, but observation noise is allowed (Equation (1), (14), (26) all include the observation noise ). Process noise is only included in Equation (26) as to show that the proposed method works under process noise in simulation. As the reviewer points out, introducing process noise would enlarge the error of estimating with an additional error term in Equation (31).
We clarify that the main technical novelty of the paper is the stable + unstable decomposition (including techniques to separate the unstable component from the lifted Hankel), whereas the system ID part to get the Hankel is not the main focus. We chose to assume process noise is to keep the system ID part cleaner, avoiding complicating the paper with unnecessary details. If process noise are to be included, then as long as the Hankel estimation error can be bounded, the rest of the proof would work in a similar way. Here is the outline of what the proof would look like:
If there is process noise , a matrix will have to appear in the proof, so the estimation error of will contain a term that increases exponentially with . This is to be expected, as process noise would be amplified by the unstable part of . Once we bound the Hankel estimation, the rest of the proof will be exactly the same as in this paper. In order to finish proof in this direction, a fine balance of needs to be selected, so that it needs to minimize the current terms in (109), but also not make a term containing blow-up, so more weight needs to be applied to the number of trajectories , instead of . However, this fine balance seems only to be present on the above theoretical analysis, as in simulation, we do include process noise we see that (a) the number of samples needed for stabilization does not increase with , confirming our approach works under process noise; (b) a longer trajectory still helps with stabilization even with process noise, meaning the fine-balance regarding the choice of is not present in our simulation. We leave how to improve the theoretical analysis to incorporate process noise as a future direction of this paper.
-
Initial condition: The paper does assume the system is reset to at the start of each trajectory, but this is merely for the simplicity of proof. For simplicity, we will set again in this part of the discussion. If , then , with the additional term known as the free-response term. We can pick some such that and define and . We can then project the future outputs orthogonally to the row space of by . Then, we can use the observation for the proposed algorithm. This is a quite popular method used in conventional control methods, such as N4SID. For details on how and why it works, please refer to Section 3 of ''N4SID: Subspace Algorithms for the Identification of Combined Deterministic-Stochastic System'' by Peter Van Overschee and Bart De Moor. This is indeed an important issue; we have added more discussion on it in the paper.
-
Inconsistencies in btw theory and simulation: Theorem 5.3 offers the order of , not the exact value we should pick (the means only ignoring numeric constant as defined at the start of Section 2 ). These expressions provide theoretical insights, showing how should depend on system size , unstable subspace , observability, and contrability of the system. Making an exact computation of the omitted numerical constants might offer conservative values, as throughout the proof, many terms are relaxed and merged with the leading order term, making the final order-wise dependence clean at the cost of a conservative numerical constant. For this reason, in experiments, we do not follow the exact theoretical values, but follow trial and error instead. This is common practice in many theoretical literature in control and optimization, where the parameters chosen in the Theorems are conservative, and simulations use much more relaxed parameters.
-
Usefulness of bounds for : We disagree that ``the choice of ... does not appear to be very useful''. The choice of eventually translates to the sample complexity requirements , which answers fundamental questions of how the sample complexity of stabilization depends on system theoretic parameters like , eigenvalue gap, controllability gramian etc. These bounds provide insights like some systems are inherently easier to stabilize than others. For example, we see it is harder to separate the stable system from the unstable system if there are marginally stable eigenvalues. This is why is proportional to and . Moreover, the less observable or controllable a system is, the smaller or is, respectively, and the larger or is, respectively. Therefore, the less observable or controllable a system is, the harder it is to stabilize the system.
We acknowledge that in practice, the theoretical values for cannot be exactly computed, as it depends on the parameter of the unknown system. However, we point out that as long as we plug in conservative estimates of those parameters into (e.g. lower bounds for ), the theorem still works. Further, as in many theoretical papers, the selection of those parameters in experiments may differ from those in theory. In fact, in our experiments, the choice of is very forgiving. Unless is too small (so we did not converge to unstable subspace enough) or too close to (so we did not leave enough samples for system estimation), we generally get quite good system estimation. Throughout the simulation, we did not need to do much hand-picking on . We have added more discussions on this issue in the paper.
We hope the above have answered your questions and provided some more context. Please let us know if you have further questions or suggestions.
The rebuttal has convinced me that the paper is solving an important sub-class of the general learning to stabilize problem. Based on this, I will raise my score to a weak accept.
But I would like to push back and clarify a few points:
- Assumption 5.2 is only necessary in the case that just the unstable component of the transfer function is being learned. This is in order to apply the small gain theorem for the stable component. But if the stable component were also learned, Assumption 5.2 should not actually be necessary.
- This is what I meant by not having Assumption 5.2. The writing in the paper is fairly careful to note that the necessity arises in the specific case that only the unstable part is used.
- While I agree that knowing the requirements on is useful, the authors seemed to have somewhat missed my point on my comment about it. Of course, understanding what the should satisfy is valuable. My point was about how it is written. I was encouraging the authors to rewrite the theorem in a way that didn't require making a choice of an algorithm parameter based on unknown quantities.
- My original suggestion was to reframe the statement so that for a particular choice of , the algorithm guarantees stability for all systems in a particular class. (Found based on the calculations for )
- The suggestion from the rebuttal would be to approximate/bound the parameters used to calculate . This seems like more work, if you actually wanted to do it rigorously. But it would be perfectly valid, if it were done.
- I'm not convinced that the argument from Van Overschee and De Moor actually covers the non-zero initial condition problem. Namely, as discussed in the referenced paper, the signals are required to be quasi-stationary, which fails in the unstable case.
My issues with the mismatch between the theory and the parameters used in numerics is more an issue of preference. I recognize that the practices in the paper are common. But it does point to gaps between what is observed in practice and what can be proved theoretically, and I do think it is valuable to be up front about those gaps.
We thank the reviewer for the timely response and suggestions and provide some clarifications below:
- The authors agree that Assumption 5.2 is only necessary if we only learn the unstable component of the transfer function. As pointed out by the reviewer, the submission is ''fairly careful to note that the necessity arises in the specific case that only the unstable part is used''. In the revised paper, we will make it even clearer.
- The authors thank the reviewer for the suggestion on the statement of Theorem 5.3. We are working on formulating a corollary, which states that given a particular choice of , what are the conditions on the system parameters (e.g. the unstable dimension , control input dimension , output dimension , controllability/observability grammain parameters etc), so that algorithm can assure stabilization of the system with high probability. We will add this corollary to the revised version.
- The authors acknowledge that the theoretical guarantee for non-zero initial position in Van Overschee and De Moor currently only applies to a quasi-stationary system, and the estimation of the Hankel matrix will contain a new error term that is again exponential in , as in the case of process noise, precisely for the reason the reviewer pointed out. The theoretical analysis of non-zero initial position is indeed non-trivial, and we leave this as a future direction of this work. We will acknowledge this point (along with points about process noise) in the revised manuscript.
This paper presents a novel method for obtaining a stabilizing controller for a partially observed discrete-time LTI system, and analyzes the sample complexity of the resulting algorithm. The method is based on estimating the parameters of a lifted Hankel matrix for the system using multiple trajectories with iid control inputs. The estimated Hankel matrix allows for estimating the open-loop transfer function for only the unstable part of the system, which is then used to construct a stabilizing controller for the entire system. In the case where the number of unstable eigenvalues is much smaller than the state dimension , this decomposition into unstable and stable parts yields better sample complexity than applying previous results on system identification to this problem. Experiments on randomly generated systems validate that the proposed method learns a stabilizing controller with less data in practice.
优缺点分析
Strengths: This is a strong paper with a good motivation and clear development of the technical ideas underlying the proposed method. It tackles a problem of practical interest that has not been targeted in prior works to the best of my knowledge. It is also very well-organized and relatively easy to follow. I especially appreciated the notation index that not only defines each term but cites the equation / lemma where each appears! Providing a code repository will make it easy for other researchers to use the method in their own applications.
Weaknesses: One important limitation of the paper is that the method assumes is known. A procedure for estimating it is presented in the paper, but this method only works when the lifting parameters , , and are sufficiently large with respect to , which is problematic when is truly unknown.
Minor weaknesses:
- Above equation (24), I believe “ and ” should read “ and ”
- It would be helpful if the captions in figure 1 and figure 2 were more detailed, including phrases like “at different noise levels ” and “with rollouts of length ” (figure 1) or “with rollouts of length (figure 2)
- Line 498: trails -> trials
问题
-
In the case when is unknown, the paper describes a procedure for finding based on the singular values of the estimated Hankel matrix which assumes that the lifting parameters , , and are sufficiently large with respect to . What happens when this is not the case? Is it possible to mistakenly believe that the true has been identified (i.e. find a large spectral gap in the singular values that does not correspond to )? I would be inclined to raise my score if I have misunderstood this limitation.
-
Do the simulations in appendix A use constant , , and matrices while randomizing ?
-
If the simulation results instead plotted the fraction of randomized systems that are successfully stabilized for a given amount of data, can we recover in theorem 5.3?
局限性
Yes
最终评判理由
The authors did a nice job of rebutting my questions/concerns as well as those of the other reviewers. This is a strong paper and I recommend that it should be accepted.
格式问题
No issues.
We thank the reviewer for the careful and thorough review and provide the explanation below:
-
and : The notation and above Equation (24) are defined in line 202 and 203, which are our estimates for if the Hankel were exactly known. As the Hankel itself is estimated in the actual Algorithm, Eq. (24) are estimated versions of .
-
Unknown : What we can prove is that (a) the difference between the lifted Hankel , and is exponentially small (Lemma C.3); (b) is formed by a product of a and a matrix, so its rank is at most ; (c) All non-zero singular values of is exponentially large in . Therefore, when are large, from (b) (c) we know will have exponentially large (in ) singular values and other singular values are zero. Then from (a) we know will have exponentially large singular values, and all other singular values are exponentially small. This is the singularvalue gap our paper was referring to and how can be identified from this gap. For your question where are not large enough, there are two cases. Case I: If are not large enough such that the rank of is smaller than (i.e. or ), then by (a) (c) all singular values of will be exponentially large, and the singular value gap will not be observed. Case II: In the second case, where is not large enough, then the distinction between the ''exponentially large in '' terms and ``exponentially small in '' will not be obvious, so a large gap is not obvious. To conclude: when are not large enough, the singular value gap will not be obvious so one won't mistakenly believe the true would be identified. Actually, the above discussion also sheds light on how to tune to identify the gap.
-
Simulation setup: We thank the reviewer for pointing out the issue with the setup in the simulation. In the simulation, and are both initialized randomly, where each entry is sampled from an independent uniform distribution between 0 and 1. we still assumed as discussed in Section 2. Although we discussed how to tackle the case when in Appendix , We did not have non-zero in the simulation.
-
Regarding : Yes, if we run the simulation a sufficient number of times for fixed , we can recover the value of . In Theorem 5.3 and Theorem C.2, we use to show that longer trajectories would have better chances of finding accurate Hankel estimation.
We hope the above have answered your questions and provided some more context. Please let us know if you have further questions or suggestions.
Thank you for the nice rebuttal, including addressing my main concern (unknown ). Since I already voted to accept and it seems like I'm in line with the other reviewers I'm not inclined to change my score and I don't have any additional comments/questions.
The authors give a provable algorithm for stabilizing an unknown partially observable linear time-invariant system, scaling only with respect to the dimension of the unstable subpsace.
Reviewers agreed that this is a technically sound and novel contribution to an important problem, and after discussion, all were in favor of acceptance. As pointed out by H1gh, the result could potentially be technically improved in some ways (more general setting with initial conditions and process noise); though the paper is still significant in carving out a first family of problems where "learn-to-stabilize" is provably efficient.
I remind the authors to incorporate the suggestions and clarifications into the final paper.