4.5

/10

withdrawn4 位审稿人

最低3最高6标准差1.5

3.5

置信度

正确性2.5

贡献度2.3

表达2.5

ICLR 2025

Koopman Embedded Equivariant Control

Xiaoyuan Cheng,Yiming Yang,Wei Jiang,Xiaohang Tang,Yukun Hu

OpenReview PDF

提交: 2024-09-28更新: 2025-02-01

TL;DR

Learning comprehensively embedding to control with Koopman operator.

摘要

关键词

Koopman operatorsOptimal ControlEquivariant RepresentationNonlinear Dynamical System

评审与讨论

审稿意见

评分: 3置信度: 42024-10-29

The paper proposed a data-driven modeling and control framework that consists of (1) dynamics model based on the Koopman formalism, and (2) reinforcement-learning-based control using the Koopman model. The framework is demonstrated on three examples, with benchmark against several methods in the literature. The claimed novelties include: (1) the introduction of equivariance and consistency requirements in the learning of Koopman dynamics, and (2) simplified RL control policy leveraging the linearity in Koopman dynamics.

优点

In Koopman-based modeling, the introduction of metric consistency, in the form of isometry loss, seems a novel contribution, and the ablation study on isometry loss shows some seemingly favorable effects of this loss. A semi-analytical optimal policy is derived based on the Koopman model, which leverages the relatively simple form of the latter. (This reviewer calls it semi-analytical, as the value function still needs to be learned from data.)

缺点

In Koopman-based modeling, the notion of equivariance, and the corresponding loss, is claimed as a novel contribution. However, this reviewer considers the so-called equivariance requirement as the basic requirement that the community of data-driven modeling of dynamical systems practices on a daily basis. The equivariance loss thus derived is also a standard loss used in Koopman community, see e.g. [1].
The use of Koopman formalism and the derivation of the (bi)linear model, Eq. (6) is not new at all. See the comprehensive work in [2], which also covers the optimal control based on Koopman model. The authors seem unaware of this work. Furthermore, in the derivation of the Koopman dynamics, the authors directly replaced the linear operators P and U by matrices. However, operators may admit point, continuous and residue spectra (which is the case for pendulum and Lorenz-63), but matrices only admit point spectrum. There is no rigorous treatment on when such replacement is possible. In fact, the treatment of continuous spectrum is one of the current bottlenecks in Koopman community (as this reviewer's opinion).
Four "baselines" are chosen, but this reviewer is unsure whether these are fair comparisons. The baselines are all different versions of "novel" learning-based control methods, but the first question to ask is whether the proposed method can out-perform standard methods, such as Koopman model + optimization-based MPC (not MPPI), which has been demonstrated to be effective on hardware in real-time (see [1]). Furthermore, in the third example, the wave equation is linear, so it admits a linear state-space model, which one can identify from data using standard system identification methods; such linear model can be controlled by LQR method. Can the proposed method out-perform such baseline?
Some portions of the manuscript are unclear, and some are even erroneous. See Questions below.
Some portions of the manuscript only provide standard or well-known results, which this reviewer is not sure whether these add any information to the paper. Particularly, these include (1) Algorithm 1 that shows standard procedures for training models and learning value functions, and (2) Appendices B, D, F.1, and G.

[1] Folkestad, Carl, Skylar X. Wei, and Joel W. Burdick. "Koopnet: Joint learning of koopman bilinear models and function dictionaries with application to quadrotor trajectory tracking." 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022. [2] Goswami, Debdipta, and Derek A. Paley. "Bilinearization, reachability, and optimal control of control-affine nonlinear systems: A Koopman spectral approach." IEEE Transactions on Automatic Control 67.6 (2021): 2715-2728.

问题

It appears Fig. 3d and 3e are swapped. Also, in the ablation for isometry loss, please provide the reward for lambda_met=0, so the effect of isometric loss is clearer.
Line 511, it is unclear what "optimal state became unobservable" means. It needs clearer definition and better quantification.
There are plenty typos in Appendix F leading to the doubt of this reviewer that whether the proofs of the "main theorems" have been carefully constructed. In particular, Line 1163, is there an extra \gamma in front of \nabla? Line 1172 missing bracket "("? Lines 1172-1176, unclear where Eq. (34) is used.
Line 1472, What do authors mean by "two strange attractors"?
How is the wave equation solved?
Table 2 shows that noise is added to the wave equation, but not others. Why is so? How sensitive is KEEC to noise in the other cases?
MPPI and PCC use significantly shorter horizons than KEEC. What if the former two use the same longer horizon, or have KEEC using the shorter horizon?
Line 155, what do the authors mean by "didn't comprehensively map the vector field ..."?

评论- Response to Weakness

2024-11-22

We thank the reviewer for their time and effort in reviewing our work. We appreciate the constructive feedback and suggestions, as well as the recognition of the strengths of our work. Below, we provide detailed responses to your comments, weaknesses, and questions:

(1) Clarification on the Novelty of Equivariance in Koopman-Based Modeling

Thanks for your questions. To clarify, our goal is to address what properties the embedding $g$ should satisfy in order to preserve the control effect in the latent space. We formally answer this question with two key properties: equivariance and isometry.

Equivariance can be expressed as $F^{\text{latent}} \circ g = g \circ F$ (which preserves the properties of flow $F$ in embedding space), and it is a more general concept compared to the Koopman operator. We use the Koopman operator because it naturally satisfies equivariant representation. We do not claim that the derivations in equations (6) and (7) are our core contributions. Instead, our use of the Koopman operator is motivated by its equivariant properties, and to the best of our knowledge, no paper has formally stated why Koopman is equivariant (see Appendix D). It naturally enables the equivariant embedding of both flow and vector fields.

(2) Novelty of Koopman Formalism and Optimal Control + Spectrum in Koopman Dynamics

We thank the reviewer for providing the reference, and we will include this paper [2] in the our paper. As our answer in weakness 1, we do not claim the equations (6) and (7) are our novleties. And, in general, we have two major differences from [2]:
- We obtain an analytical control policy from our derived invariant value function, whereas [2] relies on Model Predictive Control (MPC) for optimal control.
- We integrate deep learning model, greatly improving the scalability of solving control problems compared to [2]. Our method even effectively addresses image-based control problems, which are far more challenging than the low-dimensional systems considered in [2].
Thanks for your question on the specturm.

Our method can capture the mixed spectrum in deep learning setting. The generator $\mathcal{P}$ of the Koopman operator is a densely defined, unbounded operator with domain $\mathcal{D}(\mathcal{P}) \subset L^2(M)$ . In our approach, we approximate $\mathcal{P}$ by constructing a compactified version, $\hat{\mathcal{P}}$ , following the compactification procedure described in [1]. Specifically, it is shown in [1] that the operator $\hat{\mathcal{P}} = \Pi \mathcal{P} \Pi$ is a compact operator with a purely atomic spectrum, providing an approximation to the original unbounded generator $\mathcal{P}$ . Here, $\Pi$ is a projection operator that maps $L^2(M)$ to the feature function space spanned by $g$ , which is dense and countable in $L^2(M)$ . The approximated operator $\hat{\mathcal{P}}$ can be expressed as $\hat{\mathcal{P}} = \lim_{t \to 0^+} \frac{\Pi \mathcal{K}_{t} \Pi - I}{t},$ consistent with our learning process as described in Equations (8) and (9) of our work. Moreover, $\hat{\mathcal{P}}$ achieves strong convergence with operator norm to $\mathcal{P}$ as $t \to 0^+$ , implying that the spectral properties of $\hat{\mathcal{P}}$ approximate those of $\mathcal{P}$ . This convergence also ensures that the spectral measures of $\hat{\mathcal{P}}$ approximate those of $\mathcal{P}$ , effectively capturing both the atomic and continuous components of the Koopman spectrum. Consequently, the approximated Koopman evolution operator $\exp(\hat{\mathcal{P}} t)$ achieves strong convergence to $\mathcal{K}_t$ , even when the Koopman operator has a mixed spectrum. This result is supported rigorously by Corollary 4 in [1], highlighting the quality of the approximation.

Hope the answer effectively address the reviewer's concern and we look forward to further comments.

References

[1] Das, Suddhasattwa, Dimitrios Giannakis, and Joanna Slawinska. "Reproducing kernel Hilbert space compactification of unitary evolution groups." Applied and Computational Harmonic Analysis 54 (2021): 75-136.

[2] Goswami, Debdipta, and Derek A. Paley. "Bilinearization, reachability, and optimal control of control-affine nonlinear systems: A Koopman spectral approach." IEEE Transactions on Automatic Control 67.6 (2021): 2715-2728.

评论- Response to Weakness (Continue)

2024-11-22

(3) Clarification on Baseline Selection and Fair Comparisons

Koopman-based models combined with optimization-based MPC can achieve real-time control with sufficient effort, such as code optimization. While our method draws inspiration from Koopman theory, it introduces a key difference: control is performed directly in the latent space, avoiding the need to decode back to the original state space. This approach reduces computational overhead and uses a compact representation for more efficient optimisation.

To provide a more comprehensive evaluation, we also integrated our dynamical learning framework with MPC in the latent space and tested it on the wave equation experiment. The settings were consistent with those in the manuscript, and the results are summarized below:

Method	Episodic reward	Evaluation time (s)
KEEC	-277.6±29.2	5.79±0.24
Koopman MPC	-463.45±55.91.	28.24±0.61

The MPC planning horizon is set to 5. Full implementation details are available in the example notebook provided at https://anonymous.4open.science/r/Koopman-Embed-Equivariant-Control-70D1.

(4) Some portions of the manuscript are unclear, and some are even erroneous. See Questions below.

We thank the reviewer for carefully reading our manuscripts. See our responses and corrections below:

There is an extra $γ$ . We have removed it, see Line 1270.
We have complete the missing bracket, see Line 1280.
We have revised Line 1172-1176, and the Eq. (34) is wrongly refered. See our updates in Line 1277-1285
We redo a proofread to correct existing typos and address any inconsistencies, ensuring clarity and accuracy throughout the document.

(5) Some portions of the manuscript only provide standard or well-known results, which this reviewer is not sure whether these add any information to the paper. Particularly, these include (1) Algorithm 1 that shows standard procedures for training models and learning value functions, and (2) Appendices B, D, F.1, and G.

As a machine learning paper, our algorithms need to be elaborated thoroughly via such form with implementation details. Our method compared to other model learning approaches, is similar since the general goal is to learn dynamics and embeddings.

Appendix B: Provides important definitions used in our derivations
Appendix D: Offers a broad overview of the Koopman operator to provide a comprehensive context, giving readers a better grasp of its relevance within the global picture of our work.
Appendix F.1: Contains the proofs for the main theoretical results presented in the paper, supporting the rigor of our contributions.
Algorithm 1 & Appendix G: While it may appear standard, it is necessary to explicitly document our approach for training models and learning value functions to ensure clarity and reproducibility. Appendix G includes the pseudo-code for optimal control of KEEC. As per ICLR standards, all algorithms must be clearly and explicitly stated for reproducibility and transparency.

评论- About spectrum

2024-11-25

If I understand correctly, Das2021 focuses on ergotic autonomous systems. How is the conclusion there applicable to your case (not necessarily ergotic and/or with inputs)?

Even if the paper's conclusion applies, what projection did you employ for compactification? Das et al. appears to have used RKHS, which is rigorously founded.

评论- Response to Question

2024-11-22

(1) It appears Fig. 3d and 3e are swapped. Also, in the ablation for isometry loss, please provide the reward for lambda_met=0, so the effect of isometric loss is clearer.

Thank you for pointing out the swapped captions; we have corrected this in the revised manuscript. Additionally, we have included $\lambda_{met}=0$ in Table 1, labeled as KEEC (w/o $\mathcal{E}_{met}$ ).

(2) Line 511, it is unclear what "optimal state became unobservable" means. It needs clearer definition and better quantification.

Thanks for your question. The upright position of the pendulum represents an optimal state, which corresponds to a saddle point in the state space. When the state space is embedded into a latent space without preserving the underlying metric, this optimal state cannot be observed, and the characteristic properties of the saddle point may no longer hold. We refined the statement about the 'optimal state not being observed' to clarify this.

(3) typos in Appendix F

Thanks for carefully reading our manuscripts. See our response in weakness (4).

(4) Line 1472, What do authors mean by "two strange attractors"?

Thank you for raising this question. We have corrected this in our updated manuscript to refer to "one of the saddle points."

(5) How is the wave equation solved?

The wave equation is integrated using the 4th-order exponential Runge-Kutta method, with the action term as the source term.

(6) Table 2 shows that noise is added to the wave equation, but not others. Why is so? How sensitive is KEEC to noise in the other cases?

We maintained the default experimental settings for each implementation as specified in OpenAI Gym [1]. In the wave equation control problem, our approach demonstrated strong robustness despite the complexity and sensitivity to noise, as shown in Table 1. This suggests that our method would show similar robustness in the other lower-dimensional cases as well.

References

[1] Brockman, G. "OpenAI Gym." arXiv preprint arXiv:1606.01540 (2016).

(7) MPPI and PCC use significantly shorter horizons than KEEC. What if the former two use the same longer horizon, or have KEEC using the shorter horizon?

We appreciate the reviewer’s comment. In our experiments, we followed the default settings from the original papers for MPPI and PCC to ensure fair comparisons. Adjusting horizon lengths would require re-tuning and could deviate from their intended use. The shorter horizons for KEEC may hinder its performance. The chosen horizon lengths align with those used in extended dynamic mode decomposition (eDMD) [1], and we will include this discussion in the revised manuscript.

References

[1] Kutz, J. Nathan, et al. Dynamic mode decomposition: data-driven modeling of complex systems. Society for Industrial and Applied Mathematics, 2016.

(8) Line 155, what do the authors mean by "didn't comprehensively map the vector field ..."?

Thanks for rasing this questions. We have corrected to comprehensively map the flows and vector field, whereas the "comprehensively" refers to both flows and vector fields of the original dynamics.

We would like to thank the reviewer once again for the valuable time and thoughtful feedback. We look forward to any further comments and will address your further questions if you have.

评论- On the questions

2024-11-25

(2) "optimal" and "(un)observable" have their specific definitions in control theory. In other words, the state is optimal in what sense? And are you talking about observability? If so, what metric do you use to quantify observability?

(4) Perhaps take a look here: https://en.wikipedia.org/wiki/Attractor#Strange_attractor. I am not sure if you are using the right terminology.

(8) The claim of comprehensive embedding is very strong. This also implies embedding the topological structure of the dynamics, e.g., limit cycles and homo/hetero-clinic orbits. In fact, if I understand correctly, embedding such structures in a linear latent space of Koopman is impossible.

审稿意见

评分: 6置信度: 22024-11-01

The paper proposes learning Koopman embedding for the vector field while preserving the consistency of the control effect.

优点

The paper provides strong theoretical results and numerical analysis.

缺点

What is the drawback of traditional Koopman-based analysis? Why do we need to introduce the learning framework?
Please do a proofreading for all notations and equations. For example, in Line 187-188, there is no z_t. Why do you need to define it? What is $\mathcal{U}$ in Eq. (6)?
In Section 2 and 3, you may also mention what the past method did. For example, how did it learn $\mathcal{K}$ ? This helps the reader to understand your contribution.
The defined equivariance/isometry losses are quite similar to some existing work, such as “DeepMDP: Learning Continuous Latent Space Models for Representation Learning” . Please do a comparison.
Please clearly state your assumptions and scopes. For example, the analytical framework is based on the control-affine system in Eq. (1). So, the author must state the application domains.
In Fig. 3 (d) and (2), the x-axis doesn’t match the caption. Try to check all figures.
Is the computation time in Fig. 3 training or testing time? You may need to compare both.

问题

Q1. Please proofread for notations and figures. See my Weakness points 2, 6.

Q2. Give better motivations for readers to know your contributions. See my Weakness points 1, 3, 4.

Q3. Please clearly state your assumptions and scopes. See my Weakness point 5.

Q4. Is the computation time in Fig. 3 training or testing time? You may need to compare both.

评论- Response to Questions

2024-11-22

We would like to thank the reviewer for raising these important questions.

Please see our responses in the corresponding weaknesses.

Thank you for your thoughtful review and valuable questions. We greatly appreciate your time and effort in providing feedback to improve our work. Hope our answers address your questions and concerns effectively. We look forward to your further comments and insights.

评论- Response to Weakness (Continue)

2024-11-22

(4) The defined equivariance/isometry losses are quite similar to some existing work, such as “DeepMDP: Learning Continuous Latent Space Models for Representation Learning”. Please do a comparison.

Thank you for your question. Our work fundamentally differs from [1] in both research scope and embedding methods.

continuous setting V.S. probabilistic setting. Our approach is defined in a continuous space with a differential structure, whereas [1] is based on the Markov Decision Process (MDP).
Embedding methods. Our embedding method first seeks an equivariant representation of the dynamics and then enforces a consistent metric between the original space and the latent space. In contrast, [1] ensures that the latent MDP maintains consistent performance by imposing the Wasserstein-1 metric.
References:

[1] Gelada, Carles, et al. "Deepmdp: Learning continuous latent space models for representation learning." International conference on machine learning. PMLR, 2019.

(5) Please clearly state your assumptions and scopes. For example, the analytical framework is based on the control-affine system in Eq. (1). So, the author must state the application domains.

Thank you for the suggestion. We agree and have explicitly stated our assumptions in Section 2 of the updated manuscript (see Lines 106-107), including that our framework is based on the control-affine system in Eq. (1).

(6) In Fig. 3 (d) and (2), the x-axis doesn’t match the caption. Try to check all figures.

Thank you for carefully reading our manuscript and pointing out the mismatch. We have corrected it (see Figure 3) and reviewed all the figures in the updated manuscript.

(7) Is the computation time in Fig. 3 training or testing time? You may need to compare both.

Thank you for pointing this out. The computation time in Fig. 3 is the testing time. Our focus is on the 'off-line training and online play' scenario, where test-time efficiency is crucial, making training time less critical in this scenario.

评论- Responses to Weakness

2024-11-22

We thank the reviewer for the precious feedback, as well as your recognition of the strengths in our work. Below, we provide detailed responses to your comments, weaknesses, and questions:

(1) What is the drawback of traditional Koopman-based analysis? Why do we need to introduce the learning framework?

Thanks for your question. Rather than solely enhancing the traditional Koopman operator, our primary objective is to elucidate what properties the embedding function $g$ should satisfy to effectively model the underlying dynamical system for optimal control.

In the previous embedding methods for learning dynamics [1, 2, 3, 4, 5], it is insufficient to discuss how to embed with consistent dynamics and control policy. To the best of our knowledge, this work is the first to formally and mathematically investigate the essential properties for learning an optimal deep learning embedding tailored for control applications. Our analysis identifies equivariance and isometry as the two most critical properties for preserving control effects. We utilize the Koopman operator because it naturally serves as an equivariant representation of dynamics (see Appendix D) and is compatible with analytical solutions.

References:

[1] Watter, Manuel, et al. "Embed to control: A locally linear latent dynamics model for control from raw images." Advances in neural information processing systems 28 (2015).

[2] Banijamali, Ershad, et al. "Robust locally-linear controllable embedding." International Conference on Artificial Intelligence and Statistics. PMLR, 2018.

[3] Matsuo, Yutaka, et al. "Deep learning, reinforcement learning, and world models." Neural Networks 152 (2022): 267-275.

[4] Hafner, Danijar, et al. "Learning latent dynamics for planning from pixels." International conference on machine learning. PMLR, 2019.

[5] Levine, Nir, et al. "Prediction, consistency, curvature: Representation learning for locally-linear control." arXiv preprint arXiv:1909.01506 (2019).

(2) Please do a proofreading for all notations and equations. For example, in Lines 187-188, there is no z_t. Why do you need to define it? What is in Eq. (6)?

Thank you for pointing this out. We have carefully reviewed all notations and equations in the updated manuscript to ensure consistency and clarity. Specifically, we have addressed the issue in Lines 187–188, and $z_t=g(s_t)$ is the latent state, the same variable in Eq. (6).

$\mathcal{U}$ is the state-dependent (acutation) operator that maps the latent state $z_t$ to a linear operator acting on the control input $a_t$ . The operator $\mathcal{U}$ represents how the control input $a_t$ influences the time evolution of $z_t$ .

(3) How did the past methods learn $\mathcal{K}$ ?

Thank you for the suggestion. Existing methods [1,2,3] typically learned the Koopman operator by using a parameterized fully connected (FC) layer. Instead, in [4,5,6], using a Dynamic Mode Decomposition (DMD) -based approach that adaptively fits the $\mathcal{K}$ non-parametrically. Notably, in our work, we take a different approach by learning the operator $\mathcal{P}$ , the generator of $\mathcal{K}$ , instead of directly learning the $\mathcal{K}$ . This key difference enables us to derive an analytical control policy, avoiding the need for numerical optimisation as required in methods like MPC, as in [7,8].

References

[1] Lusch, Bethany, J. Nathan Kutz, and Steven L. Brunton. "Deep learning for universal linear embeddings of nonlinear dynamics." Nature Communications 9.1 (2018): 4950.

[2] Yeung, Enoch, Soumya Kundu, and Nathan Hodas. "Learning deep neural network representations for Koopman operators of nonlinear dynamical systems." 2019 American Control Conference (ACC). IEEE, 2019.

[3] Weissenbacher, Matthias, et al. "Koopman q-learning: Offline reinforcement learning via symmetries of dynamics." International conference on machine learning. PMLR, 2022.

[4] J. Morton, A. Jameson, M. J. Kochenderfer, F. Witherden: Deep dynamical modeling and control of unsteady fluid flows, Advances in Neural Information Processing Systems 31, 2018, pp. 9258–9268

[5] J. Morton, F. D. Witherden, M. J. Kochenderfer: Deep variational Koopman models: Inferring Koopman observations for uncertainty-aware dynamics modeling and control, Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019, pp. 3173–3179

[6] Y. Guo, M. Korda, I. G. Kevrekidis, Q. Li: Learning parametric Koopman decompositions for prediction and control. arXiv:2310.01124

[7] Li, Yunzhu, et al. "Learning Compositional Koopman Operators for Model-Based Control." International Conference on Learning Representations.

[8] Goswami, Debdipta, and Derek A. Paley. "Bilinearization, reachability, and optimal control of control-affine nonlinear systems: A Koopman spectral approach." IEEE Transactions on Automatic Control 67.6 (2021): 2715-2728.

2024-11-23

For my first question, I want to ask about the drawbacks of traditional Koopman-based analysis, not learning-based methods. You didn't give a review for this part.

评论- Response to the drawbacks of traditional koopman analysis

2024-11-23

Thanks for your question. If we understand correctly, you are asking the traditional Koopman analysis instead of deep learning methods.

Infinite-Dimensional Nature. The Koopman operator is fundamentally infinite-dimensional. Approximating this operator with finite-dimensional models can introduce inaccuracies and limitations, hindering the ability to fully capture the system's dynamics [1, 2].

Feature Function Selection. Selecting an appropriate feature basis is a challenging task for nonlinear dynamics. Traditional approaches rely on fixed feature functions, such as polynomials and Gaussian kernels. However, inadequate or poorly chosen feature functions may lead to incomplete or misleading representations of the system, thereby diminishing the effectiveness of the analysis [3, 4].

High-Dimensional Systems. Even when employing finite-dimensional approximations like Dynamic Mode Decomposition (DMD) [5, 6], the computational resources required can be substantial, particularly for high-dimensional or complex systems. This restricts the scalability of traditional Koopman methods for larger or more intricate systems, making real-time or large-scale applications difficult.

Thanks again. We have included it in our main text (see Page 5)

[1] Korda, M., & Mezić, I. (2018). Linear predictors for nonlinear dynamics: Extended dynamic mode decomposition. Proceedings of the National Academy of Sciences, 115(11), 2700–2705. DOI:10.1073/pnas.1706943114

[2] Budišić, Marko, Ryan Mohr, and Igor Mezić. "Applied koopmanism." Chaos: An Interdisciplinary Journal of Nonlinear Science 22.4 (2012).

[3] Brunton, Steven L., et al. "Modern Koopman theory for dynamical systems." arXiv preprint arXiv:2102.12086 (2021).

[4] Lusch, Bethany, J. Nathan Kutz, and Steven L. Brunton. "Deep learning for universal linear embeddings of nonlinear dynamics." Nature communications 9.1 (2018): 4950.

[5] Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L., & Kutz, J. N. (2014). On dynamic mode decomposition: Theory and applications. Journal of Applied Mechanics, 81(8).

[6] Korda, M., & Mezić, I. (2018). Linear predictors for nonlinear dynamics: Extended dynamic mode decomposition. Proceedings of the National Academy of Sciences, 115(11), 2700–2705. DOI:10.1073/pnas.1706943114

审稿意见

评分: 3置信度: 42024-11-03

A framework for controlling (via learning value function) nonlinear dynamics is proposed. It is based on embedding the state into a latent space where the dynamics are linear and represented by the Koopman generator. To embed the states a pair of an encoder and a decoder is learned, for which the loss function is based not only on the reconstruction/prediction error (which the authors refer to as the equivariance loss) but also on the regularizer to preserve the metric between the original and the latent spaces. The utility of the proposed method is shown with multiple control problems.

优点

The method looks technically reasonable. Embedding the state into a space where the dynamics can be linearized is indeed useful sometimes and can be explained using the notion of the Koopman operator. The experiment is done with multiple baseline methods, multiple systems, and some ablation studies.

缺点

(1) It is unclear which aspects of the method should be evaluated in terms of novelty. Using the Koopman generator instead of the discrete-time Koopman operator for learning is not certainly the most common setting, but the difference between the continuous- and discrete-time settings here does not seem to bring significant technical difficulty. The "isometry loss" looks somewhat new (though I feel I saw something similar in the same context which I can't remember), but I am not sure if this regularizer solely makes a notable contribution as an ICLR paper. As for the optimal control (or value function learning) part, it is unclear which part should be considered as a particular contribution of the paper.

(2) As mentioned above, learning neural network observables for embedding dynamics state has been widely studied, not only by Li et al. ICLR 2020 (which the authors have cited), but also by many other researchers. Even limiting the scope to the problems with control inputs, I can raise examples as follows:

J. Morton, A. Jameson, M. J. Kochenderfer, F. Witherden: Deep dynamical modeling and control of unsteady fluid flows, Advances in Neural Information Processing Systems 31, 2018, pp. 9258–9268
J. Morton, F. D. Witherden, M. J. Kochenderfer: Deep variational Koopman models: Inferring Koopman observations for uncertainty-aware dynamics modeling and control, Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019, pp. 3173–3179
M. Bonnert, U. Konigorski: Estimating Koopman invariant subspaces of excited systems using artificial neural networks, IFAC-PapersOnLine, vol. 53, no. 2, pp. 1156–1162, 2020
M. Han, J. Euler-Rolle, R. K. Katzschmann: DeSKO: Stability-assured robust control with a deep stochastic Koopman operator, Proceedings of the 10th International Conference on Learning Representations, 2022
Y. Guo, M. Korda, I. G. Kevrekidis, Q. Li: Learning parametric Koopman decompositions for prediction and control. arXiv:2310.01124
D. Uchida, K. Duraisamy: Extracting Koopman operators for prediction and control of non-linear dynamics using two-stage learning and oblique projections. arXiv:2308.13051
M. Wang, X. Lou, B. Cui: Deep bilinear Koopman realization for dynamics modeling and predictive control, International Journal of Machine Learning and Cybernetics, 2024

Making the relation to, not necessarily all, but at least some of the most relevant ones would be beneficial for making the context of the research clearer.

(3) Although the authors claim that the proposed method is different from previous methods in terms of the treatment of the vector field (Lines 90-91), there seems to be no direct empirical comparison from this perspective. E2C may be the most relevant of the examined baselines but is not necessarily a valid reference to investigate the particular advantage of the proposed method. Elaborating more on this point would be helpful.

问题

Point (3) in the Weaknesses section is the most meaningful to me as a question --- how would you justify the advantage of using the Koopman generator (instead of the discrete-time operator)? For example, comparison to a variant of the proposed method constructed with a discrete-time setting would be helpful if any.

评论- Response to Question

2024-11-22

(1) Point (3) in the Weaknesses section. How would you justify the advantage of using the Koopman generator (instead of the discrete-time operator)? For example, a comparison to a variant of the proposed method constructed with a discrete-time setting would be helpful if any.

Please see our responses in Weakness points 3.

Thanks for your thoughtful review and valuable questions. We greatly appreciate your time and effort in providing feedback to improve our work. Hope our answers address your questions and concerns effectively. We look forward to your further comments and insights.

评论- Response to Weakness (Continue)

2024-11-22

(2) As mentioned above, learning neural network observables for embedding dynamics states has been widely studied, not only by Li et al. ICLR 2020 (which the authors have cited), but also by many other researchers.

Thanks for providing these references. The learning-based Koopman operator has been used to solve dynamical systems. Most of them directly learn a next-step prediction (discrete-time) similar to E2C and PCC, as discussed in the second paragraph in the Introduction, such as [1, 2, 3, 4, 5]. However, our method aims to leverage the Koopman operator to preserve the ODE form (continuous-time) of the dynamics rather than the direct next-step prediction. Directly leveraging the ODE form can lead an analytical-from to improve the control performance, which other methods cannot achieve.

References

[1] J. Morton, A. Jameson, M. J. Kochenderfer, F. Witherden: Deep dynamical modeling and control of unsteady fluid flows, Advances in Neural Information Processing Systems 31, 2018, pp. 9258–9268

[2] J. Morton, F. D. Witherden, M. J. Kochenderfer: Deep variational Koopman models: Inferring Koopman observations for uncertainty-aware dynamics modeling and control, Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019, pp. 3173–3179

[3] M. Han, J. Euler-Rolle, R. K. Katzschmann: DeSKO: Stability-assured robust control with a deep stochastic Koopman operator, Proceedings of the 10th International Conference on Learning Representations, 2022

[4] Y. Guo, M. Korda, I. G. Kevrekidis, Q. Li: Learning parametric Koopman decompositions for prediction and control. arXiv:2310.01124

[5] D. Uchida, K. Duraisamy: Extracting Koopman operators for prediction and control of non-linear dynamics using two-stage learning and oblique projections. arXiv:2308.13051

(3) Although the authors claim that the proposed method is different from previous methods in terms of the treatment of the vector field (Lines 90-91), there seems to be no direct empirical comparison from this perspective. E2C may be the most relevant of the examined baselines but is not necessarily a valid reference to investigate the particular advantage of the proposed method. Elaborating more on this point would be helpful.

Empirical comparison with the discrete-time operator is beyond the scope of this paper, as our primary aim is to propose a formal and theoretically robust framework for comprehensive deep learning embedding. Embedding the vector field is crucial because the original dynamics of an affine-control system are represented as an ODE, which makes our framework highly generalizable. Another advantage of our approach is that there is no need to explicitly compute the actuation operator $\mathcal{U}$ in Eq. 16 after embedding the ODE, as it is learned during model training. As noted in our abstract, this result leads to an analytical control policy.

Thanks for your valuable comments. We have elaborated on the discussion in the introduction to better emphasize the advantages of our approach and clarify its distinction from existing methods.

评论- About continuous time

2024-11-25

If I may add a follow-up question to the authors:

The authors say they use continuous-time form of Koopman bilinear form (KBF) to derive the control law in analytical form. However, it appears to me that, in the paper, the KBF is time-discretized before deriving the control law. Hence the question becomes, why not directly learn the discrete-time version of KBF? In fact, even the equivariance loss is written in discrete-time form...

评论- Response to Weakness

2024-11-22

We thank the reviewer for the precious feedback and comments. Below, we provide detailed responses to your comments, weaknesses, and questions:

(1) It is unclear which aspects should be evaluated as novelty. Continuous settings (vector fields) and isometry loss seem trivial. The optimal control part seems unclear.

Thank you for your insightful question.

Our contributions and novelties are as the following points:

We are the first work to study what properties the embedding function $g$ should satisfy.

Our primary objective is to elucidate what properties the embedding function $g$ should satisfy to effectively model the underlying dynamical system for optimal control. To the best of our knowledge, this work is the first to formally and mathematically investigate the essential criteria for learning an optimal deep learning embedding tailored for control applications. We identify two pivotal properties that the embedding $g$ must satisfy: equivariance and isometry (see the description in Section 2.2).
We propose an embedding to satisfy the properties: Koopman-Operator-Based Auto-Encoder, and a value-based method leveraging this embedding.

Guided by the principles of equivariance and isometry, we propose a Koopman-operator-based auto-encoder designed to satisfy these critical properties. This approach is comprehensively summarized in our abstract and elaborated upon in Sections 2.2 to 3.2 of the manuscript. To demonstrate the non-trivial nature of our contributions, we highlight the following key aspects:
- Equivariance (Flow and Vector Fields):
  
  For continuous dynamical systems, the flow describes the system's evolution over time, while the vector field defines the instantaneous rate of change at each point in the state space. These two components are intrinsically linked, as the flow is generated by the vector field. The equivariant embedding approach is $F^{latent} \circ = g \circ F$ , and Koopman naturally satisfies this property. On the other hand, the infinitesimal generator $\mathcal{P}$ of the Koopman operator automatically embeds vector fields into the latent space. By leveraging the exponential map, we estimate the embedding flow map as $\mathcal{K}_t = \exp(\mathcal{P}t)$ , thereby ensuring that the latent representation accurately captures both the flow and vector field dynamics. From a theoretical perspective, a comprehensive embedding must contain the two components.
- Isometry (Control Effect):
  
  Introducing an isometry loss in the learning process is a novel aspect of our approach for optimal control embeddings. In many control systems, control costs are defined using a quadratic form. Without preserving the metric information through isometric embeddings, the integral costs or value functions become distorted, leading to suboptimal control policies. By enforcing isometry, we ensure that the value function remains invariant under the embedding $g$ , thereby preserving the integrity of control costs and enabling effective policy optimization.
- Optimal Control:
  
  Our framework allows for the direct learning of a parametric quadratic latent value function. Utilizing this latent value function, we can derive an analytical solution for the control policy. It should be noted the benefits of the analytical solution are attributed to learned dynamics with its vector fields. Crucially, implementing the policy within the latent space produces effects equivalent to executing it in the original state space, ensuring consistency and reliability in control actions.
Experimental Contributions

Beyond the theoretical advancements, our work makes significant experimental contributions. We demonstrate that our deep learning-based embedding approach outperforms existing methods, particularly in handling image-input problems. By leveraging the properties of equivariance and isometry, our embedding facilitates more accurate and efficient state representations, leading to superior performance in tasks involving high-dimensional sensory inputs.

2024-11-25

Thank you for the rebuttal. I maintain my score for the following reasons:

(1) Novelty

To the best of our knowledge, this work is the first to formally and mathematically investigate the essential criteria for learning an optimal deep learning embedding tailored for control applications. We identify two pivotal properties that the embedding must satisfy: equivariance and isometry (see the description in Section 2.2).

I do not think the equivariance loss can be claimed to be a part of the novelty. It has been used in most NN-based Koopman operator learning studies, for example in the papers I listed in my initial review and in many others. The authors might claim that the novelty lies in the treatment of continuous time, Koopman generators, but the definition in Eq. (10) based on $\exp(\mathcal{P} \Delta t)$ is a straightforward variant of the discrete-time version. In this sense I agree with Reviewer jz5b's comment in this thread.

As commented in my initial review, the isometry loss may comprise some sort of novelty.

I am still not sure what kind of novelty is claimed in the control part. However this is probably because my expertise is slightly off, I have not been extensively following studies involving OC/RL. I would withhold specific judgement here.

Beyond the theoretical advancements, our work makes significant experimental contributions. We demonstrate that our deep learning-based embedding approach outperforms existing methods, particularly in handling image-input problems.

In my understanding, image-input problems have already been addressed with DNN-Koopman-based control methods, for example as early as Morton et al. (2018):

J. Morton, A. Jameson, M. J. Kochenderfer, F. Witherden: Deep dynamical modeling and control of unsteady fluid flows, Advances in Neural Information Processing Systems 31, 2018, pp. 9258–9268.

(2) Continuous time

However, our method aims to leverage the Koopman operator to preserve the ODE form (continuous-time) of the dynamics rather than the direct next-step prediction.

As mentioned above and pointed out by Reviewer jz5b, the training process of the method uses the Koopman generator only through the exponential map $\exp(\mathcal{P}\Delta t)$ , which is almost the same as the next-step prediction; only the difference is whether $\Delta t$ is fixed or variable.

For example, Bevanda+ (2021) (picked up in terms of relevance, i.e., NN-based observable learning) deals with the continuous-time setting, assuming the time derivative of the state is available as data. I do not think such a setting is notably different from the common discrete-time setting (particularly in this context) either. Still, your model-training method is even closer to the discrete-time.

Bevanda et al., Diffeomorphically Learning Stable Koopman Operators, arXiv:2112.04085

Directly leveraging the ODE form can lead an analytical-from to improve the control performance, which other methods cannot achieve.

So my point (3) in the initial review is about this thing. To support this claim ("to improve the control performance"), it seems important to compare the proposed method with a discrete-time variant, which would be a kind of ablation study. That is, to do a variant of the proposed method where only the continuous-time consideration is dropped.

(3) Experiment

Empirical comparison with the discrete-time operator is beyond the scope of this paper, as our primary aim is to propose a formal and theoretically robust framework for comprehensive deep learning embedding. Embedding the vector field is crucial because the original dynamics of an affine-control system are represented as an ODE, which makes our framework highly generalizable.

I see the control method is based on the continuous-time, vector-field-based formulation, but I do not think such a fact makes the comparison to the discrete-time variant of the proposed method out of the scope. Moreover, in my understanding, the examined baselines are based on discrete-time setting.

2024-11-25

Thank you for your insightful question.

We employ a continuous-time Koopman bilinear form to theoretically derive and generalize the control law using the operators $\mathcal{P}$ and $\mathcal{U}$ . For practical implementation in discrete-time environments, we calculate the flow using the exponential form, ensuring consistency with our continuous-time framework. Since truly continuous control laws are not feasible in reinforcement learning settings, we implement the control policy in a discretized manner.

审稿意见

评分: 6置信度: 42024-11-04

This paper proposes a method for solving control problems called Koopman embedded equivariant control (KEEC). The key idea of the paper is that the state of the dynamics in mapped into a latent space via an embedding. In KEEC, the embedding is a learned function, trained to satisfy equivariance and isometry properties. The optimal policy is then learned in latent space using Hamiltonian-Jacobi-Bellman. KEEC is compared to other methods in numerical experiments.

优点

As far as I am aware, the paper is novel in its way of learning a latent embedding and applying Koopman operator theory for control systems.
The paper has detailed descriptions of the theory, with more information available in the appendix.
The high-dimensional control problem is reduced to a minimization problem with an analytic solution in equations (8-9).
The paper considers enforcing contraints to satisfy equivariance and isometry properties.
The method is tested on multiple control systems against multiple methods are shows superior performence.
The experiments are detailed, including how each problem is setup, and comparisons of rewards, computation time and stability.

缺点

There is some confusion regarding the theory, particularly regarding the operator $\mathcal{P}$ . Please see questions.
There is no justification for Lemma 3.3.
While KEEC is faster than MPC and MPPI, it is slower than standard RL methods such as SAC and CQL.
The page limit was exceeded.

问题

In equation (3), what are the values taken by $\tau$ in the sum? Is it at discrete time points?
In figure 3(d-e) the magnitude of $\lambda_{met}$ is between 32 and 256 and the latent dimension is between 0.1 and 1.0. Should this be switched?
Is the infinitesimal generator $\mathcal{P}$ an infinite or finite dimensional operator? The Koopman operator $\mathcal{K}$ is an infinite-dimensional operator and $\mathcal{K} = exp(\mathcal{P})$ , so the generator should be infinite-dimensional. However, in equations (6-8), $\mathcal{P}$ seems to be a finite-dimensional matrix.
The KEEC model architecture is given in Table 3. What architectures are used for the comparison methods (SAC, CQL, etc.)? It would be good to compare the number of parameters needed for each.

评论- Response to Question

2024-11-22

(1) In equation (3), what are the values taken by $\tau$ in the sum? Is it at discrete time points?

Thank you for your question. In Equation (3), $\tau$ represents discrete time steps starting from the current step $t$ , summing future rewards, as is the standard equation for value functions in reinforcement learning.

(2) In Figure 3(d-e), the magnitude of $λ_{met}$ is between 32 and 256, and the latent dimension is between 0.1 and 1.0. Should this be switched?

Thank you for pointing out the swapped captions; we have corrected this in the revised manuscript, see Figure 3.

(3) Question about operators $\mathcal{K}$ and $\mathcal{P}$

See our responses in weakness 1.

(4) The KEEC model architecture is given in Table 3. What architectures are used for the comparison methods (SAC, CQL, etc.)? It would be good to compare the number of parameters needed for each.

We appreciate the reviewer’s suggestion. The architectures for SAC, CQL, and other methods follow their official implementations with default parameter counts, as detailed in Appendix H.2 (lines 1690–1698). Table 3 outlines the architecture and parameters of KEEC. We will include a parameter comparison in the updated manuscript for greater clarity.

Thank you for your thoughtful review and valuable questions. We greatly appreciate your time and effort in providing feedback to improve our work. We hope our answers address your questions and concerns effectively. We look forward to your further comments and insights.

2024-12-03

Thank you for your responses and clarifications. My apologies for saying the page limit was exceeded; I think I misremember the limit. After reading your responses and other reviews and responses, I still think the paper is marginally above the acceptance threshold, so I maintain my score.

2024-12-03

Thanks for your response. If your concerns have been addressed to a certain extent, could you please consider rescore your confidence?

2024-12-03

Your responses did clarify confusion I had. I have increased my confidence score to 4.

评论- Response to Weakness

2024-11-22

We thank the reviewer for the valuable feedback and your recognition of the strengths of our work. Below, we provide detailed responses to your comments, weaknesses, and questions:

(1) There is some confusion regarding the theory, particularly regarding the operator $\mathcal{P}$ . Is the infinitesimal generator $\mathcal{P}$ an infinite or finite dimensional operator? The Koopman operator $\mathcal{K}$ is an infinite-dimensional operator and $\mathcal{K}= \exp(\mathcal{P})$ , so the generator should be infinite-dimensional. However, in equations (6-8), $\mathcal{P}$ seems to be a finite-dimensional matrix.

(Along with the Question 3: Is the infinitesimal generator $\mathcal{P}$ an infinite or finite dimensional operator? The Koopman operator $\mathcal{K}$ is an infinite-dimensional operator and $\mathcal{K}=\exp{(\mathcal{P})}$ , so the generator should be infinite-dimensional. However, in equations (6-8), $\mathcal{P}$ seems to be a finite-dimensional matrix.)

We thank the reviewers for raising these critical questions regarding the operator $\mathcal{P}$ and $\mathcal{K}$ :

In equations (6) and (7), $\mathcal{P}$ and $\mathcal{U}$ are still infinite-dimensional. We corrected the identity matrix $I$ in Line 238 to identify the operator. The infinitesimal generator $\mathcal{P}$ is inherently an infinite-dimensional operator. As detailed in Appendix D, both the Koopman operator $\mathcal{K}$ and the generator $\mathcal{P}$ operate within an infinite-dimensional function space.
In practical applications, it is infeasible to represent true infinite-dimensional operators. Consequently, we approximate $\mathcal{P}$ and $\mathcal{K}$ with finite-dimensional operators, denoted as $\hat{\mathcal{P}}$ and $\hat{\mathcal{K}}$ , respectively, as seen in the loss function in Equation (8). This finite-dimensional approximation is a common and effective strategy for handling infinite-dimensional operators [1]. By choosing a sufficiently large dimension for the approximated operator $\hat{\mathcal{P}}$ , we ensure that the resulting Koopman operator $\hat{\mathcal{K}}$ achieves good convergence properties and that the approximation error remains controlled.
References:

[1] Schmüdgen, Konrad. Unbounded self-adjoint operators on Hilbert space. Vol. 265. Springer Science & Business Media, 2012.

(2) There is no justification for Lemma 3.3.

Thanks for your question. Lemma 3.3 is derived based on the findings from references [1] and [2], which we have cited in our paper appropriately. Specifically:

Reference [1]: establishes the foundation when optimal control problems associated with the reward are equivalent;
Reference [2]: supports this by demonstrating that the preservation of the metric ensures that the integral of rewards along trajectories in both the original and latent spaces are identical.

The derivation is based on the two references [1,2]

Invariant Value Function: Since the rewards are the same in both spaces, the value function remains invariant under the embedding function $g$ ;
Consistent Policy Execution: Consequently, executing the policy in the latent space yields the same control effect as executing it in the original space.

This invariance ensures that the optimal control policy derived in the latent space is equally effective when applied to the original space, thereby validating the consistency and reliability of our approach.

References:

[1] Jean, Frédéric, Sofya Maslovskaya, and Igor Zelenko. "On the projective and affine equivalence of sub-Riemannian metrics." Geometriae Dedicata 203.1 (2019): 279-319.

[2] Maslovskaya, Sofya. Inverse Optimal Control: theoretical study. Diss. Université Paris Saclay (COmUE), 2018.

(3) The page limit was exceeded.

The page limit is ten this year, and our submission is within this limit. See ICLR call for paper: https://iclr.cc/Conferences/2025/CallForPapers.

(4) While KEEC is faster than MPC and MPPI, it is slower than standard RL methods such as SAC and CQL.

The speed of KEEC control can be easily addressed with engineering tricks. KEEC is also a model-based method in practice, which learns the value function and derives the greedy policy analytically. The cause of the slower control is Auto-Differentiation (auto-diff) $\nabla_{z} V_g$ in Eq. 16. However, according to the previous work [1] Eq. 8 and 9, the auto-diff can be avoided by learning this particular derivative directly.

References

[1] Levine, Nir, et al. "Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control." International Conference on Learning Representations (2020).

撤稿通知

2025-02-01

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.

Koopman Embedded Equivariant Control

摘要

评审与讨论

优点

缺点

问题

(1) Clarification on the Novelty of Equivariance in Koopman-Based Modeling

(2) Novelty of Koopman Formalism and Optimal Control + Spectrum in Koopman Dynamics

(3) Clarification on Baseline Selection and Fair Comparisons

(4) Some portions of the manuscript are unclear, and some are even erroneous. See Questions below.

(1) It appears Fig. 3d and 3e are swapped. Also, in the ablation for isometry loss, please provide the reward for lambda_met=0, so the effect of isometric loss is clearer.

(2) Line 511, it is unclear what "optimal state became unobservable" means. It needs clearer definition and better quantification.

(3) typos in Appendix F

(4) Line 1472, What do authors mean by "two strange attractors"?

(5) How is the wave equation solved?

(6) Table 2 shows that noise is added to the wave equation, but not others. Why is so? How sensitive is KEEC to noise in the other cases?

(7) MPPI and PCC use significantly shorter horizons than KEEC. What if the former two use the same longer horizon, or have KEEC using the shorter horizon?

(8) Line 155, what do the authors mean by "didn't comprehensively map the vector field ..."?

优点

缺点

问题

(4) The defined equivariance/isometry losses are quite similar to some existing work, such as “DeepMDP: Learning Continuous Latent Space Models for Representation Learning”. Please do a comparison.

(5) Please clearly state your assumptions and scopes. For example, the analytical framework is based on the control-affine system in Eq. (1). So, the author must state the application domains.

(6) In Fig. 3 (d) and (2), the x-axis doesn’t match the caption. Try to check all figures.

(7) Is the computation time in Fig. 3 training or testing time? You may need to compare both.

(1) What is the drawback of traditional Koopman-based analysis? Why do we need to introduce the learning framework?

(2) Please do a proofreading for all notations and equations. For example, in Lines 187-188, there is no z_t. Why do you need to define it? What is in Eq. (6)?

(3) How did the past methods learn K\mathcal{K}K?

优点

缺点

问题

(1) Point (3) in the Weaknesses section. How would you justify the advantage of using the Koopman generator (instead of the discrete-time operator)? For example, a comparison to a variant of the proposed method constructed with a discrete-time setting would be helpful if any.

(2) As mentioned above, learning neural network observables for embedding dynamics states has been widely studied, not only by Li et al. ICLR 2020 (which the authors have cited), but also by many other researchers.

(1) It is unclear which aspects should be evaluated as novelty. Continuous settings (vector fields) and isometry loss seem trivial. The optimal control part seems unclear.

(1) Novelty

(2) Continuous time

(3) Experiment

优点

缺点

问题

(1) In equation (3), what are the values taken by τ\tauτ in the sum? Is it at discrete time points?

(2) In Figure 3(d-e), the magnitude of λmetλ_{met}λmet​ is between 32 and 256, and the latent dimension is between 0.1 and 1.0. Should this be switched?

(3) Question about operators K\mathcal{K}K and P\mathcal{P}P

(4) The KEEC model architecture is given in Table 3. What architectures are used for the comparison methods (SAC, CQL, etc.)? It would be good to compare the number of parameters needed for each.

(2) There is no justification for Lemma 3.3.

(3) The page limit was exceeded.

(4) While KEEC is faster than MPC and MPPI, it is slower than standard RL methods such as SAC and CQL.

(3) How did the past methods learn $\mathcal{K}$ ?

(1) In equation (3), what are the values taken by $\tau$ in the sum? Is it at discrete time points?

(2) In Figure 3(d-e), the magnitude of $λ_{met}$ is between 32 and 256, and the latent dimension is between 0.1 and 1.0. Should this be switched?

(3) Question about operators $\mathcal{K}$ and $\mathcal{P}$