7.0

/10

Poster4 位审稿人

最低6最高8标准差1.0

2.8

置信度

正确性3.3

贡献度2.8

表达2.5

ICLR 2025

Efficiently Parameterized Neural Metriplectic Systems

Anthony Gruber,Kookjin Lee,Haksoo Lim,Noseong Park,Nathaniel Trask

OpenReview PDF

提交: 2024-09-27更新: 2025-02-11

TL;DR

General metriplectic systems are parameterized in a way which is universally approximating, bounded in error, and scales quadratically with the size of the system.

摘要

关键词

metriplectic systemsstructure preservationenergy conservationentropy stabilityneural ODEs

评审与讨论

审稿意见

评分: 6置信度: 32024-10-27

This paper introduces a novel framework for learning metriplectic dynamics, which describe systems that conserve energy while generating entropy. The key contribution of the proposed approach lies in its design, which ensures both energy conservation and entropy stability. Mathematically, the paper presents a parameterization method that employs neural networks to model four functions, $L$ , $E$ , $M$ , and $S$ , such that $L \nabla S = M \nabla E = 0$ . Here, $L$ is an antisymmetric matrix-valued function, $M$ is a symmetric matrix-valued function, and $E$ and $S$ are scalar functions.

Under the assumption that $\nabla E,\nabla S\neq 0$ , the proposed parameterization of metriplectic systems requires less learnable scalar functions than existing methods. Numerical results show that the proposed approach exhibits superior accuracy that existing methods for learning metriplectic dynamics from data.

优点

The emphasis on energy conservation and entropy stability is crucial for modeling realistic physical systems. The authors demonstrate that their approach maintains these properties, which is essential for ensuring that the learned models are physically meaningful and applicable to real-world scenarios
The proposed parameterization of metriplectic systems requires less learnable scalar functions than existing methods.
The paper is well-structured and clearly written, making complex concepts accessible to a broader audience. The examples and comparisons with previous methods help to illustrate the advantages of the proposed approach effectively.

缺点

Although the paper presents empirical results demonstrating the model's performance, the range of examples may be limited. Specifically, in both examples, the entropy is given by $S = s_1 + s_2$ , resulting in a constant gradient $\nabla S = 0$ . This contradicts the core assumption in the paper that $\nabla S \neq 0$ .
The requirement that the metriplectic system being approximated is nondegenerate—i.e., the gradients of energy and entropy must not vanish ( $\nabla E, \nabla S \neq 0$ )—may limit the applicability of the method. Although this is claimed to be a mild condition, both of the examples considered in the paper contradict this core assumption. In addition to the entropy $S$ having a zero gradient ( $\nabla S = 0$ ), the energy in the investigated examples is also degenerate. From the reviewers' understanding, steady-state is a fundamental concept in physical systems, characterized by the energy reaching an extremum, meaning the energy gradient is zero ( $\nabla E = 0$ ). The reviewers are not aware of any physical systems with non-zero energy gradients. To better demonstrate the applicability of the proposed method, the reviewers recommend that the paper provide a detailed discussion of systems that satisfy the key non-degeneracy assumption. Additionally, it would be beneficial to include examples that adhere to this assumption to validate the effectiveness of the algorithm.

问题

While it is evident that the proposed parameterization requires fewer learnable scalar functions, this alone does not directly imply superior performance. Could the paper provide an explanation of why the proposed approach achieves higher accuracy?
The paper claims that the core advantage of the proposed method is its efficiency. Could the paper provide evidence demonstrating that the method requires less computational time or other resources, such as memory?
Could the paper provide that specific formula of metric use to compare the performance?
As the true governing functions (the right hand side of the ODE) are known, could the paper show the results of learning these governing functions?

评论- Response to Reviewer QHfC

2024-11-22

Thank you for your review. We are glad to hear that you found our paper well-structured and clearly written. Below you will find responses to your "Weaknesses" and "Questions" sections.

Response to weaknesses:

We believe there has been a significant misunderstanding which can be easily clarified. The gradients $\nabla E,\nabla S$ do not vanish anywhere in any of our examples, and indeed they should not. To guarantee that energy conservation and entropy generation hold along trajectories, it is only necessary that $\dot{x}\cdot\nabla E=0$ and $\dot{x}\cdot\nabla S\geq 0$ , which is proven to occur for NMS in the paper. It is certainly not the case that $\nabla E=0$ is necessary for energy conservation, since this would imply that the energy function is globally constant over the state space, which is un-physical. Moreover, $S_1,S_2$ are state variables in the TGC example, meaning $S=S_1+S_2$ implies that $\nabla S = (0,0,1,1)^\intercal$ and the entropy is not constant (note that $x=(q,p,S_1,S_2)^\intercal$ ). We reiterate that the nondegeneracy assumption of $\nabla E,\nabla S \neq 0$ is quite mild and the minimum necessary for a metriplectic system to include both conservative and dissipative effects.

Response to questions:

It is true that a smaller learning problem does not directly imply superior performance. On the other hand, the optimization problems underpinning metriplectic methods such as GNODE, GFINN, and NMS are highly non-convex with formidable loss landscapes and (in the case of the former two methods) built-in redundancy. This creates a more challenging learning problem for GNODE and GFINN as their optimization takes place in a larger parameter space than is necessary. For instance, there is nothing stopping gradient descent on these architectures from moving their weights along fibers in the direction of this redundancy, thereby making no progress and hindering learning. Conversely, NMS eliminates (most of) this redundancy, leading to an easier learning problem which is empirically shown to produce solutions which are more accurate and generalize better than previous methods.
Please see Figure 4 in Appendix E for a computational study in this direction. Additionally, please see the new Appendix F which demonstrates the performance of NMS on a larger system with 201 degrees of freedom.
The metrics used are mean squared error (MSE) and mean absolute error (MAE) evaluated over the test dataset. As requested, we have included formulae for these in the revised Appendix.
While this is an interesting question, we believe it is out of scope for the present work. Note that there is inherent freedom in the representations for $E,S$ , since only their gradients enter the metriplectic equations of motion. Therefore, there is no reason to expect that the energy and entropy learned by NMS directly approximate the known counterparts unless additional supervision is provided. This is no issue in practice, as it is guaranteed by Theorem 3.9 that the error will not exceed a threshold which is computable from the trained model.

2024-11-22

Thank you for your reply. I realize now that I misunderstood the set of state variables mentioned in the paper. However, I feel the response regarding energy still does not address the issue of energy having steady states, where $\nabla E=0$ . It might also be helpful to include a calculation to confirm whether the examples presented satisfy this assumption (I verified that the energy in the second example does).

It is not entirely clear to me why optimization in a larger parameter space than necessary would yield worse results. In addition, from the reply, it seems the claim is that the proposed NMS has fewer parameters—am I understanding this correctly? It is evident that the proposed parameterization requires fewer learnable scalar functions, which I think does imply that the NMS involves fewer parameters. According to the hyperparameter section in pages 17 and 18, the number of parameters across the different models appears to be similar.

Additionally, regarding Figure 4, it is described as showing "the ground-truth and predicted trajectories..." but there seems to be no results presented about computational time or resource usage. Is this a typo?

Furthermore, it is apparent that the learned energy and entropy do not need to approximate their counterparts well. However, if the goal is to learn the correct differential equation, the learned governing functions (i.e., the right-hand side of the ODE) should approximate the true ones closely.

评论- Response #2 to Reviewer QHfC

2024-11-23

Thanks again for your continued interest. We are happy to provide further clarification.

We reiterate that this nondegeneracy is a fundamental property of metriplectic physics, as the total energy $E$ is always in flux between its "kinetic" and "potential" components. We have calculated the gradients of energy and entropy for all examples and left them in the Appendix, where you can verify that they are never zero.

To clarify, there are two uses of the word "parameter" here which should be distinguished (as we have attempted to do in the paper). We have proven that NMS scales optimally in the number of learnable scalar functions required to express general metriplectic dynamics, which is a notable improvement over previous methods, e.g., GFINNs. Conversely, each scalar function can have as many "learnable parameters" (in the machine learning sense) as desired, and therefore one can build NMS architectures with a larger parameter count than GFINNs. It is the first notion of parameter (i.e., the number of learnable functions) which causes NMS to perform better, since this is where we have ensured that all such functions contribute nontrivially to the learned metriplectic operators. This cannot be said for GFINNs, which has inherent redundancy in its parameterization: many configurations of learnable functions lead to the same metriplectic system.

Yes, we apologize. Please see Appendix G and Figure 7, as well as the new Appendix F.

This is certainly true. We believe this approximation is evident from the numerical results, which show correct long-time generalization behavior of NMS. This could not occur if the RHS is not approximated well, as the error in the solution is directly bounded by the error in the RHS through the fundamental theorem of calculus in concert with the Cauchy-Schwarz inequality. In fact, the solution error (which is seen to be small) is precisely the integral of the error in the RHS. However, at the reviewer's request, we have conducted an additional experiment to this effect and included it in Appendix C.

评论- Reminder to Reviewer QHfC

2024-11-27

Dear reviewer QHfC,

Thank you again for your feedback. Based on your comments, we have included analytic expressions for the metriplectic data $L,M,\nabla E, \nabla S$ in Appendix B (along with formulas for the MSE/MAE metrics), as well as a new experiment in Appendix C showing the error in the approximation of the metriplectic vector field by our NMS method. Besides the scaling study in Appendix G, we have also added a new experiment in Appendix F demonstrating the performance of NMS on a larger-scale example problem.

As the submission deadline for the revised paper is November 27, we kindly remind you to review our responses. We have made every effort to address your concerns thoroughly. We would greatly appreciate it if you could confirm whether our revisions have sufficiently resolved your questions and consider revisiting your score, or let us know if there are any further issues which we can help with.

Best regards,
authors

2024-11-27

Thanks for your reply and the additional results. As you mentioned, the required nondegeneracy is a fundamental property of metriplectic physics. However, a simple counterexample is the most basic Hamiltonian system with the Hamiltonian $H = p^2 + q^2$ . The energy $H$ is also in flux between its "kinetic" and "potential" components. However, nondegeneracy is defined on the phase space rather than on a single trajectory, correct?

In addition, the response does not rigorously explain why redundancy in its parameterization might lead to worse results. While having fewer learnable scalar functions is a significant advantage, it is theoretically unclear whether the performance improvement is due to this advantage. I think this may not be fully explained by current theoretical work in machine learning.

Since other concerns were well addressed, I would like to increase the rating of the paper.

评论- Thanks and Response #3 to Reviewer QHfC

2024-11-28

Thanks for your responses and for raising the score! We are happy to continue this discussion.

First, you are absolutely right that “kinetic” and “potential” are not the correct terms to use here, as seen by your counterexample of the SHO. Metriplectic systems indeed conserve a generalized total energy $E$ and generate a generalized entropy $S$ , so that the fundamental nondegeneracy of $E$ can be seen trajectory-wise as energy being exchanged between its “free” and “thermal” components. This is visible in the paper, e.g., Figure 2, where you can see that the position and momentum equilibrate while the energy $E$ remains constant; all the free energy is lost as generalized heat which is captured through the generation of the entropy $S$ . This is in vivid contrast to the Hamiltonian bias compared in Appendix E, where you can see that ignoring the thermal component leads to a completely incorrect solution.

You are also correct that the nondegeneracy of $E,S$ is a property of the functions themselves (and hence the entire phase space of states) and not any particular trajectory. In fact, the gradients of these functions along any metriplectic solution trajectory are always prescribed to satisfy the first and second laws of thermodynamics.

We are sorry that we have not been able to sufficiently address your concern about the issue of redundancy in parameterization. A rigorous analogy which may aid in understanding what is happening is that of machine learning in the group invariant setting. When attempting to learn a function $f:X\to\mathbb{R}$ which is invariant to the action of some group $G$ of transformations, it is advantageous to first “mod out” by this group of transformations, meaning to consider the reduced representation $\hat{f}([x])$ acting on equivalence classes $[x] =$ { $g\cdot x | g\in G$ }. The representation $\hat{f}$ captures all the variability in the function $f$ , but distinguishes automatically between two points $x,\tilde{x}\in[x]$ in the same equivalence class which would map to same value in the range. Therefore, when learning group invariant functions, there is no need to consider the larger space $X$ , but only the smaller one $X/G$ which is the domain of $\hat{f}$ . This has been shown to make learning easier (e.g., [4]), as there is less for the learner to distinguish on its own.

In the case of our NMS parameterization, the situation is remarkably similar. There are “equivalence classes” $[L],[M]$ of valid metriplectic data for a given set of physics, and GNODE, GFINN, etc. do not distinguish between them. Conversely, NMS does, and this leads to better results in learning.

Please let us know if this alleviates your concern, or if there is anything further we can do to help!

Best,
authors

[4] Villad, S. et al. “Scalars are universal: equivariant machine learning, structured like classical physics”, NeurIPS 2021.

2024-11-30

Thank you for the explanation. Taking into account the authors' responses and the feedback from other reviewers, I have decided to maintain my positive score.

审稿意见

评分: 8置信度: 42024-11-02

The paper presents a new parameterization scheme for learning metriplectic systems. The parameterization is constructed based on computations using properties of the exterior algebra. The constraints can be seen as hard constraints instead of soft constraints. Proof of the construction and proof of universal approximation are provided as well as proof of growth rate of error with respect to time. An algorithm has been proposed, and numerical experiments are carried out to justify the effectiveness of the proposed scheme.

优点

The subject being investigated is of significant importance in AI for Science. The topic is closely related to the so-called structure-preserving machine learning. The paper is in general well written.
The idea of the proposed parameterization is novel. The construction of the parameterizations that satisfy the hard constraints is obtained through exterior algebra computations. The construction can be seen as obtained by orthogonal projection, which is quite natural. In Theorem 3.4, the statement is "if and only if", which is a nice result for machine learning purposes. There are universality analysis and growth rate of error analysis, which are theoretically convincing.
The numerical experiments are carried out quite extensively, together with comparisons with other machine learning schemes in the literature.

缺点

The main weakness of the paper is that, some of the remarks and claims are not clearly explained. The reviewer will state the details in the "Question" session.

问题

How original is the approach to using exterior algebra to parameterize hard-constrained structures? The reviewer believes some papers that deal with structure preservation using exterior algebra exist in the literature. If the authors also know such papers, the authors could have remarked on that.
In Remark 3.5, it is claimed that "the proposed parameterizations for L, M are not one-to-one but properly contain the set of valid nondegenerate metriplectic systems". Are the authors trying to say that the proposed parameterizations are not injective but subjective onto the set of possible nondegenerate metriplectic systems? I think so from the proof. The author could clarify what they mean by "properly contain".
Again in Remark 3.5, the authors said that the Jacobi identity is not enforced in the algorithm, which causes the parameterization to be not one-to-one. It is not clear to the reviewer how are these related. From the reviewer's viewpoint, the parameterization is not one-to-one because the construction is via orthogonal projection, then of course there will be more than one parameterizations that give the same metriplectic system. However, it is unclear how this is linked to the Jacobi identity not being enforced. The author should clarify this point. Besides, in the last sentence of Remark 3.5, it is said that the structure and energy conservation cannot be simultaneously preserved, which, as far as the reviewer knows, applies in the case of symplectic integrators, but here the context is not symplectic integrators. The authors could clarify this point.
In page 6, it is claimed that "the exterior algebraic expressions in Lemma 3.2 require less redundant operations than the corresponding metricized expressions from Theorem 3.4, and therefore the expressions from Lemma 3.2 are used when implementing NMS". The author should clarify how the construction in Lemma 3.2 requires less redundant operations.
On top of page 8 (lines 381-382), the first strategy to deal with unobserved states is to assume a line between the all 0 vector to the all 1 vector. Can this be justified?
In Algorithm 1, the input, xs = x(ts, μs), but what is μs? It seems that it is not introduced.
There is a typo "Lemmata" in line 641.
The algorithm in the end still needs structure-preserving integration, which means for each system under consideration, a structure-preserving integrator needs to be applied. In the numerical experiment, how are the integrators chosen?

评论- Response to Reviewer ufvB

2024-11-22

Thank you for your review. We are glad to hear that you found our work theoretically convincing and generally well written. Below are responses to your questions.

Response to questions:

This approach is novel in the field of metriplectic learning. However, the reviewer is correct that exterior algebra has been used in other numerical methods in order to enforce hard structural constraints. The work Gruber et al. (2023a) mentioned in the paper has used similar ideas to construct metriplectic reduced-order models, and the mentioned textbook Dorst et al. (2007) contains numerous examples of using the so-called geometric algebra (which extends the exterior algebra) in computer science applications. In a related but different direction, there is a long line of work applying ideas from exterior calculus to numerical methods, including the finite element exterior calculus (e.g., [2]) and the discrete exterior calculus (e.g., [3]). We have added a bit more discussion of this in the revised manuscript.

1+2) Yes, the reviewer has understood correctly. By "properly contain", we mean in the set theory sense: the set of matrix fields $L,M$ produced by NMS is strictly larger than the set of $L,M$ which define valid nondegenerate metriplectic systems. This discrepancy is precisely due to the failure to enforce the Jacobi identity for $L$ in NMS (which technically violates the metriplectic formalism), as well as the reviewer's observation that these fields are defined in terms of orthogonal projection. Since this was a point of confusion, we have clarified the language in the revised manuscript.

The reason for this is visible from the proof of Lemma 3.2, where it is shown that the exterior algebraic expressions derived are equivalent to the claimed matrix expressions after "adding zero". We have clarified this in the revision.
The reviewer is correct that this choice is not fully general. In the absence of other information, we have chosen to assume a normalized linear increase in entropy (or entropy density) over time, as consistent with the governing thermodynamics.
We thank the reviewer for catching this typo, which has since been corrected.
The use of "Lemmata" here is intentional, as this is the plural of "lemma".
The reviewer brings up a good point. As the construction of general metriplectic time integrators is an area of active development, we have focused mostly on standard, structure-agnostic integrators for the purposes of this work. In particular, the mentioned experiments are carried out using either the implicit midpoint method or the fourth-order explicit Runge-Kutta method, both of which have demonstrated reasonable performance in practice. More sophisticated integration methods based on splitting the conservative and dissipative dynamics are potentially possible, although this becomes quite delicate in the present case due to the full coupling of state variables in $L$ and $M$ . Due to this, the issue of time integration is left for future work.

[2] Arnold, Douglas et al. Finite element exterior calculus: from Hodge theory to numerical stability. Bulletin of the AMS. 2010

[3] Hirani, Anil Nirmal. Discrete exterior calculus. California Institute of Technology, 2003.

2024-11-24

Thanks for your response.

Still, for questions 1+2, the reviewer wonders what is meant by "enforcing the Jacobi identity violates the metriplectic formalism", and how that leads to the nonuniqueness. The reviewer believes that the preservation of the structure is supposed to be reflected both on the differential equation and the algebraic structure, unless, as the authors claimed, in case they could not be simultaneously achieved. As far as the reviewer knows, in the design of structure-preserving integrator, there is a technical issue as to why energy preservation and algebraic structure preservation could not be simultaneously achieved, however, the authors should clarify what is the technical issue here in the setting of this paper.

The reviewer agrees with the rest of the response.

Thanks again

评论- Response #2 to Reviewer ufvB

2024-11-24

Thanks for your continued interest. We are happy to provide further clarification.

What we mean is the following: in the definition of a metriplectic system, $L$ must generate a Poisson bracket, and therefore $L$ should satisfy the Jacobi identity given in Remark 3.5. However, the parameterizations leading to our NMS method do not directly enforce this condition, which contributes to the model class optimized by NMS being larger than the space of "admissible" metriplectic systems under this definition. As the reviewer has noticed, this issue is somewhat distinct from the nonuniqueness caused by the use of orthogonal projection. We will clarify this further in the paper to avoid confusion.

The technical issue preventing both energy conservation and the Jacobi identity in the present case is similar to the "usual" case of symplectic integrators outlined in the cited paper Zhong and Marsden (1988). Precisely, the chosen NMS parameterizations are constructed to exactly conserve energy pointwise in time, and also after the application of certain special integrators (say, conservative-dissipative splitting). Therefore, the $L$ parameterized by NMS cannot be symplectic, and hence cannot satisfy Jacobi, or else the discrete integration performed would correspond to the exact evolution of the dynamical system. Since hard enforcement of energy conservation is just a particular choice that has been made for this work, it could also be interesting to consider future learning methods which place symplecticity of the conservative dynamics at the forefront.

2024-11-24

Thanks for your prompt review. I understand it now. As you correctly pointed out, please clearly outline that there are two types of "nonuniqueness." The first one is by construction, because of the orthogonal projection. The second one is by not exactly enforcing the algebraic structure. These two types of "nonuniqueness" do not really talk to each other. At first, the reviewer thought there was a certain relationship here that the author was trying to make clear.

All that being said, the reviewer thinks there is a relative lack of literature that deals with preserving the algebraic structure more intrinsically, and that should be an important and interesting topic to explore as a future work.

评论- Response #3 to Reviewer ufvB

2024-11-25

We completely agree. Please the new Remark 3.5 and let us know if it does not address your concerns. Thanks again for your interest in our work!

审稿意见

评分: 6置信度: 22024-11-04

In this work, the authors present a neural network model (NMS) for learning the dynamics of metriplectic systems from trajectory data. It is based on a parameterization of certain scalar- and matrix-valued fields using neural networks that differs from prior works that enforce hard constraints on the degeneracy condition of metriplectic systems and results in a model size that scales quadratically in the system dimenion. The proposed method is demonstrated empirically to perform well in the learning of two low-dimensional metriplectic systems compared to prior methods.

优点

The proposed neural network model for parameterizing the metriplectic system is novel to my knowledge, and it is well-motivated by the theoretical results on metriplectic operators and approximation errors presented in Section 3.2 and 3.4. Compared to several methods from prior literature, the proposed NMS method holds an advantage in terms of model size as well as empirical performances. The writing of the paper is clear overall.

缺点

Physics background and relation to Hamiltonian systems: Having relatively little prior knowledge about metriplectic systems, I would appreciate an expanded introduction of its physical motivation and some concrete examples for illustrating the general governing equation on the top of Page 2. For example, what do L and M look like in the two examples used for the numerical systems, or in general Hamiltonian systems -- will L be the "J" matrix in Hamiltonian systems (padded with zeros for the extra dimensions) and lose its independence on the state x?

On the decoupled block-wise structure: The authors mentioned prior works such as Ruiz et al. (2021) and Xu et al. (2022 & 2023) which proposed to parameterize metriplectic systems assuming a decoupled block-wise structure and discussed their inability to express general metriplectic dynamics. But I would appreciate some examples of these more general metriplectic systems encountered in practice. In particular, do the two systems studied in Section 5 admit the decoupled block-wise structure? If so, it would be reasonable to expect that those more restrictive methods are also tested in the experiments as baselines.

Comparison with Hamiltonian learning: If one ignores the entropy states and focuses only on the observable states (positions and momenta), it looks like the two examples in Section 5 can just be learned as Hamiltonian systems. In that case, does the NMS method reduce to e.g. the Hamiltonian Neural Network (HNN) from [1]? If not, perhaps HNN should also be added as a baseline method to compare against in Table 2.

Choices of the initial unobserved states: Regarding the initial unobserved states in batch-wise training. The authors mentioned two interesting strategies to handle the missing initial unobserved entropy states (first question: which one was used in the experiments whose results are reported in the main text?), one of which is to assume that the entropy values increase linearly in time. Is there a justification behind this? (A further question seems to be: are the entropy states and their dynamics uniquely determined by the observable states?) Besides the two strategies considered by the authors, the initial state optimization proposed in [2] for learning Hamiltonian systems might also be an alternative to consider.

Test systems are low-dimensional: Another limitation, as acknowledged by the authors, is that the empirical performance of NMS on larger-scale, realistic metriplectic systems has not yet been demonstrated.

A minor issue: misplaced parentheses for citations on Page 1 (probably due to mixing up \citep with \citet).

References:

[1] Greydanus et al., "Hamiltonian Neural Networks", NeurIPS 2019.

[2] Chen et al., "Symplectic Recurrent Neural Networks.", ICLR 2020.

问题

See questions in the "Weaknesses" section above regarding physics background, training strategy, and alternative methods.

评论- Response to Reviewer gp5d

2024-11-22

Thank you for your review. We are glad to hear that you found our work well-motivated and clear overall. Below you will find responses to your "Weaknesses" section.

Response to weaknesses:

Physics background and relation to Hamiltonian systems. We appreciate that the criticism that the background on metriplectic systems is brief. However, we are quite limited by space constraints. Note that we have made efforts to "signpost" these ideas by including numerous citations in Sections 1 and 2 where more exposition can be found.

The reviewer is correct that, in canonical Hamiltonian systems, $L(x)=J$ is the canonical symplectic matrix and there is no state dependence. At your request, we have left descriptions of the $L,M$ matrix fields corresponding to our examples in the Appendix.

On the decoupled block-wise structure. No, the considered examples are not trivializable in a way that admits their comparison with the previous methods mentioned in Section 2. Other examples include Navier-Stokes-Fourier, collisional kinetic plasmas, and resistive magnetohydrodynamics, to name a few. See, e.g., [1], for more examples.

Comparison with Hamiltonian learning. While Hamiltonian learning methods could conceivably be applied to the examples in Section 5 if entropy is ignored, this would be the wrong inductive bias for these systems. In particular, it is impossible for a Hamiltonian system to equilibrate in the way that is observed for the TGC example, as this requires a decrease in the Hamiltonian along the solution trajectory. To illustrate this point, please see the new Appendix E where we have compared NMS with a standard Hamiltonian neural network.

Choices of the initial unobserved states. All experiments use the 0-to-1 strategy unless otherwise specified (e.g., Appendix D). The justification for this is as follows: in the absence of other information, we have assumed a normalized linear increase in entropy density over time. Of course, this is not true in general, and there may exist other strategies which lead to better model performance. Whether entropy states and their dynamics are uniquely determined by the observable states is a good question. We suspect that this is not the case, although we do not have a proof of this. We thank the reviewer for pointing out the strategy in Chen et al. 2020, which would be interesting to consider in the future.

Test systems are low-dimensional. The reviewer is correct that the examples considered are relatively low-dimensional. On the other hand, it is clear from the numerical results that this is enough to expose noteworthy differences between NMS and previous state-of-the-art methods. Moreover, we believe the provided theoretical results, which are more extensive than those included in previous work, is of value to the field of structure-preserving machine learning. To further strengthen this argument, please see the new Appendix F, where we have included an additional example of state dimension 201.

A minor issue: misplaced parentheses for citations on Page 1 (probably due to mixing up \citep with \citet). Thank you for pointing this typo out to us. It has been addressed in the revision.

[1] P. J. Morrison and M. H. Updike, “Inclusive curvaturelike framework for describing dissipation: Metriplectic 4-bracket dynamics,” Physical Review E, vol. 109, no. 4, p. 045 202, 2024.

评论- Reminder to Reviewer gp5d

2024-11-27

Dear reviewer gp5d,

Thank you again for your feedback. Based on your comments, we have included new examples in Appendices E and F demonstrating both the necessity of metriplectic methods over Hamiltonian ones, as well as the performance of the proposed NMS method on a higher-dimensional example.

As the submission deadline for the revised paper is November 27, we kindly remind you to review our response. We have made every effort to address your concerns thoroughly. We would greatly appreciate it if you could confirm whether our revisions have sufficiently resolved your questions and consider revisiting your score, or let us know if there are any further issues which we can help with.

Best regards,
authors

评论- Thanks for the update

2024-11-28

I really appreciate the response of the authors and the updated manuscript. Regarding the TGC example, I had thought incorrectly that $E_1$ and $E_2$ depend only on the position and momentum variables, in which case the entropy variables look like just auxiliary to the system. But with their dependence on the entropy variables, it makes sense now why the Hamiltonian would not remain constant and all variables should be modeled together. Apologies for the confusion.

My concerns have been addressed and I am raising my score.

评论- Thanks to Reviewer gp5d

2024-11-28

Thanks for your acknowledgement of our rebuttal and for raising the score! We are happy that we could address your concerns.

Best, authors

审稿意见

评分: 8置信度: 22024-11-04

Metricplectic systems model systems that satisfy energy conservation as well as entropy constraints and can used to model thermodynamic and other system that require such constraints. In the context of machine learning it is of interest to learn such systems from data. Prior methods like GNODE exchange the problem of enforcing degeneracy constraints with the problem of enforcing symmetry constraints which underdetermines the problem and at the same time have a redundant parameterization of the problem which leads to high (cubic) complexity. The proposed method exploits structure in the tensor fields to reduce the number of parameters. The paper demonstrates structure in the degeneracy constraints beyond what can be captured by symmetry constraints leading to a lower parameter parameterization. A further result shows that the proposed formulation universally approximates metricplectic systems that are non-degenerate and shows a generalization result.

优点

The proposed approach appears to be novel and more general than prior work and allows for all metriplectic data to be approximated simultaneously. Prior work assumes special forms to satisfy the constraints on entropy and energy. This allows the method to model a greater class of systems. The modeling assumptions are also quite mild and only require non-zero gradients for energy and entropy.

The claims and objectives of the paper are clear. The literature review is quite extensive and the experiments treat a number of prior baseline and validate the claims of the paper of better generalization at lower complexity.

缺点

The paper is difficult to read and there is not a lot of intuition to understand the source of the complexity reduction in Lemma 3.2 and Theorem 3.4 for a non-expert.

问题

Besides lower complexity the method achieves better loss values compared with all the other methods. Why is that even on the simple two-gas problem the other methods, say GNODE, do not achieve the same accuracy even with full state information? Is it due to a restrictive parameterization assumption? Some intuition would be useful.

评论- Response to Reviewer ZdKG

2024-11-22

Thank you for your review. We are glad that you have appreciated the comprehensiveness of our work and the mildness of the required assumptions. Below you will find responses to your "Weaknesses" and "Questions" sections.

Response to weakesses:

We appreciate the criticism that parts of the paper may be difficult to read for non-experts. Due to space constraints, we are unable to give a complete introduction to each topic, but we have stressed intuition throughout and made efforts to "signpost" the necessary ideas by including citations in Sections 1 and 2 where more exposition can be found.

To be clear, the key piece of intuition necessary for understanding the complexity reduction inherent in NMS is the following: while previous methods lift the problem of enforcing degeneracy in matrix fields $L,M$ to a problem of enforcing symmetry in tensor fields $\zeta,\xi$ , this creates redundancy where multiple tensor fields can correspond to the same matrix field. Conversely, our NMS method makes use of the fact that the degeneracy conditions $L\nabla S = M\nabla E = 0$ factor the tensors, removing this unnecessary redundancy and producing parameterizations which exhibit superior scaling with no loss in representative power.

Response to questions:

This is an important point. You are correct that GNODE in particular does not work as well because of its restriction to constant tensors $\zeta,\xi$ , but this is not the only reason for the lower performance of other metriplectic methods. The primary issue is that the optimization problems underpinning metriplectic methods such as GNODE, GFINN, and NMS are highly non-convex with formidable loss landscapes and (in the case of the former two methods) built-in redundancy. This creates a more challenging learning problem for GNODE and GFINN as their optimization takes place in a larger parameter space than is necessary. For instance, there is nothing stopping gradient descent on these architectures from moving their weights along fibers in the direction of this redundancy, thereby making no progress and hindering learning. Conversely, NMS eliminates (most of) this redundancy, leading to an easier learning problem which is empirically shown to produce solutions which are more accurate and generalize better than previous methods.

评论- Reminder to Reviewer ZdKG

2024-11-27

Dear reviewer ZdKG,

Thank you again for your feedback. Based on your comments, we have clarified the language in the exposition of the paper and left more details regarding the superior performance of the proposed NMS method.

Best regards,
authors

2024-11-27

Thank you, I am satisfied with the responses and will raise my score.

评论- Thanks to Reviewer ZdKG

2024-11-28

Thanks for your acknowledgement of our rebuttal and for raising the score! We are glad that we could address your concerns.

Best,
authors

评论- To All Reviewers

2024-11-25

To all reviewers:

Thank you for your thoughtful comments and interest in our work. We are glad to hear that you have found our paper clearly written, numerically thorough, and theoretically convincing.

We have incorporated your suggestions into the revised submission which is now visible. All changes are colored in blue.

Please let us know if you have further questions about our work, and thanks again for your time,

authors

AC 元评审

2024-12-19

This paper proposes a method for learning the dynamics of metriplectic systems from trajectory data. It hardwires structural information in the learning model using geometric techniques, so that the physical constraint is strictly satisfied. Reviewers and I found the method to be novel. Although reviewers (e.g., gp5d and QHfC) initially expressed some concerns about the motivation, presentation and applicability, many of these concerns were alleviated during the discussions. Overall, I feel the pros outweigh the cons and therefore recommend acceptance.

审稿人讨论附加意见

Although reviewers initially expressed minor concerns about motivation, presentation and applicability, many of them were further clarified by the discussions.

最终决定Accept (Poster)

2025-01-22

Accept (Poster)