MetaPhysiCa: Improving OOD Robustness in Physics-informed Machine Learning
摘要
评审与讨论
This paper adopted a meta-learning technique to improve the robustness of OOD. The experimental results demonstrate the proposed method exhibits a better generalization ability.
优点
-
Develop a meta-learning technique for identifying the underlying governing equations of dynamical systems.
-
Formulate it as a bi-level optimization problem
-
The evaluation results have demonstrated the good performance of the proposed method over baselines.
缺点
-
How many basis functions are used in the experiments? If there is no prior knowledge, the number of basis functions will be very large.
-
The proposed method still adopts the SINDy-like approach in which all key terms should be included in a set of basis functions. It may not work on complex equations like
-
Conduct experiments on more complex systems in the CoDA paper, such as reaction-diffusion system and Navier-Stokes system
-
I am curious about the causal structure in Fig. 4. Which two terms in each of the three dynamical systems have causal relationships? Please give an example to explain it. It seems that the authors did not discuss this in the Evaluation section.
-
Please conduct experiments on noisy data.
问题
Please see the comments above.
Q4: Which two terms in each of the three dynamical systems have causal relationships? Please give an example to explain it.
A4: Thank you for the question. First, a note not entirely related to the reviewer’s question but that helps avoid misunderstandings: The causal model is a model of the dynamical system. That is, for the interventions on the initial state and environment changes ( parameters), the resulting output of our model is equivalent to what would happen in the system (up to noise). This is enough to make the model robust OOD but it is not a complete causal model of the real system (interventions on variables that are not and parameters are not considered).
Now to the reviewer’s question: All terms on the RHS of the true dynamical system affect the future dynamics . For example, in the damped pendulum system, the change in angular velocity is causally affected by both and the damping term. To be OOD robust in our scenarios we must be robust to changes in initial conditions and interventions on the parameters .
Q5: Please conduct experiments on noisy data.
A5: Thank you for the suggestion. We had evaluated the effect of noise on MetaPhysiCa’s ability to find the true dynamical system in Appendix D.3. We have now added a pointer to this section in the main text.
In Appendix D.3., we repeat the Damped pendulum and Predator-prey experiments with increasing amounts of noise. Specifically, we add 1%, 5% and 10% Gaussian noise to all the trajectories, both in training and in test. We report the normalized RMSE for different models trained on the noisy versions of data in Figure 8.
In both tasks, the proposed method is relatively robust to small amounts of noise and outperforms the baselines. With 10% noise, MetaPhysiCa is unable to identify the dynamical system accurately, but performs comparable to the baselines. This is consistent with the impossibility results in Fajardo-Fontiveros et al. [1], where the authors show that there is a fundamental limit to learning the true model from noisy data: after a level of noise, true model is unlearnable by any model.
References
[1] Fajardo-Fontiveros, O., Reichardt, I., De Los Ríos, H.R., Duch, J., Sales-Pardo, M. and Guimerà, R., 2023. Fundamental limits to learning closed-form mathematical models from data. Nature Communications, 14(1), p.1043.
We thank the reviewer for their insightful feedback. We have addressed the reviewer questions below.
Q1: How many basis functions are used in the experiments? If there is no prior knowledge, the number of basis functions will be very large.
A1: That is a good question. In our experiments, we use the same set of 14 basis functions for all 3 dynamical systems that include sine, cosine and polynomial terms up to 3rd power. At the reviewer’s suggestion, we repeated our damped pendulum experiments with increasing numbers of basis functions ranging from 7 basis terms (sinusoidal terms, polynomial terms up to power 1) to 32 basis terms (sinusoidal terms, polynomial terms up to power 6) per output dimension in the damped pendulum system ( and ).
Figure 9 in Appendix D.7 shows the training loss and over epoch. MetaPhysiCa converges to the true dynamics of the damped pendulum system with 3 basis terms for all the different values of tested. However, as increases, model requires a higher number of epochs to reach convergence.
Q2: The proposed method still adopts the SINDy-like approach in which all key terms should be included in a set of basis functions. It may not work on complex equations.
A2: Thank you for the question. In Section 3.1 we discuss the importance of having appropriate basis functions in the architecture for OOD extrapolation. Without the algorithmic alignment with appropriate basis functions, the in-domain approximations of the true model may not hold when the inputs are out-of-domain (as shown in Figure 2a).
At reviewer’s suggestion, we show two cases where algorithmic alignment is more challenging to achieve:
-
Appropriate basis functions required to learn the ground truth dynamics are not present. We repeat the damped pendulum experiment without sine/cosine basis functions in Appendix D.6.
In the absence of term to learn the true dynamics of the damped pendulum system, MetaPhysiCa learns a truncated Taylor series approximation of this term via and terms:
.
Table 11 in Appendix D.6 shows that after test-time adaptation, this learnt model achieves to better OOD test NRMSE than the best baseline, but is worse than MetaPhysiCa with sinusoidal basis functions included. -
More expressive SCM. In Appendix D.4, we extend MetaPhysiCa to a more expressive SCM by composing the given basis functions (e.g., to obtain terms such as , etc.). MetaPhysiCa with such an expressive SCM shows OOD performance gains on a complex ODE task (Appendix D.4), but sometimes suffers from learning stiff ODEs during optimization due to the complexity of such a 2-layer composition procedure. Better optimization techniques may help alleviate this problem. We discuss this as a potential future work in Conclusions section.
Q3: Conduct experiments on more complex systems in the CoDA paper, such as reaction-diffusion system and Navier-Stokes system.
A3: That is a good suggestion. We have started exploring fluid dynamic applications as a future direction. For now, at least, we can say that this extension is non-trivial but also very interesting. A PDE needs an expanded set of basis functions including differential operators in order to be able to OOD model behaviors like shock waves and turbulence, which pose unique OOD challenges.
Furthermore, given the nature of the PDEs, the boundary conditions, especially in OOD scenarios, become critically important. Ensuring that the model accurately represents the physics at the boundaries while generalizing well in OOD scenarios is a non-trivial task. This seems to entail developing new techniques for imposing and learning boundary conditions within the MetaPhysiCa framework. Moreover, for fluid simulations, it is important to balance the trade-offs between physical accuracy and computational efficiency. We discuss including PDEs as future work in the Conclusions section.
Dear authors,
Thanks for addressing some of my concerns. I have a follow-up question for A2, you mentioned that you may learn a truncated Taylor series for intricate equations. What is the difference between your work and the TaylorNet [1] that tries to discover governing equations based on the Taylor series? Please discuss this paper in your related work.
[1] TaylorNet: A Taylor-Driven Generic Neural Architecture. 2022. https://openreview.net/forum?id=tDNGHd0QmzO
Source code. By the way, I am wondering if you could upload the source code to the supplementary material so that we can reproduce the experimental results.
Thanks a lot!
Dear Reviewer,
Thank you for reading our replies. We are happy to hear the reviewer found our answers helpful. Please let us know if we should expand any of our answers.
Thank you for pointing us to TaylorNet (Zhao et al., 2023)! Very interesting work. We will add it to our related work section. Regarding the difference between our work and TaylorNet: TaylorNet considers the task of fitting a Taylor polynomial to the data (in-distribution), while our work is interested in uncovering an underlying causal model that generalizes across different environments OOD. The approach used in TaylorNet is, as far as we can tell, is unsuitable for our OOD task:
-
In the context of dynamical system forecasting, TaylorNet is a transductive model like SINDy, EQL, and other PIML methods discussed in Section 3.2. TaylorNet can learn the ODE when ground truth parameter is the same for all training trajectories, but is not designed to learn the ODE with training trajectories from different environments (different ) and cannot adapt to an OOD during test.
-
Appendix D.6, Table 11 (details in response A2 above) also suggests truncated Taylor expansion is not enough to achieve OOD robustness; appropriate basis functions like sine/cosine terms are also needed, that are not present in TaylorNet.
Source code: We are currently disentangling the source code from some custom libraries for publication. We will try our best to also anonymize the code and try to add it to the supplement by tomorrow with a short README on how to rerun the experiments.
References:
Hongjue Zhao and Yizhuo Chen and Dachun Sun and Yingdong Hu and Kaizhao Liang and Yanbing Mao and Lui Sha and Huajie Shao, TaylorNet: A Taylor-Driven Generic Neural Architecture, ICLR 2023 Submission (https://openreview.net/forum?id=tDNGHd0QmzO).
Dear authors,
Thanks a lot for your detailed answers. I will raise my score if you can upload the source code. Thanks a lot!
Dear Reviewer,
We have attached the source code in the Supplementary Material. Please see README.txt file to install the dependencies and instructions on how to run the model. Due to the time constraints, we are only able to provide code to reproduce the experiments in the main paper. We will upload the code for experiments in the Appendix with the final version.
Thank you for taking the time to engage with us.
In the paper, the authors propose an approach to prove more robust dynamical system forecasting, for physics-informed algorithms. The paper first describes the out-of-distribution (OOD) setting with transduction and inductive setting, where the algorithm alignment is needed for the accurate forecasting. For the proposed model, MetaPhysiCa, the authors first describe the deterministic underlying structural model, where at each time step, the hidden state can go through different bases functions. The derivative is a combination of the task-specific coefficients and selected bases functions. Given the training data, the authors extract the underlying causal structure from causal structure discovery problem (minimal causal structure with the last number of edges). The structure parameters, global parameters and task-specific parameters are minimized using the binarization tricks to approximate. The authors have provided the proof to show that MetaPhysiCa can correct identify the causal structure. During the test time, the task-specific parameters can be adjusted. In the experiments, the authors show that the MetaPhysiCa can perform better than existing PIML method in the tasks.
优点
(S1) The paper introduces a novel concept, MetaPhysciCA, which focuses on out-of-distribution learning in the PIML setting. This approach is distinct from the state-of-the-art methods and addresses an important robustness problem that is yet to be solved in the research community.
(S2) The paper is clearly written. The authors give a clear problem definition (inductive setting and transductive setting) with compelling examples (Figure 1-2). There are several components in the methodology, yet each component is clearly addressed. The author has also provided a theoretical proof to show that the method can correctly identify the underlying causal structure.
(S3) The empirical results presented in the paper show that the proposed method outperforms standard PIML methods in the OOD setting, showcasing the quality and robustness of MetaPhysiCa. Given the ubiquity and increasing reliance on PIMLs in real-world applications, the capability to adapt to real-time data shifting and providing robust estimation under underspecified physics prior is important. MetaPhysiCa's methodology has potential impact for ensuring PIML’s reliability.
缺点
W1. While MetaPhysiCa showcases success in the datasets mentioned, it is usually the case that the the correctly specified basis function is included in the search space of . MetaPhsysiCa’s performances in the set of incorrectly specified basis function is not known.
W2. There should be an ablation study that shows MetaPhysiCa’s performances for a relatively small pool of basis functions and a relatively large pool of basis functions. For example, for the large pool of , does learning the causal structure more difficult?
W3. The paper mentions that joining optimizing all parameters result in comparable experiments than bi-level optimization. But the experiments in the main body of the paper forego showing this. The authors should consider adding experiment results that shows bi-level optimization results as well as the jointing optimization.
问题
Q1. The authors have described the MetaPhysiCa primarily for ODE tasks. However, could the MetaPhysiCa be adapted to PDE setting, for example, for finding the parameters for Burger’s Equation? How could MetaPhysiCa be adapted to methods such as PINN?
Q2. Why are only the task-specific parameters updated during the test-time? Why would causal structure not needed to be updated? For fraudulent detection system, it is possible that a ODE function shifts during the test time.
Q3. Theorem 1 is only guaranteeing that MetaPhysiCa identified the true causal structure. What is the theoretical guarantee that MetaPhysiCa discovers the good task specific parameters, especially for few-shot updates during the test time?
Q4. Could the authors provide more insights into the computational complexity introduced by MetaPhysiCa, especially for more larger pools of basis functions and increasing number of observations?
Q5. Would MetaPhysicCa be adapted or be integrated into PIML methods that contain different loss functions? Is that as simple as adding the additional loss terms into Equation 4?
伦理问题详情
n/a
Q5: Why are only the task-specific parameters updated during the test-time? Why would causal structure not needed to be updated? For fraudulent detection system, it is possible that a ODE function shifts during the test time.
A5: Thank you for the interesting question. In our OOD settings, we assume we will not see enough of the test-time series in order to learn a new basis (or a modification of a basis). SINDy, for instance, tries to learn the basis under this setting and generally fails. Hence, our OOD shift is with respect to the interventions on the initial conditions and parameters , while the underlying structure of the dynamical system remains the same. Thus, it is enough to update the task-specific parameters. Small OOD changes to the causal structure is an interesting future work direction, possibly requiring additional regularization terms to restrict the amount of changes to the learnt structure.
Q6: What is the theoretical guarantee that MetaPhysiCa discovers the good task specific parameters, especially for few-shot updates during the test time?
A6: This is another interesting challenge. Unfortunately, establishing theoretical bounds for task-specific parameters under arbitrary basis assignments seems quite difficult. These bounds appear intrinsically linked to the basis used in the causal-equivalent model, leading to a scenario where each application demands its unique set of parameter bounds. Although there might be potential methods to circumvent these task-dependent bounds, we are yet to find a universally applicable solution that could extend across all tasks.
Q7: Computational complexity introduced by MetaPhysiCa, especially for more larger pools of basis functions and increasing number of observations?
A7: For a single task and time , computing the predicted derivatives in Equation 3 has complexity to evaluate the Hadamard product and the matrix-vector product, where is the number of basis functions and is the state dimension of the dynamical system. Since there are tasks/trajectories of maximum length , the overall complexity per epoch is . However, as Figure 9 shows, MetaPhysiCa could require a higher number of epochs to converge for higher values of . We have added a paragraph in Appendix C.1 discussing the time complexity.
Q8: Would MetaPhysicCa be adapted or be integrated into PIML methods that contain different loss functions? Is that as simple as adding the additional loss terms into Equation 4?
A8: Depending on the application, other PIML loss functions could be further incorporated (e.g., energy conservation, monotonicity, etc.) while preserving the basic principles of MetaPhysiCa: Orthogonal basis functions, identifiability of the equivalent causal model for OOD robustness. However, these PIML constraints should also be enforced during test-time adaptation of task-specific parameters by adding these loss terms in Equation 5.
Thank you for your response. It addressed all my concerns and I would recommend this paper for acceptance.
Thanks for reading our response, we are happy it answered all your concerns.
We thank the reviewer for their support and insightful feedback. We give detailed answers below.
Q1: it is usually the case that the the correctly specified basis function is included in the search space of f. MetaPhsysiCa’s performances in the set of incorrectly specified basis function is not known.
A1: That is a good point that we now address in the revised submission. We have added clarifying statements in the updated submission. Specifically, we emphasize in Section 3.1 that appropriate basis functions must be present in the architecture for OOD extrapolation. Without the algorithmic alignment with appropriate basis functions, the in-domain approximations of the true model may not hold when the inputs are out-of-domain (as shown in Figure 2a).
In the updated submission we perform an interesting experiment (thank you for the suggestion!). We repeated the damped pendulum experiment without algorithmic alignment, i.e., appropriate basis functions required to learn the ground truth dynamics are not present. We repeat the damped pendulum experiment without sine/cosine basis functions in Appendix D.6. In the absence of term to learn the true dynamics of the damped pendulum system, MetaPhysiCa learns a truncated Taylor series approximation of this term via and terms: . Table 11 in Appendix D.6 shows that after test-time adaptation, this learnt model achieves to better OOD test NRMSE than the best baseline, but is worse than MetaPhysiCa with sinusoidal basis functions included.
Q2: There should be an ablation study for a relatively small pool of basis functions and a relatively large pool of basis functions.
A2: Thank you for the great suggestion! In Appendix D.7 of the revised submission, we repeat our damped pendulum experiments with increasing numbers of basis functions ranging from 7 basis terms (sinusoidal terms, polynomial terms up to power 1) to 32 basis terms (sinusoidal terms, polynomial terms up to power 6) per output dimension in the damped pendulum system ( and ).
Figure 9 in Appendix D.7 shows the training loss and over epoch. MetaPhysiCa converges to the true dynamics of the damped pendulum system with 3 basis terms for all the different values of tested. However, as increases, model requires a higher number of epochs to reach convergence.
Q3: The authors should consider adding experiment results that shows bi-level optimization results as well as the joint optimization.
A3: We have now added a section D.2.3 in the Appendix discussing bi-level optimization vs joint optimization in more detail. We observe that bi-level optimization and joint optimization result in learning the true dynamics for all 3 tested dynamical systems. However, joint optimization is faster per epoch than the bi-level optimization (taking 90ms vs 744ms on average on one Intel(R) Xeon(R) CPU core). The final NRMSE values are the same for both as test-time adaptation is an independent step and learns the same task-specific parameters once the true structure is learnt by either optimization method. We have added a pointer to this section in the main text.
Q4. The authors have described the MetaPhysiCa primarily for ODE tasks. However, could the MetaPhysiCa be adapted to PDE setting
A4: That is an interesting question. We have started exploring fluid dynamic applications as a future direction. For now, at least, we can say that this extension is non-trivial but also very interesting. A PDE needs an expanded set of basis functions including differential operators in order to be able to OOD model behaviors like shock waves and turbulence, which pose unique OOD challenges.
Furthermore, given the nature of the PDEs, the boundary conditions, especially in OOD scenarios, become critically important. Ensuring that the model accurately represents the physics at the boundaries while generalizing well in OOD scenarios is a non-trivial task. This seems to entail developing new techniques for imposing and learning boundary conditions within the MetaPhysiCa framework. Moreover, for fluid simulations, it is important to balance the trade-offs between physical accuracy and computational efficiency. We discuss including PDEs as future work in the Conclusions section.
The authors propose a meta-learning based method for physics-informed out-of-distribution (OOD) generalization. Specifically, the proposed method comprises a set of basis functions that are assumed given where each basis function is governed by its set of parameters (unknown). Additionally, it is assumed that the proposed system is trained on a set of related training instances (from some dynamical system e.g., pendulum) to learn the parameters of the basis functions (specifically a sub-set of the basis functions that govern the task) as well as to learn parameters that are involved in the linear combination of these basis functions. These basis functions (with appropriate learned parameters) and their linear combination comprises the causal structure discovery mechamism (CSM) proposed in the paper. Additionally, the authors adopt a meta-learning approach where the CSM model (trained on multiple trajectories with different initial conditions and PDE parameters) is additionally also trained with a invariant-risk minimization (IRM) type of loss (specifically minimal variance of loss across all training tasks). Once the model is trained with the CSM + IRM based losses, the test-time comprises a few-shot adaptation (only of a subset of parameters of the model) to the related but new (initial, PDE parameters) condition of the dynamical system.
The authors demonstrate that the proposed model in multiple contexts (i) epidemic modeling (ii) predator-prey systems (iii) pendulum system and showcase better OOD adaptation compared to traditional physics-informed approaches as well as other approaches like Neural ODE.
优点
-
The problem of OOD generalization is important and is a significant challenge (as highlighted in the paper) for physics-informed neural networks (transductive or inductive) as well as traditional neural network models to accomplish. However, any useful neural network model applied to scientific domains needs to have good OOD generalization capabilities. Hence, the authors develop an effective solution to an important problem.
-
Overall, the paper is well written, and the related work and methodology as well as results are well organized and clear.
缺点
-
Testing is required on more challenging settings (e.g., 1D convection, convection diffusion other “stiff” PDE / ODE settings where physics informed approaches are known to fail).
-
The assumption that the collection of m possible (appropriate) basis functions are always available seems too strong to the reviewer. It would be helpful if more clarity about this can be provided by the authors. Basically, this strong assumption might significantly reduce the impact of the proposed method as the full set of basis functions might not always be available to select from.
问题
-
Have authors tested on more challenging (stiff PDE, ODE settings and on settings like 1D convection, convection-diffusion where traditional physics-informed models fail)?
-
Could you please expound on the assumption of the m possible basis functions? Are there contexts where all sub-parts of the full applicable basis might not be present in the pre-trained network? How can the proposed method adapt to this OOD scenario? Has this been tested?
We thank the reviewer for their support and insightful feedback. We give detailed answers below.
Q1: Testing is required on more challenging settings e.g., 1D convection, convection diffusion, other “stiff” PDE settings?
A1: We agree that extending MetaPhysiCa to forecasting PDEs (instead of ODEs) such as 1D convection, diffusion, etc under OOD scenarios is an important future research direction; this would include an expanded set of basis functions including differential operators. Furthermore, given the nature of the PDEs, the boundary conditions, especially in OOD scenarios, become critically important. Ensuring that the model accurately represents the physics at the boundaries while generalizing well in OOD scenarios is a non-trivial task. This entails developing new techniques for imposing and learning boundary conditions within the MetaPhysiCa framework.
We discuss including PDEs as future work in the Conclusions section.
Q2: Have authors tested on stiff ODE settings where traditional physics-informed models fail?
A2: Traditional PIML methods including SINDy typically struggle for stiff ODEs because the dynamical system contains both slow and fast varying states [2,3]. Unfortunately, MetaPhysiCa does not solve this PIML issue with stiff ODEs, rather inherits this weakness. For instance, in our experiments, it is unable to identify the stiff ODE describing Robertson's chemical reaction [1] with fast and slow dynamics. One proposed solution in the literature is to learn the ODE only for slow-varying dynamics and use a neural network for the fast dynamics [2]; however this is not guaranteed to be OOD robust due to the neural network component (as discussed in Section 3.1). We believe solving this issue while preserving OOD robustness is an important future work.
Q3: The assumption that the collection of m possible (appropriate) basis functions are always available seems too strong to the reviewer. Are there contexts where all sub-parts of the full applicable basis might not be present in the pre-trained network? Has this been tested?
A3: That is a good point that we now address in the revised submission. We have added clarifying statements in the updated submission. Specifically, we emphasize in Section 3.1 that appropriate basis functions must be present in the architecture for OOD extrapolation. Without the algorithmic alignment with appropriate basis functions, the in-domain approximations of the true model may not hold when the inputs are out-of-domain (as shown in Figure 2a).
In the updated submission we perform an interesting experiment (thank you for the suggestion!). We repeat the damped pendulum experiment without algorithmic alignment, i.e., appropriate basis functions required to learn the ground truth dynamics are not present. We repeat the damped pendulum experiment without sine/cosine basis functions in Appendix D.6.
In the absence of term to learn the true dynamics of the damped pendulum system, MetaPhysiCa learns a truncated Taylor series approximation of this term via and terms:
.
Table 11 in Appendix D.6 shows that after test-time adaptation, this learnt model achieves to better OOD test NRMSE than the best baseline, but is worse than MetaPhysiCa with sinusoidal basis functions included.
References:
[1] H. H. Robertson, The solution of a set of reaction rate equations, in J. Walsh (Ed.), Numerical Analysis: An Introduction, pp. 178–182, Academic Press, London (1966).
[2] Abdullah, Fahim, Zhe Wu, and Panagiotis D. Christofides. "Data-based reduced-order modeling of nonlinear two-time-scale processes." Chemical Engineering Research and Design 166 (2021): 1-9.
[3] Abdullah, Fahim, and Panagiotis D. Christofides. "Data-based modeling and control of nonlinear process systems using sparse identification: An overview of recent results." Computers & Chemical Engineering (2023): 108247.
The paper focuses on the significant challenge of forecasting tasks within dynamical systems where the underlying ordinary differential equation (ODE) parameters may vary. The key contribution is the application of a meta-learning procedure for causal structure discovery, which aims to improve model performance even when faced with initial conditions and ODE parameters that lie outside the training distribution. The proposed method is tested across three different OOD tasks, and the results indicate a substantial performance improvement over existing state-of-the-art physics-informed machine learning and deep learning methods.
优点
-
The paper's approach to modeling dynamical systems by interpreting them as structural causal models within a meta-learning framework is both novel and significant. This perspective opens new avenues for understanding and forecasting complex systems.
-
The framework's construction is innovative, and the incorporation of the V-REx penalty to uncover causal structures appears effective.
缺点
-
My main concern with this paper is the robustness and completeness of the empirical results. Authors developed two OOD scenarios and picked one for each dynamical system. The selection of OOD scenarios for each dynamical system requires further clarification, as the current rationale is not provided. In addition, the authors use limited in- and out-distribution "pairs": only one single distribution for training, and another single distribution for testing. This limits the assessment of method generalizability. Expanding the experimental design to include both OOD scenarios across all dynamical systems and a broader range of distribution pairs would likely enhance the validity of the findings.
-
The methodological scope is limited by not accounting for interactions between basis functions, potentially restricting the model's capability to learn more intricate dynamics. Additionally, by confining the approach to ODE-style dynamics, it may not accurately reflect the complexity of real-world systems, such as those found in real world epidemics.
-
While the method can integrate prior knowledge of dynamical systems into the creation of basis functions, it appears that this is the extent to which such insights can be utilized. Expanding the method to incorporate prior knowledge more deeply could further improve model performance and applicability.
问题
-
Is it possible for you to run extra experiments for multiple in- and out-distribution "pairs" and show that the results are consistent? I'd be happy to increase my score after seeing a more robust evaluation of the method.
-
A real epidemic does not follow the SIR model but there are some characteristics of the SIR model that are useful. How can we adapt your work to handle such real-world dynamical system?
We thank the reviewer for their support and insightful feedback. We give detailed answers below.
Q1: My main concern with this paper is the robustness and completeness of the empirical results. Authors developed two OOD scenarios and picked one for each dynamical system.
A1: Thank you for the question. We wish to clarify that both the OOD scenarios are tested for all 3 dynamical systems, presented in Appendix due to lack of space. Table 1 (in the main text) and Tables 2&3 (in Appendix, Pg 17) show both OOD scenarios for epidemic modeling, damped pendulum and predator-prey respectively. In both OOD scenarios for all 3 dynamical systems, MetaPhysiCa is able to obtain significantly more robust predictions than baselines.
Q2: authors use limited in- and out-distribution "pairs": only one single distribution for training, and another single distribution for testing.
A2: Thanks for the interesting suggestion! In the revised submission we have added results for multiple in- and out-of-distribution pairs in Appendix D.5. Table 9 shows that under a variety of training ranges for initial conditions and parameters , MetaPhysiCa is able to learn the true structure of the damped pendulum dynamical system. Given the learnt structure, Table 10 shows OOD NRMSE for different OOD ranges for initial conditions and parameters.
Q3: Scope is limited by not accounting for interactions between basis functions… Additionally, by confining the approach to ODE-style dynamics, it may not accurately reflect the complexity of real-world systems.
A3: We thank the reviewer for the comment. The review will find Appendix D.4 interesting, where we extend MetaPhysiCa to a more expressive SCM by composing the given basis functions (e.g., to obtain terms such as , etc.). MetaPhysiCa with such an expressive SCM shows OOD performance gains on a complex ODE task (Appendix D.4), but sometimes suffers from learning stiff ODEs during optimization due to the complexity of such a 2-layer composition procedure. Better optimization techniques may help alleviate this problem
We agree that extending MetaPhysiCa to forecasting PDEs (instead of ODEs) under OOD scenarios is an important future research direction; this would include an expanded set of basis functions including differential operators, and require careful consideration of OOD boundary conditions. In our revised submission we discuss both these limitations in the Conclusions section.
Q4: Expanding the method to incorporate prior knowledge more deeply could further improve model performance and applicability.
A4: We agree with the reviewer. In application-specific scenarios, prior physics knowledge could be further incorporated (e.g., known invariances, dimensionless variables, mesh structures for fluid simulations, etc.). The basic principles of MetaPhysiCa would still be preserved: Orthogonal basis functions, identifiability of the equivalent causal model for OOD robustness. We envision MetaPhysiCa as a starting point for OOD robustness in various application-specific scenarios.
Q5: A real epidemic does not follow the SIR model but there are some characteristics of the SIR model that are useful. How can we adapt your work to handle such real-world dynamical system?
A5: That is an interesting suggestion. In order to verify MetaPhysiCa we focused on models where we were sure there would be a unique causal structure for the train and test data (i.e., the basis functions were nearly orthogonal under the support of the training data). For more complex models, we recommend adding more complex basis functions. There is an interesting set of future works that could design task-specific basis functions and, potentially, design their associated regularization objectives for better learning the equivalent causal model for basis functions that are not orthogonal over the support of the training data.
Thank you for your response. It addressed all my concerns and I hope those insights can be added to the camera-ready version of the paper. I have increased my score.
Thank you for reading our response and increasing the score. We will make sure to add these insights in the revised submission.
This paper addresses the problem of out of distribution generalization in the context of physics-informed machine learning for forecasting tasks that follow dynamical systems based on ODE with potentially different parameters. The contribution consists of a meta-learning procedure based on causal structure discovery. Experimental evaluation shows an improvement with respect to SOTA methods.
The contribution has been evaluated as very interesting and novel, opening new perspectives for studying dynamical systems, the empirical evaluation is convincing and the paper is well written.
On the negative side, some issues about the significance and scope of the experiments and on the limitations of the approach.
During the rebuttal, authors have provided specific answers to reviewers. 3 reviewers agreed on the fact that their concerns have been solved and increase their score.
Overall the paper appears to be of good quality, I recommend acceptance.
为何不给更高分
The paper is interesting, but some limitations in the experiments have been raised.
为何不给更低分
This paper opens new perspectives for OOD in the context physics-informed machine learning that still requires some fundamental work.
Accept (spotlight)