6.0

/10

Poster5 位审稿人

最低5最高7标准差0.6

3.8

置信度

正确性3.0

贡献度2.8

表达2.8

NeurIPS 2024

FUSE: Fast Unified Simulation and Estimation for PDEs

Levi E. Lingsch,Dana Grund,Siddhartha Mishra,Georgios Kissas

OpenReview PDF

提交: 2024-05-14更新: 2025-01-13

TL;DR

This work presents a framework to unify forward and inverse problems in scientific computing by optimizing a joint objective derived from operator learning.

摘要

关键词

Forward and Inverse ProblemsPDEsNeural OperatorsNeural Posterior Estimation

评审与讨论

审稿意见

评分: 6置信度: 42024-07-10

The authors propose a new approach, FUSE to the simultaneous learning of an emulator and statistical estimation of underlying "discrete" parameters in a joint training step.

The approach splits the problem into two: (i) the forward problem which is modelled through a FNO neural operator approach, effectively learns a map from a space of finite-dimensional parameters to an output function. (ii) The inverse step then seeks to learn a conditional distribution for the parameters $\xi$ based on measurements of the continuous input $u$ using a flow-matching approach. This yields two loss functions which are simultaneously optimised.

The authors present two challenging problems as test cases.

优点

The authors have identified an important problem which is genuinely challenging and have proposed a potential solution to this problem. They have demonstrated the effectiveness of their approach on two challenging PDE-based problems. The components of the proposed approach are not novel, but their combination seems to be a new approach, demonstrating originality.

缺点

I struggled at times to understand the actual methodology in the paper. The two strategies for forward and inverse are relatively clear, but their combination is far from clear to me, which leads me to question the broad applicability of this method.

My understanding of the methodology is that the authors require data of the form $(y, u, \theta)$ , for which $u$ is a continuous input, $y$ is a continuous output and $\theta$ is an intermediate set of parameters. The forward model learns the deterministic map between $y$ and $\theta$ and the inverse model learns the probabilistic relation between $\theta$ and $u$ . If this is the case, then the need to propose a combined methodology seems unclear to me. The fact that the two optimisation problems in (4) appear to be decoupled, suggests that this is the case.

The challenge that the authors describe is most relevant in settings where data from $\theta$ or $u$ are not available, in which case we must rely on $y$ and $u$ to identify $\theta$ while training the model, but this seems to be not the case? Otherwise, I am struggling to see the need for this combination of methods.

I also think the authors could be more comprehensive in their literature review, which has been very narrowly focused on a handful of methods. There are both more recent methods, and quite a history of older approaches which seek to address this problem in different contexts around science and engineering.

问题

The limitations indicated above need to be clarified -- some clarity is needed in the introduction to spell out the applicability of this methodology. If my understanding is correct, then stronger motivation of why a combined approach is even needed.

Discrete parameters is not really the right terminology -- which seems to suggest your data is ordinal valued (1,2,3). I suspect you mean that your parameter is a finite dimensional vector rather than a function?

Finally, the literature review seems very focused on neural network approaches, yet there are other approaches based on Gaussian processes etc. Could the authors provide some relevant background which is more encompassing.

局限性

Limitations have been well addressed.

作者回复

2024-08-07

W1. Concerns about the novelty/applicability: We thank the reviewer for this important question. We will exemplify the applicability of the methodology by providing an example in PWP. The goal in this setting is to predict the output function $s$ given a finite-dimensional vector of parameters $\xi^*$ . In problems such as personalized medicine, $\xi^*$ is not known a priori, as it represents quantities such as vascular resistance that cannot be measured non-invasively, and it needs to be inferred from available measurements $u$ , e.g. PPG. This is a strongly ill-posed problem, thus a probabilistic approach needs to be considered. This is modelled by approximating a posterior probability $\rho(\xi^*|u)$ . When the posterior probability is approximated, posterior samples are drawn and used to run a flow solver to estimate continuous quantities such as local vascular pressure $s$ , which cannot be measured non-invasively, and uncertainty estimates. More generally, FUSE targets the problem of calibrating vector-valued parameters of PDEs, which is applicable to many areas in the sciences such as climate modeling, computational fluid dynamics, material science, wave scattering etc.

W2. Reasons for Unification under a Neural Operator framework: We understand the reviewer's confusion and thank them for asking for clarification. There are two strategies that can be considered for the inverse problem, which for the sake of brevity we are going to call "classical" and "ML-based". Given measurements $u$ the "classical", (e.g. MCMC) strategies guess the value of $\xi^*$ , and iteratively correct their guess. For complex problems such as the ones presented here, these approaches are computationally infeasible due to the high cost of the numerical solver, e.g. see lines 709-710 for ACB, to compute the likelihood probability.

The "ML-based" approaches are built in two stages. They first consider a training stage where given pairs of $(u_i, \xi^*_i), i=1,...,N$ , with $u_i =$ Sim $(\xi^*_i)$ and Sim a numerical solver, the model learns an approximate distribution $\rho(\xi_i^* | u_i)$ using variational inference or normalizing flows. During the evaluation stage a new input $u$ is given and the ML model provides samples from $\xi^* \sim \rho(\xi_i^* | u_i)$ . To get an ensemble of continuous quantities $s$ and uncertainty estimates, following both the "classical" and "ML-based" inversion strategies, a numerical solver needs to be considered, which is very expensive as well.

For this reason, we propose a methodology that uses the same ML architecture to infer $\xi^*$ given $u$ and to act as a solver surrogate to make predictions of $s$ given $\xi^*$ jointly during evaluation. During training we consider all $(u, \xi^*, s)$ as known but during evaluation we consider only $u$ known and predict $(\xi, s)$ . Using a neural operator surrogate for the solver, we can consider $u$ to not be equal to $s$ , and e.g. get predictions at only specified locations of interest, e.g. the aorta, without need to run the simulation through the whole cardiovascular system.

The joint analysis of the inverse and forward problems with FUSE results to the following important benefits:

Propagated parametric uncertainty: The joint formulation of both problems under a rigorous mathematical framework (Eqn. 1) allows assessing the influence parametric uncertainty in the physical model has on the predicted continuous quantities in a very efficient manner.
Unified validation: Since both the forward and inverse models are nonlinear it is hard to track the error accumulation in the different components. The joint evaluation is a solution to assessing forward and inverse errors separately. Furthermore, this mathematical framework facilitates the comparison of different combination of inverse/forward models and training objectives. A priori, the errors of an inverse and a forward model trained and validated separately may amplify nonlinearly at concatenation, increasing the chance that samples generated by the former are OOD for the latter.
Function space simulation-based inference: Parametric PDEs as described above require a communication between the infinite-dimensional function spaces of $u$ and the finite-dimensional space of $\xi$ . In order to allow operating on function spaces, we combine a finite-dimensional flow matching model with neural operators, which is (to the best of our knowledge) an entirely new concept. This approach can generalize to any flow based or diffusion approach.

Answers to the questions:

Q1. We kindly refer to the answer provided above.

Q2. We thank the reviewer for pointing out this ambiguous nomenclature and will adopt the proposed change. The term "discrete" emerged in opposition to "continuous" spatially varying parameters, which is of course not the right wording.

Q3. We are happy to incorporate any literature the reviewer may be pointing at that amplifies our literature review in the proposed direction. Following the reviewer's suggestion, we would like to add several references based on Gaussian processes for forward and inverse problems (arxiv 2204.02583), kriging (2012.11857), and GP based models to explore complex high-dimensional spaces by low dimensional representations (2101.00057), as well as others.

评论- Response to Authors

2024-08-11

I thanks the reviewer for their detailed comments.

My understanding from their responses is the following:

Data is available in triples of the form $(s, u, \xi)$ , so there is no latent variable inference, etc.
The training of the inverse and forward losses is performed in parallel and there is no shared parameters / common components etc.
The first time the models speak to each other is during inference / prediction, where the models are simply composed in the obvious way.

Based on this, I feel my original challenge about novelty still holds. My understanding remains that this is two separate methods,

(i) Learning a FNO model for the parameter to output problem.

(ii) Learning a conditional density via a flow-based model for the inverse problem from the input to the parameter distribution. Each approach is generally well-established, each trained in isolation (in parallel). These approaches are then combined at inference time to provide probabilistic inference for the associated inverse problem.

The assumption that you would have a data set of the form (parameter, functional input, functional output) would only be reasonable when you're working purely synthetic data from a PDE -- so that this provides utility mostly as a probabilistic surrogate / emulator for the inverse problem. This is different from, say INVAERT which does not prescribe intermediate values at training time. It seems quite unfair to me that it would be used as a baseline for FUSE, as it has to solve a substantially harder problem.

In terms of the benefits:

Propagated parametric uncertainty: Yes, this is true.
Unified validation: This is good, but there's no really way of back-propagating that unified validation to adjust the individual model.
Function space simulation-based inference: This is good and a clear feature of FNO layers within the models.

I will slightly bump up my score to reflect the helpfulness of the clarification, and because the general paper offers useful insight.

2024-08-12

We would first like to thank the reviewer for their response and increasing their score. We take this opportunity to clarify the remaining concerns that the reviewer has regarding our work and request the reviewer's patience in reading our very detailed reply below.

1. In response to reviewer's point on "InVAErt which does not prescribe intermediate values at training time. It seems quite unfair ...".

The reviewer appears to be mistaken in their assessment of the inVAErt methodology. Training separate objectives is not uncommon in the methods that combine forward and inverse problems. The InVAErt framework specifically considers three different models that are trained separately and then combined at evaluation; a deterministic encoder, a normalizing flow, and a deterministic decoder with a VAE based sampler. The InVAErt framework considers both $\xi$ (denoted as $v$ in the original paper) and $s$ (denoted as $y$ ), so it also assumes that the triplet $(u, \xi, s)$ is known during training. However, due to the limitations of the VAE formulation, $s=u$ in their setup. In other words, InVAErt attempts to solve the same problems we address in our work, contrary to the reviewers statement that it solves more difficult problems. First, the deterministic encoder is trained to solve the forward problem, predicting an input function, $u$ , from the parameters $\xi$ . InVAErt also employs a normalizing flow; however, this learns a distribution of the overall data, as opposed to a posterior distribution conditioned on the input data. Finally, a third training procedure is used to learn a probabilistic latent representation of the parameters, denoted as $w$ , and a deterministic map from inputs and latent samples $[u, w]$ to the parameters $\xi$ . The VAE component of this model simply learns a set of latent variables which account for a lack of bijectivity in ill-posed problems. This does not provide any unification of the uncertainties which relate the inverse and forward problems presented in their work, nor does it provide any advantage in training a model to solve these problems through a single loss function or unified training procedure. Instead, three model components are trained separately using a total of 5 training objectives.

In contrast, FUSE learns the uncertainty within the parameters themselves, clearly unifies the propagated uncertainty, and simplifies the objectives into two model components with only two loss functions.

2. To address the reviewer's comment that "the training of the inverse and forward losses is performed in parallel and there is no shared parameters / common components etc." we would like to point out that this is a choice, not a restriction.

Although each model component is trained separately, the sampling procedure of the FMPE model is differentiable. Therefore, it is completely possible to backpropagate from the output functions $s$ to the input functions $u$ , through the sampled parameters $\xi$ . This would unify the model components during training as well as inference. Essentially, this would condition the uncertainties over output functions on the input functions as opposed to the parameters. However, this does not lead to significant performance gains in practice because the coupled uncertainty may be disentangled, precisely as shown in the mathematical foundations of our model described in Eq. 1. We would add this discussion in the CRV, if accepted.

Furthermore, this approach may be used to fine-tune the individual model components when the parameters are not available. This is what we understand the reviewer to imply with point 2, unified validation backpropagation of the individual model. Because of differentiable sampling, the inverse component of the model could be fine-tuned by backpropagating the loss from the continuous output functions through the entire model while freezing the forward problem model. This constrains the parameter predictions to lie close to their true values. Likewise, the forward component may be fine-tuned by freezing the inverse problem model, backpropagating the loss on the output functions.

We realize that these points are not sufficiently clear in our work and would like to include them in a CRV, if accepted.

2024-08-12

3. To respond to the comment that "the models are simply composed in the obvious way," we would like to point out that FUSE is not trivially composed by two separate methods, but one method from which two separate objectives result by considering Eq. (1). That means that FUSE is a rigorous mathematical framework under which we can formulate different loss functions for forward and inverse problems by considering different metrics over measures. The choice of the neural operator, e.g. FNO, is only a model choice and any neural operator would work. Nonetheless, the choices for the forward and the inverse models are entangled, meaning that one affects the other. For example, if an FNO neural operator is considered for the inverse problem, $\tilde{h}$ needs to be a lifting to the space of band-limited functions. Therefore, this is a general framework that allows for different choices for the forward and the inverse problem, but these choices cannot be arbitrary. So, even though the two models do not share parameters, they do share architecture choices. Similarly, flow matching is a way to evaluate the metric $\tilde{d}$ which is difficult to evaluate otherwise, but it can be substituted by some other choice of measure matching or some other metric, as we show with conditional DDPM.

4. Additionally, the reviewer's comment that learning an FNO for a parameter to an output problem is well established is incorrect. The FNO is constructed to learn maps between infinite-dimensional function spaces, related by parameterized PDEs; yet, finite-dimensional parameters are not included in this formulation or as input/output data. To date, there is a critical lack of research on neural operators which relate infinite- and finite-dimensional spaces. We believe we present one of the first approaches to accomplish this task via a novel lifting operator from a finite-dimensional space to a space of band-limited functions. Likewise, we present a transformation from infinite-dimensional spaces to a finite-dimensional space via a projection to a finite dimensional space and a measure matching objective. We believe this is a significant contribution beyond inference based on two finite-dimensional spaces, as accomplished by FMPE alone. In case our assertion of novelty in this context is incorrect, we kindly request the reviewer to provide references where similar approaches are well established within the community.

5. The statement that data set of the form (parameter, functional input, functional output) would only be reasonable for purely synthetic data, is not necessarily true. It is very common in bioengineering, and more specifically in real datasets involving PWP, see the MIMIC-III (https://doi.org/10.1038/sdata.2016.35) as an example and data from biobanks (e.g. the UK biobank), to contain both time-series data and vectors of parameters available for different patients. FUSE is applicable for these real datasets to perform the following tasks: infer parameters using real data, see ArXiv:2307.13918 for an example setup, precision medicine or solver calibration, see ArXiv:2404.14187 for an example setup, fingerprinting to discover parameter-disease correlations, see https://doi.org/10.1101/2024.04.19.590260 for an example setup.

Even when considering purely synthetic data, it is very important to study parametric PDEs, and more specifically the relations between different parameters and the system output. In tandem with numerical solvers, FUSE may be used for sensitivity analysis, calibration to specific conditions or to find parameters that lead to extreme events. All of these processes are very useful for improving our understanding of complex systems governed by PDEs. The study of such systems is critical for areas of science and engineering, including climate modeling, mechanics, fluid dynamics, and wave scattering amongst others. Even though neural operators often work with synthetic data, it is reductive to imply there is no connection to real world systems.

2024-08-12

5. (continued) Moreover, we consider unifying forward and inverse problems, which is by definition a setup where the whole triplet $(u, \xi, s)$ is needed. This happens because both the forward and the inverse problem are supervised. This problem setting is the same as the cases considered in InVAErt and cVANO. It is an interesting direction to generalize this approach to perform manifold discovery as well, when the manifold coordinates $\xi$ are not available as labels for each data sample.

Essentially, the PDE parameters act as a coordinate system for the underlying manifold and are used to enforce the latent space to conform to this manifold. When the manifold coordinates are not known, we need to rely on the model to disentangle the coordinates of the latent space, which is a hard problem in representation learning and generative modeling in general. In practice, what disentanglement means is that if you move on the axis of one coordinate of the latent vector it has an individual effect on the output. For example if we vary the latent dimension that represents MAP in the PWP example, this will result in increasing the pressure, but if we vary the dimension corresponding to Age it will have a very small effect to the output, please see our sensitivity analysis.

One way for this to be tackled would be to consider a Manifold flow formulation (ArXiv: 2003.13913) to both discover the latent manifold coordinates to solve the inverse problem, and a chart from the latent space to the space of output functions to solve the forward problem. FUSE can be extended to Manifold flows by considering some specific architecture choices, such as invertible Neural Operators. In this case, knowing $\xi$ during training would be optional, but when $\xi$ is not known then the latent space will not be interpretable. We leave this direction to future research.

We would like to thank the reviewer for their diligence and patience with our detailed reply. We hope this has addressed the remaining concerns to the reviewer's satisfaction and kindly request the reviewer to update your assessment accordingly.

评论- response

2024-08-13

I thank the authors for clarifying a misunderstanding I had about invaert. Where FUSE sits in the existing methodology is far more clear to me. I hope the authors can provide similar clarifications about their method in the main paper. I have bumped up the score to aceptance

2024-08-13

We sincerely appreciate the reviewer's commitment to this discussion, and we will gladly incorporate these clarifications in the updated manuscript. We also thank the reviewer for further increasing their score to acceptance.

审稿意见

评分: 7置信度: 42024-07-11

This paper proposes a framework to tackle simultaneously forward (simulation of the system) and inverse (estimation of key parameters of the system) problems for PDEs. Namely, the authors suppose the existence of an underlying parameter $\xi$ that characterizes the input functions $u$ of the PDE, and therefore formulate a probabilistic framework where the output solution $s$ is sampled according to $\xi \sim p(\xi | u)$ and $\hat{s} = G^\theta(\xi)$ . The authors employ a two-loss objective for training where the first one aims to train the operator and the second one is used to approximate the true distribution $p^*(\xi | u)$ . The method is tested on complex PDE systems, that depend on multiple parameters, namyely atmospheric cold bubble (ACB) and plus wave propagation (PWP). It obtains better performance than existing methods and convincing uncertainty propagation results.

优点

The paper proposes FUSE, a unifying approach for both inverse and forward problems for PDEs, with a relevant theoretical framework.
The method obtains SOTA results on complex PDEs on the forward metrics, and is very competitive in the inverse task. It outperforms all the baselines in the OOD regime, showcasing the robustness of the method.
The uncertainty propagation property is impressive, as shown in Figure 4 and 5.
They propose the first application of flow matching for PDE problems, and integrate Fourier layers in the architecture.

缺点

The datasets used are quite complex as they include many parameters and several equations. It could be best to provide additional visualizations of the data and further describe the data format used by each block of the architecture to better illustrate the different components.
The paper does not justify the use of flow matching compared to a different probabilistic approach. Would there be a difference between DDPM and Flow matching in this case ?
There is a lack of details on flow matching, particularly on the parametrization $\psi_{u, t}$ and its inverse $\psi^{-1}_{u, t}$ .

问题

Do you have access to the true parameters $\xi$ during training ?
Do you train the neural operators with parameters $\xi$ sampled from the flow matching ?
Is the training done sequentially or in parallel ? According to the minimization objective, each loss only depends on a single set of parameters.
How many steps do you take at inference ?
Did you try other generative models other than flow matching ?
Did you try a fully probabilistic framework ? For instance assuming that the observations of $s$ were not deterministic given $\xi$ ?
What is the training and inference time of the method ?

局限性

There is a limitation section that discusses the assumptions of the method.

作者回复

2024-08-07

W1. We thank the reviewer for pointing out possible difficulties in understanding our model and data. An updated version of the model illustration is provided in the 1-page pdf, including the requested details. We are happy to incorporate any further specific suggestions, should we have missed a crucial part. In terms of data visualization, we point to the extensive figures in the appendix, in particular the sensitivity studies in Figs. A.9 and A.16, as well as A.11 showing the data-generating 2D fields for the ACB experiment.

W2. The reviewer raises an excellent point. Flow matching and diffusion are competitive methods, each with their own advantages and disadvantages. As explained in "Flow Matching for Generative Modeling" (ArXiv 2210.02747), flow matching may use a diffusion defined probability path, but an optimal-transport path may also be selected, as in FMPE. The result is that FMPE is able to train on less data, converge faster, and provide faster sampling at inference time. Given that PDE training data often come from expensive numerical simulations, data efficiency is a key component for machine learning approaches in scientific computing. To showcase this, we have performed some initial experiments using a conditional DDPM model, and present the results in the 1-page pdf. Keeping in mind that these are only preliminary results, we observe that the experimental findings support the arguments stated above.

W3. We understand the reviewer's concern and agree that more details regarding FMPE models would benefit this paper. However, given the amount of space available, fully describing these details was not possible within the main text. To clarify, We have followed [23] in this regard (lines 135-144), and we would kindly refer interested readers to their work for a full derivation and explanation of flow matching for simulation-based inference. We would also include details on FMPE in the appendix of a CRV, if accepted.

Q1, Q2. The true parameters are used during training, with training samples being tuples of the form $(u,\xi^*,s)$ . The inverse part of the model is trained with $(u,\xi^*)$ to learn a $u$ -conditional probability path to $\xi^*$ , while the forward map is learned from $(\xi^*,s)$ . Only at evaluation time, when merely $u$ is available as input, the parameters $\xi$ sampled from the flow matching are input to the forward neural operator to obtain $s$ .

Q3. Training is done in parallel. As the reviewer correctly points out, each loss function only depends on a single set of parameters during training. At evaluation time, we combine these two models to unify the inverse and forward problems, enabling us to study the relationships within the system, such as uncertainty propagation and sensitivity analysis.

Q4. We assume the reviewer is asking about the number of artificial time steps known from diffusion models. Although FMPE is defined along a time-dependent probability path, this pseudo-time variable is not discretized as in a diffusion model, but computed in a single step. Should the reviewer be referring to the number of samples drawn by the FMPE, our results use 100 samples from $\rho(\xi | u)$ at inference time.

Q5. After considering other generative approaches in the initial phase of our project, such as discrete normalizing flows and generative-adversarial models, the flow matching approach presented in this work proved the most competitive in our research. Please refer to W2 for a comparison to diffusion models.

Q6. In our work, we did also investigate a fully probabilistic approach. However, given that these data sets come from numerical simulators of PDEs whose continuous outputs are fully deterministic given the input parameters, fully probabilistic models in the way proposed by the reviewer offer no advantage in capturing the parametric uncertainty. As the reviewer suggests, this fully probabilistic approach would be interesting to investigate for instances where the observations are not deterministic given the parameters, such as with SDEs. We leave this interesting line of research to future work.

Q7. We provide the training and inference times in the 1-page pdf, and thank the reviewer for requesting this important information.

评论- Response.

2024-08-09

Thank you for the additional details. I am quite satisfied with the answers so I will increase my score to 7.

评论- Thanking the Reviewer.

2024-08-09

We sincerely thank the reviewer for appreciating our rebuttal and for increasing the score.

审稿意见

评分: 6置信度: 42024-07-12

The authors study the problem of joint prediction of continuous fields and statistical estimation of parameters for physical systems governed by PDEs. Prior work had focused on operator learning and then inference to determine the statistical parameters. Here, the propose to solve for both jointly in their method FUSE which combines Neural Operators with Flow Matching Posterior Estimation (FMPE). The authors then test their method on important applied problems in haemodynamics and large eddy simulations showing advantages in both the inverse and surrogate problems.

优点

Good application domains studied including haemodyanmics and atmospheric large-eddy simulation of bubbles.
Interesting problem to study parametric PDEs
Identified proper limitation of numerical methods when the PDE parameters are not known exactly and calibration techniques must be used to learn the parameter from data in inverse problems.
Nice overview of Neural Operators
Nice to also mention ROMs and recent work in deep learning.
Good that the paper tackles both forward problems with UQ and inverse problems
CRPS is good uncertainty metric
Also good that OOD case is tested
Thorough an diverse evaluation
Results show that the proposed FUSE method is performing strongly
FUSE maintains the discretization invariance property of NOs

缺点

Add references to the classical numerical methods, such as LeVeque in the introduction
Can add reference to the Multiwavelet Neural Operator, Gupta et al., NeurIPS 2021 in the introduction.
Missing literature references: Neural Operator methods with UQ for the Forward Model Simulation including Bayesian Neural Operator, Magnani et al., and see the overview and detailed comparison in Mouli et al., "Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs", ICML 2024, which studies sensitivity analysis and UQ for Neural Operators. I think these works should be cited and the methods compared to as baselines. Also see Ma et al., "Calibrated Uncertainty Quantification for Operator Learning via Conformal Prediction", https://arxiv.org/abs/2402.01960, 2024 for conformal prediction techniques.
The main limitation is lack of benchmarking against the above FNO + UQ baselines
More recent state-of-the-art baselines, e.g., diffusion models (DDPM, DDIM) and variants could also be compared.

问题

1, In parametric PDEs, are the authors discussing the PDE parameter or the BC/IC as mentioned in the introduction and why are they discrete? The authors should clarify this. 2. How are the two loss functions weighted and counter-balanced? Is there a hyper-parameter to tune?

局限性

Yes

作者回复

2024-08-07

W1. The reviewer rightly points out that a reference and comparison to classical numerical methods is indispensable when justifying the use of ML-based methods. Since the application field "parametric PDEs" for our method is rather broad, we are happy to adopt the suggestion to reference a standard text book on the numerical solution of PDEs, as provided by LeVeque in 1992.

W2. We thank the reviewer for highlighting the multiwavelet neural operator (MWT), which is an interesting addition to our extensive review of neural operator approaches in lines 43/44. Since it proves superior performance to FNO in the given reference, it would be an interesting future study to replace the FNO by MWT within FUSE, in particular when it comes to test cases with larger scale separation.

W3, W4. In order to avoid any confusion on terminology up front, we would also like to point out that when Magnani et al. 2022 refer to operator learning for "parametric PDEs", the parameters taken as input to the learning problem are functions in space, $\lambda(x)$ . Our setting, in contrast, assumes the parameters $\xi$ to be vector-valued constants, as an intermediate step between the functions $u$ and $s$ . We assume that when suggesting to use uncertainty quantification (UQ) for neural operators (NO), they refer to the forward model part $\Xi \mapsto \mathcal{S}$ only.

We thank the reviewer for the suggestion to add the references on UQ for NOs. However, we respectfully disagree on the applicability of UQ for NOs to the problems considered in our paper. Let us explain why: UQ for NOs, in the sense indicated by the given references, quantifies the approximation error by the NO model, instead of the parametric uncertainty in the physical model targeted by FUSE. Thus, benchmarking against the proposed methods, which is listed as the main limitation of our paper, is not possible. As Mouli et al. 2024 point out in their review in the appendix, it is common practice, e.g. in weather forecasting, to perturb a physical model input and parameters to quantify the uncertainty related to the physical state and the model itself, respectively. We deem it important to distinguish these types of uncertainty from the additional approximation error when fitting an ML model. They are also easily distinguished by the probabilistic mappings they model: While an ensemble approach maps an ensemble of parameters (as provided by the FMPE) onto an ensemble of predictions (pushforward), the UQ approaches for NOs equip a prediction on a single input with an uncertainty range. In order to clearly interpret the uncertainties given by FUSE as parametric physical model uncertainty, and given the good performance of the model in our id and ood validation, we consider the PDE parameters as the main source of uncertainty.

W5. The reviewer raises an excellent point. Flow matching and diffusion are competitive methods each with their own advantages and disadvantages. As explained in "Flow Matching for Generative Modeling" (ArXiv 2210.02747), flow matching may use a diffusion defined probability path, but an optimal-transport path may also be selected, as in FMPE. The result is that FMPE is able to train on less data, converge faster, as well as provide faster sampling at inference time. Given that PDE training data often come from expensive numerical simulations, data efficiency is a key component for machine learning approaches in scientific computing. To showcase this, we have run experiments using a conditional DDPM model and present the results in the 1-page pdf. Even though this is initial work, we observe that the preliminary experimental findings support the above arguments. We will include this discussion and results in future versions of the paper.

Q1. We thank the reviewer for the question, which is crucial to the applicability of our method. Indeed, the "parameters" can stem from model properties (ACB: $v$ and $d$ , Table 7), or parameterizations of the initial (the remaining ACB parameters) or also boundary conditions (PWP: pulse wave from the heart, lines 644-646 and original reference), or any other model component that is parameterized.

Concerning the nomenclature "discrete", we acknowledge that it is misleading, and we will change it to "finite-dimensional" parameters $\xi$ , as opposed to the infinite-dimensional $u(t)$ and $s(t)$ . We mean to point out that the inferred parameters are vector-valued constants that are not space or time-dependent.

Q2. The two loss functions are fully decoupled, see line 153 of the experiments section. So, there is no need for hyperparameter tuning.

2024-08-13

Dear reviewer, please read the above rebuttal and evaluate whether it answers your concerns. If your evaluation remains unchanged, please at least acknowledge that you have read the author's response.

审稿意见

评分: 5置信度: 42024-07-13

The authors propose "FUSE", a combination of multiple neural operator models, which are trained to jointly solve PDE forward problems and perform parameter inference of given parametric PDE. The main idea is to start from a range of PDE solutions obtained from various parameter values, and then train neural operators that can interpolate both in parameter space and in solution space. The approach is evaluated on two systems of parametric PDE, and compared to a range of similar neural operator and sampling approaches.

优点

The results of the computational experiments seem impressive, especially given that only a relatively small number of samples is used (O(1000)). The main idea of jointly learning the parameter inference and the forward PDE solution is valid and interesting. The comparison to a range of other approaches shows that the proposed approach works well.

缺点

Figure 1 is not very clear. The caption does not explain what is shown in the figure, the parameters, models, inputs and outputs are not mentioned.
Many references are only referred to by their pre-print. The proper journal / conference should be given instead.
Inference times are not provided, and no comparison to classical solvers is performed (even though they are being used to generate the training data). Also see question 4.
The paper (main and appendix) does not contain a lot of details regarding implementation and applicability (Questions 5 and 6).

Minor:

l88: "observatbles" -> observables
l110: G is now not an operator as defined in l87, but a function on a finite-dimensional space (with a range in a function space). The redefinition is confusing.

问题

l64: why is the framework called "FUSE"? Is it an acronym for something?
l93: should it not be $\mu$ instead of $\mu^*$ on the right argument of d as well? Otherwise, how can we minimize the distance to the full measure, not just its approximation?
l142: what does it mean that " $\mathcal{P}$ is a map that lifts the channels of the dimensions of the input function"? What does "lifting channels of dimensions" mean?
The authors state that (l31) "iterative and thus expensive calibration procedures" are required for classical solvers as a drawback, but then 180 GPU hours (l604) are required for training the proposed model. The authors do not comment on this, so my question: why is it beneficial to train, or "calibrate" the neural operator model for such a long time, as opposed to using the "iterative and expensive" calibration procedures for classical solvers?
There are very little details on the "learnable maps" mentioned throughout the explanation of the approach. Which maps are used? Are all of these MLPs, deep, which nonlinearities? These details do not need to be mentioned in the main paper, but even the appendix does not contain them.
There is very little detail on the the applicability of the approach. For which PDEs is it useful, when will it not work, how much data is needed for which types of problems, etc.? These questions do not need to be answered completely, but there is not even a simple example where this is being discussed or studied.

局限性

The authors discuss limitations in terms of future work, which is fine. Negative societal impact is not discussed, where the authors argue that only PDE problems are being solved, without immediate concerns - which is questionable, especially because the authors use pulse propagation in the human cardiovascular system as an example where the approach could be used (and where humans may be harmed it it does not work in practice in a hospital).

作者回复

2024-08-07

W1. We thank the reviewer for pointing this out. We include an updated Figure in the one page document, which we believe is much more explanatory.

W2. Even though referencing pre-prints is a common practice in ML, the reviewer is right in that, if available, the respective journal or conference of publication should be provided. We will update all references accordingly.

W3. The inference times for FUSE are in the order of milliseconds per sample, details are provided in the one page document. As for the classical solvers, the computational complexity for the ACB case is about half an hour on eight cores per sample (lines 709-710). As cited in line 192, the PWP data set was not created by the authors, but by (Ref. [35]). The authors do not report the computational time, but simulating 1 sample of the system using the OpenBF open source Julia code [openbf-hub, A. Melis, 2018; Melis 2018, EthosID uk.bl.ethos.731549], takes about 80 seconds on one core. As the reviewer points out, this comparison is crucial to justify the use of ML models, and will be emphasized in future versions of the paper.

W4. See Q5 and Q6.

Minor weaknesses. We thank the reviewer for pointing out the typo, which we will of course correct. Regarding the notation, however, we would like to emphasize that l110 defines $\mathcal{G}$ , while line 87 defines $\tilde{\mathcal{G}}$ . We chose this notation to distinguish the unified operator $\mathcal{G}: \mathcal{U} \mapsto \mathcal{S}$ from the parametric forward model $\tilde{\mathcal{G}}: \Xi \mapsto \mathcal{S}$ .

Q1. FUSE stands for Fast Unified Simulation and Estimation, as shown in the title. To facilitate the reading, we will repeat the full name where pointed out by the reviewer.

Q2. We thank the reviewer for asking for clarification on this notation, which we are convinced is correct. The mathematical problem formulation merely presents the distance $d$ that corresponds to the problem on a theoretical level, involving both the true unknown $\tilde{\mathcal{G}}$ and $\mu^*$ . Following common practice in ML research, $d$ is subsequently upper bound by loss functions involving $\tilde{\mathcal{G}}$ and $\mu^*$ , which are approximated by finite sampling when it comes to implementation, as suggested by the reviewer.

Q3. The lifting function is a commonly used term in ML research, also heavily used in the operator learning literature [3, 9, 10, 31], that describes an affine (linear) transform that increases the dimensionality of a function $u \in \mathcal{U} \subset \mathcal{C} (\mathcal{X}, \mathbb{R}^{d_u})$ defined as $\mathcal{P} : \mathcal{C} (\mathcal{X}, \mathbb{R}^{d_u}) \rightarrow \mathcal{C} (\mathcal{X}, \mathbb{R}^{\tilde{d}_u})$ , where $\tilde{d}_u >> d_u$ . In practice, a lifting function is implemented by a one-layer fully connected neural network.

Q4. As discussed in lines 601-604, 180 GPU hours is the total computational time for 64 runs of hyperparameter sweep for all models and all experiments contained in the paper, while training a FUSE model only takes about 1 GPU hour. Accounting for the wall-clock times of one simulation of ACB and PWP problems (W3), it is infeasible to repeatedly calibrate the numerical solvers using traditional methods for varying function input $u$ (e.g., MCMC requires about 1,000 to 10,000 simulations for each $u$ ). However, once trained, the low inference times of the FUSE model (W3) and its operator properties make this calibration possible.

Q5. "Learnable map" denotes a parametric function (for instance a neural network or an affine map) whose parameters are learnt during training. We followed standard practice in ML and did not define this terminology, while believing that an interested reader can check the code for exact architecture of our maps. However, we agree with the reviewer that defining these maps is more reader-friendly and will do so in a camera-ready version, if accepted.

Q6. As explained in the introduction, the method proposed in this paper is applicable to any parametric PDE. It will not work for PDEs that are not a priori parametric, e.g. for a stochastic PDE with a Brownian motion forcing, as explained in the limitations section. An estimate of the sample complexity for different PDEs would require a statistical analysis on a range of problems, which is outside of the scope of this paper. However, we provide an analysis of error scaling with the number of samples for the ACB case in the 1-page pdf. We observe that even if we consider 4096 training samples, half of what we used originally, FUSE would still be more accurate than any of the baselines reported in the main text.

L. The reviewer raises a point about the reliability of the FUSE model if it were applied in clinical practice for the Pulse Wave Propagation case. We believe it is very commendable of the reviewer to think about potential negative societal impact of this method.

Negative Societal Impact, as per NeurIPS author guidelines, is considered for cases where the presented methodology can be immediately used without a prior evaluation protocol, e.g., deep fakes. We present a general purpose framework and do not think that FUSE can be used as a clinical decision making tool as is. For this to happen, a very detailed protocol issued by a public safety organization needs to be considered that contains multiple steps of in-silico, in-vitro and in-vivo validation.

2024-08-10

The authors have adequately addressed my concerns. I will raise my score to 5.

Regarding Q3: it seems my comment was unclear; I am aware of the concept of "lifting" / changing the input dimension with a linear map, but the sentence in the manuscript was confusing. The explanation by the authors in the rebuttal is great, I recommend to modify the sentence in the manuscript.

2024-08-10

We sincerely thank the reviewer for adjusting their score, and we will gladly incorporate the revisions into an updated manuscript.

审稿意见

评分: 6置信度: 32024-07-13

The paper introduces a novel framework called FUSE that unifies surrogate modeling and parameter identification for parametric partial differential equations (PDEs). Traditionally, field prediction and statistical estimation of discrete parameters have been separately handled by using operator learning surrogates and simulation-based inference, respectively. FUSE proposes a combined approach that aims to enhance the accuracy and robustness of both tasks.

FUSE is designed to jointly predict continuous fields and infer the distributions of discrete parameters, leveraging a unified pre-training step to amortize the computational costs associated with both the inverse and surrogate models. It employs a probabilistic extension of operator learning through Flow Matching Posterior Estimation (FMPE) to effectively handle parametric uncertainties. This unified approach facilitates in-depth model calibration and supports sensitivity analysis and uncertainty quantification (UQ) at a significantly reduced computational cost.

The authors demonstrate the proposed methodology on two applications: pulse wave propagation (PWP) in the human arterial network and an atmospheric cold bubble (ACB) simulation. They emphasize the advantages of FUSE in both the inverse (parameter estimation) and forward (continuous field prediction) tasks.

优点

Problem Definition: The authors have mathematically defined the statistical estimation of discrete parameters, a significant problem encountered in practical applications.
Integrated Approach: FUSE provides a holistic and efficient solution for addressing parametric uncertainties in PDEs by integrating surrogate modeling and parameter inference tasks within a single framework.
Robustness: The use of FMPE in a probabilistic setting makes the model enhance the robustness of model predictions under varying conditions and uncertainties.

缺点

Lack of Novelty: This paper appears to be an application of FMPE (Flow Matching Posterior Estimation) to the inverse problem, lacking significant novel developments.
Uncommon Problems: The problems addressed in this paper have not been widely discussed within the machine learning community (though it might be due to my lack of familiarity). Questions

问题

In the supplementary section (Fig A.1, A.2), it seems that the identified parameters can cover a very wide range. However, these parameters do not significantly impact the PDE solution (Fig A.3, A.4). Is it correct to interpret these variables as having low sensitivity?
In the worst case of (Fig A.3, A.4), even when considering uncertainty (i.e., various possible parameter values), the obtained solution seems quite different from the true one. Should this be interpreted as indicating the presence of unknown physics?

These questions might arise from my lack of familiarity with the specific applications discussed, as mentioned in the weaknesses above.

Can the methodology in this paper be applied if the boundary conditions are set as parameters?

局限性

The authors have effectively explained the limitations of their approach in the paper.

作者回复

2024-08-07

We would first like to thank the reviewer for acknowledging the novel framework, the rigorous problem definition, and the integrated approach we take constructing a robust framework.

W1 Regarding the concern with respect to the lack of novelty in the FMPE application, we would like to highlight key differences and contributions in this work.

This is not merely an application of FMPE: The metric in the first term in Eq. (1) is bounded by a metric as shown in line 127. Because this metric is hard to compute in practice, our implementation matches $\rho$ and $\rho^*$ with FMPE; however, it may be substituted by other choices such as NPE or diffusion models (see 1-page pdf). In other words, FMPE is not essential to the methodology and any suitable model for matching posterior distributions would suffice.
Inverse Problem: Additionally, this paper introduces function space simulation based inference, to infer parameters from functions as opposed to vectors. Therefore, we provide an inverse model that is grid resolution-invariant, bridging a gap between neural operator architectures and simulation-based inference. This is a critical concept for many practical applications and, to the best of our knowledge, has not yet appeared in the literature.
Forward Problem: We present a formulation of the supervised operator learning problem for a map between a finite-dimensional parameter and an infinite-dimensional space of output functions. This is accomplished by a novel Lifting Operator, transforming finite-dimensional vectors into the space of band-limited functions, as presented in lines 115-116.
Unification: We propose a novel framework in the operator learning literature that unifies forward model simulation and statistical parameter estimation under the same rigorous mathematical framework in Eq. (1). The joint formulation of both problems using the triangle inequality allows to assess what we refer to as the propagated uncertainty. This approach enables a more comprehensive understanding within many scientific problems by introducing explainable model uncertainties, which tie uncertainties in complex, infinite-dimensional spaces to uncertainties in simple parameters which characterize the system.

W2 Uncommon problems: We recognize that the two problems presented may not be widely known to the broader ML community; however, ML-based model calibration is of vital interest in many applied communities. In particular, cardiovascular models have been subject to both classical and ML-based inversion due to their relevance [arXiv 2307.13918]. Climate science is also a major topic within the field of operator learning (arxiv: 2208.05419, 2111.13587, 2306.03838). Due to their high complexity, there is less work on the calibration of small-scale atmospheric LES models, even though their parametric uncertainties are well-known [doi 10.1007/s10546-020-00556-3]. However, the combination of numerical methods and ML presents an opportunity to greatly improve the state of the art [paper ref. 48].

Q To answer the questions, let us first clarify that for the Pulse Wave Propagation problem, we present three scenarios with different levels of input information. In cases 2 and 3, most of the measurement locations on the human body and quantities of interest are not available, and hence masked (lines 196-200). When the reviewer refers to the "worst case", we assume they mean case 3 with least information provided.

Q1-2. First, there are indeed some variables with low sensitivity to the parameters (e.g. Fig. A.9: pressure is insensitive to age), which generally have wider posterior distributions. The wider distributions in Fig. A.1-2.b-c are a result of larger uncertainties as the number of input channels is reduced from 39 (case 1) to 3 (case 2) and 1 (case 3). It would be reasonable to interpret parameters with large uncertainty as a result of low sensitivity to the limited set of inputs in these cases. The uncertainty arising from the ill-posed nature of these cases, as opposed to unknown physics, results in pressure predictions which are quite different from the true values. Essentially, the parameters with a strong sensitivity to pressure may not be determined because they have a weak sensitivity to PPG at the fingertip (case 3). The result is that a more "average" pressure is predicted, with large uncertainty ranges.

Q3. Yes, in fact, the Pulse Wave Propagation experiment has boundary conditions encoded as parameters. The left boundary conditions (i.e. the pulse wave from the heart) is parameterized by Heart Rate, Stroke Volume, Pulse Transit Time, Residual Filling Volume, and Left Ventricular Ejection Time. We kindly refer the reviewer to see original reference [35] for more detailed information.

评论- Thank you for responses

2024-08-13

I have reviewed the author's responses and the discussion among the other reviewers and the authors. As a result, all my concerns have been addressed, and I have raised my score. Thank you the authors for their responses.

评论- Thanking the Reviewer

2024-08-14

We express our sincere thanks to the reviewer for appreciating our rebuttal and for increasing their score.

2024-08-13

作者回复

2024-08-07

At the outset, we would like to thank all reviewers for their valuable time and feedback. We believe this discussion will lead to meaningful improvements in the quality and presentation of our work, improving its accessibility to practitioners of scientific machine learning. Furthermore, we would like to express our gratitude to the reviewers' recognition of our novel contributions to this field and the rigorous analysis of our approach.

As per guidelines, we are uploading a 1-page pdf with the following contents,

To address the comments of several reviewers, we have conducted initial experiments using a conditional denoising diffusion probabilistic model (DDPM) within the FUSE framework, in the place of FMPE. The results for both approaches, comparing inference time and accuracy for different numbers of posterior samples, are presented in the attached 1-page pdf.
We provide details on the scaling and training times of the FUSE model employed in our experiments with respect to the number of training samples.
We provide an updated main figure which we believe will add clarity to the types of data, general model framework, and training approach.

In the following, we will address the detailed comments of each reviewer in their respective rebuttal fields. We hope to address the concerns of all the reviewers with our detailed rebuttal and request them to kindly update their assessment.

Yours sincerely

Authors of FUSE: Fast Unified Simulation and Estimation for PDEs

2024-08-08

Dear authors and reviewers,

The authors-reviewers discussion period has now started.

@Reviewers: Please read the authors' response, ask any further questions you may have or at least acknowledge that you have read the response. Consider updating your review and your score when appropriate. Please try to limit borderline cases (scores 4 or 5) to a minimum. Ponder whether the community would benefit from the paper being published, in which case you should lean towards accepting it. If you believe the paper is not ready in its current form or won't be ready after the minor revisions proposed by the authors, then lean towards rejection.

@Authors: Please keep your answers as clear and concise as possible.

The AC

2024-08-13

Dear all,

Thank you for your efforts so far in reviewing the paper and in answering reviewers' questions.

The evaluation is currently borderline, with an average score of 5.2 and a spread from 4 to 7.

The authors-reviewers discussion will close today on August 13. Please take this last opportunity to discuss the paper and the reviews and to reach a consensus, or at least to gather all the necessary information to make a fair and informed decision (in favor of acceptance or rejection).

Thank you for your attention and cooperation. The AC

最终决定Accept (poster)

2024-09-25

The reviewers unanimously recommend acceptance (6-5-6-7-6). The author-reviewer discussion has been constructive and has led to a number of clarifications and potential improvements to the paper. The authors are asked to implement the clarifications discussed with the reviewers in the final version of the paper.