5.7

/10

Rejected6 位审稿人

最低3最高8标准差1.5

3.7

置信度

正确性2.5

贡献度2.5

表达2.7

ICLR 2025

PDE-constrained Learning with Multi-time-stepping for Accelerated Fluid Simulation

Qi Wang,Yuan Mi,Wang Haoyun,Yang Liu,Yi Zhang,Ruizhi Chengze,Hongsheng Liu,Ji-Rong Wen,Hao Sun

OpenReview PDF

提交: 2024-09-27更新: 2025-02-05

TL;DR

We propose a PDE-constrained network with multiscale time stepping (MultiPDENet), which fuses the scheme of numerical methods and machine learning, for accelerated simulation of fluid flows.

摘要

Solving partial differential equations (PDEs) by numerical methods meet computational cost challenge for getting the accurate solution since fine grids and small time steps are required. Machine learning can accelerate this process, but struggle with weak generalizability, interpretability, and data dependency, as well as suffer in long-term prediction. To this end, we propose a PDE-embedded network with multiscale time stepping (MultiPDENet), which fuses the scheme of numerical methods and machine learning, for accelerated simulation of fluid flows. In particular, we design a convolutional filter based on the structure of finite difference stencils with a small number of parameters to optimize, which estimates the equivalent form of spatial derivative on a coarse grid to minimize the equation's residual. A physics block with a 4th-order Runge-Kutta integrator at the fine time scale is established that embeds the structure of PDEs to guide the prediction. To alleviate the curse of temporal error accumulation in long-term prediction, we introduce a multiscale time integration approach, where a neural network is used to correct the prediction error at a coarse time scale. Experiments across various PDE systems, including the Navier-Stokes equations, demonstrate that MultiPDENet can accurately predict long-term spatiotemporal dynamics, even given small and incomplete training data, e.g., spatiotemporally down-sampled datasets. MultiPDENet achieves the state-of-the-art performance compared with other baseline models, with over 5$\times$ speedup compared to classical numerical methods.

关键词

physics-informed learningmultiscale time steppingspatiotemporal dynamics prediction

评审与讨论

审稿意见

评分: 6置信度: 32024-11-01

The authors introduce MultiPDENet. The goal is to accelerate the solution of time-dependent PDEs using deep learning. MultiPDENet has a learnable multiscale timestepper, inspired by Runge-Kutta updates. It also has some interesting inductive biases, such as symmetry constraints on layers that mimic the operation of finite difference stencils. The results show impressive gains over standard neural PDE surrogates from the literature in terms of long term rollout accuracy and stability.

优点

Novelty

There are several novel aspects of MultiPDENet

The schematic diagram of how everything fits together is unique
While learning a timestepper is not unique, I have not since it in the context of PDEs, only ODEs.
While placing constraints on learnable derivative filters is not unique e.g. Learning data driven discretizations for partial differential equations Bar Sinai et al. (2018), the symmetry constraints of the MultiPDENet is unique

Quality

The results are really good! This is probably because of the sensible use of inductive bias in the learned solver structure. I think this is the highlight of the paper. In particular:

The high correlation time plots in Figure 3 b/e/h are highly motivating
The results in Table 2 show some massive performance improvements over baselines
The results in Section 4.2 show large improvements over baselines even outside the ICs of the training range, demonstrating robustness to distribution shift
I found the ablation in Table 3 very informative. This will also be important for anyone wishing to build on this work

Significance

For the field of ML4PDEs, any model that can push the envelope of performance forward is significant, and in this sense, MultiPDENet is significant.

缺点

Clarity

I found it hard to follow the mathematical description of the model, since many of the terms are not laid out in the main text. I had to refer to the diagrams to understand the general gist of how everything fits together. Please see my questions below for further requests for clarification. I found I was able to understand, at a gut-feeling level, why main design choices were made, but because of the ambiguity in the exact model design I am not sure whether my reading is correct. The clarity is the part of the paper I would implore the authors to focus on most.

Quality

One subtle, but very important weakness of the paper is the very small number of training data used. The number of snapshots per equation are as follows:

KdV: 3000 snapshots
Burgers: 700 snapshots
Gray-Scott: 540 snapshots
Navier-Stokes: 24000 snapshots For deep learning applications this is extremely small and could explain, to a large degree, just why the performance of MultiPDENet is just so good when compared with the other models, which exhibit much less inductive bias. As is well known, inductive bias helps in the low data regime, such as this. Indeed this could become the angle that this paper takes, but unless I missed it, there is no data ablation in the paper, which would have been a useful comparison

问题

Line 55: In what way are PiNNs data-drive? I think the claim that PiNNs are “data-driven”, at least in their vanilla form, is contentious. If you consider collocation points as data then this is somewhat true, but there are no ground-truth targets fed to PiNNs, which would render them data-free.
Line 103: One the citations reads “Transformers Cao (2021)”. I think this should read “Cao (2021)”.
Equation 1: Please explain all the mathematical terms in full.
- What is $\tilde{**x**}$ ?
- Why is $\tau$ a dummy variable in the integral, but also found outside the integral?
- Is $\Delta$ a difference operator or a Laplacian?
- What is $\mu$
- What do $i$ and $j$ correspond to? If timestep indices, how are they different from $k$ ?
- Referring back to Figure 1.b), please explain what $k+\delta$ means? Is $\delta$ a fractional time index? What values does $\delta$ take?
Equation 3:
- What is $\lambda$ ?
- What values can $m$ take on? Does this imply that Equation 3 is an $m$ -step block? How does it correspond to Figure 1a)? Is Figure 1a) the Runge-Kutta style discretization of the integral?
- In Equation 1 $\mathcal{P}$ accepts argument $**u**$ , but in Equation 3 it also includes $\Delta **u**$ and $\lambda$ .
- What is the significance of square brackets?
Figure 2: Please expand on the symmetry constraints for the central difference stencils. We see even symmetry in the vertical direction and odd/even symmetry in the horizontal direction. If you are working in 2 dimensions, what does the mixed derivative operator $\partial_{xy}$ looks like?
Line 233: e -> We
Line 269: What is $T$ ? Is it time? How is $T$ different from $t$ ?
Figure 3: b)e)h).
- Between each plot the colors of the curves per model change, please fix this.
- In e) you write "Ours"; whereas, in b) and h) you write "MultiPDENet". Please choose one.

评论- Reply to Reviewer yrNc (Part 1)

2024-11-23

Thanks for your constructive comments and suggestions! We have carefully addressed them, and the following responses have been incorporated into the revised paper.

In addition, we have thoroughly proofread our paper, corrected typos and grammar mistakes, and re-organized the contents to improve the clarity of the paper. The majority of the paper has been re-written (marked in red color). We believe the presentation has been substaintially improved. Please refer to the updated .pdf file.

Weaknesses

W1. Insufficient clarity.

Reply: Thanks for your feedback. Following your comment, we have thoroughly proofread our paper, corrected typos and grammar mistakes, and re-organized the contents to improve the clarity of the paper. The majority of the paper has been re-written (marked in red color). We believe the presentation has been substaintially improved. Please refer to the updated .pdf file.

W2. One of the shortcomings of is the limited amount of data used.

Reply: Thanks for your comment. In fact, a central highlight of our work is to reduce heavy relying to large datasets in common deep learning models. With an integration of known physical knowledges, MultiPDENet shows a significant generalization capabilities on both limited and rich data.

An additional results of data scaling test (data size vs. prediction error) are presentd in Table A to further support our claim of low data dependency. These tests were performed on the GS example using 3, 6, and 15 trajectories as training data. The results demonstrate that our model achieves equal or superior performance compared to other methods (e.g., UNet and DeepONet) along with the increase of training datasets number. Since this finding simply matches the scalability trend of typical neural network models (including ours) over the training data volume, we decide not to include such results in our paper.

Table A. RMSE Performance of Different Models with Varying Numbers of Training Trajectories.

Model	3 trajectories	6 trajectories	15 trajectories
UNet	NaN	NaN	NaN
DeepONet	0.4113	0.4071	0.3965
MultiPDENet	0.0573	0.0426	0.0307

Questions

Q1. Is PINN a pure data-driven approach?

Reply: Great comment! We completely agree with your underlying arguement that PINN should be a physics-guided data-driven approach. We have removed such a statement in the revised paper to clarify and correct any inappropriate statement.

Q2. Explanation of mathematical symbols.

Reply: Great suggestion! We have reformulated all the math equations and symbols in the revised paper to improve clarity (please see Section 3 on Pages 3-5 for details). A list of key math symbols are explained as follows:

$\tilde{\mathbf{x}}$ denotes the coordinates of the coarse grid;
$\nabla$ represents the Nabla operator;
$\boldsymbol{\nabla}^2$ represents the Laplacian operator;
$dt$ represents the time step used in DNS, $\delta t$ the time interval of a micro step, and $\Delta t$ the time interval of a macro step.
$T$ denotes the duration of the trajectory.
$\boldsymbol{\lambda}$ represents parameters in the PDE, such as the Reynolds number in the NS dataset. When these parameters are unknown, they can be set as trainable.
The solution update for each micro-scale time step can be describe as $\bar{\mathbf{u}} _{m+1}^k = \bar{\mathbf{u}} _m^k + \delta\bar{\mathbf{u}} _m^k$ , where

$\delta\bar{\mathbf{u}} _m^k = \int _{t_k+(m-1)\delta t}^{t _k+m\delta t}\left[ \mathcal{B}\left(\tilde{\mathbf{u}}(\tau), \boldsymbol{\nabla}\tilde{\mathbf{u}}(\tau), \cdots; \boldsymbol{\lambda}\right) + \mathbf{f}(\tau) \right] d\tau + \text{M} _{\text{i}}\text{NN}\left(\bar{\mathbf{u}}{ _m^k, \Xi _m^k(p,\hat{\boldsymbol{\nabla}} \hat{\mathbf{u}},\hat{\boldsymbol{\nabla}}^2 \hat{\mathbf{u}}, \hat{\boldsymbol{\nabla}} p,\mathbf{f},Re)}\right)$

Definitions of other math symbols can be found in the revised paper. Hope the above revisions clarify your concern.

Q3. Question About Figure 1.a

Reply: Thanks for this comment. In Figure 1(a) on Page 3, the time marching of the Physics block relies on micro-steps, rolling out for $M$ steps (set as 4 in this paper). Within the Physics block, the temporal integration in the PDE block is performed using the RK4 method (see Figure 1(b) on Page 3).

评论- Request more details on Table A

2024-11-26

Thank you authors for your detailed and attentive rebuttal. I would like to request a little more information on Table A.

How large is a trajectory? Training on 3, 6 or even 15 trajectories could be very little or in fact a lot of data, depending on the length of the trajectories. I do accept that you were interested in building models with little training data. In this case, it would therefore also be important to know how far those limits can be pushed.

评论- Details on Table A

2024-11-26

Thanks for your addtional comment.

Question: More details about Table A.

Reply: Good question! We would like to clarify that a different initial conditions (IC) may lead to completely different trajectory patterns of a PDE system after nonlinear evolution of the spatiotemporal dynamics, resulting in an OOD problem. Hence, generalizing the model to new ICs remains a challenging problem, especially in limited data regimes (e.g., a few trajectories).

Taking the GS as an example in Table A, with 3, 6, and 15 training trajectories, respectively, the datasets are prepared as follows: as shown in Table 1 on Pages 6 in the revised paper, each trajectory is temporally downsampled by a factor of 20 (4000 → 200) and spatially downsampled by a factor of 16 ( $128^2$ → $32^2$ ). The test set consists of 10 trajectories with new ICs, each with a length of 140. After training, our model performs inference for 140 steps on these new ICs, with the results recorded in Table A.

Given that the number of training trajectories lies in the range of a few to a dozen, this still falls in small data regimes. This can be further illustrated by the poor performance of UNet and DeepONet, where both models require large training datasets (e.g., typically $>100$ trajectories to achieve reasonable performance for UNet [1]).

Concluding remark: Once again, we sincerely appreciate your constructive comments, which greatly help improve the quality of our paper. Please feel free to let us know if you have any further questions. Looking forward to your feedback!

Reference:

[1] Lippe et al. PDE-refiner: Achieving accurate long rollouts with neural PDE solvers. NeurIPS, 2024, 36.

评论- If generalizing to new ICs is not the goal, what is?

2024-11-27

Thanks for expanding on my question.

As the title of this comment reads, "If generalizing to new ICs is not the goal, what is?". I'm somewhat confused as to what the purpose of a PDE solver is, if not to generalize to a new scenario, whether that be new equation coefficients, new boundary conditions, new initial conditions, etc. And I do agree with your statement "generalizing the model to new ICs remains a challenging problem"; it is somewhat of an Achilles' Heel for ML based PDE surrogates in my opinion, and thus it is a really important challenge for our community.

If you are aiming at demonstrating that your inductive biases really do aid learning, I think one of the areas they should ideally tackle is this tough challenge of generalization over ICs. Classical solvers manage to do this in a data-free fashion because they are 100% inductive bias. Perhaps a way forward would have been to demonstrate a scaling curve, such as a neural power law. This is shown in that Lie Point Symmetry Data Augmentation paper. It's too late for this sort of analysis at this stage, but I think this is an important point to make.

Anyway, thank you once again for going into such detail in the rebuttal. I am impressed by the hard work you as authors have put in.

评论- Reply to Reviewer yrNc (Part 2)

2024-11-23

Q4. How to calculate the mixed derivative $\frac{\partial^2}{\partial x \partial y}$ ?

Reply: In this paper, the considered problem does not involve any mixed partial derivatives. However, if mixed derivatives, such as $\frac{\partial^2}{\partial x \partial y}$ , need to be calculated, the procedure is to first compute the derivative wrt $x$ using the proposed trainable symmetric convolution filter, and then take the derivative wrt $y$ with the transposition of that filter.

Q5. The colors of the subgraphs b, e, h in Figure 3 are not uniform.

Reply: Thank you for your careful reading. The issue has been fiexed (please see Figure 3 on Page 6 in the revised paper).

Concluding remark: Once again, we sincerely appreciate your constructive comments. Please feel free to let us know if you have any further questions. Looking forward to your feedback!

评论- Request your feedback before the end of the discussion period

2024-11-25

Dear Reviewer yrNc:

As the author-reviewer discussion period will end soon, we would appreciate it if you could review our responses at your earliest convenience. If there are any further questions or comments, we will do our best to address them before the discussion period ends.

Thank you very much for your time and efforts. Looking forward to your response!

Sincerely,

The Authors

评论- Reply to additional comment

2024-11-28

Thanks for your addtional comment.

Question: Scaling law behavior with respect to data size.

Reply: Thoughtful comment! We fully agree with the reviewer’s observation that developing such models is critically important. This aligns precisely with our objective: to establish a general model that can generalize across initial conditions (ICs), Reynolds numbers, external forces, and larger domains, even under limited training data regimes. Regarding the reviewer’s suggestion to include a scaling curve, we think this is an excellent point. In response, we have worked tirelessly to test our model (in comparison with the baseline PeRCNN) on different numbers of trajectory datasets. And we also produce the Figure S3 in Appendix on page 20 with disscussion. We indeed observed this phenomenon: Our testing results demonstrate that the model exhibits a scaling law behavior, with the RMSE gradually decreasing as the amount of training data increases. Our model adheres to this scaling law as well. Moreover, even in scenarios with limited data regimes (e.g., only five trajectories), the model achieves a low level of error. (see Appendix Section C.3 on page 20). The corresponding results have also been listed in the following Table A.

Table A. Comparison of PeRCNN and MultiPDENet across various training set sizes on the Burgers equation.

Data volume	$2^9$	$2^{10}$	$2^{11}$	$2^{12}$	$2^{13}$
PeRCNN (Testing RMSE)	0.0967± 0.0395	0.0903± 0.0263	0.0701± 0.0152	0.0509±0.0165	0.0383± 0.0189
MultiPDENet (Testing RMSE)	0.0057± 0.0018	0.0049±0.0015	0.0046±0.0019	0.0034± 0.0016	0.0029±0.0007

Concluding Remark: We sincerely appreciate your thoughtful comments and suggestions, which have been helpful in improving our paper. Your consideration of updating the score is highly appreciated! We look forward to your reply. Thank you!

2024-11-29

Thanks for the additional information. I acknowledge I have read it.

审稿意见

评分: 8置信度: 42024-11-03

The paper introduces MultiPDENet, a novel neural network architecture designed to accelerate simulations of fluid dynamics by integrating machine learning with partial differential equation (PDE) constraints. The architecture uses a multi-scale time-stepping approach and a learnable convolutional filter that embeds physical equations directly into the network, enhancing both accuracy and computational efficiency. The numerical findings of MultiPDENet demonstrate the accuracy and generalizability across different initial conditions, Reynolds numbers, and external forces. The inclusion of correction blocks enhances its long-term stability and robustness. An ablation study validates the effectiveness of the model’s components, highlighting its potential as a powerful tool for fluid simulations and complex PDE-constrained problems. Overall, this paper is well-written and could be a significant algorithmic contribution. I am willing to recommend accept. However, there are still some comments and questions need to be clarified or further discussion, see the weakness and questions.

优点

Originality: the paper introduces an innovative approach by integrating a physics-informed neural network (PINN) with a multi-scale time-stepping mechanism and a learnable filter, making it distinct from traditional numerical methods and existing data-driven models. This combination of deep learning and PDE constraints is a novel contribution that enhances the model's interpretability and applicability in fluid dynamics simulations.
Quality: this work is thorough, with comprehensive experiments demonstrating MultiPDENet’s performance across various PDE systems. The detailed ablation study provides strong evidence that each component of the model contributes meaningfully to its success. The results are rigorously compared against baseline methods, showcasing significant improvements in accuracy and computational efficiency.
Clarity: the paper is well-structured, providing clear explanations of the methodology, architecture, and experimental setup. The figures and tables are informative and support the textual content effectively. The presentation of the multi-scale time-stepping framework and the incorporation of physics into the network are clear.
Significance: the findings of the paper have potential implications in the field of computational fluid dynamics and PDE-constrained modeling.

缺点

The model is limited to regular grids due to the use of convolutional operations. This restricts its applicability in cases involving irregular or adaptive grids.
The current study only addresses 1D and 2D problems, which limits its applicability to more complex 3D simulations. Examples of 3D fluid simulations are more convincing.
Although it is mentioned that the method works with sparse data, the paper could elaborate more on the data requirements for practical cases with very noisy or incomplete data.
The inclusion of multi-scale architectures, learnable filters, and correction blocks adds complexity to the model. This could impact the ease of implementation and adoption by other researchers or practitioners, especially when compared to more straightforward neural network models. The author should address this and provide open-source code for reproducibility.

问题

On page 5 line 233, "e divide the neural network..." what is statement?
On page 9 line 478, "This demonstrates our model..." this claim is questionable. The current test has Re = 4000. To demonstrate the capability of this approach for high Re turbulence, a 3D flow with Re of at least 10^4 would be necessary to substantiate this claim.
The training data collected are from uniform timestep (\Delta t). However, many fluid simulations have non-uniform timestep. How does the model react to these scenarios?
The paper mentions future work involving graph neural networks for irregular grids. How feasible is this adaptation, and what challenges might arise in integrating GNNs with the current architecture?
Are there foreseeable issues when extending the current approach to higher dimensions?
How sensitive is the model’s performance to hyperparameters, such as the choice of kernel size for the learnable filter or the learning rate for training?

评论- Reply to Reviewer PGra (Part 2)

2024-11-23

Questions

Q1. Typos in the paper.

Reply: Thanks for your careful reading. we have thoroughly proofread our paper, corrected typos and grammar mistakes, and re-organized the contents to improve the clarity of the paper (marked in red color).

Q2. Claim of the model's effectiveness under high Reynolds numbers.

Reply: We agree with you on your comment "testing the model to deal with much higher Reynolds numbers, e.g., $Re=1\times 10^4$ , is essential to substantiate the claim the model's effectiveness under Reynolds numbers". To be precise, we have changed the stement to be "These results demonstrate the effectiveness of MultiPDENet for higher Reynolds number, e.g., $Re=4000$ , within domain $[0, 2\pi]^2$ " (Section 4.4, Page 8) in the revised paper.

Q3. Dealing with non-uniform timestep data.

Reply: Insightful comment! As you correctly pointed out, our model primarily focuses on datasets with uniform time steps. In most surrogate modeling tasks, uniform timestep data is commonly used. Like many other models in the literature, our model currently does not support non-uniform timestep data.

Q4. Dealing with irrugular mesh grids.

Reply: Please see our reply to W1.

Q5. Extension to dal with high-dimensional problems.

Reply: Please see our reply to W2.

Q6. Sensitivity of the model’s performance to hyperparameters?

reply: In fact, our model architecture is well-designed and not overly sensitive to the choice of hyperparameters. This is primarily evident in the fact that when applying the model to different equations, we don’t make significant changes to the hyperparameters but only perform fine-tuning, such as adjusting the number of FNO modes. However, there are certain guidelines, such as setting the modes to be $N/2 - 2$ , where $N$ denotes the number of grid nodes along a certain direction (e.g., vertical or horizontal). The learning rate for training decays over time, with an initial value of 5e-3 being acceptable. More details of the hyperparameter settings are provided in Appendix Section A.1 (Page 15), Appendix Table S5 (Page 22), and Appendix Table S6 (Page 23) in the revised paper.

Additionally, the size of the trainable filter has a more substantial impact on model performance. Based on our parametric tests (Table B below), a 5x5 symmetric filter yields the best results.

Table B. Performance metrics for filters of varying sizes.

Filter	RMSE	MAE	MNAD	HCT(s)
$3\times3$	0.1310	0.0952	0.0788	0.2102
$5\times5$	0.0057	0.0037	0.0031	1.4
$7\times7$	NaN	NaN	NaN	0.1895

Concluding remark: Once again, we sincerely appreciate your constructive comments. Please feel free to let us know if you have any further questions!

评论- Reply to Reviewer PGra (Part 1)

2024-11-23

Thanks for your constructive comments and suggestions! We have carefully addressed them, and the following responses have been incorporated into the revised paper.

In addition, we have thoroughly proofread our paper, corrected typos and grammar mistakes, and re-organized the contents to improve the clarity of the paper (marked in red color). We believe the presentation has been substantially improved. Please refer to the updated .pdf file.

Weaknesses

W1. Limitation of the model to rugular mesh grids.

Reply: As discussed in the Conclusion section (Section 5 on Page 10) in the revised paper: "The model currently only handles regular grids, due to the limitation of convolution operation used in the model. In the future, we aim to address this issue by incorporating graph neural networks to manage irregular grids."

Our preliminary study shows that graph neural networks coupled with finite volume have the potential to replace the Physics block in our model. However, the development of a learnable graph filter to estimate on-the-fly derivative quantities on irregular mesh graphs for the PDE block becomes a key challenge to address. Thank you for this thoughtful comment, which sets forward our future work!

W2. Extension to 3D (high-dimensional) applications.

Reply: Great comment. Extending our model architecture to 3D space is feasible, but involves two primary challenges:

Symmetry Filter Design: This requires careful design of the 3D symmetric convolution filter in the PDE block to ensure its satisfaction with the Order of Sum Rules [1]. Achieving this demands both theoretical derivations/proof and experimental evaluations to assess its effectiveness.
Computational Efficiency: During rollout training for 3D cases, large memory requirements may arise, necessitating parallelization of the model. However, implementing the model's parallelization across multi-GPUs is challenging and requires some engineering work.

Addressing these issues will be a key focus of our future work. Thank you for this excellent comment!

Reference:

[1] Long et al. PDE-Net: Learning PDEs from data. ICML, 3208–3216, 2018.

W3. Adaptability to incomplete and noisy data.

Reply: Insightful comment! Our model's performance under conditions of reduced data and added noise for the NS dataset is illustrated in Table A below (also see Appendix Table S3 on Page 19 in the revised paper), demonstrating its robustness.

Table A. Performance metrics under different noise levels during training.

Training	RMSE	MAE	MNAD	HCT (s)
- 20% data	0.1935	0.0958	0.0113	8.1392
+ 0.1% noise	0.2083	0.1014	0.0123	8.0431
Normal	0.1379	0.0648	0.0077	8.3566

W4. Reproducibility of the model.

Reply: The key details of our model, including its architecture, hyperparameters for the MiNN and MaNN blocks, the training process, as well as the baseline model implementation, have been provided in the revised paper (in particular, the Appendix Sections, Pages 15-23). Additionally, we will make the source codes publicly available after the peer review for convenient reproducibility.

审稿意见

评分: 3置信度: 42024-11-03

The authors propose a hybrid method for solving PDEs, combining elements of classical numerical methods with learnable parameters. Elements of classical numerical methods include an RK4 time integrator, finite-difference derivatives, and a Poisson solver for pressure. The learned correction block uses a Fourier Neural operator. The problem is tested on 1D and 2D PDEs with periodic boundary conditions. The authors claim that using a learned correction at a coarse time scale improves the prediction error and provides "53x speedup" compared to classical numerical methods.

优点

This paper proposes an architecture to solve PDEs. Overall I cannot rate this paper strongly and am unable to offer a view of strengths because I find much of the presentation unclear.

缺点

The approach here is similar to other hybrid classical+learned approaches. The presentation of the work is rather unclear in my view. It is not explained in detail what the micro (MiNN) and macro (MaNN) blocks do exactly and what the difference is between them. The authors refer to "sparse grids" on occasion, and it is unclear what is meant because the authors also use the separate term "coarse grid".

The claim that this method offers a 53x speedup compared to traditional numerical methods is not backed up by evidence, nor details of how it is measured. Overall, what is here is not sufficiently convincing or compelling.

问题

Please provide details of how the speedup compared to traditional numerical methods is obtained. Are the same resolutions being used for both? Is the same type of device being used, or does this a comparison where one simulation is on CPU and another on GPU? Is accuracy being checked at all for this comparison? Please see https://www.nature.com/articles/s42256-024-00897-5.

评论- Reply to Reviewer NnLJ (Part 1)

2024-11-22

Thanks for your constructive comments and suggestions! We have carefully addressed them, and the following responses have been incorporated into the revised paper.

Clarity

Comment: Unclear presentation.

Reply: Thanks for your feedback. Following your comment, we have thoroughly proofread our paper, corrected typos and grammar mistakes, and re-organized the contents to improve the clarity of the paper. The majority of the paper has been re-written (marked in red color). We believe the presentation has been substantially improved. Please refer to the updated .pdf file.

Weaknesses

W1. The approach here is similar to other hybrid classical+learned approaches.

Reply: Thanks for the comment. Our approach differs fundamentally from the simple combination of classical methods and learning-based techniques, for the following reasons:

The first key distinction is that the numerical solver (aka, the PDEBlock) in our architecture is also learnable. In particular, we employ a neural network to correct the coarse solution on the fly (namely, solution on coarse mesh grids) and design a symmetric convolution kernel to estimate equivalent derivatives, which are able to significantly reduce the PDE residual on coarse grids.
Secondly, we design a two-scale stepping process to mitigate error accumulation for long-term rollout prediction. The synergy between the micro- (a trainable physics block) and macro-scale (a neural network) turns out effective in long-term prediction of spatiotemporal dynamics.
Thirdly, our model has demosntrated excellent generalizability over initial conditions, Reynolds numbers, force terms, and domain sizes. It is capable of dealing with small training data (e.g., only 5 trajectories for the NS example). With extensive experiments, the results show that our model outperforms a number of popular baseline models including neural methods (FNO, DeepONet, U-Net), physics-informed learning methods (PhyFNO, PeRCNN), and hybrid classical-learning approaches (LI, TSM) with notible margins.

Hope this helps clarify your concern.

W2. Clarification of the Terms "Sparse Grid" and "Coarse Grid".

Reply: Excellent comment! The term sparse grid (which is meant to be coarse mesh grid) used in our original paper is indeed inaccurate. We have made coarse grid consistently used throughout the revised paper.

For instance, when simulating Kolmogorov flow at $Re$ = 1000, a high-resolution fine grid (e.g., 2048 $\times$ 2048) is necessary for finite-volume-based DNS to accurately capture the flow physics. In contrast, a coarse grid (e.g., 64 $\times$ 64) is obtained by downsampling the high-resolution grid, which leads to much fewer data points and increased grid spacing. As the grid size increases, the accuracy of derivative approximations, which rely on neighboring points, diminishes. Therefore, classical numerical methods, e.g., those based on finite difference approximations, struggle to produce accurate results on coarse grids. To address this issue, we propose a PDE-embedded network with multiscale time stepping (namely, MultiPDENet), which systematically integrates trainable numerical solver and neural network schemes. It requires only a small number of snapshot data to achieve satisfactory generalizabilities.

评论- Reply to Reviewer NnLJ (Part 2)

2024-11-22

W3. Clarification on MiNN blocks and MaNN blocks.

Reply: Great remark! Since the mesh grid is too coarse, even with micro time stepping, the prediction becomes unstable due to the accumulation of errors, which becomes more pronounced as the number of rollout time steps increases. The MiNN block is used to correct the coarse solution of the PDE block during the micro-scale time stepping. This block is designed as a module with relatively small parameters, such as FNO or DenseCNN [1].

It is worth noting that the Physics block, containing the MiNN block, can operate independently for prediction (see Model C in Table 3, Page 10). However, since the bottleneck of the Physics block remains significant in long-term predictions, we introduce the concept of multi-scale time stepping and employ the Physics block for predictions on a micro time scale. The residuals accumulated from multiple micro steps are combined to form the residual of the macro step.

This same approach is applied to the macro step, where a UNet [2], known for its excellent performance in large time intervals for one-step prediction, is used to correct the errors of the Physics block, thereby enabling the model to maximize its performance in long-term predictions. Tables S5 and S6 (Appendix Section C.2, Page 19) in our revised paper present detailed descriptions of the models used in the MiNN and MaNN blocks, respectively, along with their corresponding hyperparameters.

Although our model involves a micro time scale learnable module, the training data is only supplied at the macro time scale. That is said, the MiNN block prediction is unsupervised, where the predicted features are latent and used as intermediate increments to update the total solution at the macro time scale.

The above contents have been reflected in Section 3.2.4 (Page 5) and Appendix Section C.2 (Page 19) in the revised paper.

References:

[1] Liu, et al. Multi-resolution partial differential equations preserved learning framework for spatiotemporal dynamics. Communications Physics, 7(1):31, 2024.

[2] Jayesh and Brandstetter. Towards multi-spatiotemporal-scale generalized PDE modeling. Transactions on Machine Learning Research, 2023.

评论- We eagerly await your response

2024-11-27

Dear Reviewer NnLJ,

As the rebuttal period has undergone over two weeks, your silence made us anxious. We would like to follow up on our rebuttal to ensure that all your concerns have been adequately addressed. If there are any further questions or points that need discussion, we will be happy to address them. We eagerly await your response.

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- Reply to Reviewer NnLJ (Part 3)

2024-11-22

Questions

Q1. A Quantitative Analysis of Inference Efficiency and Precision in MultiPDENet and Traditional Methods

Reply: Excellent comment! All inference experiments were conducted on a single Nvidia A100 80G GPU. The DNS method was implemented based on the open-source framework JAX-CFD [3], with GPU acceleration. The specific settings for DNS are provided in Table S2 (Page 18). The duration of the inference or simulated trajectory selected is $T$ = 8.4s. The additional comparative results, shown in Table A below, show the inference time, RMSE, and HCT of both our model and DNS under identical computational conditions. It's important to note that our time step is obtained through a 128x downsampling, resulting in a larger time interval and smaller number of time steps compared to the DNS. We have added a detailed description of the relevant content in Appendix Section F.2 (Page 22) in the revised paper.

Table A. Performance Evaluation on the NS Dataset: Inference Time and Accuracy

Case	Method	Timestep	Infer cost(s)	RMSE	HCT(s)
$Re=1000$	DNS 2048	38400	260	0	8.4
$Re=1000$	DNS 1024	19200	135	0.1267	8.4
$Re=1000$	DNS 512	9600	52	0.2674	6.5
$Re=1000$	DNS 64	1200	18	0.7818	2.7
$Re=1000$	MultiPDENet	300	26	0.1379	8.4
$Re=4000$	DNS 4096	76800	1400	0	8.4
$Re=4000$	DNS 1024	19200	136	0.1463	6.8
$Re=4000$	DNS 512	9600	52	0.2860	5.8
$Re=4000$	DNS 128	2400	31	0.8658	3.6
$Re=4000$	MultiPDENet	300	26	0.1685	6.4
$x ∈ [0,4\pi]^2$	DNS 4096	76800	1280	0	8.4
$x ∈ [0,4\pi]^2$	DNS 1024	19200	129	0.4638	6.6
$x ∈ [0,4\pi]^2$	DNS 512	9600	50	0.6166	5.2
$x ∈ [0,4\pi]^2$	DNS 128	2400	30	0.8835	2.3
$x ∈ [0,4\pi]^2$	MultiPDENet	300	26	0.4577	6.7

In addition, we further show the computational efficiency of trained MultiPDENet (vs. DNS 1024) for accelerated flow prediction. For a certain given accuracy (e.g., correlation $\geq 0.8$ ), MultiPDENet achieves $\geq 5\times$ speedup compared with GPU-accelerated DNS (JAX-CFD) as shown in Table B below, where all the tests were performed on a single Nvidia A100 80G GPU. The computational time was recorded for prediction of fluid flows which evolve for a certain duration (aka, the smaller high correlation time between MultiPDENet and DNS 1024, when the prediction correlation reaches 0.8). This has been discussed in the Conclusion section (Page 10) in the revised paper.

Nevertheless, we also would like to clarify that the DNS code used above was implemented in JAX, while our model was programmed in PyTorch. These two platforms have distinct efficiencies even for the same model. Typically, the codes under JAX environment runs much faster compared with PyTorch (up to $6\times$ ) [4]. We anticipate to achieve much higher speedup of our model if also implemented and optimized in JAX, which is, however, out of the scope of this study.

Table B. Computational time for a given accuracy (e.g., correlation $\geq 0.8$ ) on the NS dataset.

Iterm	$Re$ = 1000	$Re$ = 4000	$\mathbf{x} \in$ [0, 4 $\pi$ ] $^2$
DNS 1024	135 s	130 s	133 s
MultiPDENet	26 s	19 s	21 s
Speed up	5 $\times$	7 $\times$	6 $\times$

Reference:

[3] Kochkov, et al. Machine learning–accelerated computational fluid dynamics. PNAS, 2021.

[4] Takamoto, et al. PDEBench: An extensive benchmark for scientific machine learning. NeurIPS, 2022, 35: 1596-1611.

Concluding remark: Once again, we sincerely appreciate your constructive comments. We have thoroughly revised the manuscript according to your suggestions. Looking forward to your feedback!

评论- Request your feedback before the end of the discussion period

2024-11-25

Dear Reviewer NnLJ:

Thank you very much for your time and efforts. Looking forward to your response!

Sincerely,

The Authors

评论- Sincerely looking forward to your feedback

2024-11-26

Dear Reviewer NnLJ,

Again, thanks for your constructive comments. We would like to follow up on our rebuttal to ensure that all concerns have been adequately addressed. If there are any further questions or points that need discussion, we will be happy to address them. Your feedback is invaluable in helping us improve our work, and we eagerly await your response.

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- Your feedback will be much appreciated

2024-11-29

Dear Reviewer NnLJ,

We are sending our fourth reminder to request your feedback. We really don't understand what difficulties are preventing you from responding. This made us feel confused and upset because we highly believe it is the fundamental responsibility for qualified reviewers, who agreed to take the role, to reply to and possibly interact with the authors.

We have spent extensive time carefully addressing each of your comments and suggestions, tirelessly performing additional experiments, and standing guard to wait for your response (because we want to address any single question or concern you might have in a timely manner). We hope the entire process remains rewarding, regardless of whether our paper is finally accepted or not.

Hence, if there are any further questions or points that need discussion, we will be happy to address them. We eagerly await your response!

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- Last call for your feedback

2024-12-03

Dear Reviewer NnLJ,

Since the discussion period is ending very soon, we kindly call for your feedback. We believe your comments and concerns have been fully addressed (please refer to our point-to-point replies with further clarification and additional experimental results). Please let us know if you have any additional questions.

Thank you very much!

Best regards,

The Authors

评论- Request your final feedback

2024-12-03

Dear Reviewer NnLJ,

With only a few hours left until the rebuttal period ends, we would like to ask if you are satisfied with our response? We would greatly appreciate it if you could re-evaluate our paper and consider raising your score. Thank you!

Best regards,

The Authors

审稿意见

评分: 6置信度: 32024-11-03

This paper presents MultiPDENet, a physics-constrained neural network framework developed to accelerate the simulation of complex spatiotemporal dynamical systems, particularly for fluid dynamics applications. The paper addresses two key challenges: long-horizon prediction and generalization. To tackle these issues, MultiPDENet combines numerical schemes with deep learning through a multiscale time-stepping approach, which includes both fine-scale and coarse-scale predictions, effectively mitigating temporal error accumulation in long-term forecasting.

[-] Claims on Spatial Derivative Accuracy (Abstract, line 019): The authors state that MultiPDENet "enables accurate estimation of spatial derivatives on coarse grids." However, there is no theoretical analysis or empirical evidence provided in the main text or experiments to support this claim. Additional analysis or experimental validation is needed to substantiate this assertion. Actually, this claim appears many times in the paper.

[-] Oversimplification of Method Section (Section 3.2): The method description in Section 3.2 is oversimplified, with essential details, including the Poisson and correction blocks, relegated to the appendix. These blocks are central to the model's architecture and should be introduced in the main text to improve readability and clarity. Readers currently need to constantly refer to the appendix for critical information about the methodology.

[-] Computational and Memory Cost Analysis: Since the network architecture includes the computation of partial derivatives (especially second-order derivatives), this likely increases computational complexity and memory requirements. To provide a more comprehensive evaluation, the authors should compare the time and memory costs of MultiPDENet with those of other baseline models, during both training and testing. Additionally, reporting the parameter count for each method would help control variables, clarifying whether the performance gains result from the proposed architecture or from increased computational resources and model size.

[-] Framework Independence and Network Architecture Variability: Given that the proposed framework is architecture-agnostic, it would be beneficial for the authors to demonstrate that MultiPDENet performs well with different backbone architectures (such as FNO, UNet, or Transformers). This would further validate the framework's flexibility and effectiveness across architectures. As operator learning progresses rapidly, incorporating more recent models, such as transformer-based operators, as baselines would strengthen the paper's relevance and rigor.

优点

See above

缺点

See above

问题

See above

评论- Reply to Reviewer 6Waj (Part 1)

2024-11-22

Thanks for your constructive comments and suggestions! We have carefully addressed them, and the following responses have been incorporated into the revised paper.

In addition, we have thoroughly proofread our paper, corrected typos and grammar mistakes, and re-organized the contents to improve the clarity of the paper. The majority of the paper has been re-written (marked in red color). We believe the presentation has been substantially improved. Please refer to the updated .pdf file.

Questions

Q1. Claims on spatial derivative accuracy.

Reply: Great question! In fact, it does not learn the traditional finite difference (FD) filter, as the entire study is based on coarse grids, aiming to derive an equivalent expression for the derivative on these grids. Consequently, the learned FD filter differs from the traditional FD filter. This filter is designed to approximate an equivalent derivative, focusing on minimizing the overall PDE residual rather than matching each derivative precisely to its ground truth on coarse grids. Therefore, the FD filter learned by our model differs from the traditional FD filter. By satisfying the Order of Sum Rules [1], this filter can achieve up to fourth-order accuracy in approximating the derivatives through the optimization of trainable parameters.

We have include such a discussion in Section 3.2.3 (Page 5) in the revised paper.

Reference:

[1] Long et al. Pde-net: Learning pdes from data. ICML, pp. 3208–3216, 2018

Q2. Oversimplification of Method Section 3.2.

Reply: Thank you for this great comment. We have almost completely re-written the Method Section 3.2 (Pages 4-5) in the revised paper. The equations have been re-formulated and the statements have been rephrased. Please refer to the updated paper for details.

Q3. Analysis of computational and memory costs of MultiPDENet and baselines based on concerns about derivative computation complexity.

Reply: Great suggestion! In fact, the cost of derivative calculations within the network is minimal. All derivative calculations are batch-parallel, and the operation utilizes a CNN filter with a 5x5 convolution kernel, where each first-order and second-order derivative requires only a single convolution. Additionally, the weights are shared during the operation, ensuring that convolution does not become a bottleneck due to memory constraints in the network architecture. During training, a baseline model with comparable settings (e.g., the number of parameters, memory usage) was selected for comparison, as detailed in Table A below.

Table A. Comparison of Model Parameters, Training, Inference, and Memory Usage.

Model	# of Parameters ↓	Train. time ↓	Train. epochs	Infer. time ↓	Memory ↓
UNet	$6.1309×10^8$	644 s/epoch	1000	7 s	68.9G
FNO	$2.3501×10^8$	122 s/epoch	1000	5 s	72.1G
LI	$5.9652×10^7$	266 s/epoch	1000	9 s	71.5G
TSM	$7.3361×10^7$	346 s/epoch	1000	9 s	72.6G
DeepONet	$2.6371×10^8$	0.8s/epoch	20000	1 s	65.7G
MultiPDENet	$1.6934×10^8$	200 s/epoch	1000	26 s	72.3G

评论- Reply to Reviewer 6Waj (Part 2)

2024-11-22

Q4. Variability analysis of the model architecture.

Reply: Great comment! To investigate the role of the NN blocks in our model, we followed your suggestion and conducted additional comparative experiments with the following configurations:

Model-a: the MiNN block was set to UNet and the MaNN block to FNO;
Model-b: both the MiNN block and the MaNN block were set to FNO;
Model-c: both blocks were set to FNO with roll-out training applied at the macro step (with an unrolled step size of 8);
Model-d: the MiNN block was set to DenseCNN and the MaNN block to UNet;
Model-e: the MiNN block was set to FNO while the MaNN block was set to Swin Transformer [2].

All other experimental settings were kept consistent, and the results are presented in Table A. Model-a and Model-b encountered NaN values, which can be attributed to the MaNN block’s requirement for a robust model capable of making accurate predictions at the macro step. Without such a model, multi-step roll-out training (as in Model-c) is necessary to enhance the model’s stability for long-term predictions. When a strong predictive module is employed at the macro step (e.g., UNet), the MiNN block can be replaced with a more parameter-efficient model, such as DenseCNN (Model-d). Setting the MaNN block to Swin Transformer resulted in a slight decrease in accuracy, likely due to the relatively small size of our dataset, as Swin Transformer typically excels on larger datasets.

Table A. Performance metrics for the scalability of the NN blocks.

Model	RMSE	MAE	MNAD	HCT
Model-a	NaN	NaN	NaN	0.8846
Model-b	NaN	NaN	NaN	5.2930
Model-c	0.2575	0.1507	0.0191	7.2930
Model-d	0.1564	0.0703	0.0083	8.0525
Model-e	0.2479	0.1242	0.0197	7.6346
MultiPDENet	0.1379	0.0648	0.0077	8.3566

The new experiments and results have also been included in Appendix Section C.2 on Page 19 in the revised paper.

Concluding remark: Once again, we sincerely appreciate your constructive comments. Please feel free to let us know if you have any further questions. Looking forward to your feedback!

Reference:

[2] Liu, et al. Swin transformer: Hierarchical vision transformer using shifted windows. CVPR, 10012–10022, 2021.

评论- Request your feedback before the end of the discussion period

2024-11-25

Dear Reviewer 6Waj:

Thank you very much for your time and efforts. Looking forward to your response!

Sincerely,

The Authors

评论- Sincerely looking forward to your feedback

2024-11-26

Dear Reviewer 6Waj,

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- We eagerly await your response

2024-11-27

Dear Reviewer 6Waj,

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- Your feedback will be much appreciated

2024-11-29

Dear Reviewer 6Waj,

We are sending our fourth reminder to request your feedback. We really don't understand what difficulties are preventing you from responding. This made us feel confused and upset because we highly believe it is the fundamental responsibility for qualified reviewers, who agreed to take the role, to reply to and possibly interact with the authors.

Hence, if there are any further questions or points that need discussion, we will be happy to address them. We eagerly await your response!

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- Last call for your feedback

2024-12-03

Dear Reviewer 6Waj,

Thank you very much!

Best regards,

The Authors

评论- Request your final feedback

2024-12-03

Dear Reviewer 6Waj,

Best regards,

The Authors

审稿意见

评分: 6置信度: 52024-11-04

This paper proposes an interesting hybrid neural learner for fluid problems. The neural network builds the PDE in the pipeline, instead of using it as a loss function. A time-stepping scheme is applied explicitly instead of applying the NN directly in an autoregressive fashion.

优点

The method proposed is pretty straightforward to implement and uses the physics (PDE) in a good way, not just simply throwing the PDE residual into a loss function and calling it for a day. The numerics are competitive among end-to-end models.

缺点

The writings are pretty bad, there are many typos, e.g., "spatialemporal". The word choices and phrasing definitely read pretty weird at places, for example, the term "promotion" in Table 2. I felt that this paper could really use proof-reading from a native speaker.
As someone from a background of many years of training in numerical analysis, I would not name the NN in consideration a "PDE-constrained" NN. The reason is that in traditional numerical methods, when talking about "constraint", it refers to the fact that the method of interest imposes the constraint exactly up to the machine eps. For example, the divergence-free constraint in NS.
These micro and macro blocks $M_iNN$ and $M_aNN$ are not properly defined. After digging into the figures (S1 to be specific) and Section 3.2.4, I still do not know what their exact architectures are. For example, what hyperparams are used in the FNO and UNet? Why not use UNet as $M_iNN$ and FNO as $M_aNN$ ?
$\mathcal{B}$ is not clearly defined either. Given the form of (S8), one would guess it is $\mathcal{F}$ plus the external forcing but (S6) states otherwise.
The Poisson block is essentially a pressure projection scheme, which is nothing new and has been introduced back by Chorin in 1967 and by Shen later in 1980s.
The ablation lacks some in-depth studies, examples include:
- in theory, making the FD operator as learnable filters sounds pretty good, but do the learned parameters actually replicate the FD operator and have a truncation error of four-order accuracy?
- instead of simply removing different components of the model in 4.6, I think it is better to compare the RK4 used with another time stepping scheme (with a less truncation error order) to test whether it is the RK4 adopted really works, or the pipeline is really robust with respect to any time-stepping schemes.
Some missing references on making the FD stencil learnable as a convolution filter, e.g., a paper by Kossaczká-Ehrhardt-Günther and arXiv:2201.01854. There is also arXiv:2003.09573 on using NN as correction to a time stepping scheme.

问题

I don't understand "FD method can yield inaccurate derivatives on sparse grids." What does "sparse grids" mean here?

评论- Reply to Reviewer XzFb (Part 1)

2024-11-22

Thanks for your constructive comments and suggestions! We have carefully addressed them, and the following responses have been incorporated into the revised paper.

Weaknesses

W1. There are some typographical errors and incorrect word usage.

Reply: Thank you for your valuable feedback. We have thoroughly proofread our paper, corrected typos and grammar mistakes, and re-organized the contents to improve the clarity of the paper. The majority of the paper has been re-written (marked in red color). We believe the presentation has been substantially improved. Please refer to the updated .pdf file.

W2. The term "PDE-constrained" in the title is inappropriate.

Reply: Great remark! After careful consideration, we decided to change it to "PDE-embedded". In fact, we integrate the PDE structure directly into the network as a learnable module, enabling it to actively participate in and guide the learning process.

W3a. Explanation of the micro and macro blocks.

The above contents have been reflected in Section 3.2.4 (Page 5) and Appendix Section C.2 (Page 19) in the revised paper.

W3b. Switching U-Net and FNO in micro and macro blocks.

Reply: Excellent suggestion! To investigate the role of the NN blocks in our model, we followed your suggestion and conducted additional comparative experiments with the following configurations:

Model-a: the MiNN block was set to UNet and the MaNN block to FNO;
Model-b: both the MiNN block and the MaNN block were set to FNO;
Model-c: both blocks were set to FNO with roll-out training applied at the macro step (with an unrolled step size of 8);
Model-d: the MiNN block was set to DenseCNN and the MaNN block to UNet;
Model-e: the MiNN block was set to FNO while the MaNN block was set to Swin Transformer [3].

Table A. Performance metrics for the scalability of the NN blocks.

Model	RMSE	MAE	MNAD	HCT
Model-a	NaN	NaN	NaN	0.8846
Model-b	NaN	NaN	NaN	5.2930
Model-c	0.2575	0.1507	0.0191	7.2930
Model-d	0.1564	0.0703	0.0083	8.0525
Model-e	0.2479	0.1242	0.0197	7.6346
MultiPDENet	0.1379	0.0648	0.0077	8.3566

The new experiments and results have also been included in Appendix Section C.2 on Page 19 in the revised paper.

评论- Reply to Reviewer XzFb (Part 2)

2024-11-22

W4. A detailed explanation of $\mathcal{B}$ .

Reply: Great suggestion! $\mathcal{B}$ represents the PDE block that computes the residual of the governing PDEs. It incorporates a learnable filter bank with symmetry constraints, which calculates derivative terms based on the corrected solution produced by a correction block. These terms are then combined into the governing PDEs, a learnable form of $\mathcal{F}$ in the governing PDEs. This process is incorporated into the RK4 integrator (see Appendix Section A.3 on Page 16) for solution update which can be expressed as:

$\mathcal{B}\big(\bar{\mathbf{u}}{_m^k}, \cdots, \boldsymbol{\nabla}{\bar{\mathbf{u}}{_m^k}}, \boldsymbol{\nabla}^2{\bar{\mathbf{u}}{_m^k}}, \cdots;\boldsymbol{\lambda}\big) \leftarrow \mathcal{F}\Big( \bar{\mathbf{u}}{_m^k}, \cdots, \hat{\boldsymbol{\nabla}}{\hat{\bar{\mathbf{u}}}{_m^k}}, \hat{\boldsymbol{\nabla}^2}{\hat{\bar{\mathbf{u}}}{_m^k}}, \cdots;\boldsymbol{\lambda} \Big)$

where $\mathcal{B}$ denotes the PDE block, and $\bar{\mathbf{u}}{_m^k}$ the coarse solution (aka, solution on coarse grids) at micro-scale time $t_k+m\delta t$ . Here, $\hat{\bar{\mathbf{u}}} _m^k$ refers to the neural-corrected state of the coarse solution, which is obtained through the Correction block (see Appendix Section A.1 on Page 15 for details). This corrected state $\hat{\bar{\mathbf{u}}} _m^k$ is used to estimate spatial derivatives, namely, $\hat{\bar{\mathbf{u}}}{_m^k} = `NN`(\bar{\mathbf{u}} _m^k)$ . Note that $\hat{\boldsymbol{\nabla}}$ and $\hat{\boldsymbol{\nabla}}^2$ represent trainable Nabla and Laplace operators, respectively, each consisting of a symmetrically constrained convolution filter, e.g., an enhanced FD kernel to approximate spatial equivalent derivatives. By utilizing the RK4 integrator, we can project the coarse solution to the subsequent micro-scale time step. Despite the reduced resolution causing some information loss, this learnable PDE block enables a closer approximation of the equivalent form of the derivatives on coarse grids. This addition serves as a fully interpretable "white box" element within the overall network structure.

The above contents have been added and reflected in Section 3.2.2 (Page 4) in the revised paper.

W5. Nothing new about the Poisson block (essentially a pressure projection scheme).

Reply: That is true. However, we did not claim the Poisson block as a contribution. We just simply used it in the network since the pressure term is needed when calculating the residue of the NS equations. However, we designed a new trainable symmetric convolution kernel, inspired by the structure of finite difference stencils, to estimate the derivative quantities as required in the Poisson solver. In so doing, the projected pressure field can be more accurately estimated.

W6. Does the learnable FD filter replicate the FD operator or not?

W7. Whether the RK4 adopted really works.

Reply: Great comment! Since the predictions are operated on coarse mesh grids, the temporal marching requires a high accuracy so as to mitigate error accumulation. This is the rational behind why we choose RK4 (with 4th order accuracy) as the time integrator in the Physics block.

Using a residual network is equivalent to applying the forward Euler method. To validate the necessity of RK4, we replaced it with the forward Euler method. As shown in Table C below, this substitution led to a significant performance decline across on the NS dataset. The performance degradation stems from the first-order accuracy of the forward Euler method ( $O(\delta t)$ ) and its susceptibility to instability from error accumulation, making it unsuitable for multi-step rollout prediction. In contrast, we employ the RK4 scheme, which provides a much higher temporal accuracy ( $O(\delta t^4)$ ). Implementing RK4 with a micro time step on coarse mesh grid adds minimal computational overhead. These results are also presented in Table 3 (Page 10), labeled as Model I, in the revised paper.

Table C. Additional ablation studies on NS dataset.

Model	RMSE	MAE	MNAD	HCT
MultiPDENet-Euler	0.4357	0.2321	0.0278	6.2481
MultiPDENet-RK4	0.1379	0.0648	0.0077	8.3566

评论- Reply to Reviewer XzFb (Part 3)

2024-11-22

W8. More reference for learnable FD stencils as a convolution filter.

Reply: Thanks for recommending these papers! We have included them in the Related Works section (Section 2 on Page 3). Notably, the Deep Euler method is designed to solve ODEs, which differs from the problem we are addressing. Moreover, such a method does not explore multi-scale time-stepping in prediction.

Questions

Q1. Detailed explanation of the reasons why finite difference methods can produce inaccurate derivative approximations when applied to sparse grids.

Reply: We would like to first clarify that the term sparse grid means coarse mesh grid. For instance, when simulating Kolmogorov flow at $Re$ = 1000, a high-resolution fine grid (e.g., 2048 $\times$ 2048) is necessary for finite-volume-based DNS to accurately capture the flow physics. In contrast, a coarse grid (e.g., 64 $\times$ 64) is obtained by downsampling the high-resolution grid, which leads to much fewer data points and increased grid spacing. As the grid size increases, the accuracy of derivative approximations, which rely on neighboring points, diminishes. Therefore, classical numerical methods, e.g., those based on finite difference approximations, sruggle to produce accurate results on coarse grids. To address this issue, we propose a PDE-embedded network with multiscale time stepping (namely, MultiPDENet), which systematically integrates trainable numerical solver and neural network schemes. It requires only a small number of snapshot data to achieve satisfactory generalizabilities.

Concluding remark: Once again, we sincerely appreciate your constructive comments. We have thoroughly revised the manuscript according to your suggestions. Looking forward to your feedback!

References:

[1] Liu, et al. Multi-resolution partial differential equations preserved learning framework for spatiotemporal dynamics. Communications Physics, 7(1):31, 2024.

[2] Jayesh and Brandstetter. Towards multi-spatiotemporal-scale generalized PDE modeling. Transactions on Machine Learning Research, 2023.

[3] Liu, et al. Swin transformer: Hierarchical vision transformer using shifted windows. CVPR, 10012–10022, 2021.

评论- Request your feedback before the end of the discussion period

2024-11-25

Dear Reviewer XzFb:

Thank you very much for your time and efforts. Looking forward to your response!

Sincerely,

The Authors

评论- Sincerely looking forward to your feedback

2024-11-26

Dear Reviewer XzFb,

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- We eagerly await your response

2024-11-27

Dear Reviewer XzFb,

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- Your feedback will be much appreciated

2024-11-29

Dear Reviewer XzFb,

We are sending our fourth reminder to request your feedback. We really don't understand what difficulties are preventing you from responding. This made us feel confused and upset because we highly believe it is the fundamental responsibility for qualified reviewers, who agreed to take the role, to reply to and possibly interact with the authors.

Hence, if there are any further questions or points that need discussion, we will be happy to address them. We eagerly await your response!

Thank you very much for your time and consideration.

Best regards,

The Authors

2024-12-02

Given the effort the authors tried to address my questions in the revision, I have raised my score from 5 to 6. However, there are still some errors or misleading claims presented in the submission, and in the rebuttal. To name a few:

Notation-wise, it is still kinda bad after the revision.
- " $Re = \\{500, 1600, 2000\\}$ ", a scalar should not be a set.
- The reaction-diffusion system in consideration is not "highly" nonlinear, its nonlinearity is "semi-linear", which is THE easiest nonlinearity to be studied.
- The acronym for Navier-Stokes equation is usually NSE not NS.
- A spatial "domain" should be open in all contexts of PDEs (because derivatives are taken). All the notations are wrong here.
- If "sparsity" refers to the grid being coarse, then every term "sparse training data" should be changed to "coarse-grid training data".
The comment in the rebuttal, regarding the usage of finer grid vs coarser grid in simulating NSE, is utterly wrong. The grid size is problem-dependent. For example, it is totally okay to use 64x64 grid to resolve the vortices (e.g., see Benjamin and Denny's 1979 JCP paper, the $Re=10^4$ case can be resolved on a 151x151 grid).

BTW: it was Thanksgiving break starting last Wednesday (and some institutes have a whole week break) here, sorry for the delay in response.

评论- Thank you for your positive feedback

2024-12-03

Thanks for your positive feedback and raising the score.

Comment 1: The notations used in the article require further revision.

Reply: Thanks for your careful reading. Following your suggestion, we have made the following revisions to the paper (which will be included in the final version):

Removed the set notation for $Re$ , e.g., $Re = 500, 800, 1600, 2000$ .
Replaced "highly nonlinear" with "nonlinear".
Updated the acronym for the Navier-Stokes Equation to "NSE" throughout the paper.
Used the open notation for spatial domain, e.g., $(0, 2\pi)^2$ .
Replaced all instances of "sparse" with "coarse" and defined it as "spatiotemporal low-resolution".

Comment 2: The usage of finer grid vs coarser grid in simulating NSE.

Reply: Insightful comment! We agree with you. However, we would like to clarify that the finer grid mesh is required for the specific problem we are considering.

Concluding Remark: Again, Thank you for your valuable feedback, which helps improve the clarity of our work. Thank you!

审稿意见

评分: 5置信度: 32024-11-04

The authors propose to learn fluid simulation with multi-step predictions.

优点

Using numerical integrator with neural networks is an interesting approach.
The reported experimental results show improvement over the reported baselines.

缺点

The multi-step prediction naturally makes the network to be much deeper. For example, the computation will go through at least 4x layers with 4 micro step prediction, albeit the weight sharing. However, the benchmarked baselines are only single networks which are not as deep. It has been shown that using more stacks of shared-weight networks can greatly increase the performance [1, 2]. Given the big improvement in the reported metrics, the authors need to compare their methods with stronger baselines with more layers.
Another related point is regarding the numerical integration. The numerical integration is naturally similar to the simple residual network. The authors need to benchmark the proposed solution with a network with the same amount of network blocks but without using the RK4 integration, e.g., by stacking them or using the simple residual connections.
Since the paper focuses on flow problem, it can benefit from testing larger flow benchmarks, such as PDEArena [3].

[1] Tran, Alasdair, et al. "Factorized fourier neural operators." ICLR 2023.

[2] Zhang, Xuan, et al. "SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations." ICLR 2024.

[3] Gupta, Jayesh K., and Johannes Brandstetter. "Towards Multi-spatiotemporal-scale Generalized PDE Modeling." Transactions on Machine Learning Research.

问题

For Eq. 3, does it mean that inside each PDE block there will be multiple network evaluation to perform the RK4 integration? E.g., there will be multiple steps for each micro step.
For Eq. 2, how are the derivative quantities also predicted by the network at the coarser grid? Where is the poisson block defined in Eq. 2?

评论- Reply to Reviewer 925D (Part 2)

2024-11-22

Questions

Q1. Inside each PDE block, will multiple network evaluations be performed to carry out the RK4 integration?

Reply: Good question! Once the model is trained, the inference of the network takes four RK4 roullout steps for the PDE block to perform calculations at the micro-steps. However, the PDE block shares weights, similar to the temporal unfolding of a recurrent neural network. It is important to highlight that no labeled data is involved in the calculations at the micro steps when training the model (e.g., unsupervised). Labeled data is only used for supervising the macro-step predictions. To validate the necessity of RK4, we have conducted an experiment where RK4 was replaced with the forward Euler method. This has been discussed in detail in our response to Weakness 2.

Q2. How are the derivative quantities also predicted by the network at the coarser grid.

Reply: Great question! First, the Correction block is designed to mitigate information loss caused by resolution reduction during derivative computation, allowing the model to effectively operate on the coarse grid. During training, it acts as a scaling factor for the derivative term, which is estimated by a shallow network with only two layers (as explained in Appendix Section A.1 on Page 17).

Next, for the corrected flow field, we designed a trainable symmetric convolution kernel (depicted in Figure 2 on Page 5), which leverages the symmetric properties of the finite difference (FD) kernel structure, to compute equivalent derivatives. Rather than matching each derivative exactly to its ground truth on coarse grids, this approach aims to minimize the overall PDE residual.

In summary, the derivative quantities are predicted using the trainable symmetric convolution kernel based on corrected coarse solution.

Q3. Where is the poisson block defined in Eq. 2?

Reply: Thanks for your question. To improve the clarity, we have moved the description of the Poisson block to Section 3.2.2 on Page 5 in the main text (also shown below). Its architecture is shown in Appendix Figure S1(a) on Page 15.

"In solving incompressible NS equations, the pressure term, $p$ , is obtained by solving an associated Poisson equation. To compute the pressure field, we implemented a specialized pressure-solving module shown in Figure 1(a). This module solves the Poisson equation, $\Delta p = \psi(\mathbf{u})$ , where $\psi(\mathbf{u}) = 2 \left(u_xv_y - u_yv_x\right)$ for 2D problems (the subscripts indicate the spatial derivatives along $x or$ y directions). To compute the pressure, we employ a spectral method (Poisson solver) based on $\psi(\bar{\mathbf{u}} _m^k)$ to calculate $\bar{p}{_m^k}$ . As shown in Figure S1(b), this approach dynamically estimates the pressure field from the velocity inputs, removing the need for labeled pressure data."

Concluding remark: Once again, we sincerely appreciate your constructive comments. We have thoroughly revised the manuscript according to your suggestions. Looking forward to your feedback!

评论- Reply to Reviewer 925D (Part 1)

2024-11-22

Thanks for your constructive comments and suggestions! The following responses have been incorporated into the revised paper.

In addition, we have thoroughly proofread our paper, corrected typos and grammar mistakes, and re-organized the contents to improve the clarity of the paper. The majority of the paper has been re-written (marked in red color). We believe the presentation has been substantially improved. Please refer to the updated .pdf file.

Weaknesses

W1. MultiPDENet uses 4 micro steps for prediction. The benchmarked baselines are too simplistic and require more complexity.

Reply: Interesting comment! Firstly, we would like to clarify that many modules in our MultiPDENet model, such as the Physics block, share weights and remain at a fixed depth. Our network expands temporally, similar to a recurrent neural network, without increasing the depth or layers. As shown in Table A, our model maintains a relatively low number of parameters compared to the baselines. It's important to note that the micro-step calculations, which do not utilize labeled data (unsupervised), produce intermediate latent states as their outputs that are then used as increments to update the total solution at the macro time scale.

Table A. Comparison of Model Parameters, Training, Inference, and Memory Usage.

Model	# of Parameters	Train. time	Train. epochs	Infer. time	Memory ↓
UNet	$6.1309×10^8$	644 s/epoch	1000	7 s	68.9G
FNO	$2.3501×10^8$	122 s/epoch	1000	5 s	72.1G
LI	$5.9652×10^7$	266 s/epoch	1000	9 s	71.5G
TSM	$7.3361×10^7$	346 s/epoch	1000	9 s	72.6G
DeepONet	$2.6371×10^8$	0.8s/epoch	20000	1 s	65.7G
MultiPDENet	$1.6934×10^8$	200 s/epoch	1000	26 s	72.3G

Following your comment, we designed an autoregressive FNO (ARFNO) for comparison, where the rollout of $m$ steps is denoted as ARFNO-m. As shown in Table B, increasing the number of rollout steps can improve performance to a certain extent, but beyond a certain point, the positive effect diminishes. Despite optimization efforts, the performance of ARFNO remains significantly inferior to that of MultiPDENet.

Table B. Additional experimental comparisons between baseline and MultiPDENet

Model	RMSE	MAE	MNAD	HCT
FNO	1.0100	0.7319	0.0887	2.5749
ARFNO-10	0.9505	0.6331	0.0779	3.2509
ARFNO-15	0.9479	0.6113	0.0745	3.1416
ARFNO-50	0.9913	0.6899	0.0843	2.6984
MultiPDENet	0.1379	0.0648	0.0077	8.3566

W2. The same number of Physics Blocks are stacked using residual connections, and compared with RK4.

Reply: Great suggestion! Since the predictions are operated on coarse mesh grids, the temporal marching requires a high accuracy so as to mitigate error accumulation. This is the rational behind why we chose RK4 (with 4th-order accuracy) as the time integrator in the Physics block.

Using a residual network is equivalent to applying the forward Euler method. To validate the necessity of RK4, we replaced it with the forward Euler method. As shown in Table C, this substitution led to a significant performance decline across on the NS dataset. The performance degradation stems from the first-order accuracy of the forward Euler method ( $O(\delta t)$ ) and its susceptibility to instability from error accumulation, making it unsuitable for multi-step rollout prediction. In contrast, we employ the RK4 scheme, which provides a much higher temporal accuracy ( $O(\delta t^4)$ ). Implementing RK4 with a micro time step on a coarse mesh grid adds minimal computational overhead. These results are also presented in Table 3 (Page 10), labeled as Model I, in the revised paper.

Table C. Additional ablation studies on NS dataset.

Model	RMSE	MAE	MNAD	HCT
MultiPDENet-Euler	0.4357	0.2321	0.0278	6.2481
MultiPDENet-RK4	0.1379	0.0648	0.0077	8.3566

W3. Since the paper focuses on flow problem, it can benefit from testing larger flow benchmarks, such as PDEArena.

Reply: Excellent remark! In fact, the Kolmogorov flow with $Re = 1000$ in the domain of $[0, 2\pi]^2$ is a key example in the PDE arena. Additionally, we extended our experiments to a much higher Reynolds number of $Re$ =4000 (see Section 4.4 on Page 8) and explored flow prediction within larger domains (see Section 4.5 on Page 9) to evaluate our model's generalization capabilities. These cases already represent very challenging fluid dynamics scenarios. Notably, our model exhibited strong generalization performance across different Reynolds numbers and external force terms (see Section 4.3 on Page 8). However, your suggestion sets forward our future work on extending our model to predict 3D compressive fluid flows in PDEArena. We really appreciate it!

2024-11-27

Thank you for the rebuttal and additional results. Although the custom ARFNO baseline is a deep model, it only slightly improves upon FNO. However, as demonstrated in the factorized-FNO paper mentioned in the review, a deep stack of FNOs can significantly improve the performance even the weights are shared among blocks. As a result, I feel the baseline results are still rather weak. In fact, by the depth of the network, I am referring to the number of neural network layers the computation goes through, regardless of whether the weights are shared or not (I am not sure whether the this quantity is kept the same for the newly added Euler baseline). Although I feel the paper may have potential, I think stronger baselines and more detailed ablations could be important. I will keep my rating.

评论- Reply to additional comments from Reviewer 925D (part 1)

2024-11-27

Thanks for your addtional comments.

Q1: Modify the FNO to be stacked or more layers, making the Baseline model deeper.

Reply: We would like to clarify that our primary objective is to build a model achieving various generalization capabilities (e.g., to initial conditions (IC), external forces, and Reynolds numbers) using very limited and sparse training data in both spatial and temporal domains. To this end, we conducted extensive experiments on different problems (e.g., KdV, Burgers, and NS) in comprison with multiple representative baselines, and found that the data-driven baseline models struggle to achieve good generalizability in small training data regimes (e.g., a few down-sampled trajectories). Hence, we would like to draw your attention that, regardless of how complex the FNO is (e.g., with many stacked layers), it will never succeed due to its high demand of rich and diverse training data.

To further demonstrate the above arguement for you, we worked tirelessly and conducted additional experiments by stacking 4 and 6 blocks of FNO, where the weights are not shared among the blocks. These configurations are referred to as Stack-FNO-4 and Stack-FNO-6, respectively. The results are summarized in Table A below. We acknowledge that FNO or a deep stack of FNOs may significantly improve their representation capacities, even when weights are shared among the blocks. However, it is vital to note that this improvement is contingent on the availability of sufficient training data. Without adequate data, deepening the network can lead to negative effects like over-fitting. This point, we believe, is critical to understanding the limitation of such models.

We hope this helps clarify your concern.

Table A. Results of different FNO models.

Ablated Model	RMSE (↓)	MAE (↓)	MNAD (↓)	HCT (s)
Stack-FNO-4	1.2022	0.9256	0.1122	0.6227
Stack-FFNO-4	1.0136	0.8873	0.1068	0.8234
Stack-FNO-6	1.2568	0.9388	0.1231	0.6107
ARFNO-10	0.9505	0.6331	0.0779	3.2509
ARFNO-15	0.9479	0.6113	0.0745	3.1416
ARFNO-50	0.9913	0.6899	0.0843	2.6984
MultiPDENet	0.1379	0.0648	0.0077	8.3566

Q2: Clarification on the Euler time integration method.

Reply: In a single update step ( $u_k$ to $u_{k+1}$ ), the PDE block in our model performs 4 rollouts with the RK4 integration, with each rollout comprising 4 micro-steps (with time interval $\delta t$ ), resulting in a total of 16 steps. To ensure a fair comparison, we applied the same 16 rollouts using the Euler method (with time interval $\delta t /4$ ). The results, presented in Table B below, demonstrate that the Euler method continues to perform poorly.

Table B. Results on different integrators.

Model	RMSE (↓)	MAE (↓)	MNAD (↓)	HCT (s)
MultiPDENet-Euler	0.4357	0.2321	0.0278	6.2481
MultiPDENet-RK4	0.1379	0.0648	0.0077	8.3566

Q3: Weak baseline results.

Reply: After thorough investigation, we have selected 7 baseline models for comparison, considered as representative ones for modeling PDE systems which have appeared in top-tier journals and conferences. These baselines include FNO (ICLR, 2021), PhyFNO, DeepONet (Nature Machine Intelligence, 2021), LI (PNAS, 2021), PeRCNN (Nature Machine Intelligence, 2023), UNet (TMLR, 2023), and TSM (ICLR, 2023). While these models demonstrate excellent performance particularly when the training dataset is rich, they generally perform poorly in our sparse and limited data scenarios. It is clear that our baseline results are comprehensive. Hope this summary clarifies your concern.

评论- Reply to additional comments from Reviewer 925D (part 2)

2024-11-27

Q4: Need more detailed ablation studies.

Reply: We would like to clarify that we have conducted extensive ablation experiments (14 ablation cases). For example, Section 4.6 (page 10) in the paper presents 9 ablated models including (with the results listed in Table C below)：

Model A (no Poisson block);
Model B (no filter structure constraint);
Model C (only Physics block for prediction);
Model D (FD convolution instead of symmetric filter);
Model E (no correction block); (6) Model F (no MiNN block);
Model G (no MaNN block); (8) Model H (no Physics block);
Model I (forward Euler);
the full model.

Table C. Results of the ablation study.

Ablated Model	RMSE (↓)	MAE (↓)	MNAD (↓)	HCT (s)
Model A	0.1601	0.0711	0.0085	7.904
Model B	0.2432	0.1156	0.0137	7.8633
Model C	0.2632	0.1402	0.0186	7.3146
Model D	0.2503	0.1410	0.0145	7.0783
Model E	0.2958	0.1453	0.0180	6.6856
Model F	0.4338	0.2401	0.0285	5.9768
Model G	NaN	NaN	NaN	1.4193
Model H	1.2023	0.9256	0.1122	0.6227
Model I	0.4357	0.2321	0.0278	6.2481
MultiPDENet (full model)	0.1379	0.0648	0.0077	8.3566

To further investigate the role of the NN blocks in our model, we conducted additional ablation experiments with the following 5 ablated model configurations:

Model-a: the MiNN block was set to UNet and the MaNN block to FNO;
Model-b: both the MiNN block and the MaNN block were set to FNO;
Model-c: both blocks were set to FNO with roll-out training applied at the macro step (with an unrolled step size of 8);
Model-d: the MiNN block was set to DenseCNN and the MaNN block to UNet;
Model-e: the MiNN block was set to FNO while the MaNN block was set to Swin Transformer.

The results are presented in Table D and are also included in Section C.2 on page 19 in the paper.

Table D. Performance metrics for different NN blocks.

Model	RMSE	MAE	MNAD	HCT
Model-a	NaN	NaN	NaN	0.8846
Model-b	NaN	NaN	NaN	5.2930
Model-c	0.2575	0.1507	0.0191	7.2930
Model-d	0.1564	0.0703	0.0083	8.0525
Model-e	0.2479	0.1242	0.0197	7.6346
MultiPDENet	0.1379	0.0648	0.0077	8.3566

In addition, we have also conducted 3 ablation experiments on the data size, with the results presented in Table E below.

Table E. RMSE Performance of Different Models with Varying Numbers of Training Trajectories.

Model	3 trajectories	6 trajectories	15 trajectories
UNet	NaN	NaN	NaN
DeepONet	0.4113	0.4071	0.3965
MultiPDENet	0.0573	0.0426	0.0307

Hence, we believe our abalation studies are comprehensive and thorough. Hope the above results help clarify your concern on our ablation studies.

Concluding Remark: We appreciate the reviewer’s additional comments. We sincerely hope to have your re-evaluation of our paper in light of our clarifications above. Your possible consideration of updating the score is highly appreciated!

We look forward to your feedback. Thank you!

评论- Request your feedback on our reply to your additional comments

2024-12-01

Dear Reviewer 925D,

We would like to remind you that our responses to your additional comments have been posted. Your feedback is invaluable in helping us improve our work.

We eagerly await your reply. Thank you!

Best regards,

The Authors

评论- Request your feedback before the end of the discussion period

2024-12-02

Dear Reviewer 925D:

Thank you very much for your time and efforts. Looking forward to your response!

Sincerely,

The Authors

评论- Looking forward to your feedback on our reply to your further comments

2024-11-29

Dear Reviewer 925D,

Again, thanks for your constructive comments, which are very much helpful for improving our paper. Our responses to your additional comments have been posted. If there are any further questions or points that need discussion, we will be happy to address them. Your feedback is invaluable in helping us improve our work, and we eagerly await your feedback.

Your consideration of updating your score will be much appreciated! Thank you.

Best regards,

The Authors

评论- Last call for your feedback

2024-12-03

Dear Reviewer 925D,

Since the discussion period is ending very soon, we kindly call for your feedback on our reply to your additional comment. We believe your comments and concerns have been fully addressed (please refer to our point-to-point replies with further clarification and additional experimental results). Please let us know if you have any additional questions.

Your possible consideration of updating the score is highly appreciated. Thank you very much!

Best regards,

The Authors

评论- Request your final feedback

2024-12-03

Dear Reviewer 925D,

Best regards,

The Authors

评论- Request your feedback before the end of the discussion period

2024-11-25

Dear Reviewer 925D:

Thank you very much for your time and efforts. Looking forward to your response!

Sincerely,

The Authors

评论- Sincerely looking forward to your feedback

2024-11-26

Dear Reviewer 925D,

Thank you very much for your time and consideration.

Best regards,

The Authors

评论- General response

2024-11-22

Dear Reviewers,

We deeply appreciate the insightful and constructive comments from you, which are helpful in improving our paper. We are pleased that all the reviewers recognized the novelty and excellent generalizability of our work. In particular, we thank the reviewers for recognizing the interesting hybrid approach (925D, XzFb), improved results (925D, PGra, YrNc), and novelty (yrNC, PGra) of our method.

In addition, we have thoroughly proofread our paper, corrected typos and grammatical errors, and reorganized the content to enhance its clarity. Substantial revisions, highlighted in red color, have been made in the updated version of the paper (please see the updated .pdf file). A list of the main revisions include:

Re-wrote and re-orgnized the majority of the paper (all sections).
Added additional references to the related work section (Section 2, Page 3).
Provided a clearer description of the model architecture (Section 3.2.1, Page 4).
Refined the explanation of the Physics block (Section 3.2.2, Page 4).
Clarified the description of the NN block (Section 3.2.4, Page 5).
Improved the statements regarding the experiments (Section 4, Page 6).
Analyzed the scalability of the MiNN block and MaNN block (Appendix C.2, Page 18).
Clarified the inference cost of our approach (Appendix F.2, Page 22).
Added additional experimental results, e.g., ablation study, scalability tests, computational cost, etc. (Section 4.6, Page 10; Appendix Section C.2, Page 19; Appendix Section F.2, Pages 22-23).

Thank you once again for your thoughtful reviews. Please do feel free to let us know if you have any further questions.

Best regards,

The Authors of the Paper

评论- Reviewers' Response

2024-11-25

Dear Reviewers,

As the author-reviewer discussion period is approaching its end, I would strongly encourage you to read the authors' responses and acknowledge them, while also checking if your questions/concerns have been appropriately addressed.

This is a crucial step, as it ensures that both reviewers and authors are on the same page, and it also helps us to put your recommendation in perspective.

Thank you again for your time and expertise.

Best,

评论- Thanks to the reviewers

2024-12-03

Dear Reviewers,

We would like to thank your engagement in the author-reviewer discussion period. Special thanks are given to Reviewers PGra, XzFb and yrNc for your positive feedback. Your constructive comments and suggestions are greatly helpful in improving the quality of our paper.

In the meantime, Reviewers 6Waj and NnLJ kept silent during the discussion period, despite our repeated reminders. However, we are confident that your concerns have been fully addressed through detailed clarifications and additional experimental results.

Overall, we appreciate your time and effort placed on reviewing our paper. Thank you very much!

Best regards,

The Authors

AC 元评审

2024-12-21

This paper introduces MultiPDENet, a neural network architecture designed to accelerate fluid dynamic simulations by combining classical numerical methods, such as a multi-scale time-stepping scheme inspired by Runge-Kutta methods, finite-difference derivatives, and a Fourier Neural Operator for learned corrections. This approach embeds physical constraints from partial differential equations (PDEs) directly into the network. The model reportedly demonstrates significant speedups (53x) and improved long-term stability and accuracy compared to both traditional numerical solvers and other neural PDE surrogates. The method is validated on several one- and two-dimensional problems.

The original paper contained several inaccuracies, making it difficult to read. Regarding the content, some reviewers agreed that the paper was not clearly articulated. The authors introduced many different concepts, making it unclear what the individual enhancements actually are (see comment by reviewer 6Waj). Additionally, as mentioned by 6Waj, some parts of the paper seem to lack a logical thread, making the innovations difficult to assess. Using convolutional networks to learn discretizations of differential operators is not a new concept. In addition, the pressure correction step the authors used (as mentioned by reviewer XzFb) is far from new.

One of the paper's main claims, a "53x speedup" over classical methods, appears to have been revised down to 5x. However, further examination of the baselines, raised by many reviewers, suggests that they are still quite weak (see [1]). Solving the Navier-Stokes equations on a torus can be performed extremely fast with spectral methods using a vorticity formulation, as demonstrated in [2], which the authors cite. Extrapolating from Figure 7b to Figure 1 in [2], it appears that similar performance to the authors' 1024x1024 finite volume (FV) solver can be achieved with a 128x128 grid. Therefore, the comparison is misleading, as one needs 64x fewer degrees of freedom (with an extra factor of 2 from the vorticity formulation) and 8x fewer time steps (so we can "expect" the solver to be much faster at that resolution). Additionally, given that the authors also used a spectral Poisson solver for their implementation of the Navier-Stokes emulator, the comparison with an FV-based method seems unfair.

Regarding the claim of long-term stability, the authors primarily present trajectory-wise errors. However, they provide no metrics demonstrating that the trajectories follow the correct statistics for very long rollouts (see [3, 4] for examples of such metrics). As such it is not clear how to assess that claim, given that several new formulation have been develop to tackle the long-term statistics of chaotic/fluid systems.

Furthermore, the overall solver remains subject to the CFL condition, which limits the time steps. Thus, the solver can likely only remain competitive in the coarse-scale regime. This limitation should be clearly acknowledged in the paper's claims unless the authors can provide evidence to the contrary.

Given the numerous issues raised by the reviewers, which the updated version failed to fully address, I recommend rejection.

[1] McGreivy, Nick, and Ammar Hakim. "Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations." Nature Machine Intelligence (2024): 1-14.

[2] Dresdner, Gideon, et al. "Learning to correct spectral methods for simulating turbulent flows." arXiv preprint arXiv:2207.00556 (2022).

[3] Jiang, Ruoxi, et al. "Training neural operators to preserve invariant measures of chaotic attractors." Advances in Neural Information Processing Systems 36 (2024).

[4] Schiff, Yair, et al. "DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems." Forty-first International Conference on Machine Learning (2024).

审稿人讨论附加意见

In general the view from the reviewers was fairly pessimistic. The rebuttal and the updated manuscript made the paper stronger, but several issues with respect to the main claim of beating traditional methods still remain. From my read of the updated paper, the criticism of reviewer NnLJ, was not completely addressed. As the authors still claim to beat by a reasonable factor traditional methods, but as mentioned by reviewer NnLJ, the baseline is rather weak, which puts in question the validity of the claim (which was also raised by other reviewers).

最终决定Reject

2025-01-22

Reject