HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions
摘要
评审与讨论
The manuscript provides a hybrid neural operator approach combining synthetic supervised learning data generated by automatic differentiation with a physics-informed loss. The method is called HyPINO and aims to predict the parameters of a network approximating the solution of a PDE based on a discretization of the data. The method is tested on a variety of two-dimensional PDEs for zero-shot accuracy. Further, it is investigated whether the output of HyPINO can be used as an initialization strategy for physics-informed training. The results compares favorable to several baselines.
优缺点分析
Strengths
- The manuscript addresses an important and timely problem in scientific machine learning.
- The proposed method performs well compared to the considered baselines.
Weaknesses
- The manuscript proposes a whole pipeline, which makes it hard to distill the conceptual and algorithmic innovations. In particular, the combination of supervised and physics-informed losses for neural operators has been around for a couple of years, the method of manufactured solutions is classic, and the hypernetwork HyPINN was used before. As there are many components, an ablation study would provide much improved insight to the contribution of the individual components of the overall pipeline.
- When finetuning the PINN's parameters, Adam is used. However, it is more or less consensus that second-order optimizers are far superior in physics-informed learning, in particular Gauß-Newton type methods.
References
Physics-informed neural operator for learning partial differential equations, Z Li, H Zheng, N Kovachki, D Jin, H Chen, B Liu, K Azizzadenesheli, A Anandkumar, ACM/JMS Journal of Data Science, 2024
Achieving High Accuracy with PINNs via Energy Natural Gradients, J Müller, M Zeinhofer, ICML 2023
Challenges in training PINNs: A loss landscape perspective, P Rathore, W Lei, Z Frangella, L Lu, M Udell, ICML 2024
问题
- Can you point out the key algorithmic innovation(s)? Is this the combination of the existing ideas on neural operators to achieve improved performance?
- Can you incorporate more state-of-the-art optimizers for PINNs when fine-tuning?
- In the subsection regarding the loss function, the supervised loss is not mentioned. However, from my understanding, you are using both a supervised and physics-informed loss during optimization. Can you clarify this?
局限性
yes
最终评判理由
I have raised my score based on the reply, but I am still recommending a rejection.
I believe that the manuscript has potential with many different ideas, but the technical elaboration and scientific testing of the proposed methodologies should be improved.. In particular, to two main methodologies proposed in the manuscript—the use of MMS for NOs and iterative refinements—miss a thorough evaluation, including ablation studies, and are only demonstrated for a specific neural operator.
格式问题
none
We thank the reviewer for taking the time to carefully evaluate our work and for the feedback. We appreciate your insights and address your concerns below.
[Q1:] While our approach builds on prior work, the contributions of HyPINO go significantly beyond a simple combination thereof. Neural operators such as FNOs and CNOs typically require large datasets of high-fidelity simulation data for training. The introduction of the physics-informed loss in PINO reduced this dependency significantly. However, some supervised data is still necessary, as training solely with physics-informed objectives is often unstable due to spectral bias and optimization challenges [1]. This remaining dependency on simulation data is problematic when designing multi-physics solvers, as generating and combining data for diverse physical problems in a single dataset is challenging. This is why we propose to leverage the method of manufactured solutions (MMS) to obtain an (potentially) infinite number of diverse training samples. To the best of our knowledge, HyPINO is the first neural operator trained on a dataset spanning such a diverse set of linear, 2D PDEs with mixed Dirichlet and Neumann boundary conditions on complex domains.
This infinite dataset also allows us to revisit the previously underexplored HyperPINN design, which is a more data-intensive neural operator than CNOs or FNOs because it generates a continuous target function parametrized by a PINN as opposed to the grid-based outputs of CNOs and FNOs. The benefits of HyperPINNs are that (i) the output PINN is continuous and can be evaluated at arbitrary points on the domain, (ii) exact analytical derivatives can be obtained directly via automatic differentiation, and (iii) the computation of these derivatives is computationally inexpensive because the gradients for computing the residual only need to be backpropagated through the lightweight PINN rather than the large Swin Transformer. While previous work has recognized these benefits and used HyperPINNs as backbones of their neural operators, their scope was typically restricted to minor variations of a single equation or physical system (e.g. different source terms for the same PDE, or the same equation on multiple geometries). Through the combination with the diverse, infinite MMS-dataset we are able to unlock the potential and scale HyperPINNs to true multi-physics neural operators.
Furthermore, while HyPINO shows strong zero-shot performance for a diverse set of benchmark PDEs, consistently outperforming other baseline operators, it also naturally supports task-specific fine-tuning. This is done by taking the weights generated by HyPINO as initialization and training the resulting PINN with standard physics-informed objectives using analytical derivatives to compute and minimize the residual. This task-specific fine-tuning is impossible with most other neural operator designs because they do not produce a parameterized solution that is separate from the model.
Finally, we introduce the iterative refinement process, a training-free method that leverages superposition to improve the predictions of physics-informed neural operators for linear PDEs. Given an initial PINN solution from HyPINO, we can compute its PDE residual (i.e. how much it violates the PDE). This residual function can be fed back into the hypernetwork to generate a “delta-PINN” representing a correction to the original solution. Adding the delta-PINN’s predictions to the original PINN’s predictions yields an updated solution that cancels out much of the previous residual. This procedure can be iterated multiple times, thus creating an ensemble of self-correcting networks. This refinement does not require any retraining of the hypernetwork. It is a post-processing technique that uses the trained neural operator model repeatedly in inference mode. To the best of our knowledge, this iterative residual refinement idea with delta-PINNs is also entirely novel and possibly even applicable to other physics-informed neural operator designs, such as PINO, to improve their predictions on linear PDEs.
[1] Goswami, Somdatta, et al. "Physics-informed deep neural operator networks." Machine learning in modeling and simulation: methods and applications. Cham: Springer International Publishing, 2023. 219-254.
[Q2:] We agree that second-order optimizers such as L-BFGS or Gauss-Newton methods were shown to improve convergence in physics-informed training. In fact, many state-of-the-art PINN setups adopt a two-stage scheme with Adam followed by a second-order optimizer. However, in our experiments, the goal was not to reach state-of-the-art performance on each benchmark PDE, but to isolate and assess the impact of HyPINO’s initialization quality. To do so, we deliberately chose a minimalistic and widely used training configuration so that improvements could be directly attributed to the initialization. As shown in Figure 4, HyPINO-initialized PINNs (a) start with lower error, (b) have well-behaved convergence curves, and (c) reach lower errors faster than random and Reptile initializations. While we recognize that techniques such as Sobolev losses, dynamic loss weighting, adaptive sampling, and second-order optimizers can improve PINN performance, our aim was to evaluate HyPINO initializations in a clean and controlled setting. Including such enhancements would have shifted the focus from the quality of the initialization itself and introduced confounding factors that would have made a direct comparison difficult. That said, our method is fully compatible with more advanced PINN training strategies. If desired, these can easily be added on top during downstream adaptation. While we did not run experiments with second-order optimizers or other advanced PINN techniques, the benefits of HyPINO initializations are likely to carry over, as these strategies are complementary and largely independent of the initialization scheme.
[Q3:] Our training objective indeed combines both physics-informed and supervised terms when analytical solutions are available via MMS. Specifically, the loss used for training is , where is the residual loss over interior collocation points, and are Dirichlet and Neumann boundary losses, and is a Sobolev supervised objective computing a Huber loss on function values, first, and second derivatives between the generated PINN and the analytical solution at sampled collocation points. This is detailed in Section 3.4 (Eq. 8). The Sobolev loss [2] was shown to significantly improve the accuracy and generalization of neural networks, including PINNs [3]. However, computing higher-order derivatives required for the Sobolev loss is typically computationally expensive and thus impractical in many applications. In our setting, the necessary derivatives are already obtained as part of the residual evaluation for the physics-informed loss, which allows us to incorporate the Sobolev loss without additional computational overhead.
[2] Czarnecki, Wojciech M., et al. "Sobolev training for neural networks." Advances in neural information processing systems 30 (2017).
[3] Son, Hwijae, et al. "Sobolev training for physics informed neural networks." arXiv preprint arXiv:2101.08932 (2021).
We thank the reviewer again for their comments and hope to have addressed their concerns satisfactorily. If so, we kindly request the reviewer to upgrade their assessment accordingly, while being at their disposal for any further questions about our paper.
Thank you for taking the time to addressing my questions. Where your reply resolves some of my concerns, there are multiple points that remain unclear:
- Q1: I agree that the manuscript makes multiple contributions, ranging from an overall pipeline to data generation to iterative optimization. Where I appreciate the diversity of these contributions, I believe that for a proper scientific evaluation of the individual components, ablation studies should be conducted. These include (but are not limited to):
- Use of MMS data: One contribution is the suggestion of the (classic) method of manufactured solutions for neural operators. However, an evaluation of other common neural operator approaches with such synthetic data is missing.
- Iterative refinement: This is an idea applicable to general neural operators (as you even explain in your reply). Hence, an evaluation of this approach for other types of neural operators would be very natural and is missing so far. Note that the current work on multi-stage networks follows a very similar paradigm of successive approximation. In these works, however, the networks at the individual stages are trained and not produced with a neural operator. Hence, I do not regard this as a competing line of work. However, as it is very close in spirit and received a considerable amount of attention within the scientific machine learning community, I believe it should be referenced.
Multi-stage neural networks: Function approximator of machine precision, Y Wang, CY Lai, Journal of Computational Physics, 2024•Elsevier
- Q2: I agree that the results regarding fine-tuning of HyPINOs look promising. However, for many PDEs, appropriate optimizers—where natural gradients and Gauss-Newton methods seem to be the preferred choice—yield higher accuracy compared to the fine-tuned HyPINOs. To judge the potential of HyPINOs as an initialization strategy for PINNs, they have to be compared using the optimizers suitable for PINNs. Note that second-order optimizers have a vastly different optimization behavior, including their implicit bias, hence, it is unclear how the findings on first-order optimizer generalize. I believe that comparing fine-tuning with direct PINN optimization when both are done with a Gauss-Newton method would improve the overall quality of the manuscript. Note that Gauss-Newton methods are classic and easily implemented for networks of this size, with mulitple implementations available online.
- Q3: Thank you for the clarification. I had initially overlooked that the Sobolev loss includes the ground truth data. Just a minor comment regarding your reply: The Sobolev loss used in your manuscript differs from the Sobolev loss for PINNs in [3].
Overall, I believe that the manuscript has potential with many different ideas, but the technical elaboration and scientific testing of the proposed methodologies should be improved. In particular, to two main methodologies proposed in the manuscript—the use of MMS for NOs and iterative refinements—miss a thorough evaluation and are only demonstrated for a specific neural operator.
[Q2, P1] Our initial choice to evaluate fine-tuning performance using the Adam optimizer was motivated by the fact that it is the most widely used optimizer in the PINN literature. To test whether HyPINO-initializations also benefit second-order optimization, we conducted additional fine-tuning experiments using L-BFGS, which we selected because of its broad adoption and ease of use within PyTorch. All runs used standard L-BFGS hyperparameters without tuning.
Table 1: Iterations required to reach HyPINO MSE (lower is better)
\\begin{array}{|l|c|c|c|c|c|c|c|} \\hline & \\text{HT} & \\text{HZ} & \\text{HZ-G} & \\text{PS-C} & \\text{PS-L} & \\text{PS-G} & \\text{WV} \\\\ \\hline \\text{Random Init} & 4 & 20 & \\text{N/A} & 36 & 34 & 11 & 35 \\\\ \\text{Reptile Init} & 4 & 22 & 211 & 22 & 65 & 9 & 27 \\\\ \\hline \\end{array}On PS-C and PS-L, Reptile requires 22 and 65 L-BFGS steps, respectively, to match HyPINO’s starting error, while random needs 36 and 34. On HZ-G, Reptile takes 211 steps, and random never reaches HyPINO’s initial accuracy.
Table 2: Final MSE after L-BFGS fine-tuning (lower is better)
\\begin{array}{|l|c|c|c|c|c|c|c|} \\hline & \\text{HT} & \\text{HZ} & \\text{HZ-G} & \\text{PS-C} & \\text{PS-L} & \\text{PS-G} & \\text{WV} \\\\ \\hline \\text{Random Init} & 2.93\\mathrm{e}{-9} & \\mathbf{1.15\\mathrm{e}{-7}} & 2.89\\mathrm{e}{-1} & 3.18\\mathrm{e}{-4} & 7.05\\mathrm{e}{-5} & 5.69\\mathrm{e}{-4} & 2.68\\mathrm{e}{-2} \\\\ \\text{Reptile Init} & 2.69\\mathrm{e}{-9} & 2.18\\mathrm{e}{-7} & 3.55\\mathrm{e}{-2} & 9.34\\mathrm{e}{-4} & 8.66\\mathrm{e}{-5} & \\mathbf{5.68\\mathrm{e}{-4}} & \\mathbf{3.80\\mathrm{e}{-4}} \\\\ \\text{HyPINO Init} & \\mathbf{1.62\\mathrm{e}{-9}} & 1.52\\mathrm{e}{-7} & \\mathbf{1.74\\mathrm{e}{-2}} & \\mathbf{8.19\\mathrm{e}{-5}} & \\mathbf{6.87\\mathrm{e}{-5}} & 5.69\\mathrm{e}{-4} & 1.94\\mathrm{e}{-2} \\\\ \\hline \\end{array}The second table shows that HyPINO also performs well after full L-BFGS fine-tuning. It achieves the lowest final MSE on four benchmarks (HT, PS-C, PS-L, and HZ-G), and is competitive on PS-G. Only on WV, Reptile achieves the best result, and on HZ, random slightly outperforms HyPINO. These differences are especially meaningful given the high cost of L-BFGS iterations and show that HyPINO offers an effective initialization. Should the manuscript be accepted, we would be happy to include these tables in the appendix, accompanied by a brief discussion and the corresponding convergence plots. Due to NeurIPS' new rebuttal format, we are unfortunately unable to provide them directly in this response.
We believe that, with the reviewer’s constructive feedback, the manuscript is substantially strengthened. Specifically, we have conducted experiments on iterative PINO refinement and L-BFGS fine-tuning, include a reference to multi-stage networks, pointed to comparisons with baselines trained on our MMS dataset, and clarified the role of the Sobolev loss. We hope this effectively addresses the reviewer’s concerns, demonstrates the scientific relevance of our work (in line with the positive evaluations from the other reviewers) and merits a higher evaluation.
Thank you for your extensive answers, including the clarification that all methods were trained with synthetic data. I have adjusted my score slightly, but am still not recommending acceptance. I believe that in the field of physics-informed learning, a careful, principled development and testing of methodologies is much more valuable than reporting good performance for a very specific pipeline. The manuscript makes changes to many parts of the methodology without carefully investigating the effect of the individual parts, thereby limiting the insight for the development of other methods.
We sincerely thank the reviewer for their time and thoughtful feedback throughout the process, and we value the score increase.
We regret the statement that “the manuscript makes changes to many parts of the methodology without carefully investigating the effect of the individual parts.” Unfortunately, the reviewer does not specify which parts they are referring to. Had these parts been explicitly identified, we might have been able to clarify potential misunderstandings, such as the initial misconceptions regarding the Sobolev loss and whether baselines were trained on our synthetic dataset, or to provide additional results and ablations, such as the iterative refinement and L-BFGS fine-tuning experiments, which were conducted specifically in response to the reviewer’s comments.
In the greater scheme, we appreciate critical yet constructive reviews, as they help make manuscripts stronger. We believe this has been the case here, with the feedback directly leading to clarifications, additional experiments, and a further improvement of our work.
[Q1, P1] We thank the reviewer for raising this point and would like to clarify that all baseline models were trained or fine-tuned on our MMS-generated dataset. As detailed in Section 4.2:
- U-Net, which uses the same encoder as HyPINO but replaces the hypernetwork with a convolutional decoder, was trained from scratch using MMS-labeled data and supervised loss.
- Poseidon, a pretrained neural operator, was fine-tuned on MMS-labeled data, also with a purely supervised objective.
- PINO was trained from scratch using the same hybrid objective (supervised + physics-informed) and curriculum as HyPINO.
Model performances are reported in Table 1 (Section 4.3). We hope this clarifies that the evaluation of established neural operators on our MMS-generated data is not only included, but a core part of our experimental design.
We would also like to comment on the "(classic) method of manufactured solutions" remark, which implicitly conveys that the MMS-based part of our work is not novel or not a significant contribution. While it is true that MMS is a well-established technique for verifying the correct implementation of numerical solvers and their convergence behavior, our contribution is to show that MMS can be used not just to validate, but for training neural operators capable of generalizing across a broad range of 2D PDEs, including linear elliptic, parabolic, and hyperbolic equations with mixed boundary conditions and complex geometries while also being able to quantify the solution error. This required a substantial effort in building a scalable and diverse data-generation pipeline that produces realistic samples and ensures that the resulting neural operator is not only effective on synthetic data, but also capable of transferring to real-world PDE benchmarks, as demonstrated in our evaluations. The data-generation pipeline will be made public so that other researchers can generate datasets for training and evaluating their physics-informed neural operators.
[Q1, P2] We agree that applying the residual-based refinement strategy to other physics-informed neural network designs is an interesting exercise. Therefore, we ran an additional experiment where we applied the same refinement strategy to PINO. The table below shows the performance across benchmarks, where superscripts indicate the number of refinement steps:
\\begin{array}{|c|c|c|c|c|c|c|c|} \\hline \\text{Model} & \\text{HT} & \\text{HZ} & \\text{HZ-G} & \\text{PS-C} & \\text{PS-L} & \\text{PS-G} & \\text{WV} \\\\ \\hline \\text{PINO} & 1.38\\times 10^{-2} & 2.02\\times 10^{-2} & 6.09\\times 10^{-2} & 1.70\\times 10^{-1} & \\mathbf{3.31\\times 10^{-3}} & 3.05\\times 10^{-1} & \\mathbf{2.95\\times 10^{-1}} \\\\ \\text{PINO}^{3} & \\mathbf{1.31\\times 10^{-2}} & 7.22\\times 10^{-3} & \\mathbf{4.61\\times 10^{-2}} & 2.83\\times 10^{-2} & 4.64\\times 10^{-3} & 2.33\\times 10^{-2} & 3.07\\times 10^{-1} \\\\ \\text{PINO}^{10} & 3.90\\times 10^{-2} & \\mathbf{5.09\\times 10^{-3}} & 1.44\\times 10^{-1} & \\mathbf{1.05\\times 10^{-2}} & 1.01\\times 10^{-1} & \\mathbf{1.76\\times 10^{-2}} & 8.53\\times 10^{-1} \\\\ \\hline \\end{array}We observe consistent improvements with refinement, especially on HZ, PS-C, and PS-G. The drop in performance on PS-L is similar to what we saw with HyPINO and is likely related to the small target values. We plan to explore these findings further in future work.
Overall, these results support the idea that residual-based refinement is applicable to physics-informed neural operators in general and not limited to our specific architecture. If accepted, we would be happy to include this table and a short discussion in the appendix.
We also thank the reviewer for pointing out the connection to multi-stage networks. While, as mentioned by the reviewer, those methods train each stage independently, both approaches use iterative, residual-based improvements. We will briefly discuss this line of work and add a citation in Section 3.5.
This paper proposes a novel neural operator architecture that integrates PDE residual loss into the training process, enabling self-supervised learning using only initial and boundary conditions. Inspired by HyperNetworks and Swin Transformers, it introduces a modular and scalable design with enhanced generalization and reduced reliance on labeled PDE solutions. A differentiable sampling strategy allows the model to backpropagate through PDE solvers to generate supervision signals. The method achieves strong performance across several multi-physics PDE benchmarks, outperforming baseline neural operators with significantly fewer training samples.
优缺点分析
The paper presents a novel neural operator framework that combines architectural innovations with a self-supervised PDE-informed training scheme. Its use of PDE residual loss to generate supervision signals from initial and boundary data significantly reduces dependence on labeled PDE solutions, a major practical advancement. The integration of ideas from HyperNetworks and Swin Transformers leads to a scalable and modular design with strong empirical generalization. The method is validated on a diverse set of benchmark PDE problems, demonstrating superior performance with limited data. The paper is clearly written, technically sound, and makes a strong case for its significance and originality.
While the contributions are substantial, several concerns merit further clarification and improvement:
-
Resolution Equivariance Unclear: A core property of neural operators is resolution equivariance—the ability to generalize across discretizations. However, the architecture incorporates Swin Transformers, which operate over fixed local windows. It is unclear whether NEMO can generalize across resolutions, and no experiments directly evaluate this. If the model is not equivariant, it would be more accurate to frame it as a physics-informed transformer rather than a neural operator.
-
Training Efficiency and Complexity: The differentiable sampling approach involves backpropagation through PDE solvers, which could be computationally intensive. A more detailed analysis of training time and computational requirements would help assess the method’s practicality.
-
Ablation and Component Justification: While the architectural design is rich, the paper lacks deeper ablation studies. For instance, what is the contribution of HyperNetwork-style modulation versus Swin-style self-attention? How critical is the differentiable sampling step versus simply adding residual supervision at predefined grid points?
-
Missing Citations: The paper overlooks relevant prior work in PDE pretraining and transformer-based operators. Notable omissions include MPP, DPOT, and PDE-Former (for physics-based pretraining), as well as OFormer, GNOT, UPT, and Transolver (for transformer-based operators). These omissions weaken the related work discussion and make it difficult to fully assess the novelty in context.
问题
-
Resolution Equivariance: Can the method generalize to different spatial resolutions at inference time? Since Swin Transformers rely on fixed-window self-attention, resolution equivariance—a core property of neural operators—may not hold. Please clarify and, if possible, include supporting experiments or theoretical discussion.
-
Ablation Studies: The architecture includes both HyperNetwork modulation and Swin-style attention. Could the authors provide ablation results isolating the effect of each component on performance and generalization?
-
Computational Overhead: Backpropagating through PDE solvers may introduce significant training costs. Please quantify the runtime and memory overhead compared to standard supervised neural operators.
-
Related Work: Consider citing and discussing relevant works such as MPP, DPOT, PDE-Former, OFormer, and Transolver to better contextualize the novelty and contribution of the method.
局限性
Yes
最终评判理由
The authors have addressed my concerns, and I will keep my score.
格式问题
No.
We thank the reviewer for taking the time to carefully evaluate our work and for the constructive feedback. We appreciate your insights and address your points below.
[Q1:] We agree that resolution-equivariance is a key property for neural operators. While the output of HyPINO is a continuous PINN that can be evaluated at arbitrary spatial coordinates, the input PDE parametrization (source function and boundary masks/values) is discretized on a fixed-size grid (224×224) to match the Swin Transformer’s input resolution. Following prior work [1], this limitation can be mitigated by demonstrating test-time resolution invariance when varying the input grid resolution and resizing it to (224x224). We performed this ablation on the Helmholtz benchmark (HZ) by changing the source function resolution between 28 and 448:
\\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \\hline \**Grid Size** & 28 & 56 & 96 & 112 & 140 & 168 & 196 & 224 & 280 & 336 & 392 & 448 \\\\ \\hline \**SMAPE** & 38.04 & 35.78 & 35.91 & 36.00 & 36.05 & 36.05 & 36.05 & 36.04 & 36.05 & 36.03 & 36.04 & 36.04 \\\\ \\hline \\end{array}Between resolutions of 56 and 448, SMAPE varied by less than 0.3, which shows approximate invariance. Only at very coarse resolutions (28x28) does the performance start deteriorating. This experiment can be added to the appendix.
[1] Herde, Maximilian, et al. "Poseidon: Efficient foundation models for pdes." Advances in Neural Information Processing Systems 37 (2024): 72525-72624.
[Q2:] In HyPINO, Swin blocks are used to encode the discretized inputs (i.e., the source fields, domain geometry, and boundary conditions) into a latent representation. To condition this representation on the PDE operator, we apply FiLM layers after each Swin block. These layers modulate the intermediate latent features conditioned on the PDE coefficients . This is a lightweight and common mechanism for global conditioning, consistent with prior work in neural operators [1], and better suited than cross-attention in our setting, because the conditioning signal is a 1D vector rather than a sequence of tokens.
In the decoder, we adopt a hypernetwork-style design: pooled latent features are passed through MLPs to generate the weights and biases of the target PINN.
The Swin encoder, the FiLM-conditioning, and the hypernetwork decoder serve complementary roles, which makes isolated ablation of individual components less straightforward. However, while designing the architecture, we experimented with different Transformer backbones, MLP widths, and activation functions. We found these architectural choices to have relatively minor influence on overall performance compared to the quality of the dataset and the choice of loss functions, such as the addition of the physics-informed loss, whose effectiveness is evidenced by the superior performance of HyPINO over the UNet baseline that shares the same encoder. We therefore mostly adopted standard architectural settings from prior work.
[Q3:] We would like to emphasize that HyPINO does not backpropagate through any external numerical PDE solver. Our framework is entirely neural: HyPINO is a hypernetwork that maps a PDE specification to the weights of a small target PINN . This PINN approximates the PDE solution as a continuous function over the domain. Residuals are computed by applying the differential operator to via automatic differentiation, and the training objective is to minimize the residuals.
At inference time, our residual-based refinement strategy also avoids solver-level differentiation. We compute the residual of the current PINN and pass it through the hypernetwork in forward mode only to obtain a corrective delta-PINN. The updated prediction is then formed by summing the PINN outputs. Only the small PINNs are differentiated to compute residuals whereas the hypernetwork remains frozen.
Regarding computational costs, to obtain the loss of a predicted PINN , we evaluate it on a set of collocation points and compute the necessary first- and second-order derivatives to obtain the residuals. These higher-order derivatives are computed efficiently via automatic differentiation of the small PINN only (3 layers, width 32). The resulting residual loss is then backpropagated to update the hypernetwork. Importantly, this gradient only flows through the Swin-based encoder at the final step after residuals have been computed. Thus, while residuals require evaluating derivatives up to second order, the computationally expensive hypernetwork is only differentiated once per batch.
[Q4:] Thank you for pointing out these relevant works. We will include and discuss them in the related works section of the final version.
This paper proposes a transformer-based hypernetwork architecture for predicting the solutions of PDEs with a target main network, the weights of which are generated by a hypernetwork that takes as input the PDE coefficients, source function, and boundary functions, coded as matrices. The training data contains only manufactured solutions and physics information, and in particular contains no expensive simulations. The target main network is an MLP with Fourier-feature embedding and multiplicative skip connections. An important limitation of the method is that currently it can only handle simple linear PDEs (up to second order).
优缺点分析
Strengths:
- The training data contains no expensive simulations but exploits manufactured solutions and physics-regularization.
- The construction of the training data set is very general, allowing source terms, Dirichlet and Neumann boundary conditions, and complex geometries.
- The main network is a small PINN, which is therefore efficient and meshless.
- The resulting main network can be further fine-tuned and the
- The use of analytical solutions allows the inclusion of a Sobolev loss term, which as far as I know, is novel in the physics-informed machine learning area.
Weaknesses:
- The main obvious limitation is the low-order linear PDE constraint.
- It is difficult to set up a fair comparison against the Poseidon and PINO approaches, as they use different training data, model size and training time needs to be taken into account, etc. Therefore, the results in Table 1 needs to be considered with care.
- The iterative refinement process relies on superposition, so it can't be extended to nonlinear PDEs.
- The novelty in the architecture itself is low, as both hyperPINNs and Swin transformers are known. But this is a minor weakness.
问题
The method appears extensible to nonlinear PDEs, except perhaps for the residual-driven refinement process. Was this attempted, and the results weren't satisfactory? The restriction to low-order linear PDEs is the reason I didn't give 4 to quality.
The way that the binary mask for the boundary conditions is generates needs to be better explained, e.g., I couldn't understand what this means: "a value of 1 to the four grid cells closest to each boundary point and zero elsewhere".
What constitutes a "neural operator" is a bit controversial. Theoretically, they act on infinite-dimensional function spaces, but in practice, they need to be discretized. Approaches in the literature say that a true neural operator must be "discretization-invariant," or "discretization-convergent." Could the authors comment on this issue and how this affects their approach? Have different discretizations (grid sizes) at the input of the hypernetwork been tried at training and testing times?
局限性
Yes.
最终评判理由
I am overall satisfied with the author's responses and will keep the score as is.
格式问题
None noted.
We thank the reviewer for taking the time to carefully evaluate our work and for the constructive feedback. We appreciate your insights and address your points below.
[W2:] All baselines were fine-tuned on our synthetic dataset using a similar training budget (same number of steps, batch size, and optimizer settings). While the UNet and Poseidon are trained without the physics-informed objective, PINO was trained with the same settings as HyPINO. We agree that the FNO architecture in PINO was never intended for the type of PDE parametrization used in this work (i.e., binary indicator grids for the boundaries), which might make it less effective than a Swin Transformer. Conversely, HyPINO must predict a full set of PINN weights, which is more challenging than predicting a solution grid. We acknowledge the difficulty of direct comparison in the evaluation Section 4.3 on line 250.
[W3:] It is correct that the iterative refinement process relies on linearity. That said, some non-linear PDEs can be linearized, making the method also applicable to those.
[Q1:] Our framework can be extended to 3D, non-linear, and higher-order PDEs with only minor modifications, such as adding non-linearities like to the set of selectable terms when sampling the differential operator, or adding a third variable and changing the dimensionality of the inputs to the Swin Transformer and target PINNs. However, more computational resources would be necessary to train such a model. We want to highlight that, to the best of our knowledge, HyPINO is the first neural operator trained on such a diverse set of linear, 2D PDEs with mixed Dirichlet and Neumann boundary conditions on complex domains.
While our current framework focuses on linear PDEs, this still captures a broad and impactful class of PDE problems in engineering and the natural sciences, often sufficient for many modelling applications or as first steps into non-linear modelling approaches. Moreover, we have experimented with linearization techniques (e.g. the Cole–Hopf transformation for the 1D Burgers equation) as a practical extension strategy, which enables HyPINO to handle certain nonlinear PDEs. However, we agree this work should trigger novel research directions, such as including non-linear examples directly in the training pipeline.
[Q2:] Indeed, the part about constructing the binary mask for the boundary conditions could use some additional information. We sample continuous boundary points directly on the CSG-defined shapes and need to convert them into a grid-based discretization for the Swin Transformer. For each point on the boundary, we find the enclosing grid cell it lies in and set the four corner points of that cell to 1 in the binary mask (indicating that the boundary passes through this region). For example, a boundary point at activates the grid points , , , and . This gives the binary masks and . The corresponding boundary values are stored in and at the same locations. This can be clarified in the final version.
[Q3:] We agree that discretization-invariance is a key property for neural operators. While the output of HyPINO is a continuous PINN that can be evaluated at arbitrary spatial coordinates, the input PDE parametrization (source function and boundary masks/values) is discretized on a fixed-size grid (224×224) to match the Swin Transformer’s input resolution. Following prior work [1], this limitation can be mitigated by demonstrating test-time resolution invariance when varying the input grid resolution and resizing it to (224x224). We performed this ablation on the Helmholtz benchmark (HZ) by changing the source function resolution between 28 and 448:
\\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \\hline \**Grid Size** & 28 & 56 & 96 & 112 & 140 & 168 & 196 & 224 & 280 & 336 & 392 & 448 \\\\ \\hline \**SMAPE** & 38.04 & 35.78 & 35.91 & 36.00 & 36.05 & 36.05 & 36.05 & 36.04 & 36.05 & 36.03 & 36.04 & 36.04 \\\\ \\hline \\end{array}Between resolutions of 56 and 448, SMAPE varied by less than 0.3, which shows approximate invariance. Only at very coarse resolutions (28x28) does the performance start deteriorating. This experiment can be added to the appendix.
[1] Herde, Maximilian, et al. "Poseidon: Efficient foundation models for pdes." Advances in Neural Information Processing Systems 37 (2024): 72525-72624.
I am overall satisfied with the rebuttal by the authors, but I am disappointed that the reference below, which came to my attention after I had written my review, was not cited or mentioned in the related works section:
Jae Yong Lee, SungWoong Cho, Hyung Ju Hwang, "HyperDeepONet: learning operator with complex target function space using the limited resources via hypernetwork", ICLR 2023.
I think the authors should include this reference and position their contribution in reference to that. I would like to keep the score as is.
We thank the reviewer for their time, careful review, and constructive feedback. We also appreciate the pointer to the relevant work, which we will include in the related works section.
This paper uses a hyper-network (based on the SwinTransformer Architecture) that predicts the parameter of a PINN architecture for different PDEs. This is a new way to combine operator learning and PINNs where the operator learning part is really a hyper-network that predicts the parameters of PINN for a given PDE.
The authors show that their methodology of first predicting the parameters of a PINN rather than directly training a PINN from scratch is out perform PINN base baselines. The authors further provide a delta trick (where the error in the proposed solution and the true solution is defined as another PDE that is predicted as a PINN), that iteratively reduces the error of HyPINO over datasets, often achieving ~100x L2 error 1D heat equation, Helmholtz equation as compared to other operator based baselines.
Most of the experiments performed by the authors are on time-independent PDEs and hence the authors are also able to use methods of manufactured solutions, to generate synthetic training data to train the model.
优缺点分析
I think that this is quite an interesting work, where the authors are able to compete with operator based baselines such as PINO, and at the same time able to achieve the same property of a PINN, i.e, ideally the ability to sample from any point in the domain.
Furthermore, the delta PINN trick of progressively improving the solution of the PDE is quite interesting.
A potential negative here is that the experiments are mostly shown for steady-state PDEs, and it would be interesting to see how the method performs for time-dependent dynamics.
Furthermore, it will be interesting to see how well the work does on 3D PDEs, since ideally it can pretty simply bring down the compute cost if this methodology works well. However, I think that the paper introduces an interesting methodology.
Few baselines and works to consider (that at least should be cited since they are relevant) are:
- DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic prediction
- Deep Equilibrium Based Neural Operators for Steady-State PDEs
I think the former is very similar in approach that they try to use a hyper-network to somehow compress the dynamics of the model, and the latter is a neural operator for a steady state PDEs with a similar data generation methodology.
问题
From the way I understand, the hyper-network takes as input a grid based solution (since its a Swin Transformer). However, is the PINN also evaluated on the same points, or any randomly sampled point in the domain? I think this a bit unclear, and maybe should be made a bit more explicit in the methods section.
局限性
Yes
最终评判理由
I will keep my score since the reviewers have addressed most of my concerns in the rebuttal. I think that this is an interesting paper and methodology that can be useful for the community.
格式问题
No
We thank the reviewer for taking the time to carefully evaluate our work and for the constructive feedback. We appreciate your insights and address your points below.
[W1:] While it is true that most of the experiments focus on steady-state PDEs, our evaluation also includes the 1D wave equation (WV), a time-dependent hyperbolic PDE. HyPINO demonstrates competitive zero-shot performance on this benchmark (MSE: 2.9e-1), outperforming UNet and Poseidon, and matching PINO.
[W2:] The reviewer is correct in stating that the current work is limited to PDEs in (at-most) two space dimensions. However, our framework is very general and can be readily extended to 3-D PDEs, which we aim to consider in future work.
[W3:] Thank you for highlighting these relevant works. We will include and discuss them in the related works section of the final version.
[Q1:] Indeed, the choice and sampling procedure of collocation points used to evaluate the generated PINNs could be explained further. The target PINNs are evaluated on 4,096 randomly sampled points, where 2,048 lie within the domain, and 2,048 are sampled on the boundaries. The reasons for this are twofold: first, sampling on the entire grid (over domain and boundary) would yield 224*224 collocation points, which would not fit into memory. Second, by sampling explicitly on the boundary, we ensure better coverage and a stronger training signal for Dirichlet and Neumann boundary conditions. If we would only rely on points from the sampled grid, very few would lie exactly on the boundary, especially for complex shapes, which could lead to weak or noisy training signals for the boundary conditions. This sampling process will be clarified and more detailed in a camera-ready version of the paper, if accepted.
(1) Summary of this work: This paper presents HyPINO, a hyper-network (based on the SwinTransformer Architecture) that predicts the parameters of a PINN architecture for different PDEs. To enhance the generalizability, HyPINO is optimized with data generated by MMS and unlabel data optimized by PINNs. Besides, an iterative refinement procedure is presented to finetune the generated PINN parameters for new tasks. HyPINO demonstrates strong performance in diverse PDEs.
(2) Strengths and weaknesses: Although the HyPINN framework is new, this paper presents a good unification of MMS-generated data and PINNs. The iterative refinement procedure is interesting. I think the whole framework can be inspiring for future exploration of generalizable PINNs. One main concern, as pointed out by Reviewer 7tHo, this paper lacks a detailed ablation study, since there may be manual and unexplained designs, such as swin Transformer.
(3) Summary of rebuttal: During the rebuttal, the authors have made a detailed analysis of optimizers, as well as the resolution-invariant property. Although the novelty of using MMS for PINO is argued by one Reviewer, I think introducing MMS to this community is valuable and the authors have provided a reasonable way to utilize the generated data.
Final decision: This paper presents a promising and effective framework to enable generalizable PINNs, which has been highly acknowledged by 3 of 4 reviewers. After carefully checking all the reviews and discussions, I think the authors have provided sufficient evidence to address the reviewer's concerns. Thus, I recommend an acceptance (spotlight).