6.8

/10

Poster4 位审稿人

最低4最高5标准差0.4

4.0

置信度

创新性3.0

质量3.0

清晰度2.5

重要性3.0

NeurIPS 2025

PINNs with Learnable Quadrature

Sourav Pal,Kamyar Azizzadenesheli,Vikas Singh

OpenReview PDF

提交: 2025-05-02更新: 2025-10-29

摘要

关键词

PINNs

评审与讨论

审稿意见

评分: 4置信度: 42025-06-19

This paper proposes an enhancement to physics-informed neural networks (PINNs) by introducing learnable quadrature rules. Specifically, the weight function in the quadrature rule is parameterized by a neural network. The loss function includes regularization to discourage small weights and enforces that the learned quadrature weights sum to the integral of the weight function. The method shows improved empirical performance on several 1D benchmark problem. The authors also extend their methods to train neural operators without using numerical solutions.

优缺点分析

Strengths

The method is compared with lots of existing methods and shows empirical success.
The approach is well motivated and is principled in adaptive quadrature.

Weaknesses

All examples are 1D in space.
The presentation lacks clarity in several aspects. The use of $\theta$ and $\phi$ is confusing. For example, $w_\theta$ in Equation (2) and (10), $w_\phi$ , $u_\theta$ in Algorithm 1, and $\phi$ goes to the solution function in Figure 4. In addition, in Algorithm 1, why is $\phi(\mu)$ .

问题

How does the learned weight function compare to classical choices such as Gauss-Jacobi weight functions? Please provide plots or examples of the learned weight functions for illustration.
Equation (30), which defines $l_w$ , is unclear. It is not evident which parameters are trainable. Are the quadrature weights $w_i$ themselves learnable? It seems that for any given weight function $w(x)$ , the weights $w_i$ and nodes $x_i$ are computed using a numerical procedure described in Section 5.
The paragraph on the interlacing of roots of orthogonal polynomials is confusing. What exactly does “overfitting” mean in this context? A plausible failure mode would be the learning of a trivial weight function (e.g., $w(x) = 0$ ), which seems to motivate the penalty term in Equation (20). However, the paper states: “By utilizing quadrature nodes and weights stemming from varying degrees of orthogonal polynomials (all of which correspond to the same weight function being learned), we introduce the desired stochasticity to prevent over-fitting.” This seems to be a key component of the algorithm, but it is unclear how it is implemented or integrated into the training process.

局限性

Yes.

最终评判理由

The rebuttal addressed major issue on clarity, including inconsistent notation and the explanation of the interlacing of roots. The planned revision, especially visualization of the learned weight function, will help readers better understand the methods. The method remains evaluated only on 1D problems. The approach is technically sound, well motivated, and shows consistent empirical improvement over baselines.

格式问题

N/A

作者回复

2025-07-31

We thank the reviewer for the constructive and insightful feedback. We address each point below and will incorporate the relevant clarifications and improvements into the final version of the paper.

Q1. Scope of Experiments: Why are all examples 1D in space?

Thanks for bringing this up. Our primary intent was to show the effectiveness of LearnQuad on benchmark problems used in prior PINN literature. As mentioned in line 291, we evaluated our method on standard PDEs from several baselines to enable direct comparison. That said, our formulation (Remark 4.1) naturally can be extended to higher-dimensional settings via tensor-product or sparse-grid constructions. We have checked that in principle the implementation works for simple cases, but these need to be developed in a separate follow-up work. We are happy to include additional results in the final version if the reviewer recommends a specific benchmark PDE.

Q2. Notation clarity: Confusion around $\theta, \phi$ , and their roles in Algorithm 1 and Figure 4.

Many thanks for pointing this out. Yes there was some notational inconsistency in the original draft. We have revised the text and figures to consistently denote:

$\theta$ : parameters of the weight function $w_\theta$

$\phi$ : parameters of the solution function $u_\phi$

In Algorithm 1, the line $w_\phi$ from $\phi(\mu)$ has a typo and was intended to show that the weight function may optionally depend on PDE parameters $\mu$ . However, this caused confusion since the dependence was not consistently carried forward. We have now removed this line. Figure 4 has been revised to indicate more clearly that $\theta$ governs the quadrature module and $\phi$ the solution function.

Q3. How does the learned weight function compare to classical ones like Gauss-Jacobi? Can you show plots or examples?

This is a nice suggestion. We observe that the learned weight functions differ quite a bit from classical choices. While Gauss-Jacobi weights have fixed singularities at the endpoints, our learned modifier $h_\phi(x)$ dynamically adapts the weight distribution based on the PDE solution’s characteristics. Empirically, we find that the collocation points tend to cluster in high density regions and re-distribute them as training proceeds, just as desired. We will include representative plots comparing learned versus classical weights and their induced node distributions in the final version to illustrate these differences clearly.

Q4. Equation (20) and clarity about trainable parameters: Are the quadrature weights $w_i$ themselves learnable?

We agree that Equation (20) could be better explained. Yes your interpretation is correct, only the weight function $w_\theta(x)$ (parameterized via a neural network) is learnable. The quadrature weights $w_i$ and nodes $x_i$ are computed numerically from the learned weight function using asymptotic expansions (Section $5$ ), and not directly learned. We have revised the text to clearly emphasize that the model learns the modulator $h_\phi(x)$ , from which quadrature rules are derived numerically.

Q5. Clarification on “interlacing of roots” and how it mitigates overfitting. What does “overfitting” mean here?

We appreciate the question in this regard and make an effort to clear the confusion. Given a weight function, it induces a family of orthogonal polynomials each of a different degree. LearnQuad utilizes the roots of these polynomials by computing them using asymptotic expansions. While the weight function is being learnt, we observed that only sampling the roots of a fixed degree lead to local minima and the solution function didn’t generalize well to unknown points in the domain. To mitigate this, we leverage a classical property of orthogonal polynomials: interlacing of roots. The roots of the degree $n+1$ polynomial interleave those of degree $n$ , as shown in Figure 3. Hence, by using different degrees to sample roots in different epochs leads to a drastically better generalization of the learnt solution. We thank the reviewer for pointing this out and have updated the manuscript to clearly describe this mechanism, its theoretical motivation, and its empirical benefit to generalization.

To Summarize:

We will improve clarity throughout the manuscript, particularly in Section 5 and Algorithm 1. We will revise Equation (20) to explain more precisely how it prevents degenerate quadrature weights and enforces normalization. As suggested, we will add visualizations of the learned weights and node distributions. Code will be released to enhance reproducibility and clarity.

We thank the reviewer again for their thoughtful evaluation and helpful suggestions, which we believe will greatly improve the final version of the paper.

2025-08-06

Thank you for the clarifications. The responses addressed my concerns, especially regarding notation, the training process, and the role of root interlacing. I look forward to the updated figures and improvements in clarity in the final version.

2025-08-06

Dear Reviewer,

We are glad that our responses have addressed your concerns. We will incorporate the discussions in the final version.

Thank you

审稿意见

评分: 5置信度: 42025-06-27

This paper proposes a principled method for learning adaptive quadrature points and weights, which improves the accuracy of PINN. The quadrature point is parameterized as the roots of a parameterized Jacobi polynomial induced from a modified Gauss-Jacobi type weight function, where a learnable multiplicative factor $h_{\theta}(x)$ is introduced. To compute the roots efficiently, the authors utilize the recent advances in asymptotic expansion of OP described in section 5.1. Since the quadrature points are implicit functions of the parameters $\theta$ in $h_{\theta}(x)$ , the author uses implicit differentiation over the simplified root equation (eq. 17-19). Learning is done by backpropagating the PINN loss over the parameters of the solution function and the parameters of the quadrature point. The author also proposed an additional loss for the quadrature point in equation 20. The method was tested on various common PDEs and demonstrated superior accuracy (measured in L2 relative error) against other methods. I believe that this work made a great contribution in improving PINN and making it closer to being comparable to the existing PDE solvers.

优缺点分析

Strength:

The method is very principled and demonstrates strong empirical performance
The writing was very clear, and the paper is easy to follow despite the heavy mathematical content
To the best of my knowledge, the method is novel, as previous methods for learning quadrature points do not generate the quadrature points as roots of OP.

Weaknesses:

There should be a comprehensive description of the parameterization. Also, Algorithm 1 provides a sketch of the training process; the bulk of the work in this paper is in how the quadrature points are computed. In this current form, I had to gather various pieces from the paper to understand how exactly the quadrature points are computed. Also, I couldn't find the network architecture description
The method's accuracy should also be benchmarked against non-PINN PDE solvers. It is good to show that the method outperforms other PINN variants, but it is also good to see how much gap there still is.
The method is only tested on low-dimensional standard PDEs. Applicability to real-world problems could be challenging, especially for irregular domains

问题

$\alpha$ and $\beta$ in the weight function are not learned. How are they chosen? Same question for the expansion coefficients. What kind of information about the equation do we need to derive these?
To derive equation 18, you assume the roots are real and there are only simple poles. When does this condition hold, and could this assumption be violated in some PDEs?
It appears that the first term in the quadrature loss in equation 20 is a relaxation of a hard constraint. How much is this error? And what effects does this error have on the solution?
You mentioned that the quadrature point could be learned adversarially, but in practice, no significant difference was observed. What could be the reason?
In line 230, it was mentioned that "we use linear layers...", but from the appendix, it seems that the author is actually using a multi-layer perception with 3~4 layers. I fail to understand this paragraph.

局限性

Yes

最终评判理由

I am maintaining my initial assessment of the paper and hence keeping my initial score of 5. The learned quadrature method clearly improves upon previous works in PINNs, but the limitation is still apparent. For example, the method cannot be applied to high-dimensional problems yet, and there is still a significant gap in accuracy compared to traditional solvers.

格式问题

None

作者回复

2025-07-31

We thank the reviewer for the thoughtful and encouraging evaluation. We deeply appreciate the recognition of the novelty and principled nature of our approach, as well as the constructive suggestions. Below, we address each of the reviewer’s comments and questions in detail.

Q1. Clarification on parameterization and network architecture

We appreciate the reviewer’s request for a more cohesive description of the parameterization strategy and network architecture. The architecture details are currently included in Appendix §10.1.6. To summarize: for both the solution function and the weight function (i.e., the LearnQuad module), we use fully connected multilayer perceptrons with depth $3–5$ and width ranging from $50$ to $200$ , depending on the PDE. We use tanh as the activation function throughout. We acknowledge the wording in line 230 (“linear layers”) is misleading and will revise this to explicitly state the use of MLPs. Additionally, we will expand Algorithm 1 in the final version to provide a more comprehensive description of how quadrature points and weights are computed and how they are integrated into the end-to-end training loop.

Q2. Benchmarking against non-PINN PDE solvers

Thank you for this suggestion. In response, we conducted a quick experiment comparing the performance of existing numerical solvers and PINN with LearnQuad on three representative PDEs with known analytic solutions (the details of which are in Appendix 10). We find that LearnQuad consistently closes the gap between existing (best adaptive) PINNs and classical solvers in terms of $L_2$ relative error, while maintaining computational efficiency. These results will be included in the final version of the paper, and we welcome further recommendations for such comparisons.

$L_2$ Relative Error Comparison: Numerical Solvers vs. Adaptive PINNs

PDE	Numerical Solvers: (Relative $L_2$ Error)	Adaptive PINNs: (Relative $L_2$ Error)
Conv. (β = $30$ )	Upwind: $0.0460$ \| L-W: $0.00015$ \| BW: $0.00015$	R3: $0.0078$ \| LQ: $0.0068$
Conv. (β = $50$ )	Upwind: $0.0755$ \| L-W: $0.00025$ \| BW: $0.00025$	R3: $0.0228$ \| LQ: $0.0076$
Diffusion	FE: $0.00043$ \| BE: $0.00044$ \| CN: $0.00044$	RAR-G: $0.0009$ \| LQ: $0.0005$
Wave	LF: $0.00208$ \| CFD: $0.00208$ \| NB: $0.00151$	RAD: $0.0900$ \| LQ: $0.0050$

Legend:
L-W = Lax-Wendroff; BW = Beam-Warming; FE = Forward Euler; BE = Backward Euler; CN = Crank-Nicolson; LF = Leapfrog; CFD = Centered FDM; NB = Newmark-beta; R3, RAR-G, RAD = existing adaptive PINNs; LQ = LearnQuad (ours)

Q3. Applicability beyond low-dimensional problems and standard PDEs

We agree with the reviewer that evaluating LearnQuad on more complex and real-world PDEs including irregular domains and higher dimensions is a valuable direction ripe for future work. As noted in line 291, our current PDE choices follow established baselines to enable direct empirical comparison. We also mention that extension to higher dimensions can involve the use of tensor-products or sparse grids to make it computationally efficient. If there is a specific PDE the reviewer recommends we are happy to include it in the final version.

Q4. Choice of $\alpha$ , $\beta$ , and expansion coefficients:

Thank you for raising this point. In Appendix §10.1.8 (Table 6), we performed an ablation study on various choices of $\alpha$ and $\beta$ in the modified Gauss-Jacobi weight function. We observed that LearnQuad's performance shows minor variations to these hyperparameters. Based on this, we chose $\alpha, \beta \in$ 1,2 for all experiments reported in the paper. We emphasize that the choice was not arbitrary, but rather empirically sufficient to achieve strong performance across PDEs, without requiring exhaustive hyper-parameter sweeps. We will revise the text to reflect this clearly. As for the expansion coefficients (e.g., $c_0, d_0, \tau_0, \tau_1$ ), these are not fixed but are learned from data as part of the weight function network, as described in lines 232–236. We will make this explicit in the main text to improve clarity.

Q5. Assumption of real roots in asymptotic expansion (Eq. 18)

This is an important technical point. Our derivation assumes that the domain of the PDE is real-valued, which ensures that the roots of the orthogonal polynomials are real and simple. This assumption holds for all PDEs considered in the paper and for most physical PDEs encountered in practice. However, as the reviewer rightly points out, this assumption could break down in special cases where PDEs are defined on the imaginary plane. We will include this clarification and limitation explicitly in the final version.

Q6. Impact of quadrature loss term in Equation (20)

The first term in Equation (20) enforces that the sum of quadrature weights approximates the integral of the learned weight function. This is indeed a relaxation of the exact constraint. Empirically, we observe that this term typically converges to a value of order $e^{-3}$ or lower. If we don’t include this term, the final performance is worse and also, its inclusion enforces that the quadrature weights are valid.

Q7. Adversarial training of the quadrature module

We explored the idea of learning quadrature points adversarially — where one module proposes points and the other learns under this challenging distribution. However, in our experiments, training both modules jointly (end-to-end) or in an alternating min-max fashion resulted in comparable performance. Adversarial training is known to be more unstable and harder to tune. Without careful balancing (e.g., learning rates, update schedules), it may converge to a similar fixed point as joint training, effectively reducing to the same solution. This could have been an issue specific to our implementation but we were unable to get it working. If there are specific papers we can use as guidance, we welcome suggestions.

Q8. Clarification on “linear layers” in line 230

We apologize for the inconsistent wording. As detailed in the appendix, we use multi-layer perceptrons (MLPs) with $3–4$ hidden layers for both the solution and quadrature modules. Each hidden layer consists of fully connected (linear) operations followed by tanh non-linearities. We will correct the text in line 230 to reflect this architecture accurately. Thank you for pointing this out.

We sincerely thank the reviewer for their thoughtful engagement and strong endorsement. We believe that the clarifications and additions outlined above further strengthen the paper and reinforce the rigor and applicability of LearnQuad.

2025-08-08

Thanks for your detailed reply and for conducting the additional experiments. All my questions and concerns are well addressed, and I look forward to seeing them incorporated in your final draft.

2025-08-08

Dear Reviewer,

We sincerely thank you for the positive assessment and thoughtful feedback. We’re glad our responses addressed your questions, and we’ll incorporate your suggestions into the final version of the paper.

审稿意见

评分: 4置信度: 52025-07-02

The authors propose "LearnQuad", an approach to sample quadrature points for physics-informed neural network training (PINN training). They demonstrate the performance of the approach in comparison to other sampling strategies in a simple setting (Diffusion equation), and separately demonstrate that quadrature points can also be found for families of PDEs.

优缺点分析

Strengths:

Several numerical experiments for low-dimensional PDEs, and comparison to other sampling techniques.
Appendix with description of all PDEs and settings.

Weaknesses: The paper introduces a method to sample quadrature points to improve PINN training. In general, improving methods that are still sub-par compared to the state of the art (here: compared to all PDE solvers) is fine, but the comparison should still make it clear how far away the current method still is from the SOTA. This has been discussed extensively in the PINN literature at this point, see for example:

McGreivy, Nick, and Ammar Hakim. "Weak Baselines and Reporting Biases Lead to Overoptimism in Machine Learning for Fluid-Related Partial Differential Equations." Nature Machine Intelligence 6, no. 10 (September 25, 2024): 1256–69. https://doi.org/10.1038/s42256-024-00897-5.
Wang, Sifan, Yujun Teng, and Paris Perdikaris. "Understanding and Mitigating Gradient Flow Pathologies in Physics-Informed Neural Networks." SIAM Journal on Scientific Computing 43, no. 5 (January 2021): A3055–81. https://doi.org/10.1137/20M1318043.
Jorge F. Urbán. "Unveiling the Optimization Process of Physics Informed Neural Networks: How Accurate and Competitive Can PINNs Be?" Journal of Computational Physics 523 (February 15, 2025): 113656. https://doi.org/10.1016/j.jcp.2024.113656.

The first citation includes important suggestions to developing new methods for PINNs, the second and third are important contributions to understanding and improving PINNs. Especially the third demonstrates that the accuracy can be drastically improved by choosing a different optimizer. These developments have to be taken into account when developing new methods for PINNs, otherwise it is impossible to understand how well the new method performs in practice.

Weak baselines.
Comparison to classical solvers / solvers in latest PINN literature is missing; points in appendix 11.3 are not enough to address this appropriately.
No theoretical analysis of the approach. Stating that "no additional explicit error estimation is required" is not enough to convince the reader why the proposed method is providing "better" samples than monte carlo approximation.
A lot of formatting concerns (see below).
The only comparison to other methods is done on a very simple benchmark (diffusion equation 1d) with only 30 sample points. It is impossible to judge if the method performs better than others (especially classical methods) with just this experiment.

问题

Also see "weaknesses". Q1) how would the method perform when using the best PINN optimizers instead of Adam? Q2) how accurate are classical methods for the given PDEs, with the same number of sample points? Q3) how do the other sampling approaches perform when the number of points increases, or when the PDE is different?

局限性

The limitations section does not describe a limitation of LearnQuad, but of PINNs in general. No limitations of the present work are discussed.

最终评判理由

The authors now compare to classical solvers, which all outperform the presented method, as expected. The new method (LearnQuad) however outperforms all related PINN training point sampling approaches for these low-dimensional PDEs. My points in the rebuttal were all addressed, but it is hard to judge the new version of the PDF without looking at it - thus I only update to 4.

格式问题

Typesetting mistakes (e.g. period missing after eq. 1, 2, 6; no commas before "where");
Ambigous statements (l101: "is parameterized/learned" - which one?);
The titles of the paper are not on the same level of abstraction (1 Introduction, 2 Preliminaries 3 Strong and Weak forms 4 How to learn Quadrature Rules? -> 1,2, are classical structure, which means 3 should be something like "main part").
Mix of italiqe, bold, and both for emphasis, without a clear reason to switch from one to the other.
Use of color in the text (l199, 203) is very uncommon.
Table formatting is not using NeurIPS style (e.g. table 7,8, wrong caption placement and vertical bars).

作者回复

2025-07-31

We thank the reviewer for their thoughtful comments. We begin by summarizing our understanding of the core concern and then offer detailed responses.

Core Concern: We appreciate the pointer to McGreivy/Hakim’s important critique of weak baselines in ML-for-PDEs. We agree with the call for transparent comparisons and support benchmarking against classical solvers at equal accuracy or runtime. We already align with many of these principles—comparing all sampling/PINN methods under identical setups (same network, optimizer, and compute budget). We also clearly acknowledge that PINNs are typically more expensive and less accurate than classical solvers for forward problems. What we had not done is explicitly quantify this gap. Since this is a key factor in your assessment, we now address it directly. We have benchmarked strong non-PINN solvers on multiple problems and report these results prominently. Rather than undermining our work, we believe these comparisons reinforce our contribution: LearnQuad advances PINN performance over existing sampling methods, though we do not claim it closes the gap with classical solvers.

Q1. Other Reference:

We find the developments in the second and third papers to be complementary to us. The second introduces (a) an annealed learning rate and (b) a refined network architecture; the third proposes (a) a modified optimizer and (b) an adjusted loss. LearnQuad is orthogonal to these techniques—our focus is adaptive sampling via a principled quadrature framework using orthogonal polynomials, which precedes the solution network. In contrast, the cited methods enhance optimization and modeling. Thus, LearnQuad can be seamlessly integrated with them. To validate this, we ran quick experiments combining LearnQuad with proposed setups and observed consistently improved or comparable results. We will include these findings in the final version. This reinforces our message: LearnQuad is a modular, drop-in enhancement that boosts PINN performance across architectures and optimizers.

Q2. Comparison (Choice of PDE and baselines):

We urge the reviewer to look at Table 1 and Table 2 in the main paper which has the following PDEs: Burger’s, Allen-Cahn, Wave, Convection with 1000-200 points. The Diffusion equation with 30 points is indeed a toy setting and hence included in the Appendix. The multiple baselines are chosen following major works which introduce adaptive sampling methods as mentioned in l291.

Q3. Comparison to classical solvers:

We fully agree with the reviewer that PINNs, in their current form, often fall short of classical numerical solvers in accuracy, efficiency, and robustness—a well-known limitation in the field. Our work does not claim that this is not the case. In fact, we explicitly acknowledge that classical solvers (e.g., finite difference, finite element, spectral methods) generally outperform PINNs on well-posed PDEs.

However, our goal is not to argue for PINN superiority, but to contribute to ongoing efforts to improve PINN performance and understanding. The gap between PINNs and traditional solvers is real, and our work aims to help narrow it—not by surpassing classical methods, but by enhancing generalization and accuracy through adaptive, learnable quadrature schemes. Importantly, our contribution is orthogonal to the broader debate about whether PINNs can eventually match classical solvers. We focus on one key aspect: sampling efficiency, which directly impacts the quadrature error term in generalization bounds (Mishra & Molinaro, 2023). By improving training-time sampling distributions, LearnQuad yields significant empirical gains. In short, LearnQuad is not meant to outperform standard numerical methods like finite elements or finite differences—but to improve over standard PINNs. We believe this is a necessary step if PINNs are to become viable PDE solvers.

$L_2$ Relative Error Comparison: Numerical Solvers vs. Adaptive PINNs

PDE	Numerical Solvers: (Relative $L_2$ Error)	Adaptive PINNs: (Relative $L_2$ Error)
Conv. (β = 30)	Upwind: 0.0460 \| L-W: 0.00015 \| BW: 0.00015	R3: 0.0078 \| LQ: 0.0068
Conv. (β = 50)	Upwind: 0.0755 \| L-W: 0.00025 \| BW: 0.00025	R3: 0.0228 \| LQ: 0.0076
Diffusion	FE: 0.00043 \| BE: 0.00044 \| CN: 0.00044	RAR-G: 0.0009 \| LQ: 0.0005
Wave	LF: 0.00208 \| CFD: 0.00208 \| NB: 0.00151	RAD: 0.0900 \| LQ: 0.0050

In response to the reviewer’s suggestion, we benchmarked LearnQuad against classical solvers on three PDEs with known analytic solutions (details in Appendix §10). As expected, standard methods (e.g., Lax-Wendroff, Crank-Nicolson), still achieve the lowest relative $L_2$ errors. However, LearnQuad consistently improves PINN performance across all cases and notably outperforms the strongest adaptive PINN baselines. These results are not to suggest superiority over classical solvers, but to contextualize the scale of gains LearnQuad brings within the PINN framework. We hope this clarifies the scope of our contribution. We will highlight these results in the final version.

Q4. Theoretical Analysis:

We thank the reviewer for pushing the need for theoretical justification. While our main contribution is algorithmic, LearnQuad is grounded in recent theory on PINN generalization—most notably Mishra & Molinaro (2023). Their Theorem 2.6 decomposes the generalization error into training error $E_T$ and a quadrature term $C_{\text{pde}} C_{\text{quad}}^{1/p} N^{-\alpha/p}$ , where $C_{\text{quad}}$ depends on the sampling distribution. This shows that sampling is a central factor in solution quality—not a secondary detail.

LearnQuad directly targets the quadrature term by learning a sampling distribution that concentrates collocation points where the residual is high. This lowers $C_{\text{quad}}$ without modifying the PINN architecture or PDE model. It offers a practical means to realize the theoretical insights of (Mishra & Molinaro). We will make this connection explicit in the final version. The link is direct—Theorem 2.6 gives an interpretable bound that LearnQuad implements in a scalable, theory-aligned way. LearnQuad is a principled, theoretically grounded adaptive sampling method that minimizes one of the two main sources of PINN error.

Mishra, Siddhartha, and Roberto Molinaro. "Estimates on the generalization error of physics-informed neural networks for approximating PDEs." IMA Journal of Numerical Analysis 43.1 (2023):1-43

Q5. Why are the sampled points better than Monte Carlo?

The case for adaptive sampling outperforming uniform Monte Carlo in PINNs has been well established by prior works cited as baselines. These show that allocating points in regions of high residual or solution complexity improves generalization. Our results align with this: LearnQuad consistently outperforms vanilla Monte Carlo PINNs and other adaptive methods across benchmarks. We introduce a principled approach to adaptive sampling via a learnable quadrature scheme. During training, the sampling distribution is not fixed—it evolves by learning a weight function informed by the PDE residual. This function induces a probability distribution that biases collocation points toward informative regions. A key challenge in adaptive sampling is that drawing from such tailored distributions is often costly or complex. We address this by using asymptotic expansions to generate quadrature points quickly and stably from the learned weights, making our method both scalable and easy to integrate with existing PINN models.

We believe the reviewer will agree that adaptive sampling from problem-specific distributions generally outperforms uniform sampling, especially when residuals or features are spatially non-uniform. By formulating sampling via asymptotic expansions and learnable weights, we enhance performance and offer a general-purpose tool that integrates smoothly into broader learning pipelines. We hope this is seen not just as an empirical gain, but as a step toward scalable, theory-informed adaptive sampling in physics-informed learning.

Q6. Limitations:

The reviewer is correct in pointing this out, since PINNs are a vessel for LearnQuad we had outlined the limitations when used in this setting, despite LearnQuad improving the performance of PINNs are demonstrated empirically. We are happy to expand the section to include a discussion of challenges in extension to higher dimensions which we believe is a ripe direction for future research.

Q7. Paper formatting Concern:

We have fixed typesetting errors and inserted commas after “where”. L101 we meant it is being parameterized using neural networks, the weights of which are learnt. We adjusted the abstraction level, made style consistent throughout the paper. We are happy to remove the color. We have updated the tables in the Appendix to a more standard format.

We thank the reviewer for their thoughtful critique, which helped refine our scope and positioning. We hope our clarifications—on benchmarking, theoretical grounding, and compatibility with recent PINN advances—highlight that LearnQuad is a principled, modular improvement that strengthens PINNs and moves them closer to bridging the gap with classical solvers.

2025-08-05

I am positively surprised by the amount of work the authors have put into this rebuttal. In particular the comparison to classical solvers is - in my opinion - crucial to highlight the current state of the art for the given PDEs, and now also allows to place LearnQuad / PINNs in general for those PDEs. It is not a problem at all that the classical solvers outperform PINNs here, as the authors also acknowledge, this is expected. I think the new table is absolutely crucial for the presentation in the manuscript and should be placed prominently in the main part.

I also thank the author for addressing my other concerns, and also referring to the new literature. It is hard to assess the new version of the paper without looking at the PDF, but I will update my score regardless.

My only concern left is related to the comment to the other reviewer, to "Q5. Applicability to high-dimensional PDEs". The example given by the authors is not a reasonable test of performance or comparison between methods. The example is "too high-dimensional": a simple constant as "approximation" already performs equally well (0.09).

2025-08-05

Dear Reviewer,

We are delighted that you appreciated our effort to respond to your detailed and constructive suggestions. Thank you for pushing us to clarify two important aspects that improved the paper, and for adjusting your recommendation.

Regarding the high-dimensional setting: yes, we concede it remains open for now. Despite some initial positive signal, a non-trivial amount of technical and development work still needs to happen to make tensor product or sparse-grid based ideas effective for the high-dimensional setting.

We re-affirm that we will include the comparison and other discussions arising from the reviewer's feedback in the main paper's final version where an additional page will be helpful.

Thank you

审稿意见

评分: 4置信度: 32025-07-03

This paper combines the numerical approach with the machine learning approach to learn solutions of PDEs. It proposes the method LearnQuad that enhances the physics informed neural networks (PINNs) with learnable quadrature rules, that are data-driven learned and parametrized by neural networks. By using asymptotic expansions, the method efficiently obtains quadrature nodes and weights from the learned weight function. This method can be applied in both the strong form and weak form of solving PDEs. Additionally, compared to operator learning, this method is able to generalize to solving a family of PDEs, and it does not need solution and PDE parameter pairs. For solving a single PDE, this method yields improved performance over multiple baselines including PINN, CPINN, RAD, etc. For solving a family of PDEs, this method is well generalized to the test set.

优缺点分析

Strengths:

For choosing collocation points and their weights, several prior works have been proposed. This method proposes a data-driven method to dynamically learn collocation points and weights. Unlike fixed quadrature or random sampling, the learned quadrature rules is tailored to the structure of the PDEs, enabling a more effective use of limited collocation points.
This paper bridges the gap between PINNs and neural operators for solving a family of neural operators. However, the proposed method doesn’t need the use of labelled pairs of solutions and PDE parameters.

Weaknesses:

The biggest weakness is that experiments use 1000 collocation points. Though this paper claims the method is efficient in obtaining a large number of collocation points and weights by leveraging the asymptotic expansions, it is unclear the performance gap would still exist by gradually increasing the number of collocation points. It is also unclear if the asymptotic expansions would be numerically stable at larger scales.
Second, the PDEs solutions used in the experiments are smooth. It concerns me that the method based on orthogonal polynomials might be less effective for problems with PDEs solutions that are more complex, e.g., solutions that have multi-scale and are more chaotic. Therefore, it is unclear if the proposed method would keep its accuracy advantage in those settings. It is interesting to see if with 1000 collocation points or by gradually increasing the number of points, how would the proposed method perform compared to baselines.

问题

Is it possible for the authors to show the total training time for all methods? Since LearnQuad introduces additional components, it is important to quantify the computational cost compared to PINNs or other baselines. I am also curious how the number of collocation points would scale the training time.
It is unclear how the \alpha and \beta in equation (10) is chosen. How much effort do the authors put on for tuning the two parameters? How do different choices of \alpha and \beta affect the results? An ablation study and analysis would be appreciated.
How does the method work in the high dimensional PDE problem in terms of the accuracy and computational cost? Is there any optimization challenge or training instability issue for the learnable quadrature rules?
For learning a family of PDEs, the proposed method can be used without labelled data, which is needed for neural operators. I am curious how neural operators perform in the test set. I think the authors can train neural operators either with the solutions learned by the proposed model or with the ground truth solutions.

局限性

Yes.

最终评判理由

Previously, I had concerns about the number of collocation points and the solutions being too smooth. Those concerns have been solved. I increased my score to borderline accept.

格式问题

No.

作者回复

2025-07-31

We thank the reviewer for the thoughtful and constructive feedback. We address each point in detail below and will incorporate clarifications and additional results into the final version of the paper.

Q1. Performance scaling with number of collocation points: Is the proposed method still advantageous when more points are used? Is asymptotic expansion numerically stable at large scale?

Our experimental setup (Tables 1 and 2) uses $1000–2000$ collocation points only to maintain consistency with baseline works cited in line 291. However, LearnQuad, by design, scales efficiently to a much larger number of points, enabled by the use of asymptotic expansions. To empirically validate this, we reported runtime and scaling behavior in Table 7 (Appendix Section 11.1) and request the reviewer to take a look. To summarize, the wall-clock time to generate quadrature nodes and weights remains nearly flat up to $10,000$ points. Beyond that, we observe modest sub-linear growth. This can be attributed to GPU memory bandwidth bottlenecks, not to any instability or complexity of the core method. As the number of nodes reaches millions, transferring tensors associated with nodes between compute units and memory increases runtime. This memory bandwidth issue specific to our GPU can be mitigated (although not eliminated) by higher-end hardware and/or optimized implementation. The core algorithm, by design, remains fully parallel. In Table 5 and Section 10.1.7 in the Appendix we include numerical results obtained on increasing the number of collocation points. As demonstrated the results improve on using more points in general. We are happy to expand on this section and include larger scale experiments in the main paper.

Q2. Smoothness of PDE solutions: Does LearnQuad retain its advantage for more complex or chaotic PDEs?

We appreciate the reviewer’s concern regarding generalization to more complex PDEs. Our current benchmarks (Table 1 and 2) do include non-trivial settings like Burgers' and Allen-Cahn equations. As mentioned in Appendix 10.1.3, the value of $\nu=0.01/\pi$ leads results in a non-smooth solution with steep gradients. Furthermore, the solution to the wave equation used in the experiments as outlined in Appendix 10.1.5 demonstrates multi-scale behavior in space and time. We mention in l291 how existing baseline papers informed our choice of the PDEs. That said, we agree that applying PINNs (with or without LearnQuad) on much more complex PDEs remains open. We note that the adaptivity of LearnQuad driven by a learnable weight function allows collocation points to cluster dynamically/adaptively in regions of high solution complexity. As the number of collocation points increases, the method redistributes these adaptively, we show this in Appendix Table 5. We would be happy to include additional PDEs with more challenging behavior – we have covered a mostly exhaustive set covered in adaptive PINN papers (see lines 291) but welcome specific suggestions from the reviewer.

Q3. Computational overhead of LearnQuad: How does the training time compare to baselines? How does it scale with the number of points?

Thanks for bringing up this point. We compare wall-clock training time across all methods in Table 8 (Appendix §11.2). We find that our Jax implementation of LearnQuad is significantly faster than other adaptive methods. As reported, compared to vanilla PINNs LearnQuad is bit more expensive due to an additional compute operations but converges faster apart from improving the accuracy of the solution. Additionally, we analyze the scaling behavior of LearnQuad with increasing collocation points in Table 7. The time to generate quadrature nodes and weights remains nearly flat up to $10,000$ points. Beyond that, we observe modest sub-linear growth. Quadrature generation and solution training are fully parallelized and GPU-compatible.

Q4. Hyper-parameter tuning for $\alpha$ and $\beta$ in Eq. $(10)$ : How sensitive is performance to these parameters?

We reported an ablation study in Table 6 (Appendix §10.1.8), which explores variations of $\alpha$ and $\beta$ in the modified Gauss-Jacobi weight function. We observe only minor variations in performance. Based on this, we fixed $\alpha$ and $\beta$ to moderate values (typically in the range 1–2) for all experiments in the main paper. This choice turned out to be sufficient to achieve the desired performance across tasks, without requiring extensive hyper-parameter sweeps. We are happy to expand this section in the final version of the paper.

Q5. Applicability to high-dimensional PDEs: Does LearnQuad remain accurate and efficient? Any training challenges?

This is a very good point. An extension of LearnQuad to higher dimensions is indeed possible and incurs some increased memory (which would be the case for PINNs in general) due to the requirement of sampling more points in multiple dimensions to reach a desired error. In higher dimensions techniques of tensor-products or sparse grids can help with reducing the computational complexity. The time complexity of LearnQuad does not explode with increasing dimensions due to parallel compute. However, as mentioned earlier (Table 7) one can expect a a moderate increase with sampling an extremely large number of collocation points. When we performed a simple experiment on a 100 dimensional Poisson equation with a closed form ground truth solution: $-\Delta u = -200,\quad x \in (0,1)^{100};\quad u(x) = \sum_{i=1}^{100} x_i^2,\quad x \in \partial(0,1)^{100}$ .

We observe LearnQuad achieves a relative $L_2$ error of 0.085 which is similar to using naive Monte Carlo (with the same number of collocation points) in this setting with relative $L_2$ error of 0.09. This illustrates the viability of LearnQuad for high dimensional PDEs although this is a preliminary finding. Since the solution is smooth in this particular case, there is no substantial benefit in using a data-driven adaptive method. For this problem, our training took 18 seconds to converge in 300 epochs. After this, evaluating the trained model on any given resolution takes 0.0065 seconds. During the testing phase errors were computed with respect to the true analytical solution which is readily available in this case.

Q6. Contrast with Neural Operators: How does its performance compare?

We appreciate the reviewer’s interest in this connection. The reviewer is right that with the help of LearnQuad PINNs can be extended to provide the functionality of neural operators albeit without the use of any labelled data. We believe that this is a step towards bridging the gap between the two modelling paradigms. In the problems considered in the paper, neural operators perform equally well but as the reviewer would agree we needed to provide it with labelled data. We are happy to include these results in the final version.

We once again thank the reviewer for their detailed evaluation and helpful suggestions. We believe these clarifications and additions strengthen the contributions of LearnQuad and demonstrate its applicability across a broad range of PDE-solving tasks.

2025-08-06

Dear Reviewer,

We sincerely appreciate the time and feedback you provided. If any part of our rebuttal needs further clarification or prompts additional questions, we’d be more than happy to elaborate. We also welcome any updates you might consider making to your recommendation in light of the new information provided.

Thank you again for your thoughtful review and for helping improve the quality of our work.

2025-08-08

I thank the authors for those clarifications and I will adjust my score accordingly.

2025-08-08

Dear Reviewer,

Thank you for the thoughtful engagement and are pleased to hear that our responses have clarified the concerns raised. We are grateful for the reconsideration of the score.

最终决定Accept (poster)

2025-09-17

(1) Summary of this work: This paper proposes LearnQuad to enhance PINNs with learnable quadrature rules. With asymptotic expansions, LearnQuad can efficiently obtain quadrature nodes and weights from the learned weight function. In addition to experimenting with strong and weak forms of PDEs, LearnQuad can also be extended to a family of PDEs, surpassing previous methods.

(2) Strengths and weaknesses: This paper proposes a novel and reasonable method to choose quadrature nodes and weights, which is different from previous residue-based sampling methods and demonstrates strong performance. In the rebuttal, the authors also compare LearnQuad with numerical methods, where their work also performs well. There are no significant weaknesses in this work, but there is also no evidence to demonstrate that the proposed method is principal or "optimal" to the sampling problem. Besides, as mentioned by some of the reviewers, the writing of this paper needs further improvement.

(3) Summary of rebuttal: The authors made a solid rebuttal and provided extensive experiments, clarifications and discussions to the reviewers. Especially, the experiment compared to numerical methods is significant and surprising. All the reviewers thank the authors for their rebuttal and there are no serious issues remaining.

Final decision: This paper presents a novel and reasonable approach to enhance PINN's sampling process, which can fit the structure of targeted PDEs better and show solid performance. After the rebuttal, all the reviewers acknowledged the authors' clarification. Thus, I would like to recommend acceptance. The authors should include the newly added experiments in the final version.

PDE	Numerical Solvers: (Relative $L_2$ Error)	Adaptive PINNs: (Relative $L_2$ Error)
Conv. (β = $30$ )	Upwind: $0.0460$ \| L-W: $0.00015$ \| BW: $0.00015$	R3: $0.0078$ \| LQ: $0.0068$
Conv. (β = $50$ )	Upwind: $0.0755$ \| L-W: $0.00025$ \| BW: $0.00025$	R3: $0.0228$ \| LQ: $0.0076$
Diffusion	FE: $0.00043$ \| BE: $0.00044$ \| CN: $0.00044$	RAR-G: $0.0009$ \| LQ: $0.0005$
Wave	LF: $0.00208$ \| CFD: $0.00208$ \| NB: $0.00151$	RAD: $0.0900$ \| LQ: $0.0050$

PDE	Numerical Solvers: (Relative $L_2$ Error)	Adaptive PINNs: (Relative $L_2$ Error)
Conv. (β = 30)	Upwind: 0.0460 \| L-W: 0.00015 \| BW: 0.00015	R3: 0.0078 \| LQ: 0.0068
Conv. (β = 50)	Upwind: 0.0755 \| L-W: 0.00025 \| BW: 0.00025	R3: 0.0228 \| LQ: 0.0076
Diffusion	FE: 0.00043 \| BE: 0.00044 \| CN: 0.00044	RAR-G: 0.0009 \| LQ: 0.0005
Wave	LF: 0.00208 \| CFD: 0.00208 \| NB: 0.00151	RAD: 0.0900 \| LQ: 0.0050