/10

Poster4 位审稿人

最低2最高4标准差0.7

ICML 2025

HEAP: Hyper Extended A-PDHG Operator for Constrained High-dim PDEs

Mingquan Feng,Weixin Liao,Yixin Huang,Yifan Fu,Qifu Zheng,Junchi Yan

OpenReview PDF

提交: 2025-01-22更新: 2025-07-24

摘要

关键词

PDENeural OperatorConstraintsHigh-dimension

评审与讨论

审稿意见

评分: 32025-03-08

This paper focuses on solving high-dimensional time dependent PDE with constraints. Traditional method suffers from curse of dimensionality. The proposed method effectively solves the problem by combining quadratic programing (QP) with NeuralODE. The approach can be summarized as:

Reformulating the PDE with constraints into a QP problem;
Modifying existing QP solver APDHG with learnable parameters to improve efficiency, named the method as HEAP;
Training solution ansatz parameters and HEAP parameters with NeuralODE method in a differentiable way by minimizing residuals.

update after rebuttal

I had a specific question on how to measure performance on BS equations without explicit solution, but the authors did not reply. This is my main concern. Hence, I change my score to 3.

给作者的问题

As discuss in weakness, is there any linear constraints that HEAP cannot handle?

论据与证据

Yes. The experiments show their method being effective in a clear and convincing way.

方法与评估标准

Yes. The experiments are benchmarked on heat equations

理论论述

I checked both Theorem 3.1 & 3.2 and found no issue.

实验设计与分析

Yes, I find no issue with the experimental design. Since there is little work on solving high-dim PDE with constraints, the authors only chose one baseline for demonstration, and I understand their choice.

补充材料

N.A.

与现有文献的关系

N.A.

遗漏的重要参考文献

N.A.

其他优缺点

Originality is high. There is little work for high-dim PDE with constraints for several reasons. Curse of dimensionality rules out traditional numerical methods. Meanwhile, constraints are not well handled by methods such as PINN or neural operators. Therefore, this work innovatively tackles the problem by combining traditional QP method with NeuralODE, which fills the blank to some degree.

Weakness in my opinion is about the experiment. The authors chose the constraints as upper and lower bounds for $\theta_t$ . However, it seems the proposed method HEAP should not be limited to such simple constraints. It would strengthen the effectiveness of HEAP if some other constraints are demonstrated, for example, some typical linear constraints $H[u](x, t)\geq 0$ as indicated in equation (1). Also, does HEAP have some boundary for application? Is there any constraints that HEAP cannot handle?

Typo:

Line 164, $\Omega_A$ -> $\Omega_H$ ?
Line 127 (right), "Constraint $A$ " -> Constraint $H$ ?

其他意见或建议

N.A.

作者回复

2025-04-01

We appreciate your insightful comments and suggestions. Here are our responses to your questions:

Q1: It would strengthen the effectiveness of HEAP if some other constraints were demonstrated

A1: The additional experiment results are reported in the rebuttal supplementary material (RSM), available at https://anonymous.4open.science/r/HEAP/RSM.pdf. We provide four additional experiments, two of which are Black-Scholes with no-arbitrage constraints (Eq. 1-5 of RSM), and the other two are burgers and reaction-diffusion with rate-of-change constraints. The results are shown in Tables 1, 2, 6, 7 of RSM.

Q2: Does HEAP have some boundaries for applications? Are there any constraints that HEAP cannot handle?

A2: HEAP is based on a PDHG algorithm variant, designed for convex constraints. Therefore, HEAP can handle any convex constraints in principle. For more general non-convex constraints, the convergence of HEAP is not guaranteed.

We will add this discussion and fix the typos in the final version once the paper update is allowed.

审稿人评论

2025-04-02

Black-Scholes example is interesting. BTW, what is the practical background for high-dimensional BS equation? A basket of assets? If so, what is the exact form of solution to high-dim BS equation?

作者评论

2025-04-02

Thank you for this insightful question.

The high-dimensional Black-Scholes (BS) equation is indeed used for the pricing problem of a basket of financial derivatives. We identify that BS equation in classical form and some special cases has an closed-form solution. Yet to our best knowledge, in our case with default risk, due to the piecewise nonlinear functions, the nonclassical boundary conditions and the high dimension, there is no such closed-form solutions.

Reference:

J. Han,A. Jentzen,& W. E, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. U.S.A. 115 (34) 8505-8510

审稿意见

评分: 22025-03-10

The paper introduces HEAP (Hyper Extended Adaptive PDHG), a new neural operator designed to solve constrained high-dimensional PDEs, where solutions must meet additional constraints beyond the governing equations. HEAP learns the evolution of PDE parameters and formulates this process as a quadratic programming (QP) problem. To efficiently solve this, the method unrolls the adaptive primal-dual hybrid gradient (APDHG) algorithm into the neural network. This approach improves efficiency while ensuring theoretical guarantees for constrained optimization. Experiments on various high-dimensional PDEs show that HEAP outperforms existing neural operators in accuracy and efficiency.

给作者的问题

What does the constraint mean here? Why we need it for solving PDE instead of optimization?

论据与证据

Why constraint is so important for solving PDE? If the PDE and proper initial and boundary conditions are given, the solution is fixed, what does constraint mean here? Are you doing optimization? Could you make the motivation and problem setting more clear?

方法与评估标准

"The reference solution is obtained by either the explicit solution or the PINN-based numerical solver", since the method solves equations, we should only compare with high-order numerical solutions or exact solutions. Are the authors doing so?

理论论述

3.1 to 3.3 look fine to me.

实验设计与分析

Is there any more complex problem? Are all the testing cases with exact or trusted numerical solutions?

补充材料

The paper does not have such.

与现有文献的关系

The contribution to me is unclear, it needs to be emphasized that what do the constraints mean? Are they residuals? If so, aren't we solving them?

遗漏的重要参考文献

'Evolutional deep neural network', need to be cited and discussed.

其他优缺点

I am not sure the motivation of the paper, what does the constraint mean here? Why we need it for solving PDE instead of optimization?

其他意见或建议

No such.

作者回复

2025-04-01

Thank you for the insightful question. The additional experiment results are reported in the rebuttal supplementary material (RSM), available at https://anonymous.4open.science/r/HEAP/RSM.pdf. Regarding your question, we provide the following explanations:

Q1: What does constraint mean here, if PDE has a fixed solution?

A1: The purpose of constraints, both equality and inequality, is to regularize the solution space and select the meaningful PDE solutions when multiple solutions exist.

Firstly, in a general sense, the IC/BCs are also constraints of PDE, albeit equality constraints.

Secondly, in practice, proper IC/BCs are sometimes not easy to obtain, while inequality is usually more accessible. For example, under which IC/BCs the Navier-Stokes equation has a unique solution is still an open problem (the Millennium Prize Problems) [1]. In this case, the inequality constraints are added to ensure a physically reasonable solution, e.g., the upper-bounded total energy (Eq. 7 in [1]). Constraints do affect the solution of PDEs, as we show in Fig. 2 of RSM, where the solution of the Black-Scholes (BS) equation with different financial constraints is compared. The BS equation is a fundamental PDE in quantitative finance (Nobel prize 1997) [2], which does not have a unique solution in practice.

Thirdly, the constrained PDE problems have been formulated and studied in previous AI4PDE works [3, 4]. However, the existing methods are not scalable to high-dimensional problems, which is the focus of our work.

Q2: Discuss the Evolutional deep neural network (EDNN) paper

A2: The EDNN also formulates the evolution of surrogate model parameters as an optimization problem, but it 1) solves the problem by a numerical solver instead of a neural network, 2) considers only equality constraints, and 3) is not directly scalable to high-dimensional problems. We will add a comparison with EDNN in the final version.

Q3: Is there any more complex problem? Are all the testing cases with exact or trusted numerical solutions?

A3: We provide four additional experiments, two of which are Black-Scholes with no-arbitrage constraints (Eq. 1-5 of RSM), and the other two are burgers and reaction-diffusion with rate-of-change constraints. The results are shown in Tab. 1,2,6,7 of RSM. All the additional testing cases are without exact solutions or trusted numerical solutions.

We have added all the above discussions and RSM to the draft, but we are not able to provide the updated draft here due to the rebuttal rules this year. We hope this response addresses your concerns, and please let us know if you have any further questions.

Reference:

[1] Fefferman, Charles L. "Existence and smoothness of the Navier-Stokes equation." The millennium prize problems 57.67 (2006): 22.

[2] Hull, John C., and Sankarshan Basu. Options, futures, and other derivatives. Pearson Education India, 2016.

[3] Hoshisashi, Kentaro, Carolyn E. Phelan, and Paolo Barucca. "Physics-Informed Neural Networks for Derivative-Constrained PDEs." ICML 2024 AI for Science Workshop.

[4] Moro, Viggo, and Luiz FO Chamon. "Solving Differential Equations with Constrained Learning." The Thirteenth International Conference on Learning Representations.

审稿意见

评分: 42025-03-11

This work provides a principled way of handling constraints in the framework of Control-based solution operator (CSO) for learning PDE solution operators under constraints, called HEAP. In the original CSO, the evolution of the network parameter $\theta$ is governed by a neural network $V$ called the neural control fields that directly map network parameter $\theta$ to its time derivative $\dot{\theta}$ . Constraints can be incorporated via soft penalties.

This paper instead handles the problem by performing constrained quadratic programming (QP). Assuming the constraints are linear, the CSO constraint optimization objective can be identified as a QP problem. This paper modifies an existing QP solver, adaptive primal-dual hybrid gradient (APDHG) for the CSO QP problem, where a matrices $\{W\}$ corresponding to the linear mapping to and from a linear latent space, the initial point of QP solver and the step sizes are predicted from a NN $V$ . The output of $V$ is then used in the modified ADPHG, which runs for a fixed number of iterations and produces the solution, which is the time derivative $\dot{\theta}$ . By identifying the modified ADPHG iterations as layers, $V$ can be seen as a hyper-network that predicts $\dot{\theta}$ from $\theta$ . The paper demonstrates the superior performance of HEAP over CSO in terms of accuracy, measured by L2 relative error, and constraint satisfaction on a set of constrained PDEs of dimension up to 20.

给作者的问题

Nonlinear Constraints: How easily can HEAP be extended to handle more general, non-quadratic, or nonlinear constraints, which might no longer yield a strict QP form in parameter space?
Scalability for d>20: Have you tested or estimated memory/time usage for truly large dimensions, e.g., d=50+ or 100, to see if the cost of forming Q, A becomes limiting?
Hyperparameter Ablation: How does the choice of iteration count K or extended dimension in the hypernetwork impact final PDE accuracy or constraint satisfaction?
Adaptive Step Sizing: The approach includes some learnable step sizes ( $\tau$ , $\sigma$ , etc.). Do you observe they converge to stable values, or do they exhibit large variance across training?
What is the rationale for using ResNet?

论据与证据

Yes

方法与评估标准

The paper tested their algorithm HEAP again the baseline method CSO on (1) constrained heat equations, (2) unconstrained heat, (3) Burgers, (4) reaction–diffusion PDEs in 5, 10, 15, and 20 spatial dimensions, where training set is generated by sampling the initial condition $\theta_{0}$ from a Gaussian distribution, and the test set is generated by sampling $\theta_{0}$ from another Gaussian distribution.

The main evaluation criteria are:

PDE residual error (L2REpde) after a learned final solution,
Constraint violation (L2REcon) for physically constrained PDEs.

I think the PDEs chosen and the evaluation criteria are valid, however, the dimension is only up to $20$ , which still seems relatively low, and the authors did not report any metric on the computational efficiency like memory usage and training/inference speed. This is crucial since the improvements in L2RE for some cases are marginal, and if the proposed method incurs heavy computational cost, then the usefulness of the proposed method would be discounted.

理论论述

Yes, I went through all theorems in section 3.3. except for proposition 3.3. which comes from another paper. They all seem to be coherent.

实验设计与分析

I find that there is too little information provided on the experimental design. What were the parameters in the Gaussian distribution used for generating the training and test set? What were the hyperparameters used for implementing the baseline method CSO? The author should, at the very least, report the penalty coefficient used in CSO, which could greatly impact the performance.
As mentioned before, the authors did not report any metric on the computational efficiency, like memory usage and training/inference speed.

补充材料

There are no supplementary materials.

与现有文献的关系

The paper references neural PDE-operator methods (DeepONet, PINOs, etc.) and high-dimensional PDE solvers (Han et al., Yu et al., among others). They specifically build upon the recent line of "parametric PDE operator" approaches (like the CSO or Neural Control of PDE solutions). For constrained PDE training, they mention relevant works on soft/hard constraints in PDE/inverse design but highlight how those typically remain in lower dimensions.

遗漏的重要参考文献

None that I can think of.

其他优缺点

Strengths:

The QP formulation + APDHG unrolling is a more principled way to enforce constraints than naive penalty-based methods used in CSO, and the experiment shows that the proposed method does outperform the baseline method in terms of accuracy.

Weaknesses / Questions:

As mentioned, there is too little information provided on the experimental design, which greatly reduces the validity of the authors' claim.
As mentioned, the authors do not provide explicit complexity or memory scaling analysis.
The authors mention that a few iterations (K=3) suffice, but it remains unclear how robust this is across PDE families with more complicated constraints or stiffer PDE dynamics.
The method only works for linear constraints. It is unclear how to extend the QP formulation to nonlinear constraints.
Even for linear constraint, the proposed method uses an approximation by truncating the Taylor expansion of $u_{\theta_{t}}$ (Eqn. 4), which modifies the original constraint. There was no discussion on how this approximation is valid throughout the paper.

其他意见或建议

The paper STDE (https://openreview.net/forum?id=J2wI2rCG2u) from NeurIPS 2024 seems to be the current state-of-the-art method for solving high-dimensional with PINN, which you could include in section 2.2. for a complete reference list.

作者回复

2025-04-01

Thank you for the insightful questions and suggestions. The additional experiment results are reported in the rebuttal supplementary material (RSM), available at https://anonymous.4open.science/r/HEAP/RSM.pdf. Here we provide brief responses to your questions point by point:

Q1: Computational efficiency: memory usage and training/inference speed for d<=20 and higher dimensions scalability.

A1: We provide results for d=5, 10, 15, 20, 50, 100 in the Tab. 4 of RSM. The HEAP costs 1.6x GPU memory and about 2x training/inference time compared to the baseline CSO. The time and memory usage are nearly linearly increasing with the dimension, thus HEAP is scalable to higher dimensions in terms of computation.

Q2: Experimental design: parameters in the Gaussian distribution, hyperparameters for CSO, penalty coefficient in CSO

A2: The parameters in the Gaussian distribution are mean=0, std= $\sqrt{0.5}$ for training and mean=0 std=1 for testing. The penalty coefficients are PDE residual loss (with weight $w_1$ ), the constraint violation loss ( $w_2$ ), and the numerical constraint loss ( $w_3$ ). To balance their magnitudes during training, the weights are set as $w_1 = 10^{-2}$ , $w_2 = 1.0$ , and $w_3 = 10$ . The remaining hyperparameters are displayed in Tab. 3 of RSM.

Q3: Hyperparameter Ablation: iteration count K, extended dimension in the hypernetwork.

A3: As shown in Tab. 5 of RSM, both iteration count K = 1,2,3,4 and extended dimension = 5,10,20 in the hypernetwork have a significant impact on the final PDE accuracy and constraint satisfaction. However, K=3 is a relatively robust choice across different PDE families.

Q4: Handling more general, non-quadratic, or nonlinear constraints

A4: The effectiveness of HEAP largely depends on its underlying APDHG algorithm and the structure of constraints. For convex but non-linear constraints, modifications for constraint projections may still preserve convergence guarantees, thus HEAP can be easily extended. For non-convex constraints, however, theoretical convergence becomes challenging, though heuristic adaptations like penalty methods or relaxed Lagrangian formulations might work in practice.

Q5: Convergences of learnable step sizes (tau, sigma, etc.)

A5: As shown in Fig. 3 of RSM, the learnable step sizes converge to stable values during training after 60 batches.

Q6: Rationale for using ResNet

A6: We follow the backbone choice of previous work, CSO, where ResNet is chosen for its simplicity and effectiveness in training deep networks. The skip connections in ResNet help to alleviate the vanishing gradient problem. Other architectures like Transformer or LSTM are also possible for HEAP, but we leave them for future work.

Q7: Add related work STDE.

A7: STDE is a state-of-the-art method for efficiently estimating extra-high-dimensional gradients, which can be integrated into HEAP as a gradient estimator. We will add a discussion and cite to STDE in the final version.

Q8: Validity of the approximation by truncating the Taylor expansion of the constraints (Eqn. 4).

A8: The original constraint is about the function (infinite dimension), which must be approximated by some finite-dimensional numerical scheme for computation. The truncating Taylor expansion is actually an implicit Euler method, in the sense that the constraint is enforced at the next time step. The implicit Euler method is a widely used stable scheme for PDEs, and the error can be controlled by the square of the step size.

审稿意见

评分: 32025-03-13

The paper introduces a novel method called HEAP, designed to solve high-dimensional PDEs that include additional constraints. The PDE solution is approximated with the evolution of the neural network parameters, which is formulated as a quadratic programming problem. To solve the QP efficiently, the method unrolls a fixed number of iterations of the APDHG algorithm. This makes the iterative solver a differentiable module that can be trained end-to-end. A hypernetwork is incorporated to estimate initial values, step sizes, and latent weights needed for the APDHG iterations.

The paper presents theoretical results demonstrating that HEAP can replicate the iterative sequence of the APDHG algorithm and achieves linear convergence under suitable conditions. Experiments on several PDEs (including constrained and unconstrained heat equations, Burgers equation, and reaction-diffusion equations) demonstrate that HEAP achieves lower PDE residuals and better satisfaction of constraints compared to CSO.

给作者的问题

I don't have any further questions.

论据与证据

Theoretical proof is provided for claims: HEAP aligns with the APDHG algorithm, the approximation capacity of HEAP. I have not fully checked the proof.
Empirical results are provided to demonstrate that HEAP outperforms the baseline CSO in terms of lower PDE residuals and improved constraint satisfaction. The experimental evidence is clear, showing nearly consistent improvements over the baseline.

方法与评估标准

The proposed methods and the evaluation criteria are suited to the problem of solving high-dimensional constrained PDEs.

理论论述

I have not fully checked the proof.

实验设计与分析

The experimental design is sound for the problem at hand. However, a more detailed explanation of these problems and possible visualization can help the readers to understand the importance and difficulty of these problems.

补充材料

No supplementary material is provided.

与现有文献的关系

The paper is related to AI for PDE research. No apparent relevance to broader scientific literature.

遗漏的重要参考文献

I can't think of apparent references that are missing.

其他优缺点

The method’s motivation of reformulating the parameter evolution as a QP problem and unrolling the APDHG algorithm is interesting, but the reasoning behind these choices is not strongly conveyed. A clearer explanation of how these design choices overcome limitations in existing approaches would help readers, especially those from the general machine learning community.

The presentation is technical and dense, which makes it difficult for readers who are not already familiar with these methods to understand. The paper does not fully utilize its available space to offer intuitive explanations or visual aids that could help clarify the concepts.

其他意见或建议

I have no additional comments.

作者回复

2025-04-01

Thanks for your suggestions. The additional experiment results and visualizations are reported in the rebuttal supplementary material (RSM), available at https://anonymous.4open.science/r/HEAP/RSM.pdf, and will be added to the draft once allowed. Here we provide responses to your questions point by point:

Q1: A more detailed explanation of the experiment problems and possible visualization.

A1: Here, we briefly explain two of the existing experiment problems and a newly added problem.

Heat equation: The heat equation serves as a fundamental prototype PDEs in physics and mathematics. As a paradigmatic parabolic PDE, it governs ubiquitous physical phenomena ranging from thermal diffusion in materials to probability distribution evolution in stochastic processes. However, solving high-dim instances presents formidable challenges: both traditional mesh-based discretization techniques and neural network methods suffer from the "curse of dimensionality", with storage and computational costs growing exponentially as dimension increases.
Heat equation with constraints: The incorporation of monotonically decreasing temperature fields introduces a physically grounded regularization to the classical heat equation, elevating both its modeling fidelity and computational complexity. This constraint encodes the thermodynamic irreversibility of cooling processes in materials with latent heat barriers. Traditional numerical time-stepping schemes may violate the monotonicity condition unless rigorously coupled with projection operators or barrier methods.
Black-Sholes (newly added, with visualization): The Black-Scholes equation (Nobel prize 1997) is the cornerstone of modern quantitative finance, providing a rigorous framework for pricing financial derivatives under idealized market conditions. In practice, the no-arbitrage principle ensures the absence of trivial risk-free profit opportunities in the financial market. The no-arbitrage constraint arises naturally when modeling structured products with embedded downside protections or regulatory circuit breakers. The visualizations are Fig. 1 and 2 of RSM.

Q2: A clearer explanation of how QP formulation and HEAP algorithm overcome limitations in existing approaches.

A2:

QP formulation: the existing approaches of high-dim PDEs formulate the parameter evolution either as a least square problem or a black-box optimization, without explicitly considering the potential constraints. The QP formulation allows the incorporation of linear constraints into the parameter evolution, extending the applicability of NN-based solvers to a broader range of high-dim PDEs.
HEAP algorithm: the existing methods for solving large-scale QP problem are either iterative solvers like APDHG or vanilla neural operators. The former requires a large number (100+) of iterations to converge, while the latter suffers from the generalization issues. HEAP combines the best of both: it unrolls the APDHG algorithm into a neural network, which achieves better accuracy than vanilla neural operator due to algorithm priors and solves within 3 iterations, much fewer iterations than the APDHG due to learned knowledge from data.

Q3: Intuitive explanations for concepts.

A3: Intuitions of key components:

Evolving operator: the solution of a high-dim PDE is surrogated by a subnetwork, whose parameters evolve over time to approximate PDE. When constraints are added, the parameter evolution is formulated as a QP problem, where the objective is to minimize the discrepancy between the left and right-hand side of PDE, and the linear constraints represent the given constraints.
APDHG algorithm: the APDHG algorithm is a primal-dual iteration algorithm for QP, where the primal variable x is the solution variable and the dual variable y is the Lagrange multiplier or the penalty of constraints. The primal step updates x by a gradient descent of the objective function and penalty term, while the dual step updates y by a gradient ascent of the penalty term. Both updates are projected to the feasible set of the constraints.
HEAP network: HEAP unrolls a fixed number of iterations of the APDHG algorithm into a neural network, where the primal and dual variables are extended from a vector to a matrix, i.e. parallelized over a new dimension. The extension is parameterized by the output of the hypernetwork, which takes the information of the problem state as input. The HEAP network can be trained end-to-end by backpropagation.
Theorems: Theorem 3.1 shows HEAP network is truly an extended version of APDHG algorithm, since the HEAP network can be reduced to the APDHG algorithm when the parameters are set to ones. Theorem 3.2 shows that HEAP convergence of solving QP problems in terms of the required number of parameters. The proof naturally inherits from the convergence rate of the APDHG algorithm.

审稿人评论

2025-04-07

I just realized that the authors cannot view my official comments. I am repeating my comment here in this rebuttal comment.

I thank the authors for their clarifications and have increased my score accordingly. I hope these detailed explanations can be incorporated into the final version of the paper.

作者评论

2025-04-07

We sincerely appreciate your constructive feedback and revised evaluation. We will ensure all detailed explanations are incorporated into the final manuscript to enhance its clarity and rigor.

最终决定Accept (poster)

2025-05-01

The paper proposes HEAP, designed to solve high-dimensional PDEs in the presence of additional constraints. The proposed method proposes considerable performance along with a theoretical guarantee on the soundness of the method.