/10

Poster4 位审稿人

最低2最高3标准差0.4

ICML 2025

Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers

Hang Zhou,Yuezhou Ma,Haixu Wu,Haowen Wang,Mingsheng Long

提交: 2025-01-20更新: 2025-07-24

TL;DR

This paper presents the Universal neural PDE solver (Unisolver) capable of solving a wide scope of PDEs by leveraging a Transformer pre-trained on diverse data and conditioned on diverse PDEs.

摘要

关键词

Neural PDE SolverDeep Learning

评审与讨论

审稿意见

评分: 32025-02-19

This paper proposes Unisolver, a universal neural PDE solver designed as a “foundation model” for solving a broad range of PDEs. Unisolver leverages Transformer architectures pre-trained on a diverse set of PDEs. The model make use of many PDE components, incorporating information such as coefficients, boundary conditions, and notably LLM-based embeddings of PDE expressions. The model achieve better performance than existing models on a diverse dataset.

给作者的问题

Could you clarify which numerical solver was used in Table 22? According to McGreivy & Hakim (2024) [1], after controlling for accuracy and resolution, FNO is only approximately 7× faster than traditional numerical solvers. Understanding the specific solver used would help contextualize the reported speedup.
For time dependent PDEs, how does the neural solver handle extrapolation to future time steps? Given that neural solvers are typically trained on fixed time intervals, it would be insightful to discuss the model’s ability to generalize beyond the training range.
For 2D mixed PDE, the dataset are mostly variants of Navier-Stokes equations with diffusion. How does the model generalize, with and without fine-tuning, to new type of PDEs, especially PDEs with distinct behaviors, such as inviscid burger's equation, wave equation, or Cahn–Hilliard equation, etc.

Q2 and Q3 can be demonstrated without extensive training. While it is expected that performance will degrade for out-of-distribution samples, providing such examples would be valuable for understanding the model’s limitations and guiding future research directions.

[1] McGreivy, N., Hakim, A., 2024. Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations. Nat Mach Intell 6, 1256–1269.

论据与证据

The claims are supported by the evidence.

方法与评估标准

The method and the evaluation criteria make sense.

理论论述

Experimental work. No theoretical claim.

实验设计与分析

The experiments and the analyses are sound.

补充材料

The supplementary material is a jupyter notebook that demonstrate the model using a small dataset.

与现有文献的关系

This work make progress toward "foundation model" for solving PDE.

遗漏的重要参考文献

Not to the knowledge of the reviewer.

其他优缺点

Strength:

Extensive experimental evaluation across diverse PDE benchmarks.
Detailed and thorough appendices.
Well-written and easy to follow.

Weakness

The proposed techniques may be limited in its applicability. Though it's a good combination of existing techniques.
See Questions.

其他意见或建议

N/A

作者回复

2025-04-01

We sincerely thank Reviewer WyYo for providing a detailed review and insightful questions.

Q1: "The proposed techniques may be limited in its applicability. Though it's a good combination of existing techniques."

Thank you for the feedback. Regarding the applicability, we would like to clarify that in the $\underline{\text{Incomplete Component Scenario}}$ section, the "incomplete" setting in $\underline{\text{Figure 7}}$ actually means inference with no PDE components provided. Therefore, our method can be trained with partial PDE information, and supports inference with no PDE components, which strengthens its applicability in real-world scenarios with incomplete PDE knowledge.

We acknowledge that our method builds upon existing some existing components; however, the novelty of our method lies in the systematic integration of PDE information into neural surrogate modeling through conditional embedding, which to our knowledge has not been explored in prior works.

Q2: "Could you clarify which numerical solver was used in Table 22?"

Thank you for raising this question. In $\underline{\text{Table 22}}$ , the numerical solver used for comparison is the pseudo-spectral solver adopted by Li et al. in the original FNO paper, which is not extensively optimized for the specific task. As noted in the paper you cited, FNO is reported to be up to 1,000 times faster than a pseudo-spectral solver, which is consistent with the result in $\underline{\text{Table 22}}$ . We promise to cite the mentioned paper and discuss the efficiency improvement more rigorously.

Q3: "How does the neural solver handle extrapolation to future time steps?"

Thank you for your valuable suggestions. We have provided the extrapolation behavior comparison between our model and baseline models in $\underline{\text{Appendix I.3}}$ . By extending the prediction horizon to time steps beyond the training range, we observe that all model's performance drops, while our method still outperforms all other baselines.

Q4: How does the model generalize, with and without fine-tuning, to new type of PDEs, especially PDEs with distinct behaviors, such as inviscid burger's equation, wave equation, or Cahn–Hilliard equation, etc.

We appreciate the reviewer's valuable suggestions on providing more PDE type generalization experiments.

Note that we have provided a new PDE generalization analysis in $\underline{\text{Appendix C.2}}$ , where Unisolver trained on equations with polynomial order up to 2 demonstrates strong generalization capabilities to equations of polynomial order 3 via fine-tuning.

As per your request, we test our model's performance on 2D wave equation using 200 samples for training and 20 samples for evaluation. The 2D wave equation exibits significantly differnt behaviors from the PDEs of the training dataset. The zero-shot and fine-tuning performance of our model and a FNO model trained from scratch is shown in the table below. Relative L2 is reported.

Unisolver (Zero-shot)	Unisolver (Fine-tuned)	Unisolver (From scratch)	FNO (From scratch)
0.774	0.0078	0.0406	0.0667

The zero-shot performance of our model is not very impressive, showing that zero-shot generalization to PDEs with significantly different behavior is rather hard. However, when fine-tuned with only 200 samples, our model is able to achieve a relative error smaller than 1% on evaluation samples, which is far better than FNO trained from scratch and also better than Unisolver trained from scratch, demonstrating its strong generalization capability to different types of PDEs and its effectiveness in adapting to new PDEs with limited data.

审稿意见

评分: 32025-03-03

The paper proposes a method for solving various types of PDEs by leveraging a pretrained LLM alongside known parameterizations in the form of equations and values. The symbolic equations are embedded using the pretrained LLM, while numerical values and boundary/initial conditions are incorporated separately through conditioning in a Transformer. The approach is evaluated in both in-domain settings (where equations and general conditions remain similar) and out-of-domain settings (where equations and/or their coefficients and parameterizations differ).

给作者的问题

Despite strong quantitative results, I noticed that Unisolver exhibits more artifacts in its solutions compared to other baselines, even compared with ViT (e.g., visible in Appendix E, Fig. 14). Do the authors have any insights into why this occurs and what might be causing more often artifacts?

论据与证据

Generalization ability: The proposed method appears to achieve strong performance in both in-domain and out-of-domain settings. This claim is supported by the quantitative results presented in Section 4.
Universality claim: While the model demonstrates some OOD generalization capabilities through extended experiments on challenging benchmarks, the claim of universality may be overstated. The equations considered remain a relatively small subset of possible PDEs. To substantiate this claim further, additional validation with a broader range of equations, such as those used in mesh-based simulations (e.g., Pfaff et al., 2021), would be necessary.
Theoretical analysis: Although the paper claims to provide theoretical analysis, I did not find any substantial theoretical justification in the text.

Reference:

Pfaff et al. (2021), Learning Mesh-Based Simulation with Graph Networks

方法与评估标准

The proposed method of incorporating prior information by encoding it within a sufficiently flexible framework is conceptually sound. I find it particularly interesting that separating the equation skeleton from the specific numerical values leads to improved performance, highlighting certain limitations of pretrained LLMs.
The evaluation benchmarks are adequate for demonstrating the model’s generalization capabilities within the family of equations considered, particularly in a squared domain setting.

理论论述

Not applicable. Although the abstract mentions a "theoretical analysis of the PDE-solving process," I did not find any substantive theoretical claims in the paper.

实验设计与分析

I reviewed the experimental settings and did not find any specific issues.

补充材料

Yes, I reviewed the supplementary material, specifically the additional results.

与现有文献的关系

The key contributions of this paper align with existing research on neural solvers, particularly in the context of enhancing OOD generalization by leveraging privileged information about the equations.

遗漏的重要参考文献

To the best of my knowledge, there are no essential related works missing from the discussion.

其他优缺点

Strengths:

The results clearly demonstrate OOD generalization when leveraging privileged information about the equations.
The approach of using LLMs to embed symbolic equations is an interesting direction.
The inclusion of a CFD benchmark is valuable for showcasing the method in an applied setting, in which the model should have access to privileged information on the equations.

Weaknesses:

The domain remains discretized on a grid, even in CFDBench, which further weakens the claim of universality.

其他意见或建议

I found the PCA analysis of embedded PDE conditions interesting but in need of improvement. I suggest visualizing it in 3D, as there are three varying conditions, which would better illustrate their distinctions across additional axes. Additionally, adjusting the color shading for each condition could help assess whether the condition embeddings follow a consistent trajectory that aligns with the order of prior values.
I recommend that the authors soften their claim on universality, as it appears too strong given the current scope of the experiments.

作者回复

2025-04-01

We sincerely thank Reviewer GsAC for providing the insightful review and valuable suggestions.

Q1: "The claim of universality may be overstated. The equations considered remain a relatively small subset of possible PDEs." "The domain remains discretized on a grid, even in CFDBench, which further weakens the claim of universality." "Soften their claim on universality."

We thank the reviewer for providing valuable feedback.

(1) Regarding handling irregular geometry.

We acknowledge the limitation of our current method in handling irregular geometries and have discussed this in $\underline{\text{Appendix K}}$ . We highlight that this limitation is shared by existing PDE foundation models such as DPOT, MPP, and Poseidon, all of which focus on data defined on regular grids. One fundamental reason behind this is the lack of suitable large-scale PDE datasets on irregular meshes. To extend Unisolver to irregular geometries, one possible approach is to replace the current canonical Transformer with geometry-general PDE models like Transolver.

(2) Regarding the potential overstatement of "universality".

We agree that the claim of "universal" may overstate the current scope of our model, as our method does not handle all possible kinds of PDEs, but rather focuses on several diverse sets of PDEs. Our definition of a "universal" neural PDE solver should be able to incorporate all possible PDE information and to generalize across the wide family of PDEs, and our method is one step beyond existing approaches by systematically encoding available PDE information. We promise to revise our title to "Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers" to more accurately reflect the scope and goal of our work.

Q2: Although the paper claims to provide theoretical analysis, I did not find any substantial theoretical justification in the text.

We acknowledge the reviewer's concern. Our model design is inspired by theoretical insights on how PDE components influence the solutions, as shown in the motivating example in $\underline{\text{Section 3.1}}$ . However, we would like to clarify that we do not attempt to provide formal theoretical analysis or guarantees in this work.

Q3: Suggestions on improving visualization of embedded PDE conditions.

We sincerely thank the reviewer for providing the valuable suggestions. We provide an additional visualization of the learned PDE condition embeddings, which can be found through this anonymous link: https://anonymous.4open.science/r/rebuttal-4EC7/visualization.png. The new 3D plot provides more insightful visualization of the learned embeddings, clearly reflecting how each coefficient impact the embedded PDE conditions.

Q4: Unisolver exhibits more artifacts.

We thank the reviewer for the detailed observation. $\underline{\text{Figure 14}}$ displays the error maps of each model, which is the absolute difference between the model predictions and the ground truth. Although there are visual artifacts in the error map, it is important to note that the model predictions does not display such significant artifacts. This can be demonstrated in the Full trajectory visualization in $\underline{\text{Figure 17 and 18}}$ . Actually, the absolute error is much smaller than the ground truth values, thus making the artifacts in error maps more noticeable.

审稿意见

评分: 22025-03-10

The paper introduces Unisolver, a universal neural PDE solver that can handle a wide range of PDEs, unlike traditional neural solvers that are limited to specific equations and coefficients. Instead of merely scaling up data and parameters, Unisolver leverages theoretical insights into PDE structures, embedding key components (e.g., equation symbols, coefficients, and boundary conditions) into a Transformer-based model. This approach integrates physics and deep learning, achieving state-of-the-art performance and superior generalization across diverse PDEs.

给作者的问题

No such.

论据与证据

'Universal' is too big to say.

方法与评估标准

Yes

理论论述

No Such.

实验设计与分析

Yes, all of them.

补充材料

Yes, all of them.

与现有文献的关系

Foundational model is open question to the SciML community. This paper provides a tangent solution.

遗漏的重要参考文献

No such.

其他优缺点

'Our models were trained on servers with 32 NVIDIA A100 GPUs, each with 40GB memory.' for such simple examples... How do you justify?

其他意见或建议

No such.

作者回复

2025-04-01

We sincerely thank Reviewer 8zJe for providing valuable feedback and insightful questions.

Q1: "'Universal' is too big to say."

Thanks for this rigorous review, which is very instructive for us.

(1) We adopt "Universal" in the context of deep learning to highlight model's flexibility and broad applicability.

We appreciate the reviewer’s concern. Our use of the term "universal" is not meant to claim generality across all kinds of PDEs, but rather to highlight the flexibility and broad applicability of our model in incorporating diverse PDE information and handling large-scale PDE datasets. Similar usage also exists in the previous work as "Universal Physics Transformers" [1].

Specifically, our method allows the deep model to incorporate all available PDE information including PDE types, coefficients, boundary conditions, domain geometries and force terms. This allows the model to flexibly handle large-scale and diverse PDE datasets, as demonstrated by our experiments on three challenging benchmarks.

In particular, we have trained a single unified model for all considered 1D PDE datasets, and likewise a single model for 2D PDE datasets containing all kinds of PDE components, each covering a wide spectrum of PDE variations. The experiment setup demonstrates a certain degree of universality, as the models generalize well across diverse PDE families within their respective domains.

[1] Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators, NeurIPS 2024

(2) We will change the title to "Towards Universal Neural PDE Solvers" for scientific rigor.

Thanks for the reviewer's kind reminder, we acknowledge the potential overstatement of "universal" and we promise to revise our title to "Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers" to more accurately reflect the scope and position of our work. Similar usage also exists in previous work [2]. This revision will help clarify that the goal of our work is to make progress towards more generalizable and practical neural PDE solvers.

[2] Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior, NeurIPS 2023

Q2: "For such simple examples... How do you justify?"

(1) Not all the PDE-solving tasks require large computation resources.

We apologize for the confusion. While we conduct experiments on a 32-GPU server, we would like to clarify that not all our models were trained on 32 A100 GPUs. The HeterNS dataset is a relatively small dataset and models were trained with a single GPU. For the 2D mixed PDEs dataset, we use 8 A100 GPUs to train our model, aligning with the training configuration reported in DPOT. For the 1D time-dependent PDEs dataset, our model was trained on 32 GPUs, which is justified by its large scale and high diversity, comprising over 3 million training samples in total. The computational cost of Unisolver is shown below, which can also be found in $\underline{\text{Appendix I.7}}$ .

Benchmarks	HeterNS	1D Time-dependent PDEs	2D Mixed PDEs
GPU Hours	24	3000	800

Besides, we also want to highlight that although Unisolver requires some time for training, once trained, it can generalize to new PDEs without retraining and efficiently generate solutions, as can be seen in the efficiency analysis in $\underline{\text{Appendix I.5}}$ . This allows Unisolver to serve as an efficient surrogate of numerical solvers, significantly reducing the computational overhead, which further justifies the training cost.

(2) Our PDE-solving benchmarks are among the hardest ones in current research community.

Regarding the concern about simplicity, we respectfully argue that the 2D datasets are highly non-trivial, covering a wide range of complex PDE types and diverse PDE components. While the 1D dataset is relatively simpler from a numerical solving perspective, it is specifically constructed to cover a large variety of time-dependent PDEs, posing significant challenges for training and generalization from a deep learning perspective. Specifically, the 1D PDE family contains six polynomial coefficients, various viscosity terms and force terms, three types of boundary conditions, all of them can vary simultaneously, imposing intricate challenges for the model to capture the complex relationship between PDE component inputs and the corresponding solutions.

Finally, we would like to note that the datasets used in our paper are not intended as end goals, but as a foundation for systematically studying the performance of deep surrogate models across increasingly challenging PDE settings, paving the way towards more practical and powerful PDE foundation models.

审稿意见

评分: 32025-03-11

This paper introduces Unisolver, a framework that conditions a transformer model on various physical parameters relevant to PDEs. The framework distinguishes between domain-wise components (such as equation symbols and coefficients) and point-wise components (such as external forcing). These are incorporated via adaptive layer normalization, with domain-wise components either extracted from a large language model (LLM) or modeled using an MLP, while point-wise components are patchified for compatibility with the transformer. The method is evaluated on three benchmarks: HeterNS, 1D time-dependent PDEs, and 2D mixed PDEs, where it is compared against baselines.

给作者的问题

What PDE knowledge does the LLM actually encode? Would removing the LLM affect performance?
Why use repeated encoding for PDE parameters? Would a single token with cross-attention be a more efficient alternative?
Did you find a difference between domain and pointwise embeddings ? Which ones are the most important ?

论据与证据

The authors claim that Unisolver achieves strong performance in both in-distribution and out-of-distribution settings. The experimental results suggest that:

The comparisons with baselines appear fair, with most methods receiving similar input information (except for ICON and PINO, which differ in conditioning).
However, the "incomplete scenario" setup is somewhat unclear. Does it refer to partial information during training or only at inference?
The interpretation of the learned PDE embeddings is ambiguous, making it difficult to assess the quality and significance of the extracted representations.

方法与评估标准

The proposed approach is conceptually interesting, as it attempts to create a unified conditioning mechanism for neural PDE solvers. However, a few concerns remain:

The idea of conditioning a transformer on all available PDE information is relevant, but the method assumes full knowledge of the governing equation, which may not always be realistic.
The use of an LLM for encoding equation symbols seems questionable. Does the LLM contribute meaningful information, or does it merely introduce additional complexity? The results in Table 7 seem to indicate that the LLM does not significantly improve performance, which raises concerns about the validity of this design choice.
The introduction suggests that prior approaches fail to incorporate all available information, but this framing might be misleading—previous methods likely did not attempt such exhaustive conditioning because it is not always necessary.

理论论述

This is an experimental paper.

实验设计与分析

The experimental setup appears sound, and the evaluation is conducted on diverse PDE scenarios.
However, the interpretation of learned embeddings is not particularly insightful, making it difficult to assess whether the model genuinely understands PDE structure or is simply performing pattern recognition.
The partially observable setting is not very convincing: the model still relies on 70% fully observed data, which is a relatively mild missing-data scenario.

补充材料

I have checked the supplementary.

与现有文献的关系

This work aligns with research on generalizable PDE surrogate models and foundation models for PDEs.

遗漏的重要参考文献

Key references are missing:

MPP and Poseidon should be discussed earlier in the text.
Generalization methods such as CODA [1], CAPE [2], and Zebra [3] should also be mentioned to provide a clearer contextualization of Unisolver’s novelty.

[1]Generalizing to New Physical Systems via Context-Informed Dynamics Model, Kirchmeyer et al, 2022. [2]Learning Neural PDE Solvers with Parameter-Guided Channel Attention, Takamoto et al, 2023. [3]Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs, Serrano et al, 2024.

其他优缺点

The distinction between neural PDE solvers and neural surrogates is not well discussed in the introduction. PDE solvers aim to directly solve the equation, whereas surrogates approximate numerical solutions efficiently.

其他意见或建议

Clarify the contribution of the LLM: How much does it actually help? If it provides limited gains, should it be removed?
Improve the "Neural PDE Solvers" section in Related Work: The discussion should better differentiate between neural solvers (PINNs, which enforce physics constraints) and neural surrogates (which approximate solutions efficiently).
Rephrase certain claims: For instance, stating that "most methods fail to incorporate all PDE information" is an overstatement—rather, existing methods prioritize different aspects of the problem based on their intended applications.
Consider an alternative conditioning approach: Instead of repeating equation information at each location, why not use a single token and cross-attention to encode the PDE domain information?

作者回复

2025-04-01

We sincerely thank Reviewer Z6hm for providing valuable feedback and suggestions.

Q1: About the incomplete scenario setup and incomplete ratio.

(1) Clarify our setting.

Sorry for the confusion. We clarify the incomplete component scenario setup:

During training, each PDE component (viscosity and force in HeterNS) is independently masked with 30% probability, resulting in 49% samples with full components.
During evaluation, the "incomplete" setting in Figure 7 means no components available, while "complete" means full components.

The results demonstrate that our model can inference with no PDE components, and providing components boosts performance.

(2) New experiment with a larger incomplete ratio.

We perform additional ablation where each component is masked with 80% probability during training, resulting in only 4% data with full components. As shown in the results below (averaged across multiple forces), our model outperforms FNO and maintains strong performance.

No components available: (80% masked)

Viscosity	1e-5	5e-5	1e-4	5e-4	1e-3
FNO	0.1039	0.0485	0.0305	0.0097	0.0047
Unisolver	0.0647	0.0237	0.0147	0.0043	0.0022

Full component: (80% masked)

Viscosity	1e-5	5e-5	1e-4	5e-4	1e-3
FNO	0.1009	0.0473	0.0288	0.0093	0.0044
Unisolver	0.0644	0.0232	0.0136	0.0039	0.0021

Q2: The method assumes full knowledge of the governing equation, which may not always be realistic.

According to $\underline{\text{Incomplete component scenario section}}$ , our model does not rely on complete knowledge of governing equations and can be trained with partial PDE information, supporting inference with no PDE components. This enables application to real-world settings with incomplete PDE knowledge.

Besides, we want to note that beyond real world, simulation in CAE software is also valuable where complete information is easy to obtain and Unisolver can serve as an efficient surrogate.

Q3: About contribution of LLM embedding, interpretation of learned PDE embeddings, and whether the model is simply performing pattern recognition.

As shown in $\underline{\text{Table 6}}$ , incorporating LLM embeddings yields an average improvement of 5.76%, indicating they provide meaningful information. $\underline{\text{Table 7}}$ further shows that LLM embeddings outperform manually constructed symbolic embeddings on both in-distribution and downstream tasks, with over 10% improvement on Advection equation, highlighting better generalization performance of LLM embeddings and benefits beyond simple pattern recognition.

Regarding efficiency, as the LLM has been heavily optimized, generating embeddings incurs negligible computational overhead.

To improve interpretability, we provide additional visualizations: https://anonymous.4open.science/r/rebuttal-4EC7/visualization.png. We filter out equations with viscosity and force for clarity and annotate each cluster with a representative PDE formula. The visualization clearly shows the well-structured latent space of LLM embeddings.

Q4: About statement on existing methods.

We agree that "fail to incorporate all PDE information" may be too strong. We will revise the statement to "do not fully utilize all available PDE information" to soften our claim.

Q5: MPP and Poseidon should be discussed earlier; CODA , CAPE, and Zebra should be mentioned.

We will discuss MPP and Poseidon earlier in Related Works for clarity. We have already cited CAPE, and we will cite CODA and Zebra to better contextualize Unisolver’s novelty.

Q6: About distinction between neural PDE solvers and neural surrogates.

We use the term neural PDE solver to refer to PINNs and neural operators following prior works like Message Passing Neural PDE Solver and Transolver. We also acknowledge that under a narrower definition by [1], only PINNs are considered neural solvers. Considering the mixing concept, we prefer to follow the paper Message Passing Neural PDE Solver with more relative topic and maintain the "neural PDE solver".

[1] Physics-informed machine learning: A survey on problems, methods and applications.

Q7: About conditioning approach and repeating encoding.

Token repeating is an implementation trick similar to tensor broadcasting. We experiment with cross attention on HeterNS. As shown in the table below, our design outperforms cross attention in PDE information conditioning.

Viscosity Generalization (Relative L2)	In-Dist	Zero-shot
Unisolver (Cross Attn)	0.01078	0.0416
Unisolver (Ours)	0.0098	0.0374

Q8: About difference between domain and point-wise embeddings.

The importance of domain-wise vs. point-wise components varies by dataset. In HeterNS, point-wise components (e.g., force terms) are more critical due to their strong influence on fluid patterns. For 1D and 2D mixed PDEs, both types contribute significantly in guiding the simulation.

审稿人评论

2025-04-05

(1) Clarity of the setting

I remain a bit confused by the explanation. What exactly happens in the incomplete scenario at inference time? Is it possible to run the model without providing any conditioning vectors at all? If tokens must be supplied, how are they selected? Also, could this setup be extended to handle new dynamics?

(2) Incomplete scenario

The two tables seem to show very similar performance. What is the interpretation of this? Does it mean the conditioning has limited impact in this setting?

Answer to Q2

I agree that this remains a valuable setting to explore. As you correctly point out, Unisolver could serve as a powerful surrogate in that context.

Answer to Q3

I see that it does help, but the gain doesn’t seem particularly significant. That said, I find the core value of the paper lies in the flexibility of the proposed architecture for handling various conditioning strategies, which is already an interesting contribution on its own.

Answer to Q6

Apologies if I appear overly strict here, but I don’t think the referenced models should be referred to as "solvers", they are better described as surrogates. I believe we should be precise in our terminology.

Answer to Q7

Thank you for providing the table. I appreciate the additional detail.

Answer to Q8

Interesting point. In particular, it is not always clear whether the increased difficulty in modeling certain dynamics comes from complex forcing terms or from specific boundary conditions. It could be interesting to see if the proposed model can help identify the most critical factors of a given dynamics.

Thanks again for the clarifications. With your responses, I now have a better understanding of the paper and will increase my score to 3.

作者评论

2025-04-06

We sincerely thank Reviewer Z6hm again for providing the thoughtful and constructive follow-up response to our rebuttal, as well as for raising the score. We also appreciate the time and care you have taken to provide further suggestions, which are very helpful in improving the clarity and rigor of our work.

Below, we make further clarifications for the remaining points of confusion.

(1) Clarify the incomplete component scenario at inference time.

Yes, the model can be run without providing any conditioning vectors at inference time. Note that learnable tokens are used to represent the types of unknown PDE components, rather than instance-specific values. For example, all viscosity coefficients share one learnable token, and the force terms share another. These learnable tokens serve as indicators to the model that the corresponding components are unknown. This design allows Unisolver to flexibly operate in three modes:

Full conditioning: All PDE components are provided.
Partial conditioning: A subset of PDE components is provided.
Zero conditioning: No PDE components are provided.

The setup also allows the model to handle new dynamics in two ways. If PDE components of the new dynamics are known, they can be directly provided to the model, same as the "zero-shot" results in $\underline{\text{Table 3}}$ . If the PDE components are unknown, learnable tokens can be used to represent the unknown components, allowing the model to still perform effectively.

(2) Explain the results of the incomplete component scenario.

We would like to highlight that the results in our rebuttal correspond to a high masking ratio of 80% during training, meaning that only 4% of the training samples contain complete PDE information. Therefore, the performance gap between full conditioning and no conditioning is reduced, as the model mainly learns to predict with limited component guidance.

Additionally, the model input in the HeterNS dataset contains ten history timesteps, which provides dynamic information of the fluid. Therefore, under this highly incomplete supervision, the model tends to rely more on the history inputs to infer the underlying dynamics, which weakens the impact of the components.

Regarding the distinction between "surrogates" and "solvers", we appreciate your emphasis on precise terminology. We will consider the use of terminologies more carefully in our future revisions. We promise to conduct a more comprehensive literature review to determine whether we should replace neural solvers with neural surrogates to better reflect the nature of our approach.

Thanks again for your support and dedication to our paper and for acknowledging our contributions.

最终决定Accept (poster)

2025-05-01

The paper proposes a framework for training PDE surrogates conditioned both on field state values and on prior information, including PDE coefficients, parameters, the algebraic expression of the PDE (via an LLM embedding of this expression), and boundary conditions. The prior information conditions a transformer architecture through adaptive layer-wise scale and shift operations. The method is evaluated on three benchmarks involving 1D and 2D PDEs.

The reviewers appreciate the novelty of the approach and the flexibility of the framework. They requested clarifications and additional experiments. The authors' responses were found to be relevant, and they included some complementary experiments. A key question remains regarding the practical usefulness of the framework, as pointed out by the reviewers, given that the authors make several assumptions about the availability of prior information—assumptions that are rarely met in practice. Nevertheless, the approach may inspire new ideas in the community, and I propose acceptance. The authors are encouraged to take into account the reviewers’ comments, particularly their concerns about the overstatements regarding the method’s supposed “universality,” by moderating their claims and better highlighting its limitations.