6.0

/10

Poster4 位审稿人

最低5最高7标准差0.7

3.3

置信度

正确性3.0

贡献度2.8

表达3.0

NeurIPS 2024

Alias-Free Mamba Neural Operator

Jianwei Zheng,LiweiNo,Ni Xu,Junwei Zhu,XiaoxuLin,Xiaoqin Zhang

OpenReview PDF

提交: 2024-05-15更新: 2024-11-06

TL;DR

A neural operator that introduces a novel Mamba integration.

摘要

Benefiting from the booming deep learning techniques, neural operators (NO) are considered as an ideal alternative to break the traditions of solving Partial Differential Equations (PDE) with expensive cost. Yet with the remarkable progress, current solutions concern little on the holistic function features--both global and local information-- during the process of solving PDEs. Besides, a meticulously designed kernel integration to meet desirable performance often suffers from a severe computational burden, such as GNO with $O(N(N-1))$, FNO with $O(NlogN)$, and Transformer-based NO with $O(N^2)$. To counteract the dilemma, we propose a mamba neural operator with $O(N)$ computational complexity, namely MambaNO. Functionally, MambaNO achieves a clever balance between global integration, facilitated by state space model of Mamba that scans the entire function, and local integration, engaged with an alias-free architecture. We prove a property of continuous-discrete equivalence to show the capability of MambaNO in approximating operators arising from universal PDEs to desired accuracy. MambaNOs are evaluated on a diverse set of benchmarks with possibly multi-scale solutions and set new state-of-the-art scores, yet with fewer parameters and better efficiency.

关键词

MambaNeural Operator

评审与讨论

审稿意见

评分: 7置信度: 42024-07-10

This paper introduces a novel neural operator called MambaNO (Mamba Neural Operator) for solving PDEs. The key contributions atleast to me are:

A new integral form called "mamba integration" with O(N) computational complexity that captures global function information. This is so cool! I give props to the authors for coming up with this!
An alias-free architecture that combines mamba integration with convolution to capture both global and local function features.
Theoretical analysis proving MambaNO is a representation-equivalent neural operator (ReNO) and can approximate continuous operators for a large class of PDEs.
Extensive empirical evaluation demonstrating state-of-the-art performance across various PDE benchmarks.

优点

Novelty: The paper introduces a new approach to neural operators by adapting the Mamba architecture, which has shown promise in other domains. The combination of global (mamba integration) and local (convolution) operators is innovative in this context.
Theoretical foundation: The authors provide a solid theoretical analysis, including proofs of representation equivalence and approximation capabilities. This adds depth to the empirical results and helps understand why the proposed method works.
Comprehensive experiments: The evaluation is thorough, covering a wide range of PDE types and comparing against multiple state-of-the-art baselines. The inclusion of both in-distribution and out-of-distribution tests strengthens the claims of generalization ability.
Performance improvements: The reported improvements in accuracy and efficiency are substantial across different PDE types, which is impressive given the diversity of the benchmarks.
Alias-free framework: By adhering to an alias-free framework, the authors address an important issue in neural operators, potentially improving the model's stability and generalization capabilities.
Efficiency: The O(N) complexity of the mamba integration is a significant advantage, especially for high-dimensional problems.

缺点

Hyperparameter sensitivity: The paper doesn't provide a comprehensive analysis of how sensitive the model is to various hyperparameters, such as the number of layers or the dimensionality of the state space.
Potential limitations in handling multi-scale phenomena: To me atleast there are many PDEs that exhibit behavior across multiple scales. While the U-shaped architecture with up and downsampling operations addresses this to some extent, the paper doesn't deeply analyze how effectively MambaNO can handle problems with significantly different scales of behavior.
Lack of discussion on boundary condition handling: The paper doesn't provide a detailed explanation of how different types of boundary conditions are handled within the MambaNO architecture, which is a critical aspect of PDE solving.

问题

How do you envision extending the cross-scan operation to 3D or higher-dimensional PDEs? What challenges do you anticipate? Have you conducted any preliminary experiments on 3D PDEs to assess the scalability of MambaNO?
Can you provide more insight into the relationship between the state space model used in Mamba and the specific requirements of PDE solving?
While MambaNO achieves O(N) complexity, how does its practical runtime and memory usage compare to other methods, especially for large-scale problems?
How well does MambaNO generalize to PDEs with very different characteristics from those in the training set? Are there certain types of PDEs where it struggles? Have you explored transfer learning approaches, where a model trained on one class of PDEs is fine-tuned for another?

局限性

They have presented the limitations!

作者回复

2024-08-06

Most of the hyperparameters, including encoder layer, upsampling factor, scanning direction, and integration depth, have been ablated with results and practical settings given in subsection E of supplemental material. Note that for most NO applications, the dimension of state space is simply set as 16 or 32, hence we omitted its ablation in our previous submission. As suggested, the concerned experiments are provided in the uploaded PDF. The practical selections are dependent on the balance of efficacy and efficiency.
Yes, handling different scales is a key issue in solving PDEs. MambaNO addresses the problem of different behavioral scales not only through the up- and down- samplings of its U-shaped architecture. Additionally, MambaNO combines global (Mamba integration) and local (convolution integration) treatments to learn holistic features, thereby better handling multi-scale phenomena. Besides, the ablations on U-net layers and the replacement of Mamba or convolution integration with naive modules are also given in the subsection E of supplemental material. However, the texts (deep analysis) on how effectively MambaNO can handle problems with significantly different scales are indeed insufficient, which will be fulfilled in later versions. In addition, we conjecture that introducing adaptive mesh refinement or incorporating physics-guided priors would tap the potential of enhancing the capabilities in multi-scale applications.
Boundary conditions play a crucial role in solving PDEs because they define the behavior of the solution at the boundaries of the computational domain. We believe MambaNO's unique scanning mechanism can better handle the initial and boundary conditions of PDEs, allowing the model to utilize this information during the forward inference process. Reducing the scanning directions would also decrease the model's effectiveness, as demonstrated in our ablation study in supplemental material. In fact, techniques such as data augmentation, inputting more boundary information, adaptive boundary conditions, and multi-task learning can significantly enhance the model ability to handle different boundary conditions. We will continue to explore the limits of MambaNO's capability to handle various boundary conditions in the near future.
To extend the cross-scan operation to 3D or higher-dimensional PDEs, we need to generalize the scanning process to handle additional dimensions. This involves iterating along multiple spatial axes and potentially managing increased computational complexity. In 3D case, this means scanning along the x, y, and z axes while ensuring the consistency and accuracy of the numerical solution. Anticipated challenges Computational Complexity: The increase in dimensions significantly raises computation and memory demands. Data Handling: Managing large datasets in higher dimensions can be complex. This includes efficient storage, retrieval, and manipulation of data. Accuracy and Stability: Increased dimensions would heighten the potential for numerical errors and instabilities. Unfortunately, we have not yet conducted experiments related to 3D PDEs. Inspired by GINO [1], which combines GNO and FNO for 3D PDEs, we may next consider integrating GNO and MambaNO to address the 3D challenges.
The state-space model is a mathematical framework that represents a physical system in terms of input, output, and state variables, which are related through first-order differential equations. PDEs typically need to be solved within specific spatial and temporal domains, requiring precise handling of boundary and initial conditions. The state-space model is particularly well-suited for managing these conditions. Specifically, the state-space model uses a scanning-like mechanism to handle information from initial and boundary conditions, integrating this information into the state equations to determine the system's state at the next time point. This approach is logical and intuitive because the state-space model systematically tracks the time evolution of the system's states, ensuring that initial and boundary conditions are correctly processed at each time step.
As suggested, we have conducted experiments on larger dataset such as 2D CFD in PDEBench [2] and compared MambaNO with CNO[3] and other two O(N) competitors, including GNOT[4] and OFormer[5]. The numerical and visual results are given in Table 3 and Fig. 1 of the uploaded PDF, respectively. Evidently, the performance of MambaNO is ahead of other three competitors. In terms of memory usage, stopping epoch, and training time, our MambaNO ranks third, second, second, respectively. Note that all these four are with O(N) complexity.
We believe that generalizing to very different PDEs would be a tremendous advance in this field. In the original texts, we provide both in-distribution and out-of-distribution test results for the same PDE, verifying that MambaNO has optimal generalization across different distributions. However, we have not yet conducted experiments on significantly different PDEs. We consider that the generalization capability for irregular or higher-dimensional PDEs still needs improvement. In the future, we will further explore transfer learning to enhance the ability of the algorithm in handling different PDEs.

[1]. Geometry-informed neural operator for large-scale 3d pdes. in NeurIPS 2024.

[2]. Pdebench: An extensive benchmark for scientific machine learning. in NeurIPS 2022.

[3]. Convolutional neural operators for robust and accurate learning of PDEs. in NeurIPS 2024.

[4]. Gnot: A general neural operator transformer for operator learning. in ICML 2023.

[5]. Transformer for Partial Differential Equations’ Operator Learning. Transactions on Machine Learning Research, 2022.

评论- Response to the Authors

2024-08-10

Your additional explanations and experiments have addressed many of the concerns raised. The ablation studies on hyperparameters and the new results on larger datasets like 2D CFD in PDEBench are particularly valuable, demonstrating MambaNO's competitive performance against other O(N) complexity models. Your insights on handling multi-scale phenomena and boundary conditions are appreciated, though further exploration would be beneficial. The explanation of extending the cross-scan operation to higher dimensions is clear, and the potential integration with GNO for 3D PDEs is a natural next step, and I would be excited to see your follow-up work: )! Overall, thanks for the detailed explanation! I will however keep my score: )! Thank you

2024-08-12

Thank you very much for your acknowledgment!

审稿意见

评分: 6置信度: 52024-07-12

This paper proposes a novel neural operator structure that applies the Vision Mamba architecture to neural operators. Additionally, it introduces an activation operator to mitigate the impact of standard neural network activation functions on bandlimited functions, thus reducing aliasing error. The method shows promising results when compared with several neural operator methods on both in-distribution and out-of-distribution benchmarks.

优点

The strengths of this paper are as follows:

The writing is exceptionally clear, making it easy to follow and effectively explaining the complexity reduction benefits brought by the Mamba structure.
The method achieves impressive results, demonstrating the advantages of the Mamba structure. The paper also evaluates the method on datasets with varying resolutions, showing that the proposed approach maintains consistent performance across different resolutions.

缺点

The main drawback of this paper is the lack of experiments and benchmarks. The selected datasets and baselines for the experiments are not sufficiently representative. The authors should consider referring to datasets from PDEBench [1], or at least citing them. Since the primary advantage of the Mamba NO is complexity reduction, the paper should also compare it with other low-complexity models in transformers, such as linear attention models (e.g., OFormer, GNOT).

References

PDEBENCH: An Extensive Benchmark for Scientific Machine Learning (https://arxiv.org/abs/2210.07182)

问题

局限性

作者回复

2024-08-06

In the original texts, we presented the results of eight representative two-dimensional partial differential equations, demonstrating that MambaNO enjoys better accuracy yet with O(N) complexity. Thanks for your reminding of the new datasets and the two outstanding models. As suggested, we have added several experiments on 2DCFD dataset from PDEBench[1], comparing with the mentioned OFormer[2] and GNOT[3], as well as one original competing method CNO[4]. The numerical and visualized results are given respectively in Table 3 and Fig. 1 of the uploaded PDF file. It is shown that MambaNO still leads the performance among the four O(N) competitors, while ranking second in terms of used epochs and training time, as well as third in terms of GPU memory usage (close to CNO).

Note that the main contribution of MambaNO is not simply complexity reduction. In fact, we propose a novel form of kernel integral that conducts holistic feature learning both locally and globally. On that basis, the theoretical analysis on the obeyance of discretization invariance and continuous-discrete equivalence is also provided. Moreover, the performance again is achieved with only O(N) complexity.

Finally, we will cite this mentioned dataset and models in later version. The experimental results on the complete PDEBench dataset will be also added, given chance of revision. Thanks for your valuable suggestions!

[1]. Pdebench: An extensive benchmark for scientific machine learning. in NeurIPS 2022.

[2]. Transformer for Partial Differential Equations’ Operator Learning. Transactions on Machine Learning Research, 2022.

[3]. Gnot: A general neural operator transformer for operator learning. in ICML 2023.

[4]. Convolutional neural operators for robust and accurate learning of PDEs. in NeurIPS 2024.

2024-08-13

Due to the imminent closure of the discussion period, we kindly request the reviewer to provide us with their valuable feedback on our rebuttal. We are at their disposal to answer any further questions in this regard.

审稿意见

评分: 6置信度: 32024-07-13

The authors present a new operator architecture based on Mamba and convolutional integration. They show theoretically and empirically that this neural operator is discretization invariant and alias-free. The proposed architecture outperforms baselines across a variety of 2D benchmarks.

优点

The theoretical justification for the architecture is strong.

The empirical results show good performance across a variety of PDEs.
The proposed computational complexity of O(N) is great.
There are good ablation studies and discretization invariance results.
There is a good description of the dataset and methods, as well as the computational complexity analysis.

缺点

The lack of statistical significance of the results makes the benchmark comparisons somewhat less convincing.

I could not find hyperparameters, model details, or training details of the baseline models, which makes it difficult for me to evaluate if those baselines are comparable to the default configuration of the MambaNO.
The architecture may be less scalable than others, some of the ablation studies present decreased performance with increasing model capacity. If the authors could comment if this is caused by overfitting or model limitations, that would be great.
The proposed computational complexity is great, however, it would be great if it was supported by empirical timing experiments or comments on the efficiency of the model. Although faster, does the model need more epochs/training time compared to baselines? If the authors could comment on the training details of there different models that would help. Overall, the proposed architecture seems to perform well and is based on theoretically sound analysis. However, the lack of reproducibility and transparency makes the results less convincing.

问题

How could you see adapting this model to larger, irregular systems? Resampling to a regular grid is likely not a feasible strategy with complex geometries or 3D problems.

I am curious about the model’s capacity to learn across multiple PDEs or physics scenarios. Most large PDE models use some sort of attention mechanism or transformer architecture. [1, 2, 3] How do you see your model fitting into this ecosystem, where a pretrained model can be fine-tuned?

Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, Jun Zhu, DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training, https://arxiv.org/abs/ 2403.03542
Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, Mariel Pettee, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho, Multiple Physics Pretraining for Physical Surrogate Models, https://arxiv.org/abs/2310.02994
Maximilian Herde, Bogdan Raonić, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Emmanuel de Bézenac, Siddhartha Mishra, Poseidon: Efficient Foundation Models for PDEs, https://arxiv.org/abs/2405.19101

局限性

The convolutional integration assumes a fixed grid, which makes this neural operator incompatible with irregularly discretized meshes. For practical physical applications, this could be a significant limitation.

作者回复

2024-08-06

As suggested, we have provided the experimental data and their p-values in the uploaded PDF, showing that the differences between the experimental results are statistically significant. In other words, these differences are not due to any random fluctuations. Note that such tables are too redundant and often contain many repetitions since most p-values are <0.001, so we have omitted the presentation of p-values in the original submission. For all competing methods, we used the best parameters suggested in their original papers. All the selected parameters and the trained models will be released at GitHub for comparisons and evaluations. For the proposed MambaNO, most of the hyperparameters, including encoder depth, upsampling factor, scanning direction, and integration layers, have been ablated with results and practical settings given in subsection E of supplemental material. However, we omitted the training parameters in our previous submission. As suggested, they are now provided in Table 2 of the uploaded PDF. In terms of different PDEs, most of them are fixed in all the experiments, yet with partial of them fine-tuned for better performance.
The phenomenon of occasionally performance declines with increasing model capacity can be attributed to underfitting (insufficient data samples). Thanks for your reminding, we have re-conducted comparative experiments on a larger dataset (10,000 samples, with 7,000 used for the training set) among four algorithms of the same O(N) complexity, i.e., Oformer[1], GNOT[2], CNO[3], and our MambaNO, the first two of which are suggested by another reviewer. The experimental results are provided in Table 3 of the uploaded PDF. As seen, the accuracy of our model has been further boosted. Note the efficiency metrics are also provided for a deeper comparison.
As suggested, we have provided the training loss curves for FNO[4], CNO, and MambaNO in Fig. 2 of the uploaded PDF. As seen, FNO suffers from poorer performance that results from early convergence. The convergence trends of MambaNO and CNO, both with a time complexity of O(N), are basically consistent. With a deeper inspection, we can see that the performance fluctuation of MambaNO is more stable than CNO, with finally lower errors given same epochs. Therefore, neither more epochs nor more training time is required for MambaNO. As suggested, more training details are given in Table 2 of uploaded PDF file. As for better reproducibility and transparency, we indeed only provided the trained model for evaluation. During current stage, the attachment can not be updated, nor any link to website is allowed to provide. However, we promise to release the complete codes at GitHub in the coming days.
For irregular geometries, GNOT [2] encodes shape features using an encoder into K and V, which are then combined with cross-attention. RecFNO [5] uses MLP to learn irregular or sparse observations and reshapes the obtained features into a regular shape. These approaches provide insights into how our model can handle irregular problems. We need to use an encoder to integrate irregular shape features and transform them into a form that can be used for subsequent processing. GINO [6] combines GNO and FNO to solve 3D problems. I believe that our MambaNO can also compensate for shortcomings in irregular shapes and 3D problems in this way. Additionally, for 3D problems, designing an efficient integral scanning scheme is also an issue to be discussed. So far, we have not attempted to pre-train the model on multiple PDEs and then fine-tune it. However, these three fine-tuning related papers have broadened our perspective. Thanks. Perhaps pre-training on different PDEs would be more beneficial for our model's performance on downstream tasks. We believe this is an interesting and worthwhile direction to explore. Thanks again for your valuable insights!
Yes, fixed-grid convolutions may not be suitable for irregular discrete grids. However, as mentioned in response to point 4, there are already related works that address irregular shapes [2][5][6]. We believe that MambaNO also owns potential for handling irregular shapes, which will be a focus of our future research.

[1]. Transformer for Partial Differential Equations’ Operator Learning. Transactions on Machine Learning Research, 2022.

[2]. Gnot: A general neural operator transformer for operator learning. In ICML 2023.

[3]. Convolutional neural operators for robust and accurate learning of PDEs. in NeurIPS 2024.

[4]. Fourier Neural Operator for Parametric Partial Differential Equations. in ICLR 2021.

[5]. RecFNO: A resolution-invariant flow and heat field reconstruction method from sparse observations via Fourier neural operator. International Journal of Thermal Sciences, 2024.

[6]. Geometry-informed neural operator for large-scale 3d pdes. in NeurIPS 2024.

2024-08-13

审稿意见

评分: 5置信度: 12024-07-23

The paper introduces Mamba Neural Operator (MambaNO) for solving Partial Differential Equations (PDEs) efficiently. Unlike existing methods, which are computationally expensive and often neglect global and local feature integration, MambaNO offers O(N) complexity. It balances global and local integration through a state-space model and alias-free architecture. MambaNO demonstrates accurate approximation of operators for PDEs and achieves state-of-the-art performance on benchmarks with fewer parameters and better efficiency.

优点

The paper is in general well-motivated and provides an promising MAMBA-based alias-free framework for solving PDE.
The paper is well-written with nice visualization with some experimental verifications.

缺点

Since alias-free is an important property of the framework, it should move it from the appendix to the main body and provide a succinct summarization. Also, the paper should clarify the meaning of alias-free in the main body. Does it mean zero alias error? Can we view them as the reconstruction error for function approximation.
The plots (figure 1,2,3) need better explanation of the setting. It is unclear from the plots what tasks they refer to and what does the those colorful patches/separation mean.
Since MAMBA is a work that mainly address the system concerns, like memory and computation saving via kernal computation, the paper should make very clear comparison with the original MAMBA to explain their design choices. How does the MAMBA techniques improves the neural operator settings and why other methods cannot achieve it. E.g. Are there memory benefit. Furthermore, the paper makes several extensions to the original MAMBA work in Figure 1, like down/up sampling, conv integration, and the autoencoder style architecture. The paper should makes better justification why they adopt this way of stacking modules.
Since the integration of MAMBA as the neural operator is not unique, the author should perform ablation study on the alternative ways of stacking MAMBA layers. Further ablation studies like varying the different MAMBA module inherent parameters will be helpful.

问题

Could the author clarify the main applications of this type neural operator? And do they face the similar constraint of LLM models? In addition, why autoregressive models like MAMBA is more helpful than standard ML models? What are the benefits of causal dependency in these applications.
Is the main application about function approximation problem or generic PDE solving? If it includes generic PDE solving, the author should also provide experiments on that.

局限性

In general for Neurips submission, the author should frame the paper towards generic compute scientist than specialist in this field. So, they should give clear definition, application and motivation of problem. The current intro and background are written in either very vague (related work) or techinical (mamba operator setting) manner.
I am not an expert in neural operator learning. I would leave the judgment of effectiveness (e.g. figure 3) and novelty to other reviewers.

作者回复

2024-08-06

Alias-free is truly an important property of the framework. However, two reasons lead us to move it into the appendix. First, our innovation lies in proposing MambaNO that follows alias-free property. The alias-free framework is innovated by other work [1], not ours. Therefore, we put more effort in clarifying our own ideas, yet with most of the detailed discussions on “alias-free” left in the appendix. Note that a brief introduction of this framework was already given in the main texts. Second, an alias-free NO means it is a representation-equivalent neural operator (ReNO), enjoying discrete-continuous equivalence. Therefore, we have also provided an explicit proposition claiming that MambaNO is actually a ReNO. This implies that the discrete implementations within the neural network are equivalent to the underlying continuous operations, minimizing aliasing errors as much as possible. As you stated, it does mean zero alias error and can be viewed as the reconstruction error for function approximation.
Most previous works [2][3][4] have used the fluid-like patterns to visualize the variables of PDEs. In this work, we simply follow this typical scheme to demonstrate the performance of our methods. As suggested, more explanations will be given in latter version.
Note that a new Mamba variant is not our core effort. Instead, we attempt to propose a new neural operator for PDEs. Along this line, we found that the kernel integral in traditional NOs bears certain similarities to state-space models, leading to the proposal of MambaNO. Compared with the original NOs, such as FNO[2] and CNO[3], rather than MAMBA, the proposal enjoys both local and global dependencies. The advantages of memory benefit and linear computation are also inherited.
In fact, we have provided ablation experiments on the number of stacking MAMBA layers, as shown in Table 6 of supplementary material. Besides, the ablations of varying inherent parameters by using different scanning mechanisms are also given, as shown in Table 5 of supplementary material. As suggested, we would move these contents into the main texts for clarity.
As in most previous works [1][2][3][4], the primary application of NOs is solving PDEs. The constraints that need to be considered during application include, but are not limited to, the initial and boundary conditions of PDEs [5], Continuous-Discrete Equivalence in band-limited function spaces [1], and discretization invariance [2]. We believe these constraints are more theoretically dependent, different from LLMs those are more data driven. To the best of our knowledge, LLM has never been used in solving PDEs. The performance of MambaNO stems from multi-scale information and a scanning mechanism that handles PDE boundary and initial conditions. We believe that causal dependency is essential for the interaction between boundary and initial conditions.
The primary application is the generalized solution of PDEs. Considering that 1D PDEs are relatively simple, we provide experimental data and visualization results for eight representative 2D PDEs in the original text to demonstrate MambaNO's potential for solving PDEs. Due to time and computational resource constraints, irregular shapes and higher-dimensional problems remain to be explored.

[1]. Representation equivalent neural operators: a framework for alias-free operator learning. in NeurIPS 2024.

[2]. Fourier Neural Operator for Parametric Partial Differential Equations. in ICLR 2021.

[3]. Convolutional neural operators for robust and accurate learning of PDEs. in NeurIPS 2024.

[4]. Transformer for Partial Differential Equations’ Operator Learning. Transactions on Machine Learning Research, 2022.

[5]. Pdebench: An extensive benchmark for scientific machine learning. in NeurIPS 2022.

2024-08-13

I am not an expert in Neural ODE. AC could downweight my opinion on this work. Since the author sufficiently addresses my concerns, I will raise my score.

2024-08-13

We sincerely thank the reviewer for acknowledging our response and for raising our score.

作者回复

2024-08-06

At the outset, we would like to thank all four reviewers for their thorough and patient reading of our article. Overall, four reviewers have all complimented some aspects of our study.

Specifically, Reviewer 1: The paper is well-motivated and presents a promising MAMBA-based alias-free framework for solving PDEs, and it is well-written with effective visualizations and experimental verifications.

Reviewer 2: The paper provides a strong theoretical justification for the architecture, with empirical results showing good performance across a variety of PDEs. The proposed computational complexity of O(N) is impressive, and the paper includes good ablation studies and discretization invariance results, as well as a detailed description of the dataset, methods, and computational complexity analysis.

Reviewer 3: The paper is exceptionally clear, making it easy to follow and effectively explaining the complexity reduction benefits brought by the Mamba structure. It achieves impressive results, demonstrating the advantages of the Mamba structure and evaluates the method on datasets with varying resolutions, showing that the proposed approach maintains consistent performance across different resolutions.

Reviewer 4: The paper introduces a new approach to neural operators by adapting the Mamba architecture, which has shown promise in other domains. The combination of global (Mamba integration) and local (Convolution integration) operators is innovative in this context. Firstly, the authors provide a solid theoretical analysis, including proofs of representation equivalence and approximation capabilities, adding depth to the empirical results and helping to understand why the proposed method works. Secondly, the evaluation is thorough, covering a wide range of PDE types and comparing against multiple state-of-the-art baselines. The inclusion of both in-distribution and out-of-distribution tests strengthens the claims of generalization ability. The reported improvements in accuracy and efficiency are substantial across different PDE types, which is impressive given the diversity of the benchmarks. Thirdly, by adhering to an alias-free framework, the authors address an important issue in neural operators, potentially improving the model's stability and generalization capabilities. Additionally, the O(N) complexity of the Mamba integration is a significant advantage, especially for high-dimensional problems.

Besides, their criticism and constructive suggestions will enable us to improve the quality of our article. If our paper is finally accepted, we will incorporate all the changes that we outline below in a camera-ready version (CRV) of our article. As allowed by the conference, we are uploading a one page PDF that contains figures and tables on numerical experiments which support our arguments below. With this context, We proceed to answer the points raised by each of the reviewers individually, below.

Yours sincerely,

Authors of "Alias-Free Mamba Neural Operator".

最终决定Accept (poster)

2024-09-25

This paper adapted the SSM arch to PDE operator learning problems. It receives generaly favorable reviews. However, here are some missing references in terms of mathematics. The authors should acknowledge these previous work in the camera-ready version.

Rather than the "zero-order hold" the authors mentioned, the technique used to approximate the matrix exponential takes advantage of the Bernoulli polynomial approximation of $e^{xA}$ , sometimes known as the Scharfetter–Gummel schemes in computational math.
The original authors of Mamba SSM did not acknowledge this, but the construction of the kernel in (3.15) to approximate a nonlinear state space evolution is called Carleman bilinearization (or simply linearization sometimes), and the resulting form traces back to Krylov subspace, e.g., check Y. Saad's 1992 SINUM paper Analysis of some Krylov subspace approximations to the matrix exponential operator. The Krylov subspace has been widely applied to dynamical system in the form of the evolution of state spaces ever since, with application applied to ML especially in Koopman operators, e.g., Y. Kawahara NIPS 2016.