6.8

/10

Spotlight4 位审稿人

最低5最高8标准差1.3

3.3

置信度

正确性3.0

贡献度3.0

表达2.5

NeurIPS 2024

Neural Krylov Iteration for Accelerating Linear System Solving

Jian Luo,Jie Wang,Hong Wang,huanshuo dong,Zijie Geng,Hanzhu Chen,Yufei Kuang

OpenReview PDF

提交: 2024-05-15更新: 2024-11-06

摘要

关键词

AI4PDENeural OperatorScientific ComputingKrylov Subspace

评审与讨论

审稿意见

评分: 5置信度: 32024-06-25

The authors use a neural operator approach to generate subspaces that are used for the acceleration of Krylov subspace methods for several partial differential equations setups.

优点

The authors are interested in a very relevant problem of computational science and engineering of solving linear systems of equations. The attempt of providing a convergence analysis for the proposed method.

缺点

The details of the method remain hidden to me and it seems difficult to grasp the details of the method and many details and even the major workings of the method are unclear.

问题

Why the smallest eigenvalues in line 163 on page 5? I assume that Y_k in Algorithm 1 is the predicted subspace from the neural network, but then explain the derivation used for equation (11)? How is this motivated? How is the method of the authors combined with preconditioning? The details are not given but the choice of the preconditioner is crucial and how would this depend on the learned subspace the method provides?

局限性

The method is limited to linear problems but this is a very broad and general class. If the method was better explained and the details clear and convincing there would be no limitations for these kinds of problems.

作者回复

2024-08-07

We thank the reviewer for the patience to read through our paper. We are pleased to re-introduce our work and respond to your comments as follows. We sincerely hope that our rebuttal could properly address your concerns. If so, we would deeply appreciate it if you could raise your score and your confidence. If not, please let us know your further concerns, and we will continue actively responding to your comments and improving our work.

Major Workings of Our Method

We propose a novel method, namely NeurKItt, for accelerating linear systems solving. NeurKItt consists of two modules, one is the subspace prediction module, while the other is the acceleration module.
The subspace prediction module employs the neural operator for predicting the invariant subspace $\hat{\mathcal{K}}$ of given linear systems $Ax=b$ , as a matrix $X$ that contains eigenvectors corresponding to the smallest eigenvalues from $A$ .
The acceleration module uses the predicted subspace for accelerating the linear system solving. It is the Krylov subspace algorithm with predicted subspace involved for iteration.

Subspace Prediction Module

We use FNO to predict the subspace of the matrix $A$ from the given linear system $Ax=b$ . We give the introduction to FNO in Appendix C.
The reason why we select the neural operator for subspace prediction is that the mapping from the matrix $A$ of linear system $Ax=b$ to its corresponding invariant subspace $\mathcal{\hat{K}}$ can be considered an operator, i.e., a mapping between two Hilbert spaces, thus solving such problems involves finding the corresponding operator.
We employ the projection loss function for training the FNO and apply thin QR decomposition to the output of the FNO. Projection loss is designed for learning the subspace. The idea behind it is that the predicted subspace should contain the basis vectors in the target subspace. We employ the thin QR decomposition for orthogonalizing the FNO output. We provide the details about thin QR decomposition in Appendix E.

Why the smallest eigenvalues in line 163 on page 5?

The FNO learns the subspace $\mathcal{S}$ that associates with the $n$ smallest eigenvalues, assuming $\mathcal{S}=\mathrm{span}\{ s_1, s_2, \ldots, s_n \}$ , where $s_i$ is the eigenvector corresponding to the $i$ -th smallest eigenvalue of the given matrix $A$ , $i = 1, 2, \ldots, n$ . We think such subspace $\mathcal{S}$ is easy to learn because Theorem 5.2 gives proof that if changes in a matrix $A$ occur in the subspace corresponding to larger eigenvalues, the subspace associated with the smallest eigenvalues is minimally affected, as long as the changes are smaller than the gap between the smallest and larger eigenvalues. Theorem 5.2 can be found in Section 5.

Acceleration Module

The acceleration module is the Krylov iteration algorithm with predicted subspace involved, inspired by Krylov recycling algorithms. The detailed pseudo-code about the acceleration module can be found in Appendix A.2. The key of the acceleration module is the Arnoldi relation $(I−C_kC^H_k )AV_{m−k} = V_{m−k+1}\underline{H}_{m−k}$ , which takes the subspace into iteration.
Here we give an intuitive idea about the acceleration module. Krylov subspace iteration solves the linear system problem by iteratively approximating the invariant subspace of matrix $A$ from the linear system $Ax=b$ . When the subspace information about $A$ is given, we don't need to start the approximation from scratch, thus the acceleration module can improve the convergence speed.

I assume that Y_k in Algorithm 1 is the predicted subspace from the neural network, but then explain the derivation used for equation (11)? How is this motivated?

$Y_k$ is the matrix of predicted subspace $\mathcal{\hat{K}}$ from the neural operator. In particular, we obtain $C_k$ and $U_k$ from the thin QR decomposition on $AY_k$ , such that $AU_k=C_k$ and $C_k^{H}C_k=I_k$ .
Equation $C_k^{H}C_k=I_k$ allows us to find the basis of $\mathcal{\hat{K}}$ , which enables us to remove the subspace $\mathcal{\hat{K}}$ in Equation (11) $(I−C_kC^H_k)AV_{m−k} = V_{m−k+1}\underline{H}_{m−k}$ by $(I-C_kC_k^H)$ . This approach makes the Krylov subspace iteration more efficient because we only need to search the rest of the invariant subspace for matrix $A$ .
Equation $AU_k=C_k$ makes sure the matrix $\underline{H}_{m-k}$ is an upper Hessenberg matrix, which speedups the solving of the linear system in the later iteration procedure.

How is the method of the authors combined with preconditioning? The details are not given but the choice of the preconditioner is crucial and how would this depend on the learned subspace the method provides?

Supposing the original Arnoldi relation $AV=VH_0$ and the preconditioner matrix $P$ , the preconditioned Arnoldi relation is $PAV=VH_1$ . So, for NeurKItt, after the combination, the Arnoldi relation is $P(I-C_kC_k^H)AV=VH_2$ . Here, $H_i$ is the upper Hessenberg matrices.
In our experiments, the learned subspace for any preconditioning method is the subspace composed of eigenvectors corresponding to the smallest eigenvalues of the given matrix $A$ .

We also provide a theoretical analysis of the acceleration module in Section 5.1, which gives the convergence analysis and explains our choice to predict the subspace rather than the solution. In Section 5.2, we further justify our choice of using the subspace composed of eigenvectors corresponding to the smallest eigenvalues for prediction.

If you are confused about any of the content in our paper, like what kind of obstacle hinders the understanding of our work, please let us know your further concerns, and we will continue to actively respond to your comments.

2024-08-08

Thank you for the comments. Regarding your last point on the use of the preconditioner, the main idea of applying a preconditioner P is to cluster the eigenvalues of the preconditioned matrix PA. One typically hopes that the n previously smallest eigenvalues would now be clustered in a tighter region, ideally the perfect preconditioner guarantees only very few distinct eigenvalues What does this mean for your method and the approximated subspace?

2024-08-09

Thank you for your question. We address it as follows:

We agree with the reviewer about how preconditioners work. However, our method is based on the predicted subspace to improve the convergence speed, which avoids approximating the subspace from scratch for Krylov subspace iteration. Thus, preconditioners that disrupt the invariant subspace, not eigenvalues, will have a negative influence on our method, like ICC and ILU.
Despite this potential influence, extensive experiments show that these disruptions do not significantly affect our improvement in practice. In particular, our method accelerates the solving across all preconditioners, especially achieving up to 2.64 $\times$ time speedup under ICC and 2.6 $\times$ time speedup under ILU.
Besides, optimizations for the approximated subspace could be our future work, which takes the negative influence of different preconditioners on subspace into consideration.

评论- Thanks for Your Positive Feedback

2024-08-13

We appreciate the positive feedback and the time you taken to evaluate our work. To improve clarity, we will include these discussions to help readers grasp the key aspects of our study.

审稿意见

评分: 6置信度: 32024-07-11

The manuscript introduces a novel method, referred to as NeurKItt, which combines neural network techniques with Krylov subspace methods to accelerate the solution of linear systems derived from partial differential equations (PDEs). The core innovation of NeurKItt lies in its capability to predict invariant subspaces associated with the matrices defining these linear systems, thereby significantly reducing the number of iterations required for convergence.

优点

Originality: The manuscript introduces a novel approach by integrating neural network techniques with Krylov subspace methods to predict invariant subspaces of linear systems, significantly enhancing the efficiency of these traditional methods.
Quality: The manuscript effectively targets the critical issues of computational inefficiency and instability in the context of high-dimensional and poorly conditioned linear systems. The choice of problem is highly relevant to both academic research and practical applications in scientific computing, making the work valuable to a wide audience.
Clarity: The manuscript is well-structured, presenting complex concepts in an understandable manner. It effectively communicates the challenges of existing methods and how NeurKItt addresses these issues, providing clear explanations and logical progression from problem statement to solution. However, improvements in typographical accuracy and some clarifications in methodological descriptions could further enhance clarity.
Significance: The significance of this work lies in its potential impact on the efficiency of solving large-scale linear systems, which are crucial in various scientific and engineering applications. By reducing computational time and resource consumption, NeurKItt could significantly benefit fields reliant on large-scale computations, making this contribution highly relevant to both academic research and industry applications.

缺点

The sentence "Research in using neural networks for accelerating linear system solving" on line 75 appears abruptly terminated and lacks a verb or continuation that would complete the thought.
On line 81, the term "precondition" is likely a typographical error. The correct term, given the context, could be "preconditioning".
There appears to be a typographical error in Equation (8), where the variable $z$ should presumably be $y$ .
Equation (13) on page 7 starting with an unnecessary 's' before the equal sign seems to be a typo.
In the manuscript, Section 4.1 discusses the use of Fourier Neural Operators (FNO) for parametric PDE problems and introduces the subsequent subspace prediction concept. It employs a subspace $S$ defined as the span of vectors $\{s_1, s_2, \cdots, s_n\}$ . The manuscript mentions that $s_i$ is the eigenvector corresponding to the $i$ -th smallest eigenvalue, but it does not specify whether these eigenvalues are of the matrix or another related matrix. This omission could lead to confusion about the origin and relevance of these eigenvectors, particularly in how they are integrated into the model and influence the outcome of the subspace prediction.
In Section 4.1, the manuscript proposes using invariant subspaces associated with the smallest eigenvalues to train the subspace prediction module. However, an oversight in the theoretical analysis (Section 5.2) arises from applying this concept to potentially non-Hermitian matrices. The section (section 5.2) is based on Hermitian positive definite matrices, whose eigenvalues are all real numbers that can be orderly compared. Non-Hermitian matrices, on the other hand, may exhibit complex eigenvalues, complicating or nullifying the notion of "smallest" eigenvalues due to their lack of a natural order. This discrepancy raises concerns about the applicability of the method to non-Hermitian matrices, which are prevalent in many practical applications.

问题

In reviewing the theoretical analysis provided between lines 237 and 248, I observed a notable similarity in both the phrasing and the mathematical formulations with those presented in reference [23]. Could the authors clarify the extent of originality in these sections? It is crucial for academic integrity to distinguish clearly between direct quotes and original analysis.

局限性

The authors discussed limitations in Section 7 Limitation and Conclusions.

作者回复

2024-08-07

We thank the reviewer for the insightful and valuable comments. We respond to your comments as follows and sincerely hope that our rebuttal could properly address your concerns. If so, we would deeply appreciate it if you could raise your score and your confidence. If not, please let us know your further concerns, and we will continue actively responding to your comments and improving our work.

Weaknesses

Typos mentioned in Weakness 1-4

Thank you for pointing out these typos. We will fix them in the future version.

The manuscript mentions that $s_i$ is the eigenvector corresponding to the $i$ -th smallest eigenvalue, but it does not specify whether these eigenvalues are of the matrix or another related matrix. This omission could lead to confusion about the origin and relevance of these eigenvectors, particularly in how they are integrated into the model and influence the outcome of the subspace prediction.

Given a linear system $Ax=b$ , these eigenvalues are derived from the same matrix $A$ . Thank you for pointing out this potentially misleading statement. We will fix it in the future version. In particular, we will replace the sentence in lines 164-165 "where $s_i$ is the eigenvector corresponding to the $i$ -th smallest eigenvalue, $i = 1, 2, \dots, n$ ." with "where $s_i$ is the eigenvector corresponding to the $i$ -th smallest eigenvalue of the given matrix $A$ , $i = 1, 2, \ldots, n$ ."

In Section 4.1, the manuscript proposes using invariant subspaces associated with the smallest eigenvalues to train the subspace prediction module. However, an oversight in the theoretical analysis (Section 5.2) arises from applying this concept to potentially non-Hermitian matrices. The section (section 5.2) is based on Hermitian positive definite matrices, whose eigenvalues are all real numbers that can be orderly compared. Non-Hermitian matrices, on the other hand, may exhibit complex eigenvalues, complicating or nullifying the notion of "smallest" eigenvalues due to their lack of a natural order. This discrepancy raises concerns about the applicability of the method to non-Hermitian matrices, which are prevalent in many practical applications.

Thank you for pointing out this problem. Analyzing perturbations for non-Hermitian matrices involves pseudo-spectral methods, which makes it difficult to investigate. To simplify the analysis, Section 5.2 explores how to select an appropriate subspace in the Hermitian positive definite case. In practice, for Hermitian positive definite matrices, the smallest eigenvalues are compared following the natural order. But for non-Hermitian matrices, the eigenvalues will be sorted by comparing their modulus, such that: $|\lambda_1|\le|\lambda_2|\le|\lambda_3|\le|\lambda_4|\le\dots\le|\lambda_n|$ where $\lambda_i\in\mathbb{C}$ , $i=1,2,\dots,n$ , is the eigenvalue of a given non-Hermitian matrix $A$ . We will add these details to Section 5.2 to avoid the discrepancy.

Questions

In reviewing the theoretical analysis provided between lines 237 and 248, I observed a notable similarity in both the phrasing and the mathematical formulations with those presented in reference [23]. Could the authors clarify the extent of originality in these sections? It is crucial for academic integrity to distinguish clearly between direct quotes and original analysis.

We are sorry for the misunderstanding caused by our improper citation. We fully adhere to academic ethics, and this issue was due to a writing oversight. Sentences in lines 237-248 provide the definitions and assumptions serving as the preliminaries for Theorem 5.2, which is mentioned in Theorem 5.2. To keep Theorem 5.2, which is cited properly, coherent with the original text, we reused the definitions and assumptions from reference [23], and this might potentially lead to disputes.
Due to NeurIPS rebuttal restrictions, we are not allowed to submit a revised paper during rebuttal. We will make the quotes clear by adding the sentence " The following definitions and assumptions for Theorem 5.2 are from the reference [23]." at line 238 to indicate that content in lines 237-248 comes from the reference [23].

2024-08-12

Dear Reviewer iSRc,

We are writing as the authors of the paper titled "Neural Krylov Iteration for Accelerating Linear System Solving" (ID: 18161). Thanks again for your valuable comments and constructive suggestions, which are of great help to improve the quality of our work. We have followed your suggestions to significantly enhance the quality of our work. As the deadline for the author-reviewer discussion period is approaching (due on Aug 13), we are looking forward to your further comments and/or questions.

We sincerely hope that our rebuttal has properly addressed your concerns. If so, we would deeply appreciate it if you could raise your scores. If not, please let us know your further concerns, and we will continue actively responding to your comments and improving our work.

Best regards,

Authors

2024-08-13

Thanks for the response of the authors.

All of my previous concerns have been addressed in the rebuttal, I think this is a good work and I would like to raise my score from 5 to 6 for this submission.

评论- Thanks for Your Positive Feedback

2024-08-13

We appreciate your positive feedback and the efforts you made in reviewing our rebuttal. Thanks for your constructive comments and valuable suggestions. We will incorporate them into our paper in the future version.

审稿意见

评分: 8置信度: 32024-07-12

This paper proposes a data-driven approach to accelerate solving linear systems. Linear Systems are widespread in scientific computing applications like solving partial differential equations (PDEs), nonlinear systems, etc. so this method has potential for major impact. The proposed method builds upon the idea of neural operators that learn mapping between function spaces.

While most prior methods have attempted to accelerate solving linear systems by predicting a better initial guess, this work instead predicts the matrix invariant subspace. It uses this subspace to accelerate the Krylov Iterations. The method is validated on linear systems originating from PDEs and achieves speedups of around $5.5\times$ in computation time.

优点

The presented method is novel, and to the best of my knowledge there doesn't exist a method that uses a similar approach
Experiments are well designed and use strong baseline methods (GMRES from PETSc). Speedup in computation time and iteration count over such a strong baseline presents a strong case for the proposed method.

缺点

The Neural Operators need to be trained on each individual problem. So while there is an improved convergence speed, there is an increased overall time when the training time is taken into account. However, this is a pretty common shortcoming across neural operator approaches and hence I haven't used this shortcoming as part of my overall scoring.

问题

Are there any cases where GMRES (with no-preconditioners) fails to converge while NeurKItt can solve it because it is data-driven?
Is it necessary to learn the NO for each system? Or is it possible to learn a single network and reuse it?
(Maybe I missed this) Is it possible to include the training times for the neural networks?

局限性

N/A

作者回复

2024-08-07

Weaknesses

The Neural Operators need to be trained on each individual problem. So while there is an improved convergence speed, there is an increased overall time when the training time is taken into account. However, this is a pretty common shortcoming across neural operator approaches and hence I haven't used this shortcoming as part of my overall scoring.

We agree with the reviewer that training the neural network involves additional costs, and it is a shortcoming of any neural operator approach. But considering the benefits of acceleration over millions of linear system solving, such training costs could be negligible due to the time saved. This is one of the reasons why we claim the feasibility of NeurKItt in practice.

Questions

Are there any cases where GMRES (with no-preconditioners) fails to converge while NeurKItt can solve it because it is data-driven?

There might be the case where GMRES fails but NeurKItt can solve.
Specifically, NeurKItt consists of the subspace prediction module and acceleration module. The subspace prediction module is data-driven, while the acceleration module uses the predicted subspace to improve the Krylov iteration convergence speed. The case mentioned is due to the acceleration module that replaces the original Arnoldi relation with $(I-C_kC_k^H)AV_{m-k}=V_{m-k+1}\underline{H}_{m-k}$ , which takes the invariant subspace into iteration. It can also be considered as a preconditioned linear system solving, which replaces the original linear system problem $Ax=b$ with $PAx=Pb$ , where $P=(I-C_kC_k^H)$ . This change directly improves the convergence speed while lowering the condition number of the given matrix $A$ , which leads to the case that GMRES fails to but NeurKItt can solve. The data-driven module (Subspace Prediction Module) makes the improvement possible, but it's not the direct answer.

Is it necessary to learn the NO for each system? Or is it possible to learn a single network and reuse it?

We'd like to give answers given different contexts.
First, given the case that systems derived from different PDEs, we did not conduct relevant experiments. But we think it is possible to learn a single network and reuse it in the given context, while the network is pre-trained. Recent works like $1$ , $2$ , and $3$ , give experimental proofs that the neural operator can be a pre-train model, which can be reused for different PDEs with only fine-tuning.
Second, given the case that systems derived from the same PDE but with different scales, like different sizes of the linear systems, we have conducted an experiment to show the potential of NeurKItt in Appendix I. In particular, we train the neural operator on linear systems with a fixed matrix size but predict linear systems with larger sizes, then use the predicted subspace to accelerate the solving for those linear systems with larger sizes. The experimental results show that NeurKItt successfully accelerates the solving, which indicates the potential to learn a single network and reuse it for the linear systems derived from the same PDE but with different scales.

(Maybe I missed this) Is it possible to include the training times for the neural networks?

Yes, we have reported the training time for each dataset in Appendix H.2. In our experiments, all the training converges within 150 minutes, and keeping training has no additional performance improvement once reaching the convergence.

[1] Yang, Liu, et al. "In-context operator learning with data prompts for differential equation problems." Proceedings of the National Academy of Sciences 120.39 (2023): e2310142120.

[2] Hao, Zhongkai, et al. "DPOT: Auto-regressive denoising operator transformer for large-scale pde pre-training." arXiv preprint arXiv:2403.03542 (2024).

[3] Zhou, Anthony, et al. "Strategies for Pretraining Neural Operators." arXiv preprint arXiv:2406.08473 (2024).

2024-08-12

Dear Reviewer XS9H,

Best regards,

Authors

2024-08-13

I would like to thank the authors for their detailed responses to my questions. Thank you for pointing out the relevant sections in the appendix which clearly answer my remaining questions.

Considering this and the response to Reviewer CSA7 that improves upon the explanation of the method, I have increased the score to 8.

评论- Thanks for Your Positive Feedback

2024-08-13

We sincerely appreciate your positive feedback and the time you’ve dedicated to reviewing our rebuttal. Thank you for appreciating the merits of our work. Your constructive comments are invaluable to us.

审稿意见

评分: 8置信度: 42024-07-13

The paper develops a new method, NeurKItt, for accelerating the solution of non-symmetric linear systems. NeurKItt constructs an approximate invariant subspace of non-symmetric matrix A using the Fourier Neural Operator. This invariant subspace is then used as the initial subspace within GMRES. The paper provides theoretical support for the proposed method and empirical results showing the benefits of NeurKItt.

优点

1.) The paper develops an interesting new method for solving non-symmetric linear systems based on operator learning. In particular, I have not seen the Fourier Neural Operator used in this way before. I believe the idea has the potential to be quite useful and could have valuable implications for other problems.

2.) The authors provide theoretical analysis supporting the method.

3.) The method performs well in practice, which is what is most important. NeurKItt seems to yield significant reductioning in the number of GMRES iterations, and also yields good wall-clock time speed-ups (which is particularly impressive as these matrices are very sparse). So I think the method has the potential to be very useful in practice.

Overall, the paper proposes an exciting new idea that appears to work quite well and has the potential to be quite useful. Moreover, the paper addresses an important hard problem: improving the solutions of non-symmetric linear systems. This setting is quite challenging, as the non-symmetry makes things difficult. Often, methods for accelerating GMRES are highly problem-dependent, so a method that can lead to generic improvements is significant. I'm happy to recommend its acceptance to NeurIPS provided an issue given below in the weaknesses section is properly addressed.

缺点

I'd say the main weakness of the paper is its presentation. In particular, a significant issue is that the paper lacks a precise description of how FNO is implemented to learn the invariant subspace. The appendix only briefly describes FNO but does not explain how the paper applies it. The current discussion in lines 148-151 is vague and unclear. In particular, the output of FNO is the output of an operator on a function, not a subspace. So, how are you getting the predictive subspace from the output of the FNO? I have ideas of how this is done, but rather than guessing, I would like the authors to explain the precise procedure adequately. This should then be added to the paper, ideally in the main body, but at least the supplement with clear pointers in the main paper of where to find the details.

I'm willing to consider raising my score if the authors address this point well. If they don't, I will lower my score, as I cannot recommend acceptance of a paper for which a significant part of the methodology is unclear.

Aside from this, the authors should do a careful spell-check, as there are many typos throughout, which is somewhat distracting.

问题

See weaknesses.

局限性

The authors have adequately addressed the current limitations of the method.

作者回复

2024-08-07

We thank the reviewer for the insightful and valuable comments. We respond to your comments as follows and sincerely hope that our rebuttal could properly address your concerns. If so, we would deeply appreciate it if you could raise your score. If not, please let us know your further concerns, and we will continue actively responding to your comments and improving our work.

Weaknesses

How are you getting the predictive subspace from output of the FNO? I have ideas of how this is done, but rather than guessing, I would like the authors to explain the precise procedure adequately. This should then be added to the paper, ideally in the main body, but at least the supplement with clear pointers in the main paper of where to find the details.

Here, we use the 2D Darcy flow problem to demonstrate how the Fourier Neural Operator (FNO) predicts the subspace given the input function.
- Let the input function be $a\in\mathbb{R}^{d_a\times d_a}$ from the 2D Darcy flow problem, where $d_a\in\mathbb{N}$ is the resolution of the input function, which yields a linear system $Ax=b$ for numerical solving, where matrix $A\in\mathbb{R}^{d_A\times d_A}$ . We aim to predict a subspace $\mathcal{\hat{K}}$ , as a matrix $X\in\mathbb{C}^{d_A\times n}$ , for matrix $A$ in the linear system. First, the FNO's lifting layer $\mathcal{R}$ transforms $a$ to a higher-dimensional representation $v_0\in\mathbb{C}^{d_a\times d_a\times c}$ , while the 3rd dimension is channel dimension. After $T$ Fourier layers forward, we have the $v_T\in\mathbb{C}^{d_a\times d_a\times c}$ . Because Fourier layers keep the shape unchanged, we apply a transformation $Q$ to map the $v_T$ to the desired space, $\mathcal{\hat{K}}=Q(v_T)$ , with $Q:\mathbb{C}^{d_a\times d_a\times c}\rightarrow\mathbb{C}^{d_A\times n}$ .
- In practice, transformation $Q$ is a stack of transformation layers. It first flattens the first and second dimension of $v_T$ , obtaining $q_0\in\mathbb{C}^{d_a^2\times c}$ . Then a fully-connected neural network (FNN) applies the mapping $\mathbb{C}^{d_a^2\times c}\rightarrow\mathbb{C}^{d_a^2\times n}$ to $q_0$ , obtaining $q_1\in\mathbb{C}^{d_a^2\times n}$ . And another FNN applies the mapping $\mathbb{C}^{d_a^2\times n}\rightarrow\mathbb{C}^{d_A\times n}$ to $q_1$ , obtaining the output $X\in\mathbb{C}^{d_A\times n}$ . Finally, we apply QR decomposition to $X$ for orthogonalizing, obtaining $\mathcal{\hat{K}}=\mathrm{span}\{X\}$ .
To address the problems identified in the reviewers' comments, we will make several changes to our paper. First, we will replace lines 148-151 with more precise content to clarify the subspace prediction task. The revisions include the problem setup and the detailed description of how to predict subspace by FNO. Second, we will add the 2D Darcy flow example mentioned above to Appendix C to give an image of how to predict the subspace by the FNO. The sentences quoted below are the revised version of lines 148-151:
- "Generally, for a linear system $Ax=b$ derived from the parametric PDE problem, to predict its invariant subspace $\mathcal{\hat{K}}$ , the input to FNO is the input function $a\in\mathbb{R}^{d_a}$ from the given PDE, where $d_a\in\mathbb{N}$ . We provide a detailed discussion in Appendix B about how to build a linear system problem from a PDE problem, and what is the input function. Our task is to learn the mapping between two Hilbert space $\mathcal{G}:\mathbb{R}^{d_a}\rightarrow \mathbb{C}^{d\times n}$ using FNO.
- For FNO, the lifting transformation $\mathcal{R}$ first lifts the input $a$ to a higher dimensional representation $v_0\in\mathbb{C}^{d_a\times c}$ , where $c$ is the number of channels. Then we feed the $v_0$ to Fourier layers. After $T$ Fourier layers forward, we have $v_T\in\mathbb{C}^{d_a\times n}$ from the last Fourier layer, which keeps the same shape as $v_0$ . The FNO's output $X=Q(v_T)$ is the projection of $v_T$ by the transformation $Q:\mathbb{C}^{d_a\times c}\rightarrow\mathbb{C}^{d_A\times n}$ . NeurKItt then uses QR decomposition to orthogonalize the matrix $X$ , obtaining the predicted subspace $\mathcal{\hat{K}}$ . We provide more details about how to predict the subspace given the 2D Darcy flow problem in Appendix C."
Due to NeurIPS rebuttal restrictions, we cannot submit a revised version during the rebuttal. We will incorporate these modifications into our paper in the future version. If these modifications do not adequately address the issues raised, please let us know your further concerns, and we will continue to actively respond to your comments.

Aside from this, the authors should do a careful spell-check, as there are many typos throughtout, which is somewhat distracting.

Thank you for bringing this to our attention. We will conduct a thorough spell-check and correct any typos in the future version.

评论- Thanks for the rebuttal

2024-08-08

I appreciate the authors' detailed reply to my concerns. This adequately addresses the issues I had with the presentation. I trust the authors to include this in the final revision, as they have promised here. Given this I will raise my score from 7 to 8, and presentation from 1 to 3.

评论- Thanks for your positive feedback

2024-08-09

Thank you for your positive feedback and for taking the time to read and respond to our rebuttal. We appreciate your constructive comments and concrete suggestions. We will incorporate them into our paper in the future version.

最终决定Accept (spotlight)

2024-09-25

Keylov subspace method is a standard choice in solving linear system, it suffers from low efficiency when the iterates is less-than-ideal, making it not practical for large-scale sparse linear system. Regarding this difficulty, the submission proposes a neural operator based method, named Neural Keylov Iteration (NeurKItt), which uses a neural operator to predict the invariant subspace of the linear system and in turn to speed up solving the problem. Theoretical analysis is provided, and extensive numerical experiments demonstrate the advantages of the method.

From the reviews, the reviewers reach a consensus that the paper contains significant contribution with two ratings as strong accept. In particular

Novel idea: solving linear systems via operator learning is new to the literature, and novel contribution to the community. Theoretical guarantees are also provided.
Strong performance: the method manages to significantly reduce both number of iteration and wall-clock time, implying promising applications.

During the author-reviewer discussions, the authors managed to addressed the reviewers' concerns, and they are easy to incorporate into the final version of the paper. Regarding Reviewer g2gN, though his rating is the lowest, his review and comment are less convincing that the others, implying that he may fail to fully follow the paper. As a result, his rating is not fully taken into my evaluation (but his comments together with Reviewer iSRc's show that the authors need to work on the presentation of the paper). Given the novel idea and strong performance of the method, I recommend for a spotlight.