Can neural operators always be continuously discretized?
Discretizing injective neural operators may cause them to lose injectivity, even with very flexible discretization methods.
摘要
评审与讨论
This paper studies the question of whether a neural operator (or a general diffeomorphism on an infinite dimensional Hilbert space) can be continuously discretized through the lens of category theory. It first proves that there does not exist a continuous approximation scheme for all diffeomorphisms. Then, it shows that neural operators with strongly monotone layers can be continuously discretized, followed by a proof that a biliptschitz neural operator can be approximated by a deep one with strongly monotone layers. Some further consequences are discussed.
优点
- The paper studies the discretization of a continuous operator, which is a source of error and an important issue in operator learning if one is not careful.
- The paper is fairly comprehensive, encompassing both positive and negative results. The study of positive results contains theorems of different flavors.
- Although the theory is based on the category theory, the presentation and explanation of the results are relatively clear and accessible for people who are unfamiliar with it.
缺点
- Most results presented in the paper are purely theoretical and lack a quantitative or asymptotic estimate. For example,
- the notion of continuous approximation functor in Definition 8 does not care about the rate of convergence, and
- there is no estimate for the number of layers in Theorem 4.
- No empirical results are there to support the theory. While this is a theoretical paper, some toy experiments that exemplify the theory would be very helpful.
问题
- In general, is there any assumption of the Hilbert space studied in this paper? For example, are the Hilbert spaces assumed to be separable?
- What is the definition of the convergence of finite-dimensional subspaces used in this paper? Strong convergence of the projection operator? I do not think there is a standard definition for this in elementary functional analysis so it would be helpful to say it explicitly in the paper.
- Have you studied the role of the bilipschitz constant in your theorems? For example, how does the number of layers in Theorem 4 depend on it?
局限性
None.
We would like to thank the reviewer for the detailed comments and fair criticisms. We address all of these below:
- "Most results presented in the paper are purely theoretical and lack a quantitative or asymptotic estimate."
The proof of Theorem 4 makes it possible to estimate the number of layers as a function of , the Lipschitz constants of the map and , the -norm of in a ball having double the radius where we consider the approximation of . More precisely, where depends on the radius of the ball where is approximated and the -norm of the map in a ball of radius . We will add an explicit formula for and its proof in the final version of the paper.
The main steps of the proof are the following: First, we use spectral theory of the compact operators and to find a finite dimensional subspace and the projection onto it so that is a diffeomorphism that is close to the operator . Let be the restricition of to . After this we deform to the invertible linear map is the derivative of at , along the path All operators are bi-Lipschitz maps, and . We consider the values and , for , and the operators We show that in the ball Here, depends on and -norm of as well as on the Lipschitz constants of and and is chosen to be sufficiently small.
Moreover, we show that for , where the factor appears due to the multiplier in the definition of . To obtain , we choose . This causes the factor in the bound for .
In addition to this, we consider paths on the Lie group , and show that there are sequence of invertible matrixes , such that where and , where , is either the identity operator or a reflection operator . Here, we can choose the number of steps to be Combining the operators with , we see that can be deformed to a linear operator or a reflection operator by combining operators of the form . This yields the bound for .
- "While this is a theoretical paper, some toy experiments that exemplify the theory would be very helpful."
We much appreciate the reviewer's comment. In Appendix A.1 we had given an example where we approximate the solution operator of the nonlinear elliptic equation with on the boundary using a discretization that is based on Finite Element Method. When is convex and the source term is represented in the form the map is a diffeomorphism in the Sobolev space with the Dirichlet boundary values. The approximation can be obtained by Galerkin method. We will expand upon this example in the final version of the manuscript. We will give an example on no-go theorem using the elliptic (but not not strongly elliptic) problem For all these equations are uniquely solvable, but we show that when we use FEM to approximate those, some of the obtained finite dimensional problems has also a zero eigenvalue and is not solvable.
- "Do the Hilbert spaces need to be separable?"
The no-go theorem applies also to non-separable Hilbert spaces. Naturally, for such spaces the partially ordered set of finite dimensional linear subspaces of , that is used as an index set, is huge. In most of our positive results on existence of approximation operations we have assumed that the Hilbert space is separable as we use finite rank neural operators as approximators, and have used this in the orthogonal projectors form to , where , is an enumerable orthonormal basis. However, it seems to us that our results can be generalized to non-separable Hilbert spaces that have a non-enumerable orthonormal bases. We will check carefully if this generalization is possible.
- "What is the definition of the convergence of finite-dimensional subspaces used in this paper?"
After Definition 7, line 223, we defined that the limit This limit can be defined also by endowing the set by the topology associated to the partial ordering of , that is, the topology generated by the sets U_V:=\{W\in S_0(X):\ X\supset V\\}\cup \{X\}.
- "Have you studied the role of the bi-Lipschitz constant in your theorems? For example, how does the number of layers in Theorem 4 depend on it?"
The bi-Lipschitz constraint (and form of neural operator layers) enable us to decompose the map into strongly monotone neural operator layers (Theorem 4), where . Here, is arbitrary. The bi-Lipschitz constant appears in the proof of the estimate of (under 1.) We will mention this observation in the final version of the manuscript.
Thank you for the detailed response. Since I am not absolutely familiar with category theory and other related work, I am unable to further raise my score, but I acknowledge that I have read through the rebuttal and it appears to be a nice paper overall.
Well, I felt sorry for the authors because a reviewer of NeurIPS, once well-known for a top theory-oriented conference of machine leaning, cannot raise his/her score simply because he/she is not familiar with the category theory. I even feel this is symbolic of the current state of a theory-oriented machine learning conference. It should not be the problem of the individual reviewer him/herself, but the problem of conference's matching systems that mistakenly assign a perfect amateur of the category theory for reviewing a category theoretic research.
This is a suggestion for chairs for future avoidance of mismatches, that the reviewers should be examined if they have fundamental knowledge/background/understandings in the field. I am an expert of expressive power analysis, but not at all of category theory and tropical geometry. Unfortunately, this kind of mismatch happens every year, so I am usually skeptical to any mathematical ''theorems'' published in machine learning conferences.
This paper focuses on the continuous discretization in operator learning. This is a very important question since it involves reducing the infinite-dimensional space to a finite-dimensional space in operator learning. The authors present cases where discretization is continuous and cases where it is not. The results are interesting and can be applied to design methods in operator learning.
优点
The proof is solid, and the paper is well-written and organized. I appreciate the results presented in this paper.
缺点
Since this paper is submitted to NeurIPS and not a mathematical journal, I hope the authors can provide some practical examples, such as solving the Poisson equation to learn the operator relationship between and . By using methods like DeepONet and FNO, it would be beneficial to determine whether the discretization in these methods is continuous or not. I believe this could make the paper more accessible to a broader audience.
问题
Mentioned in the Weakness.
局限性
All right.
We thank the reviewer for the suggestion to include an example based on the discretization of simple differential equations as it surely helps the readers to quickly understand the essential features of the no-go theorem of the approximation of invertible operators.
On the positive results, in Appendix A of our paper, we have considered nonlinear discretiation of the operators ; see the reply to reviewer mJde under point 3. To exemplify the negative result, we will add in the appendix of the paper the following example on the solution operation of differential equations and the non-existence of approximation by diffeomorphic maps: We consider the elliptic (but not not strongly elliptic) problem (below, called as "PDE1")
with the Dirichlet and Neumann boundary conditions
Here, is a parameter of the coeffient function and if and if . We consider the weak solutions of PDE1 in the space
We can write
where
parametrized by , are multiplication operations that are invertible operators, (this invertibility makes the equation PDE1 elliptic). Moreover, and are the operators with the Dirichlet boundary condition and , respectively. We consider the Hilbert space ; to generate an invertible operator related to PDE1, we write the source term using an auxiliary function ,
Then the equation,
defines a continuous and invertible operator,
In fact, when the domains of and are chosen in a suitable way. The Galerkin method (that is, the standard approximation based on the Finite Element Method) to approximate the equation PDE1 involves introducing a complete basis , of the Hilbert space , the orthogonal projection
and approximate solutions of PDE1 through solving
This means that operator is approximated by , when is invertible.
The above corresponds to the Finite Element Method where the matrix defined by the operator is , where
Since we used the mixed Dirichlet and Neumann boundary conditions in the above boundary value problem, we see that for all eigenvalues of the matrix are strictly positive, and when all eigenvalues are strictly negative. As the function is a continuous matrix-valued function, we see that there exists such that the matrix has a zero eigenvalue and is no invertible. Thus, we have a situation where all operators , are invertible (and thus define diffeomorphisms ) but for any basis and any there exists such that the finite dimensional approximation is not invertible. This example shows that there is no FEM-based discretization method for which the finite dimensional approximations of all operators , , are invertible. The above example also shows a key difference between finite and infinite dimensional spaces. The operator has only continuous spectrum and not eigenvalues nor eigenfunctions whereas the finite dimensional matrices have only point spectrum (that is, eigenvalues). The continuous spectrum makes it possible to deform the positive operator with to a negative operator with in such a way that all operators , , are invertible but this is not possible to do for finite dimensional matrices. We point out that the map is not continuous in the operator norm topology but only in the strong operator topology and the fact that can be deformed to in the norm topology by a path that lies in the set of invertible operators is a deeper result. However, the strong operator topology is enough to make the FEM matrix to depend continuously on .
Thanks for the reply. I will keep my score.
This paper investigates theoretical limitations of discretizing neural operators on infinite-dimensional Hilbert spaces. The authors first prove a "no-go theorem" (Theorems 1,2) showing that diffeomorphisms between infinite-dimensional Hilbert spaces cannot generally be continuously approximated by finite-dimensional diffeomorphisms. Then, they provide positive results for certain classes of operators such as strongly monotone (Theorem 3) and bilipschitz neural operators (Theorem 4). They finally provide concrete example of approximation by finite residual ReLU networks (Theorem 5).
优点
-
The universality of neural networks has been demonstrated in various settings. However, research on the approximation abilities of operators is relatively scarce. Particularly, the characterization of classes that cannot be approximated is intriguing. This study is important as it succinctly demonstrates the differences between finite-dimensional and infinite-dimensional properties in the manageable setting of Hilbert spaces.
-
Moreover, the novel approach of expressing approximation sequences in terms of category theory is noteworthy.
缺点
-
On the other hand, the proofs are based on conventional analytical arguments rather than category-theoretic arguments. Therefore, the "category theory" framework might be somewhat exaggerated. It is expected that with refinement of notation and sentence structure, the description could become more perspicuous in the future.
-
There is concern that the categorical description may have obscured the contributions typically seen in traditional approximation theory papers. As the authors likely recognize, various topologies are used in function approximation, and this study focuses only on approximation in the norm topology of Hilbert spaces, and does not negate "all considerable approximation sequences". So, the impossibility theorem presented here might simply be due to the norm topology being too strong. While the Hilbert structure sounds natural as a generalization of Euclidean structure, in reality, concepts like L2 convergence of Fourier series are quite technical and not necessarily an inevitable notion of convergence. It seems that in pursuit of an elegant categorical description, the diversity of function approximation may have been compromised.
问题
In Definition 3, why is imposed besides ?
局限性
The authors did not discuss the validity of assumptions.
We appreciate the detailed suggestions, criticisms and endorsement of the reviewer. We address all of these below:
- "On the other hand, the proofs are based on conventional analytical arguments rather than category-theoretic arguments."
The proofs are indeed based on analytical arguments. We used category theory as a formalism (similar to the one for object oriented programming) to describe approximation operations in all Hilbert spaces (including non-separable ones). As the collection of all Hilbert spaces cannot be considered as a set (cf. Russel's paradox) but can as a category, we chose to use the language of category theory. In the beginning of the paper, we considered an "approximation operation" to avoid difficulties related to formal category theory.
- "The impossibility theorem presented here might simply be due to the norm topology being too strong."
We much appreciate the issue raised by the reviewer. We will include the analysis and discussion below in the revised manuscript, in an appendix on generalizations.
We formulated the approximation functor using norm topology as uniform convergence in compact sets is extensively studied in the theory of neural networks. Norm topology also makes it possible to consider quantitative error estimates. However, we agree with the referee that it is important to understand no-go results in weaker topologies. It turns out that our results can be generalized to a setting where the norm topology is partially replaced by the weak topology. Definition 7 is replaced by the following
Definition [Weak Approximation Functor] When we define the weak approximation functor, that we denote by , as the functor that maps each to some and has the following the properties
(A') For all , all and all , it holds thatMoreover, when is the operator or , then is the operator or , respectively.
In (A') we added conditions on the approximation on the operators and . Similarly, the continuity of the approximation functors can be generalized in the case where the convergence in the norm topology is replaced by the weak topology. The proof of the no-go theorem generalizes also to this setting; we will add in the Appendix a theorem which states that there are no weak approximation functors that are continuous in the weak topology.
- In Definition 3, why is imposed besides ?
This definition needs to be interpreted with care, which we will clarify in the revised manuscript. The appearance of the compact operators and makes the discretization of activation function and the activation functions inside in Definition 3 different, and this is one reason why we have introduced both and . To consider invertible neural operators, we will below assume that is an invertible function, for example, the leaky Relu function. In the operation the nonlinear function is sandwiched between compact operators and . The compact operators map weakly converging sequences to norm converging sequences. This is essential in the proofs of the positive results for approximation functors as discussed in the paper. However, we do not have general results on how the operation can be approximated by finite dimensional operators in the norm topology, but only in the weak topology in the sense of the above {Definition} 1 of the Weak Approximation Functor. Nonetheless, one can overcome this difficulty, for example, for using the explicit form of the activation function and choosing different finite dimensional spaces in each layer of the neural operator.
We address the question whether the activation function is relevant in universal approximation results. If the activation function is removed, the operator becomes a sum of a (local) linear operator and a compact (nonlocal) nonlinear integral operator. Moreover, if we compose above operators of the above form, the resulting operator, say, is also a sum of a (local) linear operator, , and a compact (nonlocal) operator, . The Fr'{e}chet derivative of at is equal to and a compact linear operator. This means that the Fredholm index of the derivative of at equal to the index of is constant, that is, independent of the point where the derivative is computed. In particular, this means that one cannot approximate an arbitrary -function in compact subsets of by such neural operators. Indeed, for a general -function, the Fredholm index may be a varying function of . Thus, appears to be relevant for obtaining universal approximation theorems for neural operators. Again, we will add this analysis to the final version of the manuscript.
- "The authors did not discuss the validity of assumptions."
We appreciate this criticism and will address it in the final version of the manuscript. The key assumption is that the neural operator is bilipschitz while being of the general form (4). the expressability properties and applicability in designing generative models is discussed in global comments. We also point of that the strong monotonicity used as an assumption in several lemmas and theorems is an intermediate assumption that is absorbed in Theorem 4 where we consider approximation of bi-Lipschitz neural operators.
In Theorem 5, finite-rank residual neural operators appear as explicit natural approximators of bilipschitz neural operators. Such a perspective has been empirically studied as in [Behrmann, et al PMLR 2019, pp. 573-582], although in the finite-dimensional case.
Thank you for detailed clarifications. I would like to keep my score as is.
The proofs are indeed based on analytical arguments.
If so, I recommend the authors to reconsider the following phrases in the abstract and conclusion:
Using category theory, we give a no-go theorem We used tools from category theory to produce a no-go theorem
It would be much impactful and significant if the authors could more directly point out any incorrectness of the proof or inappropriateness of the assumption in the previous studies.
Thank you for your response. We would will address your points in the following way.
- If so, I recommend the authors to reconsider the following phrases in the abstract and conclusion: "Using category theory, we give a no-go theorem." "We used tools from category theory to produce a no-go theorem"
We appreciate the advice, and will follow it. We will replace
"Using category theory, we give a no-go theorem"
with
"Using analytical arguments, we give a no-go theorem framed with category theory."
and replace
"We used tools from category theory to produce a no-go theorem"
with
"We give a no-go theorem framed with category theory"
in the abstract and conclusion.
- It would be much impactful and significant if the authors could more directly point out any incorrectness of the proof or inappropriateness of the assumption in the previous studies.
There are several papers which use continuous functions (either as elements of infinite dimensional function spaces or metric spaces) to model images or signal and apply statistical methods and invertible neural networks or maps modeling diffeomorphisms. Often in these papers one derives theoretical results in the continuous models and presents numerical results using a finite dimensional approximations. In this process the errors are caused by the discretization and the effect of changing the dimension of the approximate models. We believe that our work meaningfully addresses these questions as applied to injective/bijective neural operators, an important architecture. We hope that our paper inspires further study these points. We can include citations to the following papers, related to these issues.
The below papers which combine neural networks and approximation of diffeomorphisms, as applied to imaging.
-
Elena Celledoni · Helge Glöckner · Jørgen N. Riseth, Alexander Schmeding Deep neural networks on diffeomorphism groups for optimal shape reparametrization. BIT Numerical Mathematics (2023) 63:50
-
GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency Lin Tian · Hastings Greer · François-Xavier Vialard · Roland Kwitt · Raúl San José Estépar · Richard Jarrett Rushmore · Nikolaos Makris · Sylvain Bouix · Marc Niethammer West Building Exhibit Halls ABC 153
The below papers combine invertible neural networks and statistical models, especially for solving inverse problems (including imaging problems).
-
Alexander Denker , Maximilian Schmidt , Johannes Leuschner and Peter Maass Conditional Invertible Neural Networks for Medical Imaging. Journal of Imaging 2021, 7(11), 243
-
Ardizzone, L.; Kruse, J.; Rother, C.; Köthe, U. Analyzing Inverse Problems with Invertible Neural Networks. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019.
-
Anantha Padmanabha, G.; Zabaras, N. Solving inverse problems using conditional invertible neural networks. J. Comput. Phys. 2021, 433, 110194
-
Denker, A.; Schmidt, M.; Leuschner, J.; Maass, P.; Behrmann, J. Conditional Normalizing Flows for Low-Dose Computed Tomography Image Reconstruction. In Proceedings of the ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, Vienna, Austria, 18 July 2020.
-
Hagemann, P.; Hertrich, J.; Steidl, G. Stochastic Normalizing Flows for Inverse Problems: A Markov Chains Viewpoint. SIAM/ASA Journal on Uncertainty QuantificationVol. 10, Iss. 3 (2022) 10.1137
-
Papamakarios, G.; Nalisnick, E.T.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research 22 (2021) 1-64
The paper addresses the problem of discretizing neural operators, maps between infinite dimensional Hilbert spaces that are trained on finite-dimensional discretizations. Using tools from category theory, the authors provide a no-go theorem showing that diffeomorfisms between Hilbert spaces may not admit continuous approximations by diffeomorfisms on finite spaces. This highlights the fundamental differences between infinite-dimensional Hilbert spaces and finite-dimensional vector spaces. Despite these challenges, the authors provide positive results, showing that strongly monotone diffeomorphism operators can be approximated in finite dimensions and that bilipschitz neural operators can be decomposed into strongly monotone operators and invertible linear maps. Finally, they observe how such operators can be locally inverted through an iteration scheme.
优点
- The paper provides theoretical results addressing the challenging problem of discretizing inherently infinite-dimensional objects (neural operators)
缺点
- The text and presentation require significant polishing. It contains numerous typos, poorly formulated sentences, and instances of missing or repeated words
- While the paper's theoretical focus is valuable, it lacks examples of specific neural operator structures that meet the theorems or remarks
- A more detailed discussion on the practical impact of this work, accompanied by examples, would be beneficial for the audience
Please note that my review should be taken with caution, as I am not familiar with category theory and did not thoroughly check the mathematical details. My feedback primarily focuses on the presentation and potential impact of the results rather than a rigorous validation of the theoretical content.
问题
- Neural operators are typically defined between Banach spaces. Why does your theory focus on maps between Hilbert spaces instead?
- Comment: The work in [1] might have been relevant to cite as well.
- The main neural operator paper [2] develops theoretical results on the universal approximation theory of neural operators. How do your results relate to the ones in that paper?
[1] F. Bartolucci, E. de Bézenac, B. Raonić, R. Molinaro, S. Mishra, R. Alaifari, Representation Equivalent Neural Operators: a Framework for Alias-free Operator Learning, NeurIPS 2023.
[2] Nikola B. Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew M. Stuart, and Anima Anandkumar. "Neural operator: Learning maps between function spaces with applications to PDEs," J. Mach. Learn. Res., 24(89):1–97, 2023.
局限性
The paper lacks examples of applications of the theorems to specific neural operator structures, and some further discussion on the practical impact of the results with examples
We appreciate the valuable comments and constructive feedback of the reviewer. We are pleased to address all of these below.
- "The text and presentation require significant polishing."
We agree and sincerely regret this, and have already made many corrections to the manuscript.
- "While the paper's theoretical focus is valuable, it lacks examples of specific neural operator structures that meet the theorems or remarks."
In our approximation results (Theorem 5 and Corollary 1), while we consider a large class of bijective neural operators, approximators can be obtained through finite rank neural operators (see Def. 10). Finite-rank neural operators are encountered, e.g., as FNOs [37], wavelet neural operators [Tripura and Chakraborty, Wavelet Neural Operator for solving parametric partial differential equations in computational mechanics problems, Comp. Meth. Appl. Mech. 2023], and Laplace neural operators [Chen et al, arXiv:2302.08166v2, 2023].
The neural operators,studied in our paper include neural operators that are close to those introduced in Kovachki-Lanthaler-Mishra (KLM) [28, 31]. This is discussed in the comments for all reviewers.
- "A more detailed discussion on the practical impact of this work, accompanied by examples, would be beneficial for the audience."
We much appreciate this suggestion. In Appendix A.1 we had given an example where we approximate the solution operator of the nonlinear elliptic equation with onusing a discretization that is based on Finite Element Method.In the case when is a convex function and the source term is represented in the form the map is a diffeomorphism in the Sobolev space with the Dirichlet boundary values and . The approximation can be obtained by Galerkin method. We will expand upon this example in the final version of the manuscript.
- "Neural operators are typically defined between Banach spaces. Why does your theory focus on maps between Hilbert spaces instead?"
Via our general framework, we found that strong monotonicity is one of the key ingredients to obtain a "positive" result, that is, preserving invariant discretization. Strong monotonicity is defined by using inner products, which is why we have focused on Hilbert spaces.
However, the no-go theorem which states that diffeomorphisms of Hilbert spaces cannot be continuously approximated by finite dimensional diffeomorphisms implies directly that the same "negative" result holds for general Banach spaces.
The main challenge for using general Banach spaces for the "positive" result is that a map , that maps a point to the closest points in the subspace , may be set-valued, that is, there may be several nearest points. Nonetheless, several of our results can be generalized to uniformly convex Banach spaces, . For these, In such a space, for a closed subspace and , there is a unique closest point to . (In fact, uniformly convex Banach spaces are strictly convex Banach spaces where the above inequality is given in a quantitative form). This makes, e.g., the linear discretization well defined. We will include a detailed discussion in the revision on generalizations to strictly convex spaces.
- "Comment: The work in [1] might have been relevant to cite as well.
We thank the reviewer for bringing this paper to our attention. We will add [1] to our references.
- "The main neural operator paper [2] develops theoretical results on the universal approximation theory of neural operators. How do your results relate to the ones in that paper?"
We agree with the reviewer that this is an important point. The approximation result in [2] is the universality of neural operators, i.e., to approximate any continuous map by a neural operator. Diffeomorphisms are contained in this result, which holds in the function space setting. However, even though a general diffeomorphism, , can be approximated by a neural operator, , and the neural operator can be approximated by a finite dimensional operator, , the proof of the no-go theorem implies that either the approximating infinite dimensional neural operators are not diffeomorphisms or that the approximation of neural operators by finite dimensional operators, that is, operation , is not continuous. Note that the present universal approximation results for neural operators have mainly analyzed the approximation of functions by neural operators in norms of the spaces , where is compact, but not in the -norms.
We will add this discussion to the Introduction.
Thank you for your detailed response. As I mentioned in my initial review, my understanding of category theory is somewhat limited. My feedback has mainly focused on the presentation and potential impact of the results rather than an in-depth validation of the theoretical content. I am not in a position to increase my score.
We thank the reviewers for their valuable comments and detailed questions. We will provide replies to the individual reviewers below, but first would like to make some general statements addressing a few issues raised by all the reviewers.
Common questions: Practical impact/examples of this work?
A practical implication of our result is the description of bi-Lipschitz neural operators as a composition of discretization invariant layers of invertible finite dimensional neural operators (i.e. neural networks). Such neural operators are useful in generative models where a probability distribution supported on a given model manifold is pushed forward by a map to a distribution that one would like to be close to an empirical target distribution supported on some submanifold of the Hilbert space . (Here and are unknown and are optimized with samples from ). Suppose that we know a priori the topology of the data manifold and there is a diffeomorphism As all smooth finite dimensional submanifolds of a Hilbert space are close to some finite dimensional subspace , one can start by assuming that there exists an embedding that is close to , where is a finite dimensional orthoprojection onto . By considering the model manifold as a subset of , we can extend the embedding to a diffeomorphism . This can be done when the dimension of is sufficiently large [Puthawala et al., ICML 2022].
Furthermore, can be extended to a diffeomorphism where maps toThe map can be written aswhere is a compact linear operator and . By definition the map is a neural operator diffeomorphism. Thus, diffeomorphic neural operators can be used to obtain generative models. As the finite dimensional subspace is not a priori known, and its dimension depends on the accuracy required for the generative model, it is natural to consider infinite dimensional neural operators and study their approximation properties.
In our paper we show, in Theorem 3, that strongly monotone neural operators can be approximated continuously by finite dimensional neural operators that are diffeomorphisim (so, invertible). In Theorem 4 we show that any bi-Lipschitz neural operator (not necessarily strongly monotone) can locally be represented as a composition of strongly monotone neural operator layers. This implies that bi-Lipschitz neural operators can be approximated by a composition of invertible, finite dimensional neural networks in a continuous way. This makes invertible neural operators a class that behaves well in finite dimensional approximations. Our results can also be summarized by stating that neural operators conditionally serve as a class of diffeomorphisms of function spaces that are simple enough for well-working approximations but still sufficiently expressive (and may model a rich variety of deformations).
The neural operatorswe study include the neural operators that are close to those introduced in Kovachki-Lanthaler-Mishra (KLM) [28, 31]. We have assumed that operators and are compact linear operators; in several cases these can be chosen to be identity embeddings that are maps between different function spaces so that these embeddings are compact.
Consider a KLM neural operator of the formwhere and is a bounded set. Moreover, let and , where . Let be a nonlinear (integral) operator,where is a kernel given by a neural network with sufficiently smooth activation functions of the formand and the identity embedding operators mapping between function spaces,Thus,The Hilbert spaces , and are isomorphic and by writing, e.g.,wherewhere is an isomorphism, we can write in the formin which and are compact linear operators and a continuous nonlinear operator. In this way, the KLM-operator can be written in the form studied in our paper.
Furthermore, by choosing and as the convolutional kernel and the torus, the map takes the form of an FNO [37].
The reviewers find this work studying an important problem and encourage the authors to incorporate the suggestions and developments during the review process.