PaperHub
4.0
/10
withdrawn4 位审稿人
最低3最高5标准差1.0
5
5
3
3
4.0
置信度
正确性2.5
贡献度2.0
表达2.5
ICLR 2025

EKAN: Equivariant Kolmogorov-Arnold Networks

OpenReviewPDF
提交: 2024-09-26更新: 2024-11-14

摘要

Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains thanks to spline activation functions, becoming an alternative to Multi-Layer Perceptrons (MLPs). However, spline functions may not respect symmetry in tasks, which is crucial prior knowledge in machine learning. Previously, equivariant networks embed symmetry into their architectures, achieving better performance in specific applications. Among these, Equivariant Multi-Layer Perceptrons (EMLP) introduce arbitrary matrix group equivariance into MLPs, providing a general framework for constructing equivariant networks layer by layer. In this paper, we propose Equivariant Kolmogorov-Arnold Networks (EKAN), a method for incorporating matrix group equivariance into KANs, aiming to broaden their applicability to more fields. First, we construct gated spline basis functions, which form the EKAN layer together with equivariant linear weights. We then define a lift layer to align the input space of EKAN with the feature space of the dataset, thereby building the entire EKAN architecture. Compared with baseline models, EKAN achieves higher accuracy with smaller datasets or fewer parameters on symmetry-related tasks, such as particle scattering and the three-body problem, often reducing test MSE by several orders of magnitude. Even in non-symbolic formula scenarios, such as top quark tagging with three jet constituents, EKAN achieves comparable results with EMLP using only $26\%$ of the parameters, while KANs do not outperform MLPs as expected.
关键词
Equivariant networksKolmogorov-Arnold networks

评审与讨论

审稿意见
5

This paper introduces a new class of equivariant KAN networks, utilizing gating mechanisms and constraining linear weights to satisfy group equivariance. Specifically, the authors apply SVD decomposition to linear weight matrices to identify their null spaces, drawing inspiration from EMLP, and use gating to scale n-rank vectors in an equivariant manner. Experimental results demonstrate that EKAN achieves superior performance over EMLP on several tasks.

优点

  1. The paper proposed a new class of equivariant networks based on Kolmogorov-Arnold Networks that can be applied to various scientific problems.
  2. Even though neither equivariant linear weights nor KAN or gating mechanism are new, authors managed to connect them together and provide a new set of tools in equivariance research.
  3. Most of experimental results presented in the paper show that EKAN has a clear outperformance over EMLP.

缺点

  1. The technical contributions presented in the paper lack novelty, as the core techniques have been previously explored in the literature.
  2. Although EKAN is compared against MLP, KAN, and EMLP, the effectiveness of the model could be further validated by including comparisons with more recent equivariant architectures or domain-specific models.
  3. A discussion on the efficiency of the KAN-based model would be valuable, particularly for application-oriented readers. Given the current trend where researchers may prioritize data-driven approaches over constraining models with equivariant functions, such a comparison could highlight EKAN's practical utility [1].

[1] Abramson, J., Adler, J., Dunger, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). https://doi.org/10.1038/s41586-024-07487-w

问题

See Weaknesses

评论

Thank you for your valuable feedback! We will take it into consideration in the revision.

审稿意见
5

This paper introduces Equivariant Kolmogorov-Arnold Networks (EKAN), an extension of Kolmogorov-Arnold Networks (KANs) to integrate matrix group equivariance, enhancing their effectiveness for symmetry-related tasks. While KANs have been valuable in scientific applications due to their spline activation functions, they lack built-in symmetry awareness. EKAN addresses this limitation by incorporating equivariant linear weights and gated spline basis functions to preserve symmetry within the architecture. A lift layer is introduced to align EKAN’s input with dataset feature spaces, enabling broader applicability. Experimental results demonstrate that EKAN achieves good accuracy on symmetry-sensitive tasks with fewer parameters and smaller datasets, outperforming traditional KANs and matching or exceeding the performance of Equivariant Multi-Layer Perceptrons (EMLP) on specific applications like particle scattering and top quark tagging, where EKAN reaches comparable results with significantly reduced parameters.

优点

  1. The proposed EKAN framework stands out as the first of its kind to embed matrix group equivariance directly into KAN architectures, expanding their potential application range and enhancing their utility in tasks where symmetry is crucial.
  2. The numerical results are promising, particularly in scientific computing tasks with symmetry-related constraints.

缺点

  1. The novelty and depth of this work appear somewhat limited, as the methodology seems relatively straightforward by extending existing techniques from MLP to EMLP (Finzi et al., 2021) to construct gated basis functions and equivariant linear weights. It would strengthen the paper if the authors provided deeper insights into the intrinsic challenges or unique aspects of building EKAN from KAN, beyond the application of existing equivariant techniques.
  2. Certain numerical tests lack fair benchmarking. For instance, more variations in the width and depth of MLP and EMLP should be considered to ensure comparability. Additionally, the results in Table 3 suggest that the performance of EMLP or EKAN may depend on balancing model size with training set size. However, model sizes for (E)MLP and (E)KAN are not equivalent across tests. It would be helpful if the authors presented results for MLP and EMLP with model sizes under 50K, as well as for KAN and EKAN with sizes over 100K, to provide a more balanced and thorough comparison. More general, the authors could test all models with 3-4 different parameter counts ranging from 30K to 150K.

问题

In addition to the points noted in the "Weaknesses" section, I have the following questions:

  1. It is well noticed that any KAN can be exactly represented by an MLP, for example as demonstrated in Wang et al. (2024), “On the expressiveness and spectral bias of KANs” (arXiv:2410.01803). Given this, can we assert that EKAN can be directly derived from EMLP? Alternatively, does the construction presented in this work offer new insights into achieving equivariant properties within KANs?

  2. In Theorem 1, does the function f:UgiUmf: U_{gi} \to U_m ensure that the entire KAN can precisely be equivariant from original input to output, as shown in Equation (5)?

  3. Regarding the singular value decomposition (SVD) of CC to determine the projection operator onto the kernel space, as stated in Equation (22), is this process conducted offline and can it be performed in parallel and a priori? Additionally, during the training process, how is it ensured that WbW_b remains within the kernel space? Is a projection applied after each SGD step?

评论

Thank you for your valuable feedback! We will take it into consideration in the revision.

审稿意见
3

This paper introduces Equivariant Kolmogorov-Arnold Networks (EKAN). EKAN bases KANs and incorporates matrix group equivariance into KANs. The idea to introduce the symmetry is to incorporate a gate architecture, which is also adopted to make MLP to be equivariant to the matrix group. The networks are evaluated in some classes of scientific discovery.

优点

  • The motivation of this paper is somewhat clear and the contribution of the work is posited in a right area. Incorporating symmetry into machine learning models is extensively studied in the scientific discovery domain.
  • Having the figures of schematic such as figures 1 or 2 seems to be a good idea to help readers easily grasp an idea of the proposed method.

缺点

Theoretical part: Main drawback of this paper is the ambiguous and unclear description of theoretical parts, which makes it very hard to follow the paper. The main cause I feel is that the distinction of the definition and property is not clear and/or some mathematical terminologies are not introduced properly. The followings are the umbiguous/unclear, but not exhaustive, parts of the paper:

Section 3

  • The sentence starting ‘In general’ in line 139 is neither trivial nor understandable. What is the assumption on "the" vector space UU and how is it associated with the matrix group?

Section 4

  • Assuming the definition of (10), for example, T(1,1)T(-1, -1) is also allowed -- What does this notation mean?.

  • As I mentioned above, how the decomposition (10) is associated to the matrix group is unclear, so I do not see how this expression helps the later discussion.

  • Descriptions of lines 253-256 are very vague. For example, why is the input/output feature does not lie with in U_{I}/U_{o}? What is the necessity of “align with gated basis functions?” Why does adding a gate scalar help to obtain the actual input/output space? The expression UI/UoU_{I}/U_{o} is also extremely misleading, as this could also mean the quotient space of UIU_{I}.

  • Section 4.2 is very hard to follow. For example, lines 280-284 are very difficult to understand. ‘’For the non-scalar term vI,av_{I, a}, we apply the basis functions …$ I do not see any mathematical formulation for this, and it is hard to see whether this is the definition or the property derived from some equations. Line 281, “For the scalar term, …, which is equivalent to applying basis functions element-wise.” This description also does not make sense, since I do not think two mathematical terms which are supposed to be equivalent are not introduced already.

  • While the paper reads “UmU_{m} can be written as …”at line 284, I do not know the original definition of UmU_{m}, and I am not sure if the equations (13, 14) hold. I could not check the validity of the proof of Theorem 1.

  • Unfortunately, I cannot follow the rest of the theoretical claims due to the lack of my understanding for the above questions.

Another question for the motivation of the paper is that: Why don't we use frame averaging for KAN model, but rather put hard equivariance to existing KAN model?

Experiment part: I also have some concerns on the setting of experiments.

  • N-body simulations is a relatively small, while representative, physical system among other types of scientific simulations. The number of parameters for MLP and EMLP is relatively high compared to EKAN, and I suspect that this advantage might come from the difference in the number parameters. Would it be possible to change the number of parameters/layers of MLP/EMLP/EKAN to see how the number of parameters have an impact on the test accuracy?
  • I think the choice of baselines is not exhaustive. For example, Steerable E(3)-GNN [1] can be applied to N-body experiments and Top Quark Tagging. Also, Clifford Group Equivariant Neural Networks [2] could be another strong baseline. While I understand that the focus is rather on the side of comparison to (E)MLP, I still think the authors should at least mention other baselines (and hopefully include those baselines in the experiments) since all the scenarios in the experiments are in scientific domain and those models are shown to be very effective to solve tasks in (some of) those experiments.

Overall, I feel the paper needs profound revision in the writing, so the paper is more self-contained and readers could follow the main idea of the paper much more comfortably.

Typo

  • Line 814, Adan optimizer

[1] Brandstetter et.al., "Geometric and Physical Quantities improve E(3) Equivariant Message Passing," ICLR 2022.

[2] Ruhe et.al., "Clifford Group Equivariant Neural Networks," NeurIPS 2023

问题

See above.

伦理问题详情

NA

评论

Thank you for your valuable feedback! We will take it into consideration in the revision. By the way, 'Adan optimizer' is not a typo. Reference: Xie, Xingyu, et al. "Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models." IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).

审稿意见
3

This paper extends the general framework of Finzi et al to construct equivariant matrix group MLP of Finzi to KANs.

优点

  • This paper is an important contribution, both theoretically and technically. The framework of Finzi et al is relatively simply, i.e., it doesn't cover steerable or regular group convolutions, yet it is applicable across a wide range of groups. An extension to KANs makes a lot of sense in many ways.

缺点

  • The paper completely neglects any literature from 2021 onwards. This is close to scientific misconduct.
  • Comparisons to steerable methods are missing. This is of (theoretical) interest since EMLP / EKAN operate by linear combination of scaled subspaces, whereas steerable methods in principle are more expressive - at the cost of being computationally slow.
  • The top quark tagging problem has been done in many different papers in the last 2 years with extremely great results. All of this is completely ignored in the paper. It is thus really hard to judge the results.

The reviewer doesn't consider it their duty to list all literature, both for general comparison, but also for experiments on the top tagging experiment.

问题

  • I would strongly advise the authors to put some effort into the paper to update it with recent works and embed it in the literature. Furthermore, the experiments need some more meat. It is scientifically not ok to report one larger experiments (top tagging) where all previous efforts are left out.
  • In the current state this paper is thus not ready for conference publication, but with mentioned changes this is a valuable contribution.
评论

Thank you for recognizing the contributions of our work. However, we completely disagree with the weaknesses you have pointed out, for the following reasons:

  1. You may have a serious misunderstanding of "scientific misconduct." Using this term to describe our work is extremely irresponsible. First, in the experimental section, we provide a detailed comparison of EKAN and KANs. Isn’t KANs a new literature from 2024? Similarly, the first paragraph of the Introduction discusses related work on KANs across different fields. Moreover, both EMLP and KANs have only been compared experimentally with MLPs from a few years ago. According to your reasoning, are all of these influential works considered scientific misconduct?

  2. To our knowledge, the concept of "steerable" is typically used for vector fields or function inputs. This is not the same track as ours, since our method focuses on vector inputs. If by "steerable" you are referring to the concept in relation to group convolutions, as defined in steerable CNNs, then in fact, EMLP is a steerable method because the group representations in the intermediate layers can be manually specified. The same logic applies to the work on top quark tagging. Compared to task-specific works like LorentzNet and EGNN, our method aims to provide a general framework that covers all scientific problems. Top quark tagging is just one example among many scientific problems that we chose to compare with other general models like MLPs, EMLPs, and KANs. Naturally, works that focus exclusively on this problem are not our primary comparison baselines.

撤稿通知

We decide to forgo the rebuttal because the ratings are not ideal.