We appreciate the reviewer for providing constructive comments on our paper. We'll include the feedback in the camera-ready paper. In the following, we have addressed the reviewer's concerns:

Q1: Firstly, we clarify that we used cross-validation for tuning the hyperparameter. For experimental results on and , we followed the experimental settings from reference papers [1,2]. In addition to adding a comprehensive table of details like the following table for the camera-ready, we will publicly release the codes (which have already been uploaded alongside our submission) and refer to them to make these settings as clear as possible.



Optimizer	ADAM	ADAM	ADAM	ADAM	ADAM	ADAM	ADAM
Batch size	100	100	256	512	256	256	256
	1000	1000	100	100	700	100	30
	3	3	1	1	1	1	1

Q2: In order to make the connection between the proposed COSIMO and reference Graph-PDE counterparts [3,4,10], we first highlight the generalization aspect of simplicial complexes over graphs. For instance, let's consider a simple case of a simplicial complex containing nodes, edges, and triangles. In this case, by relying on graph structure, we neglect edge and triangle dynamics. Therefore, we only deal with (in the nodes space with ), and equations (6) and (7) in our framework are canceled. On the other hand, since , only the lower node space is present and equation (5) in our framework turns to . Then, by assuming the usage of the normalized version of this Laplacian, i.e., with being the normalized adjacency matrix, this equation turns to , which is equivalent to eq. (2) in GRAND [4] and eq. (7) in CGNN [3] frameworks in the isotropic linear cases. Therefore, COSIMO generalizes the CGNN and GRAND models by considering edge and triangular dynamics. Apart from these theoretical/architectural differences, we have also experimentally compared COSIMO against GRAND [4] and Graph-coupled oscillator networks (GraphCON) [10] in the following table, showcasing the superior performance by relying on higher-order dynamics.


GCN	0.400.04
GAT	0.340.05	0.500.04
GraphSage	0.270.05	0.540.03
GIN	0.180.04	0.530.04
GRAND	0.160.07	0.520.10
GraphCON	0.390.08	0.580.04
SCCNN		0.620.05
SAN		0.530.09
COSIMO	$Missing close brace\textbf{0.90$ \pm $Extra close brace or missing open brace 0.05}$	$Missing close brace\textbf{0.69$ \pm $Extra close brace or missing open brace 0.08}$

Q3: Apart from the outlined possible limitations in the conclusion section, we also point out the case of i) eventually facing over-smoothing in Figure 3, and ii) an extremely noisy regime (for example, and in Figure 4). To mitigate these limitations in these cases, it has been shown that equipping the underlying set of PDEs with source terms [5] can effectively improve the stability and robustness to over-smoothing. Therefore, the corresponding equations of our framework, i.e., independent and joint dynamics, turn to:

where , , and are source terms. Although this modification might bring the above-mentioned advantages, the solutions pose more computational burden, as detailed in different frameworks in [5].

Q4: As stated in Remark 5.6, and due to the differentiability of our framework w.r.t. the simplicial receptive fields, in all the experimental results (except the over-smoothing analysis in Section 6.2), the values of are learned during the training process, i.e., are parameters, not hyperparameters. We considered as hyperparameters only in the over-smoothing analysis in Section 6.2 to illustrate the effect of different values for on the over-smoothing phenomenon. In general, we use cross-validation over a range of possible values to tune the hyperparameters of COSIMO. Regarding the hyperparameter , we also exploited a multi-branch architecture in trajectory prediction tasks in which we have already provided its sensitivity and accuracy trade-off in the Appendix, i.e., Figure 7, by using a cross-validation scheme on the corresponding range for . Furthermore, one can select a proper using supervised (cross-validation) or unsupervised methodologies. We used the cross-validation approach in the paper. Regarding the unsupervised methodologies, one can exploit the following alternative strategies:

Eigengap heuristic [6]: The number is often found using the eigengap heuristic:

.

A large gap indicates a natural cutoff point.

Energy-based Criterion [7]: The Laplacian EVD can be viewed like PCA, such that one can choose enough eigenvectors to explain a certain percentage of spectral energy. Therefore, the total spectral energy is defined as:

.

One can thus choose the smallest such that , where is a threshold like 90%.

Spectral Gap Ratio or Knee Detection [8]: Instead of just picking the largest gap, use a normalized eigengap ratio:

.

This method accounts for the scaling of eigenvalues and can find relative jumps.

Spectral Entropy-Information-Theoretic Criteria [9]: The spectral entropy can be defined as:

.

Therefore, one can choose where the entropy contribution of the next eigenvalues becomes negligible.

Apart from introducing and listing the relevant methods, we have also experimentally applied these strategies on a subset of the node classification dataset in the following table, accompanied by the accuracy performance. Note the supervised methodology picked of eigenvalues for , which is consistent with the unsupervised methods ().

Table: Selected optimal values of eigenvalue-eigenvector pairs from 5800 ones for by different relevant methods.

	Eigengap	Energy-based	Knee detection	Spectral entropy	Cross-Validation
K	322	312	289	312	300
Acc	91.7	92.0	91.6	92.0	91.9

In conclusion and by observing results on various datasets, choosing 5-10 of eigenvalue-eigenvector pairs and number of branches () from 1-3 seems to provide a fair compromise of performance on downstream tasks.

References:

[1] M. Roddenberry, et al, “Principled simplicial neural networks for trajectory prediction,” ICML 2021

[2] M. Yang, et al, “Convolutional learning on simplicial complexes,” arXiv 2025

[3] L.-P. Xhonneux, et al, “Continuous graph neural networks,” ICML 2020

[4] B. Chamberlain, et al, “Grand: Graph neural diffusion,” ICML 2021

[5] A. Han, et al, “From continuous dynamics to graph neural networks: Neural diffusion and beyond,” TMLR 2024

[6] J. Shi, et al, “Normalized cuts and image segmentation,” IEEE TPAMI, 2000

[7] M. Belkin, et al, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural computation, 2003

[8] I. M. Johnstone, “On the distribution of the largest eigenvalue in principal components analysis,” The Annals of statistics, 2001

[9] M. De Domenico, et al, “Spectral entropies as information-theoretic tools for complex network comparison,” Physical Review X, 2016

[10] T. K. Rusch, et al, “Graph-coupled oscillator networks,” ICML 2022