There are some notational inconsistencies in the paper: In Section 3.1, matrix L = I + A (self-loop adjacency matrix), but in Section 3.3 it is redefined as the Laplacian Matrix.
The method is referenced by multiple names including "GCGQ" and "GQRL" including the appendix.
The baselines used for comparison are relatively old. Newer methods which use contrastive learning are achieving much higher performance on standard datasets like Cora, Citeseer and Pubmed. It is important to include the following papers as baselines: [^2], [^3], [^4], [^5]
All the graph datasets used in the paper have a few thousand nodes. Was the reason for not experimenting on large graph datasets that the complexity of the method is poor ? It might be useful to add results for a few large datasets to the paper.
A few recent papers ([^3], [^4], [^5]) which offer superior performance on Cora and Citeseer have complexities lower than the GCQC method. It is important to address the advantages of GCQC over these existing methods.
The paper talks about a "over-dominating" effect separately from the "over-smoothing" effect. However, they appear to originate from the same thing. The paper states "That is, the embeddings of topology-adjacent but attribute-dissimilar nodes will be similar due to the information aggregation dominated by the graph topology." which is what over-smoothing is.
There are multiple typing errors and similar mistakes in the paper
- Typo in figure 1: convolusion -> convolution
- In line 228, the authors write , which is a typo and should be so that the Hamiltonian product b/w and can succeed.
- In eqn. 5, the method involves a "quarternion fusion" operator, which the authors clarify to be just an average of the four parts. Why is this chosen to be the operator of choice and how do other operators such as min/max change the behaviour of the method? This seems to be analogous to the pooling operation in many ways, can any type of pooling work here?
- In line 285, the authors write "learned embeddings of the node attributes indicated by ", which is a typo. The correct definition is given in line 291.

[^2] N. Mrabah, M. Bouguessa, and R. Ksantini, ‘Escaping Feature Twist: A Variational Graph Auto-Encoder for Node Clustering’, in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, 7 2022, pp. 3351–3357. [^3] Yue Liu, Xihong Yang, Sihang Zhou, Xinwang Liu, Zhen Wang, Ke Liang, Wenxuan Tu, Liang Li, Jingcan Duan, and Cancan Chen. 2023. Hard sample aware network for contrastive deep graph clustering. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI'23/IAAI'23/EAAI'23), Vol. 37. AAAI Press, Article 1002, 8914–8922. https://doi.org/10.1609/aaai.v37i7.26071 [^4] F. Devvrit, A. Sinha, I. Dhillon, and P. Jain, ‘S3GC: Scalable Self-Supervised Graph Clustering’, in Advances in Neural Information Processing Systems, 2022, vol. 35, pp. 3248–3261. [^5] Nairouz Mrabah, Mohamed Bouguessa, Mohamed Fawzi Touati, and Riadh Ksantini. Rethinking graph autoencoder models for attributed graph clustering. IEEE Transactions on Knowledge and Data Engineering, pp. 1–15, 2022. doi: 10.1109/TKDE.2022.3220948