PaperHub
5.3
/10
Poster4 位审稿人
最低5最高6标准差0.4
6
5
5
5
3.5
置信度
ICLR 2024

StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning

OpenReviewPDF
提交: 2023-09-22更新: 2024-04-17

摘要

关键词
Graph Contrastive LearningScalable TrainingStructural Compression

评审与讨论

审稿意见
6

This paper studies improving efficiency of graph contrastive learning. The authors propose a structural compression framework, StructComp, that adopts a low-rank approximation of the diffusion matrix to obtain compressed node embeddings. They show that the original GCL loss can be approximated with the contrastive loss computed by StructComp, with an additional benefit of the robustness. Experiments on seven benchmark datasets show that StructComp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.

优点

(+) The proposed structural compression idea is new and interesting;

(+) The presentation and organization are clear and easy to follow;

缺点

(-) The applicability of StructComp seems to be limited;

(-) Several claims have not been verified;

(-) Some related works have not been compared or discussed;

问题

  1. The applicability of StructComp seems to be limited:
  • Theoretically, StructComp has to rely on the approximation of the diffusion matrix for a specific graph and GNN model, as demonstrated in Theorem 4.1. How well StructComp can approximate to more complicated graphs such as heterophilous graphs, and more complicated while commonly used GNNs such as GraphSage, GIN, GAT, or even more interesting variants such as PNA?

  • Empirically, what is the exact setting of StructComp for node classification? Can StructComp be applied to both transductive and inductive node classification?

  • How well can StructComp approximate different augmentations in GCL?

  1. Several claims have not been verified:
  • The paper claims that StructComp can work for large scale graphs, while the benchmarked datasets are rather small or medium scale. Although it’s claimed that ogbn-products and arxiv are large datasets, while they are indeed medium scale datasets according to OGB (https://ogb.stanford.edu/docs/nodeprop/). To support the claim, it’s expected to evaluate StructComp in large datasets such as papers100M, reddit, or OGBG-LSC datasets.

  • The paper also claims that StructComp has better robustness and stability with the additional regularization, while no evidence’s been found.

  1. Some related works have not been compared or discussed:
  • Some GCL works have not been discussed in the paper, for example, [1,2,3].

  • Why not comparing efficient GCL baselines such as CCA-SSG, and GGD discussed in the paper?

  1. How the time and memory cost are computed? Do they count in the preprocessing steps?

References

[1] Calibrating and Improving Graph Contrastive Learning, TMLR’23.

[2] Single-Pass Contrastive Learning Can Work for Both Homophilic and Heterophilic Graph, TMLR’23.

[3] Scaling Up, Scaling Deep: Blockwise Graph Contrastive Learning, arXiv’23.

[4] Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data, arXiv’23.

评论

Q7:Why not comparing efficient GCL baselines such as CCA-SSG, and GGD discussed in the paper?

We have compared the performance of CCA-SSG with CCA-SSG trained using StructComp in the experimental section of the original submission. The specific results can be found in table 1 and table 2 of section 6. Clearly, the performance of CCA-SSG trained with StructComp far surpasses the original CCA-SSG. Moreover, we add a comparison between StructComp and GGD in the new submission, the specific results are shown in the table below. The performance and resource consumption of various GCL models trained with StructComp are superior to GGD.

Table 6: The results of StructComp-trained GCLs and GGD.

MethodCora AccCora TimeCora MemCiteSeer AccCiteSeer TimeCiteSeer MemPubMed AccPubMed TimePubMed Mem
GGD79.9±\pm1.70.01311871.3±\pm0.70.01828174.0±\pm2.40.015311
SCEStructComp_{\text{StructComp}}81.6±\pm0.90.0022371.5±\pm1.00.0025977.2±\pm2.90.00354
COLESStructComp_{\text{StructComp}}81.8±\pm0.80.0022471.6±\pm0.90.0036075.3±\pm3.10.00361
GRACEStructComp_{\text{StructComp}}79.7±\pm0.90.0093770.5±\pm1.00.0097277.2±\pm1.40.009194
CCA-SSGStructComp_{\text{StructComp}}82.3±\pm0.80.0063871.6±\pm0.90.0057178.3±\pm2.50.00685

Q8:How the time and memory cost are computed? Do they count in the preprocessing steps?

We compute the time and memory cost in the same way as previous work on scalable GNNs: the time is the training time per epoch and memory is the peak GPU memory cost during training. The preprocessing time for our StructComp is strictly less than common scalable methods (e.g., ClusterGCN, GraphSaint and GraphAutoScale), since it only performs METIS once (like ClusterGCN) and does not need to do random sampling in each epoch. METIS is highly scalable and even papers100M (with 100M nodes) can be processed within an hour on a commercial CPU. Thus for big graphs, the prepocessing time is dominated by the training time, and more importantly, it only needs to be done once and the partition result can be shared by all experiments.


The discussion of all the aforementioned issues will be added into the revised version of our paper. We appreciate your insightful feedback once again.

[1]H. Zhu, and P. Koniusz. Simple spectral graph convolution. ICLR 2020.

[2]H. Zhu, and P. Koniusz. Generalized Laplacian Eigenmaps. Neurips 2022.

[3]Fang, T., Xiao, Z., Wang, C., Xu, J., Yang, X., and Yang, Y. Dropmessage: Unifying random dropping for graph neural networks. AAAI 2023.

评论

Q3: How well can StructComp approximate different augmentations in GCL?

For multi-view GCLs, we designed a new data augmentation method, Dropmember, for our StructComp. It is important to note that Dropmember is not designed to approximate other graph data augmentation methods, but rather because previous graph data augmentation methods cannot be used under the StructComp framework. According to the analysis in DropMessage [3], common graph data augmentations can be unified under a single framework, and we have proved that our Dropmember falls within this framework as well.

Q4:The paper claims that StructComp can work for large scale graphs, while the benchmarked datasets are rather small or medium scale. Although it’s claimed that ogbn-products and arxiv are large datasets, while they are indeed medium scale datasets according to OGB. To support the claim, it’s expected to evaluate StructComp in large datasets such as papers100M, reddit, or OGBG-LSC datasets.

We have conducted extra experiments on the ogbn-papers100M dataset according to your suggestion. We use StructComp to train four representative GCL models. Here, we compressed ogbn-papers100M into a feature matrix XcR5000128X_c \in R^{5000*128} and trained GCL using StructComp. The table also presents the results of GGD trained with ClusterGCN. Although GGD is specifically designed for training large graphs, when dealing with datasets of the scale of ogbn-papers100M, it still requires graph sampling to construct subgraphs and train GGD on a large number of subgraphs. In contrast, our StructComp only requires training a simple and small-scale MLP, resulting in significantly lower resource consumption compared to GGD+ClusterGCN.

Table 3: The accuracy, training time per epoch and memory usage on the Ogbn-papers100M dataset.

MethodAccTimeMem
GGD63.5±\pm0.51.6h4.3GB
SCEStructComp_{\text{StructComp}}63.6±\pm0.40.18s0.1GB
COLESStructComp_{\text{StructComp}}63.6±\pm0.40.16s0.3GB
GRACEStructComp_{\text{StructComp}}64.0±\pm0.30.44s0.9GB
CCA-SSGStructComp_{\text{StructComp}}63.5±\pm0.20.18s0.1GB

Q5:The paper also claims that StructComp has better robustness and stability with the additional regularization, while no evidence’s been found.

We have conducted extra experiments to study the robustness of StructComp. We randomly add 10% of noisy edges into three datasets and perform the node classification task. On the original dataset, the models trained with StructComp showed performance improvements of 0.36, 0.40, 1.30 and 1.87, respectively, compared to the models trained with full graphs. With noisy perturbation, the models trained with StructComp showed performance improvements of 0.80, 1.27, 2.47, and 1.87, respectively, compared to full graph training. This indicates that GCL models trained with StructComp exhibit better robustness.

Table 4: The results over 50 random splits on the perturbed datasets.

MethodCoraCiteSeerPubMed
SCE78.8±\pm1.269.7±\pm1.073.4±\pm2.2
SCEStructComp_{\text{StructComp}}79.3±\pm0.969.3±\pm0.975.7±\pm2.8
COLES78.7±\pm1.268.0±\pm1.066.5±\pm1.8
COLESStructComp_{\text{StructComp}}79.0±\pm1.068.3±\pm0.969.7±\pm2.6
GRACE77.6±\pm1.164.1±\pm1.464.5±\pm1.7
GRACEStructComp_{\text{StructComp}}78.3±\pm0.869.1±\pm0.966.2±\pm2.4
CCA-SSG75.5±\pm1.369.1±\pm1.273.5±\pm2.2
CCA-SSGStructComp_{\text{StructComp}}78.2±\pm0.769.2±\pm0.876.3±\pm2.5
评论

Thank you for your detailed review. We would like to address your questions/concerns below:

Q1:Theoretically, StructComp has to rely on the approximation of the diffusion matrix for a specific graph and GNN model, as demonstrated in Theorem 4.1. How well StructComp can approximate to more complicated graphs such as heterophilous graphs, and more complicated while commonly used GNNs such as GraphSage, GIN, GAT, or even more interesting variants such as PNA?

In order to verify the approximation quality to the diffusion matrix of StructComp, we test the performance on a deep GNN architecture called SSGC [1]. We transferred the trained parameters of StructComp to the SSGC encoder for inference. For full graph training in GCL, both the training and inference stages were performed using the SSGC encoder. Table 1 shows our experimental results, indicating that even with a deeper and more complicated encoder, StructComp still achieved outstanding performance.

Table 1: The results of GCLs with SSGC encoders over 50 random splits.

MethodCoraCiteseerPubmed
SCE81.8±\pm0.972.0±\pm0.978.4±\pm2.8
SCEStructComp_{\text{StructComp}}82.0±\pm0.871.7±\pm0.977.8±\pm2.9
COLES81.8±\pm0.971.3±\pm1.174.8±\pm3.4
COLESStructComp_{\text{StructComp}}82.0±\pm0.871.6±\pm1.075.6±\pm3.0
GRACE80.2±\pm0.870.7±\pm1.077.3±\pm2.7
GRACEStructComp_{\text{StructComp}}81.1±\pm0.871.0±\pm1.078.2±\pm1.3
CCA-SSG82.1±\pm0.971.9±\pm0.978.2±\pm2.8
CCA-SSGStructComp_{\text{StructComp}}82.6±\pm0.771.7±\pm0.979.4±\pm2.6

To the best of our knowledge, current mainstream GCL models, whether single-view models such as SCE, COLES, GLEN [2], or multi-view models such as DGI, GRACE, CCA-SSG, GGD, and their variants, all use SGC or GCN encoders. These unsupervised GCL models can achieve performance surpassing supervised GNNs (such as GAT, SAGE) with simple encoders on many datasets. Few GCL models have focused on whether using different encoders can achieve better performance. Moreover, GCL training is more complicated than supervised GNNs, so introducing complex encoders may make the models more difficult to train. Considering this, we did not extensively explore how using other encoders would affect StructComp. We believe this is a good question, and we will investigate it further in future work.

On the other hand, we have not implemented StructComp on heterophilous graphs , as the chosen basic GCL methods - SCE, COLES, GRACE, and CCA-SSG - are not suitable for heterophilous graph. In our future work, we will explore applying StructComp to more complex graph data structures.

Q2:Empirically, what is the exact setting of StructComp for node classification? Can StructComp be applied to both transductive and inductive node classification?

Our StructComp can also be used to handle inductive node classification tasks. We provide additional experiments on inductive node classification in Table 2. Clearly, the GCL models trained with StructComp also perform exceptionally well on inductive node classification tasks.

Table 2: The results on two inductive datasets. OOM means Out of Memory on GPU.

MethodFlickr AccFlickr TimeFlickr MemReddit AccReddit TimeReddit Mem
SCE50.60.558427--OOM
SCEStructComp_{\text{StructComp}}51.60.0034394.40.0171068
COLES50.30.839270--OOM
COLESStructComp_{\text{StructComp}}50.70.0034894.20.0241175
GRACE--OOM--OOM
GRACEStructComp_{\text{StructComp}}51.50.01022194.30.0798683
CCA-SSG51.60.125167294.90.215157
CCA-SSGStructComp_{\text{StructComp}}51.80.0079995.20.56457
评论

Q6:Some GCL works have not been discussed in the paper, for example, [1,2,3].

It should be noted that the goal of these studies and our work are different. The aim of SPGCL is to handle homophilic graphs and heterophilic graphs simultaneously. BlockGCL attempts to explore the application of deep GNN encoder in the GCL field. Contrast-Reg is a novel regularization method which is motivated by the analysis of expected calibration error. On the other hand, StructComp is a framework designed to scale up the training of GCL models: it aims to efficiently train common GCL models without performance drop. It is not a new GCL model that aims to achieve SOTA performance compared to existing GCL models. So our work is orthogonal to these three previous works. In fact, StructComp can be used as the training method for SP-GCL, BlockGCL and Contrast-Reg. In future work, we will further investigate how to train these recent graph contrastive learning methods using StructComp.

In terms of scalability, which is the main goal of our work, we have conducted extra experiments to compare SP-GCL, BlockGCL, Contrast-Reg and StructComp-trained baseline. The results confirm that the aforementioned three GCL models are not designed for reducing training costs.

Table 5: The results of StructComp-trained GCLs and some GCL baselines over 50 random splits. For SP-GCL, we are unable to get the classification accuracy on CiteSeer since it does not take isolated nodes as input.

MethodCora AccCora TimeCora MemCiteSeer AccCiteSeer TimeCiteSeer MemPubMed AccPubMed TimePubMed Mem
BlockGCL78.1±\pm2.00.02618064.5±\pm2.00.02332974.7±\pm3.10.037986
SP-GCL81.4±\pm1.20.016247-0.02131974.8±\pm3.20.0411420
Contrast-Reg79.2±\pm1.30.04835569.8±\pm1.60.09760272.4±\pm3.50.33411655
SCEStructComp_{\text{StructComp}}81.6±\pm0.90.0022371.5±\pm1.00.0025977.2±\pm2.90.00354
COLESStructComp_{\text{StructComp}}81.8±\pm0.80.0022471.6±\pm0.90.0036075.3±\pm3.10.00361
GRACEStructComp_{\text{StructComp}}79.7±\pm0.90.0093770.5±\pm1.00.0097277.2±\pm1.40.009194
CCA-SSGStructComp_{\text{StructComp}}         82.3±\pm0.80.0063871.6±\pm0.90.0057178.3±\pm2.50.00685
评论

Dear Reviewer kF3D,

We thank you again for your insightful and constructive review. We have worked hard and have thoroughly addressed your comments in the rebuttal.

As the discussion period soon comes to an end, we are looking forward to your feedback to our response and revised manuscript. Many thanks again for your time and efforts.

Best regards,

Authors of Submission 5510

评论

Thank you for the extensive additional experiments and the comprehensive discussion. Most of my concerns are resolved, nevertheless, there remain some questions regarding the response:

  • Regarding Q1: Since the heterophilous graphs have received lots of attention from the community, it'd be better if StructComp could be implemented for both kinds of graphs, with the suitable GCL method such as SP-GCL.
  • Regarding Q8: In practice, the pre-processing time must be considered to find the best trade-off in terms of performance and efficiency, considering different scales of the graphs (the small, medium scale graphs in the original submission; and the large-scale graphs presented in the authors' response). It's important to present the efficiency results including the full pipeline of different methods, i.e., the overall time/mem cost for training the model, and the overhead for inference.
评论

As the scale of the graph dataset increases, all GCL models need to use graph sampling or graph partitioning techniques to construct subgraphs and then train GCL on subgraphs. Therefore, pre-processing like METIS is necessary, there are no issue where pre-processing increases the overall time. In our StructComp, METIS only needs to be performed once. Then, we can use the results of the graph partitioning to train various GCL models. Tables 3 and 4 in the original submission have already proven that StructComp has achieved the best trade-off between performance and efficiency on large graphs. To further illustrate this point, we provide the overall time results on 100M.

Table 10. The overall time and memory usage on the Ogbn-papers100M dataset.

MethodAccTime (Prepreocess + Training)Mem
GGD+Cluster-GCN63.5±\pm0.51.4h + 1.6h ×\times 10 = 17.4h4.3GB
SCEStructComp_{\text{StructComp}}63.6±\pm0.41.4h + 0.18s×\times 10 = 1.4h0.1GB
COLESStructComp_{\text{StructComp}}63.6±\pm0.41.4h + 0.16s×\times 10 = 1.4h0.3GB
GRACEStructComp_{\text{StructComp}}64.0±\pm0.31.4h + 0.44s×\times 10 = 1.4h0.9GB
CCA-SSGStructComp_{\text{StructComp}}63.5±\pm0.21.4h + 0.18s×\times 10 = 1.4h0.1GB

Regarding the overhead for inference. Firstly, the overhead for inference is the same whether or not StructComp is used. The overhead for inference is solely related to the specific encoder used. Secondly, our work is only focused on the training process, as it is the most resource-intensive part of the entire pipeline. In contrast, inference is much simpler compared to training, and there are already straightforward and effective methods available for accelerating inference with GNN encoders [1,2]. For the 100M dataset, we utilized these methods to speed up the inference process. In our experiments, we utilized the same encoder for each dataset. The corresponding inference overheads are displayed in Table 11.

Table 11. The overhead for inference. Due to the large size of 100M, we employed neighbor sampling during the inference process. For all other datasets, we performed full graph inference.

CoraCiteseerPubmedComputersPhotoFlickrRedditArxivProducts100M
Mem24MB63MB122MB72MB40MB362MB1.1GB434MB5.9GB5.5GB
Time0.001s0.001s0.003s0.002s0.002s0.01s0.03s0.01s0.05s159s

[1] Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M. and Leskovec, J. Open graph benchmark: Datasets for machine learning on graphs. Neurips 2020.

[2] Gasteiger, J., Qian, C. and Günnemann, S. Influence-based mini-batching for graph neural networks. LOG 2022.

评论

Thank you for the follow-up experiments and discussion. I have increased my score accordingly.

评论

We are happy to see that we could address your concern. Thank you again for the response and positive comments.

评论

Thank you for your feedback . We would like to address your questions/concerns below:

Regarding Q1: Since the heterophilous graphs have received lots of attention from the community, it'd be better if StructComp could be implemented for both kinds of graphs, with the suitable GCL method such as SP-GCL.

We provide experiments of training SP-GCL with StructComp to verify the performance of StructComp on heterophilous graphs. The experimental results are shown in Table 7. Overall, the SP-GCL trained by StructComp is superior to full graph training. This is our initial attempt to use StructComp to handle heterophilous graphs, and it is obviously a valuable direction worth further research.

Table 7. The results on heterophilous datasets.

Chameleon AccChameleon TimeChameleon MemSquirrel AccSquirrel TimeSquirrel MemActor AccActor TimeActor Mem
SP-GCL65.28±\pm0.530.03873952.10±\pm0.670.080362328.94±\pm0.690.041802
SP-GCLStructComp_{\text{StructComp}}66.65±\pm1.630.01116853.08±\pm1.390.00921728.70±\pm1.250.013159

Regarding Q8: In practice, the pre-processing time must be considered to find the best trade-off in terms of performance and efficiency, considering different scales of the graphs (the small, medium scale graphs in the original submission; and the large-scale graphs presented in the authors' response). It's important to present the efficiency results including the full pipeline of different methods, i.e., the overall time/mem cost for training the model, and the overhead for inference.

As we said in our previous response, the pre-processing of our StructComp only needs to perform METIS once. The specific time for METIS on our CPU is shown in Table 8.

Table 8. Preprocessing time.

CoraCiteseerPubmedComputersPhotoFlickrRedditArxivProducts100M
METIS0.075s0.059s0.60s1.4s0.58s1.7s26.9s15.7s225s1.4h

For small and medium datasets, the time required by METIS is very short. Therefore, in these basic GCL models, the overall time required for full graph training is still more than that of StructComp. The results are shown in Table 9.

Table 9. The overall time (seconds) and memory usage (MB) on small and medium datasets.

Cora MemCora TimeCiteseer MemCiteseer TimePubmed MemPubmed TimeComputers MemComputers TimePhoto MemPhoto Time
SCE820.161590.2018311.99201.53290.60
SCEStructComp_{\text{StructComp}}230.17590.16540.67291.4160.62
COLES1150.202040.3218511.710183.13781.5
COLESStructComp_{\text{StructComp}}240.18600.21610.75391.5210.64
GRACE4411.77142.51167718.9594329.6199621.2
GRACEStructComp_{\text{StructComp}}370.24720.871941.3542.6592.2
CCA-SSG1321.02251.182512.3241810.5119711.2
CCA-SSGStructComp_{\text{StructComp}}380.17710.31850.9401.6411.1
审稿意见
5

This paper introduces StructComp, a scalable training framework for Graph Contrastive Learning (GCL). By replacing the message-passing operation in GCL with node-compression, StructComp achieves significant reductions in both time and memory consumption. The authors provide both theoretical analysis and empirical evaluations to underscore the effectiveness and efficiency of StructComp in training GCL models.

优点

  1. The storyline is relatively clear, it is easy to follow for the authors.
  2. The experiment results are amazing, especially the time-saving.
  3. The used method is quite simple.

缺点

  1. Lack of discussion of graph partition: The paper lacks a comprehensive discussion on graph partitioning. Given that the efficacy of the method hinges on graph partitioning—a classic NP-hard problem—a detailed exploration of its impact on the proposed method is warranted. A cursory introduction does not suffice.
  2. Inadequate theoretical provements: The theoretical justifications provided are somewhat limited. The authors' attempt to establish the equivalence between the compressed loss and the original loss is based solely on the ER model, which may not be representative of real-world datasets.
  3. Lack of the discussion about the limitations.

问题

  1. Adding more discussions about graph partition: The authors should delve deeper into the topic of graph partitioning, as highlighted in the first weakness.

  2. Considering not over-claiming your work: It's crucial to avoid overstating the contributions. While the authors assert that they have provided theoretical proof, the strong assumptions (like the ER model) limit its applicability. It might be prudent to either temper such claims in the abstract and introduction or offer more exhaustive proof. In essence, while I acknowledge the novelty and results presented in this paper, I urge the authors to provide a more in-depth rationale behind their impressive outcomes. Without this, the paper leans more toward a technical report than a comprehensive research paper.

评论

Q2:Inadequate theoretical provements: The theoretical justifications provided are somewhat limited.

We understand your concern that the ER model may seem a strong assumption and possibly not entirely reflective of real-world scenarios. However, random graphs like the ER graph and the CSBM is widely adopted for GNN analysis(e.g., [3-8]). In our paper, we choose ER over CSBM since they have no significant difference in unsupervised GCL. We will specify the data distribution we used for analysis in the abstract and introduction. We would also like to point out that strong assumptions are common in the analysis of neural networks, e.g., infinite width in NTK [9], shallow layers [10], removing non-linearity [11], and strong data distribution assumptions [3-8].

According to the suggestion of the reviewer, we give an extra analysis on arbitrary graphs. For non-random graphs, the approximation gap of losses is simply bounded by the Eq 4. Suppose the loss L\mathcal{L} is LL-Lipschitz continuous,

$

\begin{split}
    |\mathcal{L}(P^\dagger P^TXW)-\mathcal{L}(\hat{A}^kXW)|\leq L \underbrace{\Vert P^\dagger P^T - \hat{A}^k\Vert}_\text{Eq 4.}  \Vert X\Vert  \Vert  W\Vert .
\end{split}

$

And for a spectral contrastive loss L\mathcal{L}' , assume the graph partition are even, we have:

$

\mathcal{L}'(P^TXW)=-\frac{2}{n}\sum_{i=1}^n e^T_{1,i}e_{2,i}+\frac{1}{n^2}\sum_{i=1}^n \sum_{j=1}^n (e^T_{1,i} e_{2,j})^2 =-\frac{2}{n}\sum_{k=1}^{n'}\sum_{i\in S_k}e^T_{1,i}e_{2,i}+\frac{1}{n^2}\sum_{i=1}^n \sum_{l=1}^{n'} \sum_{j\in S_l} (e^T_{1,i} e_{2,j})^2

$

$

    =-\frac{2}{n'}\sum_{k=1}^{n'}E^T_{1,i}E_{2,i}+\frac{1}{nn'}\sum_{i=1}^n \sum_{l=1}^{n'}  (e^T_{1,i} E_{2,j})^2
    =-\frac{2}{n'}\sum_{k=1}^{n'}E^T_{1,i}E_{2,i}+\frac{1}{nn'}\sum_{k=1}^{n'}\sum_{i\in S_k} \sum_{l=1}^{n'}  (e^T_{1,i} E_{2,j})^2

$

$

    =-\frac{2}{n'}\sum_{k=1}^{n'}E^T_{1,i}E_{2,i}+\frac{1}{n'^2}\sum_{k=1}^{n'}\sum_{l=1}^{n'}  (E^T_{1,i} E_{2,j})^2=\mathcal{L}'(P^\dagger P^TXW),

$

where e1,ie_{1,i} denotes the representations of a recovered node and E1,iTE^T_{1,i} denotes the representations of a compressed node. The above analysis shows that our approximation is reasonable for fixed graphs.

Empirically, we want to highlight Figure 3 in the original submission as it greatly validates our approximation on real-world datasets. We have trained two encoders UU and WW with the compressed loss L(Xc;U)\mathcal{L}(X_c; U) and the original loss L(A,X;W)\mathcal{L}(A,X;W), respectively. And we plot the trends of the origianl loss on UU and WW (i.e., L(A,X;U)\mathcal{L}(A,X;U) and L(A,X;W)\mathcal{L}(A,X;W)). The figure clearly shows that vanilla training method and StructComp yields extremely similar trends, which justifies our approximation.

Q3:Lack of the discussion about the limitations.

Thank you for your feedback. We will provide a detailed discussion of the limitations of StructComp in the revised version. To our knowledge, StructComp is the first training framework specifically designed for GCL models, focusing only on basic and common GCL models. Our future work will involve generalizing StructComp to more complex GCL models, graph data structures.

评论

The discussion of all the aforementioned issues will be added into the revised version of our paper. We appreciate your insightful feedback once again.

[1]Chiang, W. L., Liu, X., Si, S., Li, Y., Bengio, S., & Hsieh, C. J. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. KDD 2019.

[2]Fey, M., Lenssen, J. E., Weichert, F., & Leskovec, J. Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. ICML 2021

[3]Wei, R., Yin, H., Jia, J., Benson, A. R., & Li, P. Understanding non-linearity in graph neural networks from the bayesian-inference perspective. Neurips 2022.

[4]Wu, X., Chen, Z., Wang, W. W., & Jadbabaie, A. A Non-Asymptotic Analysis of Oversmoothing in Graph Neural Networks. ICLR 2023.

[5]Su, J., Zou, D., Zhang, Z., & Wu, C. Towards Robust Graph Incremental Learning on Evolving Graphs. ICML 2023.

[6]Keriven, N., Bietti, A., & Vaiter, S. Convergence and stability of graph convolutional networks on large random graphs. Neurips 2020.

[7]Keriven, N., Bietti, A., & Vaiter, S. On the universality of graph neural networks on large random graphs. Neurips 2021.

[8]Keriven, N. Not too little, not too much: a theoretical analysis of graph (over) smoothing. Neurips 2022.

[9]Jacot, A., Gabriel, F., & Hongler, C. Neural tangent kernel: Convergence and generalization in neural networks. Neurips 2018.

[10]Hsu, D., Sanford, C. H., Servedio, R., & Vlatakis-Gkaragkounis, E. V. On the approximation power of two-layer networks of random relus. COLT 2021.

[11]Awasthi, P., Das, A., & Gollapudi, S. A convergence analysis of gradient descent on graph neural networks. Neurips 2021.

评论

Thank you for your comments! Below are our responses.

Q1:Lack of discussion of graph partition.

Our main contribution is a novel scalable framework for training GCLs and empirically show that even with off-the-shelf partition algorithms, our framework achieves remarkable speedup. We believe that no need for a specially designed partition algorithm is actually a big advantage.

We agree that different graph partition algorithms will have an impact on the performance of StructComp. We have conducted extra experiments on graph partition. In Table 1, we demonstrate the effects of three algorithms, algebraic JC, variation neighborhoods, and affinity GS, on the performance of StructComp. These three graph coarsening algorithms are widely used in scalable GNNs, from which we can obtain the specific graph partition matrix P. The experimental results suggest that different graph partition methods has little impact on StructComp on these datasets.

Table 1: The results of different graph partition methods.

MethodCoraCiteSeerPubMed
VN+SCEStructComp_{\text{StructComp}}81.3±\pm0.871.5±\pm1.077.5±\pm2.7
JC+SCEStructComp_{\text{StructComp}}81.2±\pm0.971.5±\pm1.177.3±\pm2.7
GS+SCEStructComp_{\text{StructComp}}81.5±\pm0.871.4±\pm1.077.4±\pm3.0
METIS+SCEStructComp_{\text{StructComp}}81.6±\pm0.971.5±\pm1.077.2±\pm2.9
VN+COLESStructComp_{\text{StructComp}}81.4±\pm0.971.6±\pm0.975.5±\pm3.0
JC+COLESStructComp_{\text{StructComp}}81.4±\pm0.971.5±\pm1.075.3±\pm3.0
GS+COLESStructComp_{\text{StructComp}}81.8±\pm0.871.6±\pm1.075.5±\pm3.2
METIS+COLESStructComp_{\text{StructComp}}81.8±\pm0.871.6±\pm0.975.3±\pm3.1

Graph partition is a common technique for scalable supervised GNN (e.g., ClusterGCN[1], GAS[2]), thus we have not introduced its details in our paper. To our knowledge, Metis is the mainstream graph partitioning method for scalable supervised GNNs. Previous work has not deeply investigated the impact of graph partitioning methods on scalable GNNs. We believe that this is a valuable research topic and we notice that there is an active submission at ICLR 2024 regarding this issue on scalable supervised GNNs .

评论

Dear Reviewer iDuQ,

We thank you again for your insightful and constructive review. We have worked hard and have thoroughly addressed your comments in the rebuttal.

As the discussion period soon comes to an end, we are looking forward to your feedback to our response and revised manuscript. Many thanks again for your time and efforts.

Best regards,

Authors of Submission 5510

评论

Dear Reviewer iDuQ,

The rebuttal phase ends today and we have not yet received feedback from you. We believe that we have addressed all of your previous concerns. We would really appreciate that if you could check our response and updated paper.

Looking forward to hearing back from you.

Best Regards,

Authors

审稿意见
5

This paper aim at resolving the scalability issue of graph contrastive learning training. In graph representation learning, the most compiutation overhead comes from message passing, where its complexity grows exponentially wrt the num of layers in GNN.

To overcome this scalability issue, the authors propose StructComp trains the encoder with the compressed nodes. StructComp allows the encoder not to perform any message passing during the training stage.

优点

I like the idea of using compressing nodes to replace the need of message passing.

缺点

  • The theoritical results only hold in linear-GNN, which over-simplifies the problem. It is well know that deep neural network behave different from linear model in contrastive learning [1]. Without consider non-linearity, the problem in Eq. 4 is simply matrix decomposition problem (e.g., [2] section 3).

[1] Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning. https://arxiv.org/abs/2206.01342 [2] Understanding Deep Contrastive Learning via Coordinate-wise Optimization https://arxiv.org/pdf/2201.12680.pdf

  • Experiment datasets are too small (even arxiv dataset is small)... please try some large-scale graph datasets (e.g., Yelp, Reddit datasets that previously GraphSaint paper) to validate the effectiveness. Especially when this paper is focussing on improving the scalability issue.

  • Repeat experiment multiple times instead of just once. For example Figure 4.

问题

How theoritical results could be generalized to non-linear models?

Does the proposed method work for graphs with multiple node/edge types?

评论

Thank you for your detailed review. We would like to address your questions/concerns below:

Q1: How theoretical results could be generalized to non-linear models?

Thanks for the insightful question. We explain how to extend the results to non-linear deep models below.

Eq. (4) provides the motivation of structural compression on a linear GNN (which can also be considered as an approximation to one layer in a multi-layer non-linear GNN). The analysis can be extended to non-linear deep GNNs. For instance, given a two-layer non-linear GCN σ(hatAσ(hatAXW1)W2)\sigma(\\hat{A}\sigma(\\hat{A}XW_1)W_2), we first approximate A^\hat{A} by P^† P^T, then the whole GCN can be approximated as

$ \sigma(P^† P^T\sigma(P^† P^TXW_1)W_2)&=\sigma(P^† P^TP^†\sigma(P^TXW_1)W_2)\\ &=P^†\sigma(\sigma(P^TXW_1)W_2). $

The first equality holds because P^† is a partition matrix and the last equality follows from the fact that P^TP^† = I. Therefore, our analysis provides theoretical justifications of using StructComp as a substitute for non-linear deep GNNs. We will add this extended analysis to the revision. Note that previous studies on scalable methods for training GNNs, such as GraphSaint [1], FastGCN [2], Adapt [3], GRADE [4] also rely on various types of approximation to the propagation matrix, however, their analyses only focus on the approximation quality to a single layer or only work for linear GNNs. On the contrary, our analysis of StructComp can be easily extended to non-linear deep GNNs, which is another advantage of our framework.

Q2:Experiment datasets are too small (even arxiv dataset is small)... please try some large-scale graph datasets (e.g., Yelp, Reddit datasets that previously GraphSaint paper) to validate the effectiveness.

The ogbn-products dataset used in our experiments has 2,449,029 nodes, which is much larger than both Yelp and Reddit you suggested. We have also conducted extra experiments on Flickr and Reddit to show the performance of StructComp under inductive setting. The experimental results are as follows. StructComp shows remarkable scalability on these datasets.

Table 1: The results on two inductive datasets. OOM means Out of Memory on GPU.

MethodFlickr AccFlickr TimeFlickr MemReddit AccReddit TimeReddit Mem
SCE50.60.558427--OOM
SCEStructComp_{\text{StructComp}}51.60.0034394.40.0171068
COLES50.30.839270--OOM
COLESStructComp_{\text{StructComp}}50.70.0034894.20.0241175
GRACE--OOM--OOM
GRACEStructComp_{\text{StructComp}}51.50.01022194.30.0798683
CCA-SSG51.60.125167294.90.215157
CCA-SSGStructComp_{\text{StructComp}}51.80.0079995.20.56457

According to the suggestions of other reviewers, the experiments on the ogbn-papers100M (which has 111,059,956 nodes) are conducted as well. The experimental results are shown in Table 2. Here, we compressed ogbn-papers100M into a feature matrix XcR5000128X_c \in R^{5000*128} and trained GCL using StructComp. Table 2 also presents the results of GGD trained with ClusterGCN. Although GGD is specifically designed for training large graphs, when dealing with datasets of the scale of ogbn-papers100M, it still requires graph sampling to construct subgraphs and train GGD on a large number of subgraphs. In contrast, our StructComp only requires training a simple and small-scale MLP, resulting in significantly lower resource consumption compared to GGD+ClusterGCN.

Table 2: The accuracy, training time per epoch and memory usage on the Ogbn-papers100M dataset.

MethodAccTimeMem
GGD+ClusterGCN63.5±\pm0.51.6h4.3GB
SCEStructComp_{\text{StructComp}}63.6±\pm0.40.18s0.1GB
COLESStructComp_{\text{StructComp}}63.6±\pm0.40.16s0.3GB
GRACEStructComp_{\text{StructComp}}64.0±\pm0.30.44s0.9GB
CCA-SSGStructComp_{\text{StructComp}}63.5±\pm0.20.18s0.1GB
评论

Q3:Repeat experiment multiple times instead of just once. For example Figure 4.

We emphasize that all of the experiments in the original submission are results of multiple repetitions. On Cora, Citeseer, PubMed, Computers, and Photo, we repeated the experiment 50 times. On Arxiv and Products, we repeated the experiment 5 times. Figure 4 shows the average accuracy of 50 repetitions.

Q4:Does the proposed method work for graphs with multiple node/edge types?

Currently, we have not implemented StructComp on graphs with multiple node/edge types, as the chosen basic GCL methods - SCE, COLES, GRACE, and CCA-SSG - are not suitable for these types of graph data. In our future work, we will explore applying StructComp to more complex graph data structures.


The discussion of all the aforementioned issues will be added into the revised version of our paper. We appreciate your insightful feedback once again.

[1] H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. K. Prasanna. Graphsaint: Graph sampling based inductive learning method. ICLR 2020.

[2] J. Chen, T. Ma, an C. Xiao. Fastgcn: fast learning with graph convolutional networks via importance sampling. ICLR 2018.

[3] W. Huang, T. Zhang, Y. Rong, and J. uang. Adaptive sampling towards fast graph representation learning. NIPS 2018.

[4] R Wang, X Wang, C Shi, L Song. Uncovering the Structural Fairness in Graph Contrastive Learning. Neurips 2022.

评论

Dear Reviewer v4Cz,

We thank you again for your insightful and constructive review. We have worked hard and have thoroughly addressed your comments in the rebuttal.

As the discussion period soon comes to an end, we are looking forward to your feedback to our response and revised manuscript. Many thanks again for your time and efforts.

Best regards,

Authors of Submission 5510

评论

According to the suggestion from reviewer kF3D, we have conducted experiments to train SP-GCL with StructComp, in order to verify the performance of StructComp on heterophilous graphs. The experimental results are shown in Table 3. Overall, the SP-GCL trained by StructComp is superior to full graph training. This is our initial attempt to use StructComp to handle heterophilous graphs, and it is obviously a valuable direction worth further research.

Table 3. The results on heterophilous datasets.

Chameleon AccChameleon TimeChameleon MemSquirrel AccSquirrel TimeSquirrel MemActor AccActor TimeActor Mem
SP-GCL65.28±\pm0.530.03873952.10±\pm0.670.080362328.94±\pm0.690.041802
SP-GCLStructComp_{\text{StructComp}}66.65±\pm1.630.01116853.08±\pm1.390.00921728.70±\pm1.250.013159
评论

Dear Reviewer v4Cz,

The rebuttal phase ends today and we have not yet received feedback from you. We believe that we have addressed all of your previous concerns. We would really appreciate that if you could check our response and updated paper.

Looking forward to hearing back from you.

Best Regards,

Authors

审稿意见
5

The paper proposes Structural Compression (StructComp), a new training framework that improves the scalability of graph contrastive learning (GCL) models. The key idea is to substitute propagation with a sparse, low-rank approximation of the diffusion matrix to compress the nodes. Contrastive learning is performed on these compressed nodes, reducing computation and memory costs. Theoretical analysis shows the compressed loss approximates the original loss and StructComp implicitly regularizes the model. Experiments on various single-view and multi-view GCL methods demonstrate StructComp's improvements in performance and efficiency.

优点

  1. The paper is well-written and easy to follow. The problem is motivated well, and the method is explained clearly.

  2. Scalability is a major bottleneck hindering wider adoption of graph neural networks. This work makes an important contribution by enabling efficient training of GCL models.

缺点

  1. Additional experiments could help verify claims on scalability and robustness of StructComp:
  • Evaluating on larger datasets like papers100M and or OGBG-LSC datasets would better support scalability claims, since the experimented datasets are rather small or medium scale.

  • Would be great to verify the model stability/robustness with the proposed regularization, since it is claimed in the presentation.

  1. Would be great to discuss the approximation quality to the diffusion matrix of StructComp for more complicated graphs and other models architectures (like GAT, GraphSAGE) .

  2. There is a lack of comparisons with certain related works, such as recent graph contrastive learning methods [1-2]

[1] ] Wang, H., Zhang, J., Zhu, Q., & Huang, W. (2022). Can Single-Pass Contrastive Learning Work for Both Homophilic and Heterophilic Graph?. arXiv preprint arXiv:2211.10890.

[2] Li, J., Sun, W., Wu, R., Zhu, Y., Chen, L., & Zheng, Z. (2023). Scaling Up, Scaling Deep: Blockwise Graph Contrastive Learning. arXiv preprint arXiv:2306.02117.

问题

See the weakness above

评论

Q4:There is a lack of comparisons with certain related works, such as recent graph contrastive learning methods [1-2].

It should be noted that the goal of these studies and our work are different. The aim of SPGCL is to handle homophilic graphs and heterophilic graphs simultaneously. BlockGCL attempts to explore the application of deep GNN encoder in the GCL field. On the other hand, StructComp is a framework designed to scale up the training of GCL models: it aims to efficiently train common GCL models without performance drop. It is not a new GCL model that aims to achieve SOTA performance compared to existing GCL models. So our work is orthogonal to these two previous works. In fact, StructComp can be used as the training method for both SP-GCL and BlockGCL. In future work, we will further investigate how to train these recent graph contrastive learning methods using StructComp.

In terms of scalability, which is the main goal of our work, we have conducted extra experiments to compare SP-GCL, BlockGCL and StructComp-trained baseline. The results confirm that the aforementioned two GCL models are not designed for reducing training costs.

Table 4: The results of StructComp-trained GCLs and some GCL baselines over 50 random splits. For SP-GCL, we are unable to get the classification accuracy on CiteSeer since it does not take isolated nodes as input.

MethodCora AccCora TimeCora MemCiteSeer AccCiteSeer TimeCiteSeer MemPubMed AccPubMed TimePubMed Mem
BlockGCL78.1±\pm2.00.02618064.5±\pm2.00.02332974.7±\pm3.10.037986
SP-GCL81.4±\pm1.20.016247-0.02131974.8±\pm3.20.0411420
SCEStructComp_{\text{StructComp}}81.6±\pm0.90.0022371.5±\pm1.00.0025977.2±\pm2.90.00354
COLESStructComp_{\text{StructComp}}81.8±\pm0.80.0022471.6±\pm0.90.0036075.3±\pm3.10.00361
GRACEStructComp_{\text{StructComp}}79.7±\pm0.90.0093770.5±\pm1.00.0097277.2±\pm1.40.009194
CCA-SSGStructComp_{\text{StructComp}}         82.3±\pm0.80.0063871.6±\pm0.90.0057178.3±\pm2.50.00685

The discussion of all the aforementioned issues will be added into the revised version of our paper. We appreciate your insightful feedback once again.

[1]H. Zhu, and P. Koniusz. Simple spectral graph convolution. ICLR 2020.

[2]H. Zhu, and P. Koniusz. Generalized Laplacian Eigenmaps. Neurips 2022.

评论

Thank you for your detailed review. We would like to address your questions/concerns below:

Q1:Evaluating on larger datasets like papers100M and or OGBG-LSC datasets would better support scalability claims, since the experimented datasets are rather small or medium scale.

We have conducted extra experiments on the ogbn-papers100M dataset according to your suggestion. We use StructComp to train four representative GCL models. Here, we compressed ogbn-papers100M into a feature matrix XcR5000128X_c \in R^{5000*128} and trained GCL using StructComp. The table also presents the results of GGD trained with ClusterGCN. Although GGD is specifically designed for training large graphs, when dealing with datasets of the scale of ogbn-papers100M, it still requires graph sampling to construct subgraphs and train GGD on a large number of subgraphs. In contrast, our StructComp only requires training a simple and small-scale MLP, resulting in significantly lower resource consumption compared to GGD+ClusterGCN.

Table 1: The accuracy, training time per epoch and memory usage on the Ogbn-papers100M dataset.

MethodAccTimeMem
GGD63.5±\pm0.51.6h4.3GB
SCEStructComp_{\text{StructComp}}63.6±\pm0.40.18s0.1GB
COLESStructComp_{\text{StructComp}}63.6±\pm0.40.16s0.3GB
GRACEStructComp_{\text{StructComp}}64.0±\pm0.30.44s0.9GB
CCA-SSGStructComp_{\text{StructComp}}63.5±\pm0.20.18s0.1GB

Q2:Would be great to verify the model stability/robustness with the proposed regularization, since it is claimed in the presentation.

We have conducted extra experiments to study the robustness of StructComp. We randomly add 10% of noisy edges into three datasets and perform the node classification task. On the original dataset, the models trained with StructComp showed performance improvements of 0.36, 0.40, 1.30 and 1.87, respectively, compared to the models trained with full graphs. With noisy perturbation, the models trained with StructComp showed performance improvements of 0.80, 1.27, 2.47, and 1.87, respectively, compared to full graph training. This indicates that GCL models trained with StructComp exhibit better robustness.

Table 2: The results over 50 random splits on the perturbed datasets.

MethodCoraCiteSeerPubMed
SCE78.8±\pm1.269.7±\pm1.073.4±\pm2.2
SCEStructComp_{\text{StructComp}}79.3±\pm0.969.3±\pm0.975.7±\pm2.8
COLES78.7±\pm1.268.0±\pm1.066.5±\pm1.8
COLESStructComp_{\text{StructComp}}79.0±\pm1.068.3±\pm0.969.7±\pm2.6
GRACE77.6±\pm1.164.1±\pm1.464.5±\pm1.7
GRACEStructComp_{\text{StructComp}}78.3±\pm0.869.1±\pm0.966.2±\pm2.4
CCA-SSG75.5±\pm1.369.1±\pm1.273.5±\pm2.2
CCA-SSGStructComp_{\text{StructComp}}78.2±\pm0.769.2±\pm0.876.3±\pm2.5
评论

Q3:Would be great to discuss the approximation quality to the diffusion matrix of StructComp for more complicated graphs and other models architectures (like GAT, GraphSAGE).

In order to verify the approximation quality to the diffusion matrix of StructComp, we test the performance on a deep GNN architecture called SSGC [1]. We transferred the trained parameters of StructComp to the SSGC encoder for inference. For full graph training in GCL, both the training and inference stages were performed using the SSGC encoder. Table 3 shows our experimental results, indicating that even with a deeper and more complicated encoder, StructComp still achieved outstanding performance. To the best of our knowledge, current mainstream GCL models, whether single-view models such as SCE, COLES, GLEN [2], or multi-view models such as DGI, GRACE, CCA-SSG, GGD, and their variants, all use SGC or GCN encoders. These unsupervised GCL models can achieve performance surpassing supervised GNNs (such as GAT, SAGE) with simple encoders on many datasets. Few GCL models have focused on whether using different encoders can achieve better performance. Moreover, GCL training is more complicated than supervised GNNs, so introducing complex encoders may make the models more difficult to train. Considering this, we did not extensively explore how using other encoders would affect StructComp. We believe this is a good question, and we will investigate it further in future work.

Table 3: The results of GCLs with SSGC encoders over 50 random splits.

MethodCoraCiteseerPubmed
SCE81.8±\pm0.972.0±\pm0.978.4±\pm2.8
SCEStructComp_{\text{StructComp}}82.0±\pm0.871.7±\pm0.977.8±\pm2.9
COLES81.8±\pm0.971.3±\pm1.174.8±\pm3.4
COLESStructComp_{\text{StructComp}}82.0±\pm0.871.6±\pm1.075.6±\pm3.0
GRACE80.2±\pm0.870.7±\pm1.077.3±\pm2.7
GRACEStructComp_{\text{StructComp}}81.1±\pm0.871.0±\pm1.078.2±\pm1.3
CCA-SSG82.1±\pm0.971.9±\pm0.978.2±\pm2.8
CCA-SSGStructComp_{\text{StructComp}}82.6±\pm0.771.7±\pm0.979.4±\pm2.6
评论

Dear Reviewer PXMh,

We thank you again for your insightful and constructive review. We have worked hard and have thoroughly addressed your comments in the rebuttal.

As the discussion period soon comes to an end, we are looking forward to your feedback to our response and revised manuscript. Many thanks again for your time and efforts.

Best regards,

Authors of Submission 5510

评论

According to the suggestion from reviewer kF3D, we have conducted experiments to train SP-GCL with StructComp, in order to verify the performance of StructComp on heterophilous graphs. The experimental results are shown in Table 5. Overall, the SP-GCL trained by StructComp is superior to full graph training. This is our initial attempt to use StructComp to handle heterophilous graphs, and it is obviously a valuable direction worth further research.

Table 5. The results on heterophilous datasets.

Chameleon AccChameleon TimeChameleon MemSquirrel AccSquirrel TimeSquirrel MemActor AccActor TimeActor Mem
SP-GCL65.28±\pm0.530.03873952.10±\pm0.670.080362328.94±\pm0.690.041802
SP-GCLStructComp_{\text{StructComp}}66.65±\pm1.630.01116853.08±\pm1.390.00921728.70±\pm1.250.013159
评论

Dear Reviewer PXMh,

The rebuttal phase ends today and we have not yet received feedback from you. We believe that we have addressed all of your previous concerns. We would really appreciate that if you could check our response and updated paper.

Looking forward to hearing back from you.

Best Regards,

Authors

AC 元评审

Paper proposes a technique called Structural Compression to improve the compute and memory efficiency for Graph Contrastive Learning (GCL). Main idea is to merge clusters of nodes into a super-nodes and perform GCL on the “super-graph” of super-nodes (done using standard min-cut algorithms). Super-node embedding is taken as the average node embedding of cluster components. Authors provide experimental results showing efficiency and performance gains. Authors argue that performance gain is due to regularization effect of the compression. Additionally they also provide a new augmentation method for multi-view GCL with Structural Compression. During the review process authors were able to provide mostly positive additional experiments on robustness, larger graphs, heterophillc graphs, and other baselines.

However, there is very no discussion (even after asked during open discussion) on the limitation of the method, and it is not clear why it should work for heterophillic graphs (note it reduced performance in one such dataset). Nitpicking: ^\dagger is not a good notation since it is traditionally being used for pseudo inverse of a matrix.

为何不给更高分

Questions on limitation and heterophilic graphs.

为何不给更低分

Novel simple idea and good results

最终决定

Accept (poster)