6.0

/10

Poster4 位审稿人

最低4最高7标准差1.2

4.0

置信度

正确性3.5

贡献度3.3

表达3.0

NeurIPS 2024

Deep Graph Neural Networks via Posteriori-Sampling-based Node-Adaptative Residual Module

Jingbo Zhou,Yixuan Du,Ruqiong Zhang,Jun Xia,Zhizhi Yu,Zelin Zang,Di Jin,Carl Yang,Rui Zhang,Stan Z. Li

OpenReview PDF

提交: 2024-05-13更新: 2024-11-06

TL;DR

We propose Posteriori-Sampling-based Node-Adaptative Residual Module to enhance the performances of deep graph neural networks.

摘要

关键词

Graph Neural NetworksDeep Models

评审与讨论

审稿意见

评分: 4置信度: 32024-06-23

The paper proposes to address over-smoothing through the lens of overlapping neighborhoods and revolves around addressing it in residual GCN line of work. It highlights the lack of adaptability of higher-order neighborhood information as the limitation of previous residual methods. To overcome these drawbacks, it introduces PSNR, which learns node adaptive residual coefficients through posterior inference using a graph encoder.

优点

The method to use node-level coefficient for residual connection from the neighborhood is novel.
The paper is written clear and easy to follow.

缺点

The discussion of related works is limited: only the residual methods are discussed. To better position the contribution, it seems necessary to give a brief overview of other types of GNN methods for over-smoothing alleviation. For example, GPR-GNN [1] looks closely related since it also proposes adaptive “GPR-weights” for message aggregation.
The state-of-the-art baseline methods on the heterophilic datasets, such as GPR-GNN [1] and ACM-GNN [1], are not discussed in the related works or included in the performance comparison.
Figure 3 shows for deeper architectures, the performance of the vanilla initial residual (init-res) module is almost identical to PSNR, suggesting that init-res already mitigates over-smoothing to a great extent. How can the more complex PSNR module be justified? Meanwhile, the baseline init-res-GCN is missing from Table 3.
The reported performance of the baselines, such as GCN, GCNII, do not match with the original papers for Citeseer, Cora, Pubmed, ogb-arxiv datasets. The paper reported much worse performance of these methods than what's in the original papers. I wonder if the authors could clarify it.
Experiment section: It would be interesting to see the empirical study of the residual coefficients. E.g. how does the coefficient change with a) node degree, and b) the layer k?

##References: [1] Luan et al. Revisiting Heterophily for Graph Neural Networks. NeurIPS, 2022. [2] Chien et al. Adaptive Universal Generalized PageRank Graph Neural Network. ICLR, 2021.

问题

Please refer to the weaknesses.

局限性

There is a brief discussion on the limitations.

作者回复

2024-08-07

We thank the reviewer for the valuable feedback. We are glad that the reviewer appreciates the idea and technical contributions of our work. Below, we address the reviewer’s concerns one by one.

Q1:The discussion of related works is limited ...

A1: Thank you for the suggestion. At line 31, we briefly categorized the methods for alleviating over-smoothing into three types. Considering that residual methods are a significant part of our approach, we focused primarily on them in our discussion. However, we acknowledge the importance of providing a comprehensive overview of related works, such as GPR-GNN. From our perspective, GPR-GNN proposes adaptive “GPR-weights” for different neighborhood subgraph aggregation, which seems related to our method. However, the main difference between GPR-GNN and PSNR-GNN is that PSNR leverages node-adaptive residual coefficients, which enhance the higher-order neighborhood subgraph aggregation that effectively alleviates higher-order information loss. This fundamental distinction differentiates our method from GPR-GNN. The results of model performance in General Response Table2 support our claim. We will incorporate this broader perspective in the revised manuscript to better position our contribution within the context of existing work.

Q2: The state-of-the-art baseline methods on the heterophilic datasets ... are not discussed in the related works or included in the performance comparison.

A2: ACM-GNN uses high-pass, low-pass, and identity channels to extract richer localized information for various heterophilic scenarios, while GPR-GNN learns GPR weights that adjust to node label patterns to address feature over-smoothing. Our primary focus is on the over-smoothing problem rather than heterophilic graphs, which is why ACM-GNN was not discussed. GPR-GNN, which also mitigates over-smoothing, has been compared to PSNR-GNN in previous responses. To further substantiate our approach, we conducted comparative experiments with both ACM-GNN and GPR-GNN. The results are summarized in General Response Table2. As can be observed from the table, PSNR-GNN outperforms both ACM-GNN and GPR-GNN across most datasets.

Q3: Figure 3 shows for deeper architectures, the performance of the vanilla initial residual (init-res) module is almost identical to PSNR...

A3: In the paper Figure 3 depicts the results for GCNII, as noted in line 278 of the manuscript. GCNII, being a state-of-the-art init-res method, incorporates an identity matrix that allows the initial residuals to be effectively deepened. Therefore, we selected it as the baseline for initial residual methods. Similarly, Table 3 also represents the performance of init-res methods through GCNII. To demonstrate the advantages of our method over initial residuals and GCNII, we conducted additional comparative experiments following the setup of Experiment 1. These experiments include comparisons among the vanilla GCN, the init-res structure, GCNII, and our PSNR method. The results, presented in General Response Figure 1, show that the vanilla init-res structure alone cannot effectively deepen GCNs. In contrast, both our method and GCNII significantly enhance the performance of init-res. Moreover, our approach outperforms GCNII in overall performance. This highlights the superiority of the more complex PSNR module.

Q4: The reported performance of the baselines, such as GCN, GCNII, do not match with the original papers for Citeseer, Cora, Pubmed, ogb-arxiv datasets. The paper reported much worse performance of these methods than what's in the original papers. I wonder if the authors could clarify it.

A4: Firstly, our experiments utilized DGL to implement the models, and all methods were trained, validated, and tested within the same code framework. The detailed code and dataset split file are available at the link provided in the paper. Additionally, the inconsistency in results is due to our use of ten random splits of the dataset, rather than using a single split like the GCN and GCNII did on four datasets. Compared to using just one split, the random split approach yields diverse training and testing data, ensuring that the data distributions for training and testing are more varied to avoid the issue of achieving good performance on just one data split. This allows for a more fair and comprehensive comparison of the performance of different models.

Q5: Experiment section: It would be interesting to see the empirical study of the residual coefficients. E.g. how does the coefficient change with a) node degree, and b) the layer k?

A5: We conducted an empirical study using an 8-layer PSNR-GCN trained on the Cora dataset to obtain the best-performing model. We saved the mean and standard deviation of the learned residual coefficient distribution for each layer. Nodes were average divided into four groups based on their degree, and the average mean and standard deviation for each group across different layers are reported in the General Response Table1. From the table, we observe the following:

The mean residual coefficient increases with the number of layers, indicating that PSNR effectively retains high-order subgraph information.
The variance increases in some layers, suggesting that the increased randomness helps mitigate information loss in high-order subgraphs.
In the shallow layers, the mean does not show significant differences across node degrees. However, in deeper layers, nodes with higher degrees tend to have a lower mean, indicating that these nodes retain more initial information due to the higher likelihood of subgraph overlap, which increases their distinguishability.

All these observations align with our expectations, and demonstrate how the residual coefficients adapt with node degree and layer depth.

2024-08-11

Dear Reviewer yetc,

Thank you for your valuable feedback on our paper, such as adding more comparative baselines and exploring the variation patterns of residual coefficients. These suggestions have undoubtedly enhanced the quality of our paper. We understand that chasing down your reply is not our job and we do not intend to add any pressure on your busy schedule. However, as we are getting closer to the end of the discussion phase, we would really appreciate it if you could be so kind to let us know if we have properly addressed your comments and questions in the rebuttal and if anything can be further clarified.

Many thanks in advance!

Authors

2024-08-12

Dear Esteemed Reviewer yetc,

We sincerely appreciate the time and effort you dedicated to reviewing our paper. Your thoughtful questions and insightful feedback have been extremely beneficial. The comparisons with related work you mentioned, the clarification of the initial residual method, and the exploration of the residual coefficients have undoubtedly enhanced the quality of our work.

We understand that you have numerous commitments, and we truly appreciate the time you invest in our work. As the rebuttal phase is nearing its end, should there be any further points that require clarification, we would greatly appreciate it if you could let us know at your earliest convenience. Thank you once again for your invaluable contribution to our research.

Warm regards,

Authors

2024-08-14

Dear Esteemed Reviewer yetc,

We sincerely appreciate the time and effort you have dedicated to reviewing our paper. Your suggestions help us add heterophilic baselines for comparison, clarify the effectiveness of initial residuals in mitigating oversmoothing, and explore the variation patterns of residual coefficients. Undoubtedly, these improvements have significantly enhanced the quality of our paper.

We have carefully considered your feedback and provided point-by-point responses to address your concerns. We would greatly appreciate your feedback on whether our responses have satisfactorily resolved your concerns.

Once again, we genuinely thank you for your invaluable contribution to our paper. As the deadline is approaching, we eagerly await your post-rebuttal feedback.

Best regards,
The Authors

审稿意见

评分: 7置信度: 42024-07-07

This manuscript focuses on the issue of over-smoothing in Graph Neural Networks (GNNs), which occurs when increasing the number of layers causes node representations to become indistinguishable. The authors explore this problem from the perspective of overlapping neighborhood subgraphs and propose a novel Posterior-Sampling-based, Node-Adaptive Residual module (PSNR) to mitigate it. The paper demonstrates how PSNR integrates multiple orders of neighborhood subgraphs and achieves distinguishability and adaptability, overcoming the limitations of previous residual methods. Theoretical analyses and extensive experiments confirm the effectiveness of the PSNR module across various settings.

优点

The paper is well-written, making complex concepts and methodologies easy to understand.
The motivations and theoretical foundations of the proposed PSNR module are compellingly presented, enhancing the credibility of the research.
The experimental validation is extensive, covering diverse scenarios such as fully observed node classification, large graph datasets, and missing feature cases, demonstrating the robustness and scalability of the PSNR module.

缺点

What is the difference between the neighborhood subgraphs proposed in the paper and the subgraphs covered in other works?
The analysis of the cumulative product term in the formula seems to show a discrepancy of a -1 factor compared to the PSNR-GCN formula. Does this affect the conclusion?
There are three recent representative methods mentioned in section 5.4, namely DropMessage, Half-Hop, and DeProp. Why are their results not included in the missing feature setting?
What are the specific reasons and insights that guided the selection of GraphEncoder for the GraphEncoder?

问题

see weakness

局限性

see weakness

作者回复

2024-08-07

We thank the reviewer for the valuable feedback. We are glad that the reviewer appreciates the idea and technical contributions of our work. Below, we address the reviewer’s concerns one by one.

Q1: Relation to other subgraphs: What is the difference between the neighborhood subgraphs proposed in the paper and the subgraphs covered in other works?

A1: The neighborhood subgraphs used in our paper differ significantly from the subgraphs discussed in other works. Our neighborhood subgraphs represent different-order ego networks for a given node, primarily serving as the information domain for node representations. In contrast, other methods involving subgraphs, such as those used for graph classification tasks, often rely on sampling subgraphs from the graph to enhance representation capabilities. These methods address the limited expressive power of GNNs for graph classification by augmenting the representation through subgraph sampling strategies. We have detailed these differences and the context in the Appendix F, and we will include a reference to this discussion in the main text of the revised manuscript.

Q2: Details about Proposition 1: The analysis of the cumulative product term in the formula seems to show a discrepancy of a -1 factor compared to the PSNR-GCN formula. Does this affect the conclusion?

A2: We appreciate your attention to this detail. The discrepancy where the cumulative product term differs by a factor of -1 from the PSNR-GCN formula does not affect our conclusions. As discussed in the paper, our analysis focuses on the asymptotic behavior as the number of layers increases. This asymptotic analysis remains valid regardless of the -1 factor and does not impact the overall conclusions regarding the smoothing behavior.

Q3: Details on SSNC-MV: There are three recent representative methods mentioned in section 5.4, namely DropMessage, Half-Hop, and DeProp. Why are their results not included in the missing feature setting?

A3: Thank you for pointing this out. We have addressed this by including the results of the recent methods DropMessage, Half-Hop, and DeProp in the missing feature setting for the SSNC-MV experiment. The updated results are provided in the table below:

Table Re1: Recent methods performance under SSNC-MV setting.

	GCN Backbone			GAT Backbone
Method	Cora	Citeseer	Pubmed	Cora	Citeseer	Pubmed
DeProp	71.4 / 6	59.4 / 2	76.1 / 4	68.04 / 2	48.3 / 2	75.88 / 4
DropMessage	75.5 / 10	61.0 / 6	76.4 / 6	76.5 / 6	61.1 / 8	76.6 / 6
HalfHop	73.7 / 8	59.48 / 6	76.5 / 6	76.0 / 20	59.6 / 4	76.9 / 6
PSNR	77.3 / 20	61.1 / 15	77.0 / 30	77.9 / 8	61.9 / 15	77.3 / 10

As shown in the table, PSNR continues to achieve superior performance compared to these recent methods, even under the missing feature setting.

Q4: Choice of GraphEncoder (SAGE): What are the specific reasons and insights that guided the selection of SAGE for the GraphEncoder?

A4: The choice of SAGE for the Graph Encoder in our study was not driven by any specific design considerations. In practice, other choices, such as GAT and GCN, could also be used as Graph Encoders. To provide more insight, we have included results for different encoders on standard node classification task. The results are summarized in the table below:

Table Re2: Different GraphEncoder performance for SSNC task (layer 2).

Graph Encoder	Cora	Citeseer	CS	Photo	Chameleon	Squirrel
GCN	80.98±1.60	68.46±2.28	90.52±0.82	91.56±0.74	72.02±1.60	56.14±1.51
GAT	80.89±1.63	68.77±1.89	90.61±0.89	91.18±0.92	71.97±1.28	56.24±1.11
SAGE	80.59±1.57	68.06±2.12	91.23±1.00	91.44±0.82	71.51±1.90	54.95±1.73

Table Re3: Different GraphEncoder performance for SSNC task (layer 4).

Graph Encoder	Cora	Citeseer	CS	Photo	Chameleon	Squirrel
GCN	81.65±1.70	68.11±1.24	90.66±0.70	91.14±0.90	71.58±2.07	56.34±1.48
GAT	82.21±1.41	67.96±1.20	90.57±0.89	91.17±0.81	71.29±1.75	56.50±1.45
SAGE	81.01±1.63	66.03±1.93	90.70±1.49	91.20±1.03	70.74±2.24	54.13±1.41

The results indicate that each encoder has its own strengths and performs differently across various datasets, demonstrating superior performance compared to the baseline.

2024-08-11

Dear Reviewer Uv7R,

We understand that chasing down your reply is not our job and we do not intend to add any pressure on your busy schedule. However, as we are getting closer to the end of the discussion phase, we would really appreciate it if you could be so kind to let us know if we have properly addressed your comments and questions in the rebuttal and if anything can be further clarified.

Many thanks in advance!

Authors

评论- Official Comment of Submission7100 by Reviewer Uv7R

2024-08-11

Thank you for your thorough response to my comments, which has addressed my questions

2024-08-12

Dear Esteemed Reviewer Uv7R,

Thank you for your thoughtful and constructive feedback. We greatly appreciate the time and effort you have invested in reviewing our paper. Your insights have been instrumental in enhancing the quality of our work, and we are pleased that we could address the concerns you raised. Thank you once again for your invaluable contribution to our research.

Many thanks in advance!

Authors

审稿意见

评分: 6置信度: 52024-07-08

The authors revisit the problem of over-smoothing in graph neural networks from the perspective of overlapping neighbourhood subgraphs, and propose a node adaptive residual module based on a posteriori sampling to demonstrate the effectiveness of this method from both theoretical and experimental perspectives. In this paper, the adaptivity to different nodes is achieved by learning the residual coefficients of each node, which helps to capture the features and dependencies of the nodes more accurately.

优点

The problem of over-smoothing has important implications for graph neural networks
This paper is well organised in its entirety
The authors provide some theoretical proofs and code, which supports this paper well.

缺点

The paper needs further polishing, e.g. some formulas are numbered and some are not.
Although the authors provide a complexity analysis, I still have concerns about the efficiency of the algorithm. The authors should consider providing the computation time and number of parameters in Table 4.

问题

The technical description is too brief, which is not reader-friendly. For example, how is positional embedding implemented in the text? How is γ initialised?
What is the physical meaning of the residual coefficient? That is, can we interpret the residual coefficients as significance?
Why this a posteriori approach to residual coefficients is used?

局限性

The authors should consider further clarification of the dataset, e.g., whether the heterophily of the data may have an impact on this method.

作者回复

2024-08-07

We thank the reviewer for the valuable feedback. We are glad that the reviewer appreciates the idea and technical contributions of our work. Below, we address the reviewer’s concerns one by one.

Q1: The paper needs further polishing, e.g. some formulas are numbered and some are not.

A1: Thank you for pointing this out. Some formulas were not numbered as they were not referenced later in the paper. However, we understand the importance of consistency and will add numbering to all formulas, including those below L165, L171, L199, and L201.

Q2: Although the authors provide a complexity analysis, I still have concerns about the efficiency of the algorithm. The authors should consider providing the computation time and number of parameters in Table 4.

A2: We would like to thank the reivewer for this valuable suggestion. We have updated Table 4 to include the computation time and number of parameters for our algorithm. The following results were obtained using an A100 80G server, training each model for 500 epochs on the OGBn-Arxiv dataset. The table provides the training time per epoch in milliseconds, and the number of parameters for each method:

Table Re1: Training time and parameter count of different residual methods.

Method	Training Time (ms/epoch)	Parameter Count
GCN	30.1087	27,496
Dense	52.2488	85,096
GCNII	33.3330	27,496
JK	40.0566	48,040
Res	36.0624	27,496
PSNR	42.9386	31,723

As shown in the table, PSNR introduces a moderate increase in training time compared to other methods, primarily due to the additional graph convolution layer effectively doubling the model’s depth. However, this results in a relatively modest increase in parameter count and memory consumption. Despite the increased runtime, PSNR demonstrates significant improvements in model performance, highlighting its efficiency and effectiveness in improving model performance.

Q3: The technical description is too brief, which is not reader-friendly. For example, how is positional embedding implemented in the text? How is $\gamma$ initialised?

A3: Thank you for highlighting the need for a more detailed technical description. Here is a more comprehensive explanation:

Positional Embedding Implementation: The positional embedding in our method is inspired by the approach used in Transformer models. Specifically, the positional encoding is computed using the following formula: $PE_{(layer,2i)}=sin(layer/10000^{2i/d_{\mathrm{model}}})$ $PE_{(layer,2i+1)}=cos(layer/10000^{2i/d_{\mathrm{model}}}),$ where $layer$ represents the layer index, $i$ is the dimension index, and $d_{\mathrm{model}}$ is the dimension of the embedding vectors. This encoding helps incorporate layer-specific information into the model, allowing a single network layer to capture multi-layer residual coefficient distributions effectively.

Initialization of $\gamma$ : The parameter $\gamma$ is initialized to 0.1 and is treated as a learnable parameter throughout the training process. This initialization allows $\gamma$ to adjust dynamically as the model trains, optimizing its contribution to the model’s performance. We will include these details in the revised manuscript to ensure a clearer understanding of the technical aspects of our method.

Q4:What is the physical meaning of the residual coefficient? That is, can we interpret the residual coefficients as significance?

A4: The residual coefficient essentially represents the retention factor of features from previous layers, allowing for a trade-off between information from different orders of aggregation. In our work, from the perspective of subgraph aggregation, the residual coefficient can be interpreted as a summation coefficient associated with the results of aggregating features from different-order subgraphs. This means the residual coefficient modulates the influence of various subgraph aggregations, thus controlling how different levels of information are combined and preserved in the final representation.

Q5: Why this a posteriori approach to residual coefficients is used?

A5: As mentioned in L154, not all nodes can learn effective node-level coefficients during the training process. Therefore, we use a posteriori method to model the residual coefficients by learning the posterior distribution of these coefficients. This approach allows us to indirectly capture the effective node-level coefficients, enhancing the model's ability to handle varying levels of information and improve overall performance.

Q6: The authors should consider further clarification of the dataset, e.g., whether the heterophily of the data may have an impact on this method.

A6: Our paper primarily focuses on the issue of over-smoothing, and we did not specifically address the impact of heterophily on our method. However, we agree that it is important to consider how different task settings, including heterophily graphs, might affect our approach. Regarding heterophilic graphs, our method enhances the model's expressive power by accumulating multi-order subgraph information. Moreover, some literature[1] suggests a connection between heterophily and over-smoothing. Therefore, we believe that our method may be effective to some extent in heterophilic scenarios as well.

[1] Cristian Bodnar, et al. 'Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs', NeurIPS 2022

2024-08-11

Thank you for your response, I am glad to raise my score.

2024-08-12

Dear Esteemed Reviewer 61ZY,

Many thanks in advance!

Authors

2024-08-11

Dear Reviewer 61ZY ,

Many thanks in advance!

Authors

审稿意见

评分: 7置信度: 42024-07-09

This paper proposes a PSNR module to alleviate the over-smoothing problem faced by graph neural networks when the number of layers increases. The effectiveness of this method is demonstrated through both theoretical analysis and experimental results.

优点

The motivation is reasonable and the method is well-motivated.
The experiments conducted are comprehensive, utilizing widely adopted datasets.
The paper show the insight of how the residual method can alleviate the over-smoothing issue, which can further promote research and application of the residual method in the field of graph neural networks.

缺点

1.About the conclusion that subgraph overlapping causes oversmoothing， it does not seem to holds since transformer can access all the node without oversmoothing. 2.L104: "Considering nodes with high degrees tend to have larger neighborhood subgraph overlap," this conclusion lacks of detailed explanation. 3.Some minor typo: L86 oversmoothing

问题

Why is there no report of variance in the SSNC-MV experiment?
This method can be applied to traditional gnns. Is it orthogonal to other methods that alleviate oversmoothing, such as DropEdge?

局限性

The authors have addressed the limitations and potential societal impact.

作者回复

2024-08-07

We thank the reviewer for the valuable feedback. We are glad that the reviewer appreciates the idea and technical contributions of our work. Below, we address the reviewer’s concerns one by one.

Q1: The paper assumes that the posterior distribution of residual coefficients is Gaussian, but it is not clear if this assumption holds in all experimental settings. Further discussion on the choice of distribution would be beneficial.

A1: We chose the Gaussian distribution due to its commonality and widespread use in statistical modeling. The primary focus of our paper is on employing a graph posterior encoder to model the posterior distribution of node-level residual coefficients. The specific choice of distribution is not central to our main contribution. Our method is versatile and can accommodate any distribution that can be expressed in a reparameterized form, such as the beta distribution. We will include a more detailed discussion on the flexibility of our approach regarding the choice of distribution in the revised manuscript.

Q2: L104: "Considering nodes with high degrees tend to have larger neighborhood subgraph overlap," this conclusion lacks of detailed explanation.

A2: Thank you for pointing out the need for further clarification. Larger neighborhood subgraphs tend to have a higher degree of overlap than smaller neighborhood subgraphs. Take an extreme scenario as an example. As the layer number increases, several larger neighborhood subgraphs expand and cover the entire graph. In this case, these larger neighborhood subgraphs will have complete overlap, while smaller neighborhood subgraphs may be dispersed throughout different parts of the entire graph, resulting in a relatively lower degree of overlap and making it less likely for them to have complete overlap. In conclusion, Nodes with higher degree tend to have larger neighborhood subgraphs overlap.

Q3: Some minor typo: L86 oversmoothing.

A3: Thank you for pointing this out. We will correct the typo in the revised version of the manuscript.

Q4: Why is there no report of variance in the SSNC-MV experiment?

A4: In the SSNC-MV experiment, we used the results of GCN and other baselines directly from the original paper [1], which did not report variance. To maintain consistency with the original results, we also did not report variance in our experiments.

Q5: This method be applied to traditional gnns, Is it orthogonal to other methods that alleviate oversmoothing, such as dropedge?

A5: Indeed, our method is a general and plug-and-play module that is not in conflict with other methods designed to alleviate oversmoothing, such as DropEdge. It can be used in conjunction with these methods to enhance their performance. To validate this, we conducted experiments using DropEdge as an example and tested whether PSNR could further improve DropEdge's performance. We performed tests on the Cora dataset and reported the results in the table below.

Table Re1: To verify the orthogonality of PSNR and DropEdge.

Method	Cora	Citeseer	CS	Photo	Chameleon	Squirrel
DropEdge	74.31±5.98	58.48±3.42	85.17±0.96	80.37±2.27	44.04±1.57	33.51±0.93
PSNR+DropEdge	78.61±1.63	65.27±1.42	90.22±1.09	91.22±0.77	62.72±3.22	47.22±1.57

The results demonstrate that PSNR can indeed further enhance the performance of DropEdge.

[1] Wei Jin, et al. "Feature overcorrelation in deep graph neural networks: A new perspective". SIGKDD, 2022

2024-08-11

Dear Reviewer ZBJB,

Many thanks in advance!

Authors

2024-08-11

Thank you for the detailed feedback from the reviewers. My concerns have been well addressed, and I remain positive about the manuscript.

2024-08-12

Dear Esteemed Reviewer ZBJB,

Many thanks in advance!

Authors

作者回复

2024-08-07

General Response

We thank the reviewers for their insightful and constructive reviews of our manuscript. We are encouraged to hear that the reviewers found our motivation and theoretical proofs to be reasonable (Reviewers 61ZY, ZBJB, Uv7R) and appreciated the comprehensiveness and supportiveness of our experiments, which utilized widely adopted datasets and covered diverse scenarios (Reviewers ZBJB, Uv7R，61ZY). They highlighted the clarity and organization of the paper, making complex concepts easy to understand (Reviewers Uv7R, 61ZY, yetc), and acknowledged the insightful approach to addressing the over-smoothing problem, which has important implications for graph neural networks (Reviewers ZBJB, 61ZY). Additionally, the novelty of using node-level coefficients for residual connections and the robustness and scalability of our PSNR module were recognized (Reviewers Uv7R, yetc).

Based on the reviewers’ valuable feedback, we have conducted several additional experiments. Below, we address the issues pointed out by the reviewers and resolve any possible misunderstandings:

1.Clarifications:

Concept of Neighborhood Subgraphs: We provide a clearer definition and distinction of neighborhood subgraphs compared to other subgraph-based methods. (Reviewer Uv7R)
Impact of the Ignored Factor on Proposition 1: We discuss how the ignored factor affects the overall conclusions and provide a detailed explanation to ensure clarity. (Reviewer Uv7R)
Reason for Assuming the Distribution of the Residual Coefficient to be Gaussian: We justify our choice of Gaussian distribution for the residual coefficient and its implications. (Reviewer ZBJB)
Explanation of 'High Degrees Tend to Have Larger Neighborhood Subgraph Overlap': We elaborate on why high-degree nodes tend to have larger overlaps in their neighborhood subgraphs. (Reviewer ZBJB)
Detailed Implementation of Layer Embedding: We describe the technical details of how layer embedding is implemented in our model. (Reviewer 61ZY)
Possible Impact of Heterophily Graph on PSNR: We explore the potential effects of heterophilic graphs on the performance of PSNR and provide our insights. (Reviewer 61ZY)

2.New Experiment Results:

Additional Results for Three New Methods on the SSNC-MV Task: We include new comparative results to evaluate the performance of PSNR against recent methods. (Reviewer Uv7R)
Performance of PSNR with Different Residual Coefficient Encoders: We test PSNR with various residual coefficient encoders to demonstrate its flexibility and robustness. (Reviewer Uv7R)
Performance of PSNR with Other Methods Like DropEdge: We evaluate how PSNR performs when combined with other techniques such as DropEdge. (Reviewer ZBJB)
Comparison of Computation Time and Number of Parameters: We provide a detailed analysis of the computation time and parameter count for different models. (Reviewer 61ZY)
Analysis of the Learned Coefficient Distribution: We present an empirical study of the distribution of the learned coefficients across different layers and node degrees. (Reviewer 61ZY, yetc)
Comparison with Other State-of-the-Art Baselines, such as GPR-GNN and ACM-GNN: We benchmark PSNR against other leading methods to highlight its effectiveness and superiority. (Reviewer yetc)

3.Minor Issues: We corrected the typos and other minor flaws mentioned by all the reviewers.

Thank you for your time and effort in reviewing our manuscript. We sincerely appreciate your insightful comments, which will certainly enhance the quality of our article. If our response has effectively addressed your concerns and resolved any ambiguities, we respectfully hope that you consider raising the score. Should you have any further questions or need additional clarification, we would be delight to engage in further discussion.

最终决定Accept (poster)

2024-09-25

After discussion, the authors have addressed all the concerns raised by the reviewers, particularly those of the last reviewer who initially gave a negative score. It is suggested that when submitting the final revised submission, the authors can at least provide additional performance comparisons using conventional test methods. We recommend accepting this paper.