PaperHub
6.8
/10
Poster4 位审稿人
最低5最高8标准差1.1
8
7
7
5
4.0
置信度
正确性3.0
贡献度3.0
表达3.0
NeurIPS 2024

Unified Graph Augmentations for Generalized Contrastive Learning on Graphs

OpenReviewPDF
提交: 2024-05-12更新: 2024-11-06

摘要

关键词
Graph Neural NetworksGraph Contrastive LearningGraph Augmentation

评审与讨论

审稿意见
8

This paper investigates the generality of graph augmentations across various types of graphs and tasks. Firstly, the paper presents a new interpretation that unifies the various graph augmentations into local attribute modifications of each node from a basic message-passing perspective. Then, the paper develops a novel graph augmentation module that alters node attributes to simulate the augmentation effects. Utilizing this module, the paper presents a novel framework for graph contrastive learning, where the balance between consistency and diversity across different augmented views is carefully considered. Finally, it provides a theoretical understanding of the generality of the proposed framework. Extensive evaluations demonstrate its superior performance on diverse datasets and tasks.

优点

  1. This paper is well-organized and well-written.
  2. The motivation of unifying graph augmentations from a message-passing view is interesting.
  3. The proposed framework is simple and efficient with solid theoretical guarantee.
  4. Experiments across various types of graphs and tasks demonstrate the superior efficacy and efficiency of the proposed method in comparison to existing works.

缺点

  1. There are a few typos and grammar mistakes in the paper that need fixing.
  2. The theoretical analysis requires intuitive explanations. While the authors offer the proof of generality, they should elucidate how the theorem relates to the superior performance.
  3. The concepts of consistency and diversity deserve a more comprehensive explanation. Additionally, how do the definitions of consistency and diversity across augmentations in self-supervised learning scenarios differ from those in semi-supervised learning scenarios?
  4. The connection between the number of AC vectors and model performance is not clear.

问题

Please refer to Weaknesses.

局限性

N/A

作者回复

Q1. There are a few typos and grammar mistakes in the paper that need fixing.

R1. Thanks for your thoughtful reminder. We will polish our manuscript to elevate the quality and clarity.


Q2. While the authors offer the proof of generality, they should elucidate how the theorem relates to the superior performance.

R2. An intuitive explanation is that the generality improves the flexibility of the augmentation, which is the need for diverse graph tasks. Graphs and corresponding tasks possess diverse characteristics and, hence, need different augmentation strategies to meet their diverse and complicated requirements. However, existing graph augmentations, no matter whether heuristics or learnable ones are limited to selecting from discrete candidate sets, e.g., node or edge dropping and attribute masking; thus, some flexible and continuous augmentations can NOT be achievable to satisfy these diverse needs. Fortunately, the proposed GA module is endowed with theoretical versatility, enabling it to be equivalent to any flexible graph augmentations. Therefore, by equipping with this versatile module, the proposed model can be flexibly tailored to meet the diverse GA demands across different tasks. The versatility and flexibility have propelled our framework to achieve outstanding performance across various datasets and tasks, outperforming existing GCL baselines.


Q3. The concepts of consistency and diversity deserve a more comprehensive explanation.

R3. Consistency means that the augmented graph maintains the underlying structure and key features of the original graph. Therefore, GAs should minimally impact the similarity between the representations from different augmented graphs to preserve the intrinsic semantic integrity of samples. In contrast, diversity indicates that the augmented graphs originate from diverse distributions. Thus, another objective of GAs is to minimize the overlap between augmented graphs, ensuring the model does not become overly fixated on the specific features of a single distribution.


Q4. How do the definitions of consistency and diversity across augmentations in self-supervised learning scenarios differ from those in semi-supervised learning scenarios?

R4. The distinction in the concepts of consistency and diversity between self-supervised and semi-supervised learning scenarios primarily stems from the availability of supervision signals.

  1. Semi-supervised learning
  • Consistency: It is often defined as the extent to which augmented graphs align with the inherent patterns of the labeled data.

  • Diversity: It is often defined as the difference in distribution between the augmented and input graphs.

  1. Self-supervised learning
  • Consistency: It is often defined as the extent to which various augmented graphs preserve the essential characteristics of the input graph.

  • Diversity: It is often defined as the distributional differences among the various augmented graphs.


Q5. The connection between the number of AC vectors and model performance.

R5. We recognize your concerns about the selection of the kk, which is a key hyperparameter in the proposed GOUDA. The proposed framework sets kk as a small number, which is independent of the size of the graph. The submitted manuscript has presented an experimental assessment of the impact of varying kk on model performance, as shown in Figure 7, where kk is selected from the set {1, 5, 10, 20}. Our observations indicate that GOUDA is relatively insensitive to the hyperparameter kk as explained in Section 4.2. Therefore, one need not be overly concerned with the precise value of kk for GOUDA’s robustness to changes in kk.

评论

Thank you for the authors' detailed response. I am satisfied with the authors' response and revised manuscript, so I would like to increase my score from 7 to 8.

审稿意见
7

This paper explores the characteristics of local attribute modification in current graph contrastive learning methods. It then integrates diverse augmentation strategies with attribute learning into a unified framework. This unified approach introduces a novel and straightforward graph contrastive learning framework that establishes consistency between embeddings and ensures diversity through augmentation. Specifically, diversity is enforced using the HSIC term. The proposed framework offers two main advantages: 1) high efficiency and 2) universally learnable augmentation. Its effectiveness is validated across graph-level and node-level tasks, demonstrating its superiority.

优点

  • The exploration of graph contrastive learning mechanisms from a spatial perspective is insightful.

  • The unified augmentation proposed and the subsequent framework are both straightforward and efficient.

  • The use of shared AC and HSIC is technically robust.

  • The experimental evaluations provide convincing evidence.

缺点

  • The symbols \gets used in many equations, such as Eq. (4), should be changed to \to for convenience.

  • Some experiments require further explanation. It's unclear why the proposed framework demonstrates robustness to topology and attribute noises.

  • Some hyper-parameters remain unverified. It's uncertain whether the parameter ε\varepsilon in Eq. (7) significantly affects performance.

  • As far as I know, the HSIC term is computationally intensive. Does this affect efficiency?

  • Descriptions from Lines 143 to 153 are difficult to understand.

  • Reference [33] is based on adversarial training rather than adversarial attack.

  • Some commas are missing after equations, for instance, in Eq. (9).

问题

See above.

局限性

N/A

作者回复

Q1. The symbols \leftarrow used in many equations, such as Eq. (4), should be changed to \rightarrow for convenience.

R1. Following your advice, we will revise the equations to replace the symbols \leftarrow with the symbols \rightarrow to enhance readability.


Q2. It's unclear why the proposed framework demonstrates robustness to topology and attribute noises.

R2. Before we explain the robustness advantages of our proposed framework that employs learnable Graph Augmentation (GA) over the baseline models that utilize static and random GAs, we first describe the impact of noise attacks on graphs. Specifically, noise attacks can potentially destroy the semantic subgraphs crucial for predictions, thereby inducing distortions in the learned representations. However, random GAs struggle to preserve the information-rich subgraphs and may further destroy them, exacerbating semantic bias. In contrast, the proposed learnable GAs, trained with a relevant proxy task like contrastive loss, can preserve such subgraphs and, hence, can mitigate semantic bias.


Q3. It's uncertain whether the parameter ε\varepsilon in Eq. (7) significantly affects performance.

R3. According to your suggestion, we verify the impact of the hyper-parameter ε\varepsilon on model performance. ε\varepsilon stands for the threshold to suppress the elements in BB to zeros. Thus, to eliminate bias due to network size, ε\varepsilon is not freely tuned. Instead, it is set as the output of the selection function, i.e., ε=selection(B,s)\varepsilon = \text{selection}(B, s), which estimates this threshold. BB represents the matrix of propagation weights from AC vectors to nodes, and ss denotes the presence of the largest elements retained. ss is chosen from the set {0.2, 0.4, 0.6, 0.8} in the experiments. Thus, the impact of ss on the performance is shown in the following table. It can be observed that ss does not significantly affect the model performance.

GOUDA-IF0.20.40.60.8
Cora85.2984.7184.1986.11
CiteSeer74.5573.2073.1173.62
PubMed86.9687.2587.5587.05
GOUDA-BT0.20.40.60.8
Cora85.0784.5685.8885.99
CiteSeer74.3474.4773.9873.41
PubMed87.5986.9486.7687.11

Q4. As far as I know, the HSIC term is computationally intensive. Does this affect efficiency?

R4. There may be some misunderstandings. The HSIC term has a quadratic time complexity of O(N2)O(N^2) for the number of samples NN. However, this complexity is mitigated to O(K2)O(K^2) in the proposed framework since HSIC is applied only to KK AC vectors instead of the entire node set. The proposed framework sets KK as a small number, which is independent of the size of the graph. Therefore, the employment of the HSIC term does NOT cause high computation in the proposed framework.


Q5. Descriptions from Lines 143 to 153 are difficult to understand.

R5. We have carefully reviewed the content and have reorganized the passage for better clarity and comprehension. The passage now presents as follows.

(1) Edge augmentation is to add or remove the augmented edges, which corresponds to inserting or dropping the nodes connected by these edges in the neighborhood of the impacted nodes. The shown edge removal will lead to dropping certain 2-hop neighbors (nodes 1, 4, 7, and 8) of node 0 during the aggregation phase.

(2) Attribute augmentation is to replace the attributes of the impacted nodes with the altered attributes, which affects all nodes connected to them. The shown attribute augmentation can be seen as modifying the attributes of the augmented neighbors (nodes 1, 3, 4, 5, 6, 7, and 8) of node 0 during the aggregation phase.

(3) Subgraph augmentation is to modify the specific subsets of the input graph (including its edges and attributes), which also can be seen as the perturbation in the neighborhoods of impacted nodes. The shown subgraph augmentation will lead to the removal of nodes 2, 4, 6, and 8 from the 2-hop neighborhood of node 0 during the aggregation phase. Furthermore, node augmentation represents a specific case of subgraph augmentations, with the subset limited to a single node, thus rendering the aforementioned conclusion applicable to it.


Q6. Reference [33] is based on adversarial training rather than adversarial attack.

R6. Thank you for the meticulous correction. We will correct the methodology of Reference [33] to adversarial training.


Q7. Some commas are missing after equations, for instance, in Eq. (9).

R7. Thank you for your attention to detail. We will carefully check and comprehensively polish the manuscript to make sure that all formulas are followed by the correct punctuation.

评论

Thank you for the response, which solves most of my concerns, and I will maintain my score on this paper.

审稿意见
7

The paper reconsiders the formulation of graph augmentations in graph contrastive learning, introducing a novel perspective on GA through message passing. The proposed UGA framework interprets graph augmentations as mechanisms for aggregation and propagation between nodes, highlighting the significance of local aggregation and propagation within GA. The effectiveness of the proposed GOUDA framework is demonstrated in experimental evaluations.

优点

  • The formulation of GA from the perspective of message passing is intriguing. Construing GA as aggregation and propagation among neighbors adds significant flexibility.
  • The proposed GOUDA framework demonstrates effectiveness and robustness.
  • The paper exhibits superiority across various graph-based tasks, including node classification, node clustering, and graph classification. These results indicate that GOUDA is not limited to specific tasks.
  • The computational complexity is significantly reduced over a wide range.
  • Well-written.

缺点

  • The paper lacks presentation on the interpretability of learned graph augmentation vectors. Given that graph augmentation can occur at nodes, edges, attributes, and sub-graphs, an explanation of how these learned vectors are interpreted is necessary.
  • The paper should include a comparison of time consumption during experiments.

问题

  • Can the proposed GOUDA, especially the Graph Augmentation Vectors (GAVs), be integrated with other Graph Contrastive Learning (GCL) methods?
  • How to assess the contribution of the proposed GOUDA framework to Graph Contrastive Learning (GCL)?

局限性

The paper requires interpretability and should clarify the contribution to GCL.

作者回复

Q1. The paper lacks presentation on the interpretability of learned graph augmentation vectors.

R1. To provide interpretation for Graph Augmentation (GA), we would first introduce the interpretable explanations for GNNs (i.e., graph encoders) on graphs [1]. Within the encoder-relevant computation graph, i.e., the k-hop subgraph, a subgraph that is informative and most influential on the label is specified as an explanation. For a given node, the computation graph corresponds to its kk-hop neighborhood; for a given graph, it represents the entire graph. Based on this introduction, GAs can be interpreted as techniques that seek to preserve the information-rich subgraphs. Our GA method, which interpolates a batch of Augmentation-Centered (AC) vectors into the input graph to emulate the GAs' effect, is proposed based on an observation: GA is equivalent to modifying the node attributes within the computation graphs of nodes, that is, local attribute perturbation. During the weight optimization guided by an error metric, these AC vectors adaptively capture task-related perturbation information from the computation graphs of nodes. Therefore, the learned AC vectors can be interpreted as the representations of subgraphs in the computation graph of nodes.

[1] GNNExplainer: Generating Explanations for Graph Neural Networks. NeurIPS 2019


Q2. The paper should include a comparison of time consumption during experiments.

R2. We understand your concerns about the complexity of the proposed framework. We have conducted a comprehensive comparison of the running time consumption of GCLs, as shown in Figure 3 of our manuscript. The accuracy and the running time for each training epoch are provided in the following table for your review.

Accuracy(%) / Time(s)IMDB-BINARYIMDB-MULTI
Infograph73.03 / 0.8249.69 / 0.89
GraphCL71.14 / 0.4948.58 / 0.56
JOAO71.60 / 1.8849.20 / 1.79
AD-GCL71.49 / 1.3150.36 / 1.44
MVGRL74.20 / 1.1651.20 / 1.02
GOUDA-IF (Ours)75.22 / 0.4152.43 / 0.47
GOUDA-BT (Ours)76.80 / 0.4653.05 / 0.55

It is evident from the table that GOUDA not only outperforms the baselines but also consumes less time per epoch, demonstrating that GOUDA is an effective yet lightweight framework. The detailed analysis has been presented in Section 4.1 of our submitted manuscript.


Q3. Can the proposed GOUDA, especially the Graph Augmentation Vectors (GAVs), be integrated with other Graph Contrastive Learning (GCL) methods?

R3. Of course. As a general framework, GOUDA possesses high compatibility with most GCLs, enabling integration through slight modifications of the graph encoder or the loss function. In our submitted manuscript, GOUDA is implemented as two models: GOUDA-IF, which utilizes InfoNCE loss, and GOUDA-BT, which employs BarlowTwins loss. Extensive experiments (such as graph classification in the table of R2) on various datasets and tasks demonstrate the outstanding performances of GOUDA-IF and GOUDA-BT, highlighting the broad compatibility and effectiveness of the framework GOUDA.


Q4. How to assess the contribution of the proposed GOUDA framework to Graph Contrastive Learning (GCL)?

R4. We would like to illustrate the contribution of the proposed framework GOUDA to GCL from the following aspects:

  1. A novel perspective of graph augmentations. GOUDA is designed with a thorough analysis of the mechanisms of existing Graph Augmentations (GAs) in GCLs. We provide a unified interpretation of these mechanisms from a message-passing perspective, offering new insights into the GCL field.
  2. A general graph augmentation module. GOUDA presents a lightweight yet effective GA module, i.e., UGA, that achieves a theoretical unification of diverse GAs, providing an innovative GA strategy to GCLs.
  3. New SOTA. The implemented models, GOUDA-IF and GOUDA-BT, have achieved a new state-of-the-art performance on many tasks, including node/graph classification.
评论

Thank you for your response. My concerns have been addressed, so I have decided to maintain my original score.

审稿意见
5

The paper introduces GOUDA, a versatile framework for Graph Contrastive Learning that addresses the limitations of existing graph augmentation techniques. GOUDA proposes a unified graph augmentation module capable of simulating various explicit graph augmentations, enhancing the generality and efficiency of GCL models. By incorporating both widely-adopted contrastive losses and a novel independence loss, GOUDA ensures consistency and diversity across different augmentations.

优点

  1. The paper addresses a crucial problem in graph contrastive learning: the specificity, complexity, and incompleteness of current graph augmentation techniques.
  2. The paper conducts experiments on multiple datasets and compares the results with various baseline models.

缺点

  1. Some of the mathematical derivations and formulas are complex and may be difficult for readers to follow. Although the notations are defined, the paper could provide more intuitive explanations and step-by-step derivations to enhance understanding.
  2. While authors have compared with many GCL methods, the comparison with GA techniques like simple node augmentation is missing.

问题

  1. How to interpolate the learned AC vectors?
  2. How is k selected in line 181?
  3. Graph augmentation is based on the assumption that the augmentation does not change the original class label, how does UGA guarantee this?

局限性

N/A

作者回复

Q1. The paper could provide more intuitive explanations and step-by-step derivations to enhance understanding.

R1. Thanks for your valuable suggestion. We will further clarify the mathematical formulas with intuitive explanations in the revised manuscript. Besides, please refer to the Appendix for detailed derivations.


Q2. While authors have compared with many GCL methods, the comparison with GA techniques like simple node augmentation is missing.

R2. Simple node augmentation has been compared in our experiments since it is the augmentation strategy employed by many existing GCL methods, such as GraphCL [1]. Specifically, GraphCL employs random node dropping, a prevalent and simple node augmentation strategy, in the augmented graph construction. Following your suggestion, we will clarify this in the revised version.

Figure 3 shows these comparisons from both effectiveness and efficiency perspectives. For convenience of review, the total accuracy and the running time for each training epoch are given as follows.

Accuracy(%) / Time(s)IMDB-BINARYIMDB-MULTI
GraphCL71.14 / 0.4948.58 / 0.56
GOUDA-IF (Ours)75.22 / 0.4152.43 / 0.47
GOUDA-BT (Ours)76.80 / 0.4653.05 / 0.55

It is evident that GOUDA not only achieves superior performance but also possesses efficiency similar to GraphCL. This highlights that GOUDA is a lightweight yet highly effective framework.

[1] Graph Contrastive Learning with Augmentations. NeurIPS 2020


Q3. How to interpolate the learned AC vectors?

R3. The AC vectors are interpolated via a graph-based strategy. Besides the nodes in the original graph, AC vectors are treated as another type of nodes in the new graph. The additional connections (i.e., edges) from AC nodes to the original nodes are constructed according to the feature similarity, as formulated in Eq. 6. This interpolation strategy benefits both the reduction of the tuning parameters for graph augmentations and the flexibility by dynamically adjusting the varying importance of relationships.


Q4. How is k selected in line 181?

R4. We recognize your concerns about the selection of the kk, which is a key hyperparameter in the proposed GOUDA. The proposed framework sets kk as a small number, which is independent of the size of the graph. The submitted manuscript has presented an experimental assessment of the impact of varying kk on model performance, as shown in Figure 7, where kk is selected from the set {1, 5, 10, 20}. Our observations indicate that GOUDA is relatively insensitive to the hyperparameter kk as explained in Section 4.2. Therefore, one need not be overly concerned with the precise value of kk for GOUDA’s robustness to changes in kk.


Q5. Graph augmentation is based on the assumption that the augmentation does not change the original class label, how does UGA guarantee this?

R5. Note that NO data augmentation can guarantee semantic invariance in self-supervised learning since label information is unavailable. All the augmentation-based graph contrastive learning methods are based on the assumption of local smoothness in the space of graphs, and thus, slight changes between original and augmented graphs do not change the semantics, i.e., the label, of the graphs. Thus, the proposed UGA module, which unifies the four types of graph augmentations, has yet to guarantee the semantic invariance of augmentation. Fortunately, the learnable characteristic of UGA prevents the semantic shift from random operations in previous GAs. Thus, the invariance can be enhanced via the consistency constraints.

评论

Thanks for authors' rebuttal, I have adjusted my score based on the response.

评论

Thanks for your feedback. We hope that our response appropriately answers your questions. We are willing to provide further clarification if you have any additional concerns.

评论

Dear Reviewers,

The authors have provided comprehensive rebuttals and tried to address the concerns raised in your reviews. Please take the time to review their responses carefully. If you have any further questions or require additional clarification, please engage in a discussion with the authors. Thank you for your continued efforts.

AC

最终决定

This paper addresses potential limitations in existing Graph Contrastive Learning (GCL) frameworks, specifically the issues of specificity, complexity, and incompleteness of Graph Augmentation (GA) techniques. The authors introduce UGA, a method with theoretical guarantees, alongside GOUDA, a framework that can be integrated with various contrastive losses. Experimental results demonstrate the effectiveness of GOUDA in enhancing GCL performance.

The paper has received predominantly positive feedback both before and after the rebuttal period, with final scores of (8, 7, 7, 5). Please carefully consider the questions and concerns raised by the reviewers, particularly those from reviewer i3ih, when the authors preparing the final version of the paper, e.g., some implementation details and more comparisons.