PaperHub
4.3
/10
Rejected3 位审稿人
最低3最高5标准差0.9
5
5
3
4.7
置信度
ICLR 2024

Node Duplication Improves Cold-start Link Prediction

OpenReviewPDF
提交: 2023-09-19更新: 2024-02-11

摘要

关键词
Link PredictionGraph neural networksGraph AugmentationTail NodesCold-start Link Prediction

评审与讨论

审稿意见
5

This article proposes an augmentation technique to improve the performance of GNNs on low-degree nodes and observes that cold node LP performance typically suffers because they have too few connections resulting in underrepresentation in LP training. This article proposes the NODEDUP method, which utilizes a multi-view approach to improve the fitting of cold node nodes.

优点

  1. This method improves the performance of cold nodes without affecting warm nodes, which is a good improvement for the overall results.
  2. A multi-view method is proposed to enhance the fitting of cold nodes during model training. Through a large number of experiments, it is proved that this method can improve the performance of three kinds of nodes.
  3. NODEDUP can be easily combined with the GNNs model, which is very inspiring for the subsequent work.

缺点

  1. There are some spelling and grammar problems in the article, please pay attention to check.
  2. In the analysis of the same type of augmentation methods, the reasons for the performance improvement can be analyzed in depth.
  3. More case studies for low-degree nodes are needed. Why such augmentation is really meaningful but fitting.

问题

  1. Why such augmentation is really meaningful but fitting in this task?
  2. GNNs have a hard time achieving good results on low-degree nodes. This may not always be true. Justify it more clearly.
评论

Dear Reviewer 1iUU,

Thank you for your valuable feedback. Our detailed response is as follows:

W1: There are some spelling and grammar problems in the article, please pay attention to check.

Thank you for pointing this out. We have carefully checked the spelling and grammar in the updated submission.

W2: In the analysis of the same type of augmentation methods, the reasons for the performance improvement can be analyzed in depth.

As we demonstrated in Section 3.2, there are two perspectives contributing to the effectiveness of NodeDup.

  1. From the aggregation perspective, when the transformation weights for an anchor node and its neighbors are different, messages passed from neighbors will play different roles than those from the anchor node itself. With NodeDup, our model can leverage extra information from an additional "view." The existing view is when a node is regarded as the anchor node during message passing, whereas the additional view is when that node is regarded as one of its neighbors thanks to the duplicated node from NodeDup.
  2. From the supervision perspective, NodeDup adds edges that connect low-degree nodes to their duplicates. These additional edges serve as one of the few positive supervision signals for link prediction. More supervision signals for cold nodes lead to better quality embeddings.

From the results shown in Figure 2, we can observe that both these two perspectives play important roles in the performance improvements of NodeDup. The experimental results shown in Figure 5 further demonstrate the effectiveness of these two perspectives towards cold nodes because our methods outperform other augmentation methods in this setting.

W3: More case studies for low-degree nodes are needed. Why such augmentation is really meaningful but fitting.

Based on the results shown in Table 1, we believe our proposed methods are really meaningful but not overfitting to cold node settings because the improvements happen not only on cold nodes but also on warm nodes across all the datasets.

Q1: GNNs have a hard time achieving good results on low-degree nodes. This may not always be true. Justify it more clearly.

We acknowledge that this might not always hold true, but current research broadly supports the claim from two aspects:

  1. From the methodology, most GNNs follow a message-passing scheme, which updates each node's representations by iteratively aggregating its neighbors' information. Therefore, the performance of GNNs on each node will highly rely on whether it has enough high-quality neighbors.
  2. From the experimental results, GNNs tend to perform less effectively on cold nodes than on warm nodes, as illustrated in Figures 1 and 6. This trend is also supported by findings in previous studies [1,2,3].

To clarify any confusion, we have refined our wording in the revised submission.

Thanks again for your comments and diligence in reviewing our work. We hope our responses have addressed your concerns. If so, we hope that you will consider raising your score. If there are any notable points unaddressed, please let us know and we will be happy to respond.

[1]Towards locality-aware meta-learning of tail node embeddings on networks. CIKM 2020.
[2]Investigating and Mitigating Degree-Related Biases in Graph Convolutional Networks. CIKM 2020.
[3]Cold Brew: Distilling graph node representations with incomplete or missing neighborhoods. ICLR 2022.

评论

Dear Reviewer 1iUU,

As today is the final day of our discussion, we anticipate the opportunity to engage with you. Please let us know if our response has successfully addressed all of your concerns. If so, we would be deeply grateful if you consider raising your score. If not, we will happily answer any additional questions you may have. Thank you!

Best regards,
NodeDup authors

评论

Dear authors,

Thank you for your response and pardon my late reply. I appreciate the time and efforts you put on rebuttal. Most of my concerns have been addressed in the response.

评论

Dear Reviewer 1iUU,

Thank you for letting us know we have addressed most of your concerns. If there are any remaining concerns or questions, please do not hesitate to share them with us, and we will happily address them. Otherwise, we kindly request that you consider increasing the score, as the discussion stage is approaching its end.

Thanks again,
NodeDup authors

审稿意见
5

In this paper, the authors propose to improve cold-start link prediction performance by duplicating tail nodes in the graph. Besides, they present NodeDup (L) that reduces duplicated nodes by self-loops to speed up the learning process. The experiments on several benchmark datasets demonstrate that the proposed method can serve as a plug-and-play approach to improve conventional GNN methods in link prediction. The authors also conduct several experiments to show that the comparisons in effectiveness and efficiency with other cold-start and augmentation methods.

优点

  • S1: The proposed method is simple and easy to be applied to arbitrary methods.

  • S2: Compared to conventional cold-start methods, the proposed method achieves better performance in link prediction.

  • S3: The authors conducted several ablation studies to demonstrate the effectiveness and efficiency of the proposed method in different settings and scenarios.

缺点

  • W1: The whole idea of the paper is about self-augmentation while the addition of self-loops (i.e., NodeDup (L)) would be a nature form without increasing model parameters. However, augmentation and adding self-loops are not novel in either link prediction [a, b, c, d].

  • W2: The experimental settings are problematic. The authors randomly sampled nodes as negative references, but the setup is unrealistic while those references would almost be easy negatives. All of the remaining nodes should be considered as ranking candidates.

  • W3: Many performance metrics are inconsistent or contradict with existing studies.

  • W4: Large-scale datasets are not included in the experiments.

  • W5: Some minor writing flaws exist, such as inconsistent terms without citing each other like inductive and production settings.


[a] Cai, L., Yan, B., Mai, G., Janowicz, K., & Zhu, R. (2019, September). TransGCN: Coupling transformation assumptions with graph convolutional networks for link prediction. In Proceedings of the 10th international conference on knowledge capture (pp. 131-138).

[b] Wang, Z., Lei, Y., & Li, W. (2020). Neighborhood attention networks with adversarial learning for link prediction. IEEE Transactions on Neural Networks and Learning Systems, 32(8), 3653-3663.

[c] Yan, Z., Ma, T., Gao, L., Tang, Z., & Chen, C. (2021, July). Link prediction with persistent homology: An interactive view. In International conference on machine learning (pp. 11659-11669). PMLR.

[d] Chien, E., Chang, W. C., Hsieh, C. J., Yu, H. F., Zhang, J., Milenkovic, O., & Dhillon, I. S. Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction, ICLR 2022.

问题

  • Q1: The improvements reported in the paper are astonishing. I wonder if the authors conduct any significance test on the improvements and the corresponding confidence levels, especially for the isolated and low-degree cases.

  • Q2: Following Q1, it is surprising that all categories of nodes get benefited a lot after duplicating (or adding self-loops for) cold nodes.

  • Q3: Compared to Hits@10, Hits@1 could be more critical in the real-world applications, especially for tail nodes with very few neighbors. I wonder if the authors can also provide the Hits@1 performance.

  • Q4: Following W2, the authors should consider conducting a set of experiments using all remaining nodes as candidates for link prediction, thereby alleviating the bias on easy negatives and achieving more fair comparisons.

  • Q5: Following W3, I wonder why so many reported metrics are contradict with existing studies. For instance, Cold-brew even underperforms the original GSage across all metrics on Cora/Citeseer and most of metrics on other datasets in Table 1. As the authors do not use the identical setup and data partitions to previous studies, it should be better to have some elaboration on this part.

  • Q6: Following W4, in Appendix C, the authors claimed some reasons to not conduct experiments on some large-scale graphs. However, given the improvements on all node categories, I believe it is still worth to verify the performance on the large-scale datasets. Besides, IGB is not really large-scale while some datasets like ogbn-products and ogbn-papers100M have millions or handred millions of nodes.

伦理问题详情

N/A.

评论

Q1: The improvements reported in the paper are astonishing. I wonder if the authors conduct any significance test on the improvements and the corresponding confidence levels, especially for the isolated and low-degree cases.

We have conducted the significance evaluation for our results. The analysis yielded p-values less than 0.01 for all node groups, including isolated/low-degree/warm nodes, which indicates a high level of statistical significance. This low p-value suggests that the likelihood of observing such results by chance is very small, thereby reinforcing the validity of our findings. Additionally, our code is available in an anonymous GitHub repository, along with hyperparameters for reproducing all the reported results. This ensures the reproducibility of our work.

Q2: Compared to Hits@10, Hits@1 could be more critical in the real-world applications, especially for tail nodes with very few neighbors. I wonder if the authors can also provide the Hits@1 performance.

Thank you for your valuable suggestions. We have extended our experiments to include analyses on both Cora and Citeseer datasets evaluated by Hits@1, and the results are presented below. These results consistently reflect the relative performance trends observed across different methods, as shown in Table 1. Besides, we believe that evaluating performance using Hits@10 is a realistic and relevant metric, even for tail nodes, because we usually will not only recommend one item/user to each user in real-world applications [19]. Hits@10 is also a commonly-used evaluation metric for link predictions [14,15].

Hits@1Hits@10
GSageGSage+NodeDupGSageGSage+NodeDup
CoraIsolated8.41 ± 1.1713.66 ± 1.8132.20 ± 3.5844.27 ± 3.82
Warm25.87 ± 0.6328.51 ± 0.8059.45 ± 1.0961.98 ± 1.14
Low-degree22.28 ± 0.8723.21 ± 0.6161.14 ± 0.7859.07 ± 0.68
Overall22.44 ± 0.6324.29 ± 0.4758.31 ± 0.6858.92 ± 0.82
CiteseerIsolated20.99 ± 0.6729.77 ± 0.8147.13 ± 2.4357.54 ± 1.04
Low-degree34.25 ± 0.6342.34 ± 0.7061.88 ± 0.7975.50 ± 0.39
Warm34.36 ± 1.2035.90 ± 1.1371.45 ± 0.5274.68 ± 0.67
Overall31.81 ± 0.6636.84 ± 0.4263.77 ± 0.8371.73 ± 0.47

We hope that we have addressed your concerns, and that you will consider raising your score. If we have left any notable points of concern unaddressed, please let us know, and we will be happy to respond.

[1]Dropedge: Towards deep graph convolutional networks on node classification. ICLR 2020.
[2]Local augmentation for graph neural networks. ICML 2022.
[3]Tuneup: A training strategy for improving generalization of graph neural networks. arXiv 2022.
[4]Link prediction with persistent homology: An interactive view. ICML 2021.
[5]Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction. ICLR 2022.
[6]TransGCN: Coupling transformation assumptions with graph convolutional networks for link prediction. K-CAP 2019.
[7]Neighborhood attention networks with adversarial learning for link prediction. IEEE Transactions on Neural Networks and Learning Systems 2021.
[8]https://ogb.stanford.edu/docs/linkprop/#ogbl-citation2.
[9]Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. KDD 2008.
[10]Performance of Recommender Algorithms on Top-N Recommendation Tasks. RecSys 2010.
[11]Leveraging Meta-Path Based Context for Top-N Recommendation with A Neural Co-Attention Model. KDD 2018.
[12]Explainable Reasoning over Knowledge Graphs for Recommendation. AAAI 2019.
[13]Efficient Training on Very Large Corpora via Gramian Estimation. ICLR 2019.
[14]Mlpinit: Embarrassingly simple gnn training acceleration with mlp initialization. ICLR 2023.
[15]Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking. NeurIPS Benchmark Track 2023.
[16]Tail-GNN: Tail-Node Graph Neural Networks. KDD 2021.
[17]Cold Brew: Distilling graph node representations with incomplete or missing neighborhoods. ICLR 2022.
[18]GRAPHPATCHER: Mitigating Degree Bias for Graph Neural Networks via Test-time Augmentation. NeurIPS 2023.
[19]https://www.amazon.science/the-history-of-amazons-recommendation-algorithm.

评论

For Q1 and Q2, thanks the authors for conducting additional experiments to address the concerns.

I also notice that the citations numbers listed in the author comments cannot map to the description.

评论

I acknowledged that I have read all of the author responses. As some concerns have been addressed, especially about extra experiments, I have increased my score to 5 based on some remaining flaws in the novelty and experimental settings.

评论

W3: Many performance metrics are inconsistent or contradict with existing studies. I wonder why so many reported metrics are contradict with existing studies. For instance, Cold-brew even underperforms the original GSage across all metrics on Cora/Citeseer and most of metrics on other datasets in Table 1. As the authors do not use the identical setup and data partitions to previous studies, it should be better to have some elaboration on this part.

Sorry for the confusion. To clarify:

  1. For the evaluation metrics, link prediction tasks are also considered recommendation tasks. In this work, we employ Hits and MRR (Mean Reciprocal Rank) as our evaluation metrics, which is consistent with prior literature [14,15].
  2. For the experimental settings, we recognize that experimental setups vary across cold-start methodologies [16,17]. In pursuit of a fair comparison and to match real-world scenarios, we have adopted a random split approach for our datasets, splitting edges into training, validation, and testing sets. The degree of nodes is then defined based on this random split, which we believe could provide a realistic evaluation framework.
  3. It's important to note that the setup used in Cold Brew [17], which artificially creates isolated nodes by removing edges from the lowest 10% of nodes in the degree distribution, differs significantly from our experimental setting. We have meticulously fine-tuned Cold Brew for our experimental settings, however, it did not perform as effectively as in its original setting. [18] reports a similar observation in the context of node classification tasks while comparing with [17].

W4: Large-scale datasets are not included in the experiments. In Appendix C, the authors claimed some reasons to not conduct experiments on some large-scale graphs. However, given the improvements on all node categories, I believe it is still worth to verify the performance on the large-scale datasets.

Thank you for the suggestion and acknowledgment of the simplicity of our methods. As outlined in Section 3.1, our methods incur a minimal increase in time complexity compared to base GNNs, with the increase being linearly proportional to the number of cold nodes. This ensures scalability. Besides, the effectiveness of our method is also insensitive to dataset size. Given the time constraints, we only extended our experiments to the IGB-1M dataset, featuring 1 million nodes and 12 million edges. The findings, which we detail below, affirm the effectiveness of our methods in handling large-scale datasets, consistent with observations from smaller datasets. Thank you again for your valuable suggestion to enhance our experiments. We have incorporated this result into the revised submission.

GSageGSage+NodeDup
IGB-1MIsolated82.10 ± 0.0687.81 ± 0.40
Low-degree84.73 ± 0.0690.84 ± 0.03
Warm89.98 ± 0.0291.31 ± 0.02
Overall89.80 ± 0.0291.29 ± 0.02

W5: Some minor writing flaws exist, such as inconsistent terms without citing each other like inductive and production settings.

Thank you for pointing it out. We have changed all "production settings" to "inductive settings" to keep it consistent in the updated submission. We have also carefully checked the writing in the updated one.

评论

For W3, in this case, I would encourage the authors to compare methods in the same experimental setup, to verify there is no biases favoring the proposed method as mentioned in W2.

For W4, thanks the authors for spending the time on conducting the experiments on a larger scale dataset.

评论

Dear Reviewer pJyQ,

Thank you for your valuable feedback. Our detailed response to your concerns is as follows:

W1: The whole idea of the paper is about self-augmentation while the addition of self-loops (i.e., NodeDup (L)) would be a nature form without increasing model parameters. However, augmentation and adding self-loops are not novel in either link prediction. It is surprising that all categories of nodes get benefited a lot after duplicating (or adding self-loops for) cold nodes.

Thank you for directing our attention to these works. Although augmentation and the incorporation of self-loops are not new concepts in link prediction, our contributions to this field remain substantial:

  • In terms of augmentation methods, our methods, both NodeDup and NodeDup(L), stand out for their simplicity and efficiency, specifically tailored to enhance cold-start nodes' performance while existing augmentation methods [1,2,3,4,5] all focus on overall link prediction performance. As demonstrated in Figure 5, NodeDup and NodeDup(L) not only significantly boost performance for both Isolated and Low-degree nodes but also offer enhanced run-time efficiency compared to [1,2,3]. Unlike [4], which necessitates extracting topological features for each node pair, and [5], which requires pre-training of the text encoder to improve node features, our methods are more streamlined and less complex.
  • Regarding the addition of self-loops, our use in NodeDup(L) diverges from previous implementations [6,7] where self-loops were solely for aggregation. In our framework, these self-loops also provide critical supervision training signals for cold nodes, marking a notable differentiation.
  • Beyond methodological distinctions, our work is the first to connect the simple node duplication trick with the cold-start issue, and we have experimentally demonstrated the effectiveness of this approach. In addition, we also emphasize the importance of understanding the underlying mechanics of our approach. In Section 3.2, we conduct a comprehensive analysis to identify the core reasons driving the effectiveness of our methods. Furthermore, in Section 3.3, we investigate the interplay between NodeDup and self-distillation, offering deeper and more interesting insights to the academic community.

W2: The experimental settings are problematic. The authors randomly sampled nodes as negative references, but the setup is unrealistic, while those references would almost be easy negatives. All of the remaining nodes should be considered as ranking candidates. The authors should consider conducting a set of experiments using all remaining nodes as candidates for link prediction, thereby alleviating the bias on easy negatives and achieving more fair comparisons.

Thank you for highlighting this aspect. We agree that, ideally, utilizing all remaining nodes as candidates for link prediction would provide a comprehensive evaluation. However, this approach poses significant computational challenges, especially in the context of large-scale datasets. To address this, we follow the official task setting for OGB-Citation2 [1] to randomly sample negatives for each node. This methodology was suggested by Koren et al. [2], which advocates for the "sampling top-k" method in recommendation system evaluations. This approach has been widely adopted in numerous related studies [3,4,5,6] due to its balance between thoroughness and computational efficiency.

评论

For W1, unfortunately, almost all of the previous studies adding self-loop will intuitively use them as supervision signals, and intrinsically would solve the cold-start issues.

For W2, it would be an issue on the topic of cold-start tasks, especially for the self-loop based methods. With easy negatives and training with self-loops of cold-start nodes, it will have much higher tendency to make those negative easier. Besides the performance evaluation, the authors should also elaborate how the proposed method can be applied into real-world applications because the real-world candidates cannot be randomly-sampled.

评论

Dear Reviewer pJyQ,

Thank you for the time you spent reviewing our paper and the constructive feedback you've provided. We believe we have addressed your questions and concerns in detail. As today is the final day of our discussion, we anticipate the opportunity to engage with you. Please consider raising our score if we have addressed all of your questions/concerns. If not, we are happy to elaborate on our responses and answer any additional questions you may have!

Thanks again,
NodeDup authors

审稿意见
3

This paper presents an augmentation-based method (i.e., NodeDup) tailored for the isolated and low-degree nodes in the link prediction task. The proposed method can not only significantly improve the prediction performance on cold nodes, but also can maintain the overall prediction capability. By simply duplicating the cold nodes, the proposed method exhibits faster running speed than the previous methods based on augmentation. Comparing with the vanilla GNN models, the cold-start models, and the graph data augmentation models, the experimental results are sufficient to demonstrate the superiority of NodeDup. Additionally, NodeDup is a light plug-and-play method which is able to collaborate with almost all decoders and encoders for link prediction task.

优点

S1: The cold-start problem towards graph data is meaningful and challenging, whose difficulty mainly lies on the isolated and low-degree nodes. The proposed method NodeDup succeeds in improving the prediction capability on the cold node, without damage to the overall performance at meantime. S2: Extensive experiments are conducted. The ablation study on duplication times and nodes facilitates the understanding of when NodeDup is effective. S3: The compatibility of NodeDup is exceptional. The prediction capability consistently outperforms different types of baselines, when combines with GraphSAGE, GAT, and JKNet encoders.

缺点

W1: The major concern is the definition of cold nodes or long-tailed nodes. (1) After checking the paper several times, I am quite surprised to see that the threshold lambda is set to 2 over all datasets. Considering the difference of number of nodes/edges/communities/densities in different graphs, it would be much better if authors could carefully define and investigate the definition of code nodes. (2) I am wondering whether it is proper to define code nodes based on the 1-st order degree instead of high-order connectivity. For example, given two nodes: m has 2 connected neighbors but has 200 two-hop neighbors while n has 10 connected neighbors but only has 10 two-hop neighbors. Based on hop-wise message passing used in GNNs, I think node m would attend to more neighborhood information, while node n might be more cold. Hence, it would be much better if authors could explain more on the definition of cold nodes.
W2: Based on the claim in the introduction: they are under-represented in standard supervised LP training due to their few (if any) connections, a straightforward strategy is to change the distribution of node sampling to ensure more tail nodes could join the training process. In my humble opinion, as NODEDUP only duplicate code nodes and connected them to the original ones, it essentially only changes the distribution of batch sampling. I notice that in E.3, authors only present the performance compared with upsampling, while the discussions of experimental results are missing. It would be much better to clarify the reasons why NODEDUP is effective instead of numbers.
W3: Another concern is the baselines. First, the old-fashioned link prediction baselines are missing, including but not limited to common neighbors (CN), Jaccard, preferential attachment (PA), Adamic-Adar (AA), resource allocation (RA), Katz, PageRank (PR), and SimRank (SR). (2) Some SOTA models are also needed such as SEAL (Link Prediction Based on Graph Neural Networks). (3) The studied problem is similar to the degree-bias in GNNs. Please consider to add baselines such as `` On Generalized Degree Fairness in Graph Neural Networks’’.

问题

Q1: What is the key innovation of NodeDup? According to the manuscript, it seems that NodeDup is almost the standard node duplicate technique. More discussions about the existing node duplicate works are recommended. Q2: What is the main difference between NodeDup(L) and self-loop in GNN? During the experiment, does the GraphSAGE (and GAT, JKNet) baseline introduce self-loop process? If no, the authors may want to add more experimental explanations. If yes, why NodeDup(L) is more effective than vanilla GNN models? Q3: Why NodeDup(L) is more effective than NodeDup towards warm nodes in almost all scenarios? From my perspective, the two methods conduct the same operation on the warm nodes. The authors may want to add more discussions. Q4: In light of the similarity to self-distillation mentioned in section 3.3, the comparison between NodeDup and self-distillation technique is recommended.

评论

Q3: Why NodeDup(L) is more effective than NodeDup towards warm nodes in almost all scenarios? From my perspective, the two methods conduct the same operation on the warm nodes. The authors may want to add more discussions.

Thank you for pointing this out. We also observed the performance differences between NodeDup and NodeDup(L) upon cold/warm nodes and provided the analysis in Section 4.2. The biggest difference between NodeDup and NodeDup(L) is that NodeDup has the node duplication step. From our understanding, the impact of node duplication of NodeDup on the original graph structure likely affects the performance of warm nodes, which explains the better perfromance of NodeDup(L) in this setting. Besides, the node duplication of NodeDup also provides an extra view during the aggregation for the isolated nodes, which leads NodeDup is more effective than NodeDup(L) towards isolated nodes.

Q4: In light of the similarity to self-distillation mentioned in section 3.3, the comparison between NodeDup and self-distillation technique is recommended.

Thank you for the recommendation. We have conducted the experiments to compare the NodeDup and self-distillation on Citeseer. The results are shown in Figure 4. We can observe that NodeDup consistently outperforms self-distillation in all scenarios. As demonstrated in Section 3.3, unlike self-distillation, NodeDup offers valuable positive supervision training signals for cold nodes. Furthermore, the multi-view information aggregated through NodeDup potentially offers more valuable insights than the supplementary information from self-distillation, especially for cold nodes. This could also be a contributing factor to the observed performance difference.

In light of our answers to your concerns, we hope you consider raising your score. If you have any more concerns, please do not hesitate to ask and we'll be happy to respond.

[1]Towards locality-aware meta-learning of tail node embeddings on networks. CIKM 2020.
[2]Tail-GNN: Tail-Node Graph Neural Networks. KDD 2021.
[3]Cold Brew: Distilling graph node representations with incomplete or missing neighborhoods. ICLR 2022.
[4]On Generalized Degree Fairness in Graph Neural Networks. AAAI 2023.
[5]Link prediction based on graph neural networks. NeurIPS 2018.
[6]Graph Neural Networks for Link Prediction with Subgraph Sketching. ICLR 2023.

评论

Q1: What is the key innovation of NodeDup? According to the manuscript, it seems that NodeDup is almost the standard node duplicate technique. More discussions about the existing node duplicate works are recommended.

Our work, to the best of our knowledge, is the first to employ node duplication for link prediction tasks. This innovation manifests in two key aspects:

  1. Methodological Innovation: NodeDup is not conventional duplication - it selectively duplicates only the cold nodes, rather than all nodes. In addition to this duplication, NodeDup establishes connections between each cold node and its duplicate. This approach is crucial for two reasons: firstly, it offers additional viewpoint information during the aggregation process; secondly, it provides crucial supervision training signals for link prediction tasks.
  2. Analytical Contribution: Beyond the design of the method, our work is the first to connect the simple node duplication trick with the cold-start issue, and we have experimentally demonstrated the effectiveness of this approach. In addition, we offer an extensive analysis in Section 3.2, where we delve into two potential reasons underpinning the effectiveness of our proposed method. Moreover, we explore the relationship between NodeDup and self-distillation in Section 3.3, providing deeper insights to the research community.

Q2: What is the main difference between NodeDup(L) and self-loop in GNN? During the experiment, does the GraphSAGE (and GAT, JKNet) baseline introduce self-loop process? If no, the authors may want to add more experimental explanations. If yes, why NodeDup(L) is more effective than vanilla GNN models?

Apologies for the confusion. To clarify, GraphSAGE, GAT, and JKNet inherently incorporate self-loops as part of their foundational methodologies. Accordingly, in our experiments, we have also integrated self-loops into these baseline models. However, as we demonstrated at the end of Section 3.2, it's important to note that NodeDup(L) differs from traditional self-loops in two significant ways:

  1. For the aggregation, we take GraphSAGE as an illustrative example. In the PyG's official implementation, GraphSAGE would have separate weights W1\text{W}_1 and W2\text{W}_2 for self-representation and neighbor representations. NodeDup enhances this by adding an extra self representation into the neighbor information. This additional representation is assimilated through the weight matrix W2\text{W}_2, effectively providing an additional "view" or perspective of self representation, distinct from the conventional self-loop.
  2. For the supervision signal, different from the normal self-loops, the edges added by NodeDup(L) also serve as additional positive training samples for the cold nodes.
评论

W2: Why NODEDUP is effective instead of numbers?

As we demonstrated in Section 3.2, there are two perspectives contributing to the effectiveness of NodeDup.

  1. From the aggregation perspective, when the transformation weights for an anchor node and its neighbors are different, messages passed from neighbors will play different roles than those from the anchor node itself. With NodeDup, our model can leverage extra information from an additional "view". The existing view is when a node is regarded as the anchor node during message passing, whereas the additional view is when that node is regarded as one of its neighbors thanks to the duplicated node from NodeDup.
  2. From the supervision perspective, NodeDup adds edges that connect low-degree nodes to their duplicates. These additional edges serve as one of the few positive supervision signals for link prediction. More supervision signals for cold nodes lead to better quality embeddings.

From the results shown in Figure 2, we can observe that both these two perspectives play important roles in the performance improvements of NodeDup.

评论

Dear Reviewer exPD,

Thank you for your valuable feedback. Our detailed response to your concerns is as follows:

W1: Concerns about the definition of cold nodes or long-tailed nodes. (1) The threshold set to 2 over all datasets.

We apologize for any confusion regarding the definition of cold nodes.

  1. Our decision to set the threshold δ\delta at 2 is grounded in data-driven analysis, as illustrated in Figures 1 and 6. These figures reveal that nodes with degrees not exceeding 2 consistently perform below the average Hits@10 across all datasets, and higher than 2 will outperform the average results.
  2. Additionally, our choice aligns with methodologies in previous studies [1,2], where cold or long-tailed nodes are identified using a fixed threshold.
  3. We conduct further experiments with different threshold δ\delta on Cora and Citeseer datasets. The results are shown below. Our findings were consistent across different thresholds, with similar observations at δ\delta = 1, δ\delta = 2, and δ\delta = 3. This indicates that our method's effectiveness is not significantly impacted by changes in this threshold.

Thank you for your suggestion. We have incorporated this clarification into our paper to ensure greater clarity.

δ\delta = 1δ\delta = 2δ\delta = 3
GSageGSage+NodeDupGSageGSage+NodeDupGSageGSage+NodeDup
CoraIsolated31.34 ± 5.6042.20 ± 2.3032.20 ± 3.5844.27 ± 3.8231.95 ± 1.2643.17 ± 2.94
Low-degree53.98 ± 1.2057.99 ± 1.3459.45 ± 1.0961.98 ± 1.1459.64 ± 1.0162.68 ± 0.63
Warm61.68 ± 0.2961.17 ± 0.4361.14 ± 0.7859.07 ± 0.6861.03 ± 0.7959.91 ± 0.44
Overall58.01 ± 0.5759.16 ± 0.4458.31 ± 0.6858.92 ± 0.8258.08 ± 0.7459.99 ± 0.53
CiteseerIsolated47.25 ± 1.8256.49 ± 1.7247.13 ± 2.4357.54 ± 1.0447.31 ± 2.1756.90 ± 1.12
Low-degree54.10 ± 0.8571.09 ± 0.4761.88 ± 0.7975.50 ± 0.3962.97 ± 0.8375.45 ± 0.40
Warm72.41 ± 0.3574.57 ± 1.0471.45 ± 0.5274.68 ± 0.6773.57 ± 0.4675.02 ± 0.84
Overall64.27 ± 0.4570.53 ± 0.9163.77 ± 0.8371.73 ± 0.4764.05 ± 0.4271.80 ± 0.40

(2) Whether it is proper to define cold node based on the 1st order degree.

Thank you for asking about the definition of "cold nodes". We believe that defining cold nodes based on 1st-order degree is justified for several reasons:

  1. This approach aligns with established methodologies [1,2,3] in current research, where they define long-tail nodes based on their 1st-order degree.
  2. As evidenced in Figures 1 and 6, our experimental results highlight a performance gap among nodes when categorized based on their 1st-order degree. This observation provides a sound experimental setting for evaluation, reinforcing the validity of our approach.
  3. In the scenario provided by the reviewer, a node like mm, with few direct (1st-order) neighbors but a large number of two-hop (2nd-order) neighbors, may have its representation disproportionately influenced by the super-stars (high-degree neighbors). This could adversely affect its performance, despite it being ostensibly less 'cold' than a node like nn, which has a few 1st-order neighbors and also a few 2nd-order neighbors. This phenomenon further validates our choice to prioritize 1st-order connections in defining cold nodes.

That being said, we agree that exploring cold-start scenarios based on 2-hop neighbors can be a good followup work.

评论
CNAARADegFairGNNGsageGsage+NodeDup
CoraIsolated0.000.000.0018.70 ± 1.5332.20 ± 3.5844.27 ± 3.82
Low-degree20.3020.1420.1438.43 ± 0.1459.45 ± 1.0961.98 ± 1.14
Warm38.3338.9038.9042.49 ± 1.8261.14 ± 0.7859.07 ± 0.68
Overall25.2725.4925.4939.24 ± 1.1058.31 ± 0.6858.92 ± 0.82
CiteseerIsolated0.000.000.0015.50 ± 1.2747.13 ± 2.4357.54 ± 1.04
Low-degree26.8627.0027.0045.06 ± 0.9661.88 ± 0.7975.50 ± 0.39
Warm37.3039.0239.0255.47 ± 1.0871.45 ± 0.5274.68 ± 0.67
Overall30.8131.8531.8544.58 ± 1.0363.77 ± 0.8371.73 ± 0.47
CSIsolated0.000.000.0017.93 ± 1.3556.41 ± 1.6165.87 ± 1.70
Low-degree39.6039.6039.6049.83 ± 0.6875.95 ± 0.2581.12 ± 0.36
Warm72.7372.7472.7261.72 ± 0.3784.37 ± 0.4684.76 ± 0.41
Overall69.1069.1169.1060.20 ± 0.3783.33 ± 0.4284.23 ± 0.39
PhysicsIsolated0.000.000.0019.48 ± 2.9447.41 ± 1.3866.65 ± 0.95
Low-degree46.0846.0846.0847.63 ± 0.5279.31 ± 0.2884.04 ± 0.22
Warm85.4885.7485.7062.79 ± 0.8290.28 ± 0.2390.33 ± 0.05
Overall83.8784.1284.0962.13 ± 0.7689.76 ± 0.2290.03 ± 0.05
ComputersIsolated0.000.000.009.36 ± 1.819.32 ± 1.4419.62 ± 2.63
Low-degree28.3128.3128.3118.90 ± 0.8157.91 ± 0.9761.16 ± 0.92
Warm59.6763.5062.8431.44 ± 2.2566.87 ± 0.4768.10 ± 0.25
Overall59.2463.0362.3731.27 ± 2.2266.67 ± 0.4767.94 ± 0.25
PhotosIsolated0.000.000.0012.99 ± 1.519.25 ± 2.3117.84 ± 3.53
Low-degree28.4428.7828.7820.18 ± 0.2152.61 ± 0.8854.13 ± 1.58
Warm64.5367.2666.8842.72 ± 0.8967.64 ± 0.5568.68 ± 0.49
Overall63.9466.6466.2642.37 ± 0.8767.32 ± 0.5468.39 ± 0.48
IGB-100KIsolated0.000.000.0057.09 ± 21.0875.92 ± 0.5288.04 ± 0.20
Low-degree12.2612.2612.2659.45 ± 21.8479.38 ± 0.2388.98 ± 0.17
Warm30.6530.6530.6565.57 ± 20.4386.42 ± 0.2488.28 ± 0.20
Overall26.2226.2226.2264.16 ± 20.7084.77 ± 0.2188.39 ± 0.18
  1. Considering our methods are flexible to integrate with GNN-based link prediction structures, we conduct the experiments on top of SEAL[5] on the Cora and Citeseer datasets. The results are shown in the following table. We can observe that adding NodeDup on top of SEAL can consistently improve link prediction performance in the Isolated and Low-degree node settings on these two datasets.
SEALSEAL + NodeDup
CoraIsolated62.20 ± 1.0670.73 ± 0.61
Low-degree66.80 ± 2.8367.70 ± 4.11
Warm56.69 ± 2.3654.87 ± 1.61
Overall60.60 ± 2.3860.89 ± 2.36
CiteseerIsolated56.92 ± 5.5366.37 ± 1.01
Low-degree64.13 ± 2.5665.54 ± 1.69
Warm58.81 ± 3.2260.73 ± 2.75
Overall60.18 ± 2.9863.35 ± 1.43

Thank you again for pointing these baselines to strengthen our analysis. We have added all of these analysis in the updated submission.

评论

W3: Another concern is the baselines. First the old-fashined link prediction baselines are missing. Some SOTA models are also needed such as SEAL. The studied problem is similar to the degree-bias in GNNs. Consider "On Generalized Degree Fairness in Graph Neural Networks".

We sincerely appreciate your suggestions. We have added extra baselines to make our analysis more comprehensive.

  1. We further compare our methods with traditional link prediction baselines, such as common neighbors (CN), Adamic-Adar(AA), Resource allocation (RA). The results are shown below. We observe that GSage+NodeDup can consistently outperform these heuristic methods across all the datasets, with particularly significant improvements observed on isolated nodes.
  2. We have conducted the experiments to compare with the existing methods [2,3], which also tackle degree bias in GNNs. The results, presented in Table 1, consistently show our methods outperforming these approaches. We appreciate the suggestion of work, DegFairGNN [4], and have thoroughly investigated its application in the link prediction context. The results are shown below. Unfortunately, we've found that this approach is not well-suited for link prediction tasks for several reasons:
    • Designed for Node Classification: The method in [4] introduces a learnable debiasing function in the GNN architecture to produce fair representations for nodes, aiming for similar predictions for nodes within the same class, regardless of their degrees. However, this is designed specifically for node classification tasks. For example, the fairness loss, which ensures prediction distribution uniformity among low and high degree node groups, is not suitable for link prediction, because there is no defined node class in link prediction tasks.
    • Importance of Degree Trait in Link Prediction: The approach in [4] achieves significant performance in node classification tasks by effectively mitigating degree bias. However, in the context of link prediction, the degree trait is crucial. Applying [4] would compromise the model's ability to learn from structural information, such as isomorphism and common neighbors. This, in turn, would negatively impact link prediction performance, as evidenced by references [5,6].
评论

Dear Reviewer exPD,

As today is the final day of our discussion, we anticipate the opportunity to engage with you. If you have any remaining questions or concerns, please don't hesitate to share them with us. We will be happy to respond. If we have sufficiently alleviated your concerns, we hope you'll consider raising your score. Thank you!

Best regards,
NodeDup authors

AC 元评审

I recommend to reject this paper.

In this paper, the authors introduces a strategy to enhance cold-start link prediction by replicating tail nodes within the graph. The authors conduct multiple experiments to showcase comparative effectiveness and efficiency of the proposed method No deDup against alternative cold-start and augmentation methodologies.

All the reviewers appreciated the additional efforts made by the authors during rebuttal but still thought that there are still remaining concerns about the experimental settings such as a better way to collect negative nodes in the evaluation to mimic the real-world use cases and limited technical novelty of the proposed method. As a result, I would recommend to reject this paper and encourage the authors to take the reviewers' feedbacks to further revise the work.

为何不给更高分

N/A

为何不给更低分

N/A

最终决定

Reject