PaperHub
6.0
/10
Poster3 位审稿人
最低5最高7标准差0.8
6
7
5
3.3
置信度
正确性2.7
贡献度2.7
表达2.7
NeurIPS 2024

Exploring Consistency in Graph Representations: from Graph Kernels to Graph Neural Networks

OpenReviewPDF
提交: 2024-05-14更新: 2024-12-23

摘要

关键词
Graph Neural NetworksRepresentation ConsistencyGraph Classification

评审与讨论

审稿意见
6

This work aims to bridge the gap between neural network methods and kernel methods by enabling GNNs to consistently capture relational structures in their learned representations. The authors propose a loss function that enforces the similarity of graph representations to remain consistent across different layers. Extensive experiments demonstrate that the proposed consistency loss can significantly enhance the graph classification performance of various GNN backbones on different datasets.

优点

This work explores the connection between kernels and GNNs, and proposes a consistency criterion. It has a strong theoretical foundation.

缺点

1.Does the consistency loss effectively enhance the performance of various GNN backbones in the graph clustering task? 2.This algorithm does not seem suitable for application to large-scale datasets. 3. There are several typos and grammatical/spelling errors that should be corrected.

问题

Please see ‘Weaknesses’.

局限性

None

作者回复

We appreciate the reviewer's insightful question regarding the extension of our method to graph clustering tasks and large-scale graph datasets, since complexity and scalability are significant concerns in practical applications.

1.On Graph Clustering Tasks.

While applying our consistency loss to the graph clustering task is an interesting idea, it's important to note that our method is specifically designed for graph-level tasks rather than node-level tasks. This approach is grounded in graph kernels, which are predominantly used for graph-level applications. Thus, there are no theoretical guarantees for node-level tasks. Furthermore, unlike graph classification tasks where similarities are computed only within a batch of graphs, applying this approach to clustering requires calculating similarities between all nodes within a graph, which significantly increases computational costs. Nevertheless, we manage to apply our consistency loss to the Attention-driven Graph Clustering Network (AGCN) model[1] on the ACM dataset by leveraging a sampling strategy to compute similarities among the nodes. Following previous research[1,2], we evaluate the quality of clustering using four key metrics: Accuracy (ACC), Normalized Mutual Information (NMI), Average Rand Index (ARI), and macro F1-score (F1). Higher values for each metric signify better clustering performance. The results, presented in Table R11, demonstrate that applying our method to a clustering model leads to slight improvements. This suggests that our approach has the potential to enhance performance in graph clustering tasks.

Table R11: Graph clustering on ACM Dataset, evaluated in terms of accuracy/NMI/ARI/F1.

AGCNAGCN+Lconsistency\mathcal{L}_{\text{consistency}}
ACC0.899±0.0010.899\pm 0.0010.901±0.0040.901\pm0.004
NMI0.672±0.0040.672\pm0.0040.676±0.0100.676\pm0.010
ARI0.725±0.0030.725\pm0.0030.731±0.0100.731\pm0.010
F10.898±0.0010.898\pm0.0010.901±0.0040.901\pm0.004

2.On Large-Scale Datasets.

As described in our general response, the computational cost associated with our method scales linearly with the dataset size, which is small compared to the training time. Furthermore, we conduct additional experiments to run our model on a real-world large dataset, Reddit Threads dataset[3], which contains over 200,000 graphs. We present the results in Table R12.

Tabel R12. Graph classification performance on Reddit Threads dataset, measured in accuracy.

REDDIT ThreadsGINGMTGCNGraphSAGEGTransformerAverage improvements
w/o Lconsistency\mathcal{L}_{\text{consistency}}77.50±0.1677.50\pm0.1672.06±10.1572.06\pm10.1576.00±0.4476.00\pm0.4476.67±0.1176.67\pm0.1176.75±0.1276.75\pm0.12-
w Lconsistency\mathcal{L}_{\text{consistency}}77.64±0.0577.64\pm0.0577.19±0.1477.19\pm0.1477.12±0.1277.12\pm0.1277.57±0.0577.57\pm0.0577.14±0.0677.14\pm0.061.541.54

Table R12 demonstrates that our method remains effective on large datasets and achieves noticeable improvements across a range of backbone networks.

[1] Peng, Z., Liu, H., Jia, Y., & Hou, J. (2021). Attention-driven Graph Clustering Network. Proceedings of the 29th ACM International Conference on Multimedia.

[2]Bo, D., Wang, X., Shi, C., Zhu, M., Lu, E., & Cui, P. (2020). Structural Deep Clustering Network. Proceedings of The Web Conference 2020.

[3]Rozemberczki, B., Kiss, O., & Sarkar, R. (2020). An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. ArXiv, abs/2003.04819.

3.Typo.

We thank the reviewer for reminding us of this issue. We will address it in the next version.

评论

Thanks for the author's responses. Most of my concerns are addressed, therefore I improve my score slightly.

评论

Thank you for taking the time to read our rebuttal. We're glad that our response addressed most of your concerns and appreciate your updated score. Your insights have been invaluable in improving our submission. We'll be sure to incorporate your suggestions into our next version.

审稿意见
7

The authors propose to improve the quality of graph embeddings in GNNs by encouraging a n motion of consistency in the representation obtained at the various layers.

优点

The work presented here offers interesting contributions: 1) a new perspective on understanding the graph classification performance of GNNs by examining the similarity relationships captured across different layers; 2) a novel consistency principle in both kernel and GNNs with theoretical proofs explaining how this principle enhances performance; 3) empirical evaluation across diverse GNN model types.

缺点

The paper could be improved by:

  • devising an artificial experiments with graph datasets of increasing structural complexity and with an increasing label alphabet to show a link between the complexity of the task and the amount of improvement offered by the proposed technique

问题

Suggestions could include:

局限性

Yes.

作者回复

We appreciate the reviewer's question about artificial experiments, which provides an opportunity to explore how this method performs across different data scenarios. Should there be any remaining issues or suggestions for improvement, we welcome further feedback on how to refine our approach.

1.Increasing label alphabet.

To investigate the influence of an increasing label alphabet on the runtime and performance of our method, we generated subsets with an increasing number of classes from the REDDIT-MULTI-5K dataset, which originally comprises five classes. Specifically, we randomly sampled between 2 to 4 classes from the dataset and run GCN and GCN+Lconsistency\mathcal{L}_{\text{consistency}} on these subsets.

Table R7. Graph classification performance on subsets of REDDIT-MULTI-5K, measured in accuracy.

Subset1 (2 classes)Subset2 (3 classes)Subset3 (4 classes)Fullset
GCN79.5079.5067.1367.1350.3050.3053.8053.80
GCN+Lconsistency\mathcal{L}_{\text{consistency}}81.1081.1068.0068.0057.1557.1557.1257.12

The results in Table R7 show that the effectiveness of our method remains unaffected by the growing number of labels.

We further measure the runtime of the models on these subsets and present the results in Table R8. As shown in Table R8, the addtional time costs do not change much with the growing number of labels.

Table R8: Average training time per epoch on subsets with varying class complexity from REDDIT, measured in seconds.

Subset1(2 classes)Subset2(3 classes)Subset3(4 classes)Fullset
GCN0.2030.2030.3450.3450.4080.4080.4930.493
GCN +Lconsistency\mathcal{L}_{\text{consistency}}0.2270.2270.3550.3550.4300.4300.5570.557

2.Increasing structural complexity.

We evaluate the impact of structural complexity by dividing the IMDB-BINARY dataset into three subsets with increasing graph density. This density, denoted as d=2mn(n1)d=\frac{2 m}{n(n-1)}, where nn is the number of nodes and mm is the number of edges in graph GG, was employed as the criterion for creating these subsets. Specifically, the datasets were divided into: (small) for graphs with a density below the 33rd percentile, (median) for densities between the 33rd and 67th percentiles, and (large) for graphs with a density above the 67th percentile. We run GCN and GCN +Lconsistency\mathcal{L}_{\text{consistency}} models on these subsets and present the results in Table R9.

Table R9. Graph classification performance across subsets of varying structural complexity from IMDB-B. Performance is measured in accuracy.

IMDB-B(small)IMDB-B(medium)IMDB-B(large)
GCN77.58±4.1177.58±4.1166.25±5.3866.25±5.3867.61±6.2167.61±6.21
GCN +Lconsistency\mathcal{L}_{\text{consistency}}84.24±4.8584.24±4.8569.06±4.0669.06±4.0671.43±4.4371.43±4.43

Based on these results, we can conclude that with the Lconsistency\mathcal{L}_{\text{consistency}} loss function, the GCN model consistently outperforms the original version across varying levels of structural complexity in both datasets, underscoring the effectiveness of the proposed method.

We also conducted experiments to assess the training costs on datasets with different structural complexities when introducing the Lconsistency\mathcal{L}_{\text{consistency}} loss. The results are presented below.

Table R10. Average training time per epoch for subsets of varying structural complexity from IMDB-B, measured in seconds.

IMDB-B(small)IMDB-B(medium)IMDB-B(large)
GCN0.03080.03080.03110.03110.03210.0321
GCN +Lconsistency\mathcal{L}_{\text{consistency}}0.03710.03710.03780.03780.03920.0392

Given these results, we find that the additional training cost is minimal across datasets with different structural complexities, demonstrating its broad applicability.

For more analysis in time and space complexity of our method, please refer to the general responses.

3.Critical difference diagram.

While the suggestion to include critical difference diagrams is appreciated, these diagrams may not be suitable for our context. As shown in Table 1 and 2 (in the main paper), incorporating consistency loss generally enhances the performance of the baseline models. However, no clear ranking emerges across the different baselines, as each model excels in distinct datasets across various domains. Since the primary goal of this paper is not to determine the relative ranking of these models, such rankings would not yield meaningful insights.

审稿意见
5

In this paper, the authors introduce the shortcomings of graph neural networks (GNNs) in capturing consistency and similarity relationships between graphs, and proposes a new loss function, aiming to enhance graph representation at different levels. Through theoretical analysis and experimental verification, the author proved that consistency loss can significantly improve the performance of GNN in graph classification tasks.

优点

  1. The structure of the paper is clear and easy to follow.

  2. The proposed method seems reasonable and sound.

  3. The method is effective in comparison to base models.

缺点

  1. I have some concerns about the efficiency of the method. The authors give the limitation about additional computational costs, which is commendable. I suggest that the authors provide the complexity analysis, specific running times and memory usage to enable further evaluation of the paper.

  2. The authors mention "across a wide range of base models and datasets" in the Introduction section, but the datasets used in experiments only have two classes. To further verify the effectiveness of the method on a wide range of datasets, I suggest that the author conduct experiments on datasets with more classes.

问题

  1. If only the consistency of the first and last layers of the graph representation is enhanced to replace the consistency constraints of all layers, how much will the experimental performance drop? Will there be an improvement compared to the base model?

  2. Contrastive learning can also capture similarity between samples. Can the author give the differences and similarities between contrastive learning and the proposed method in the paper?

局限性

None

作者回复

We thank the reviewers for their insights on efficiency and the relation of the proposed method to contrastive learning. We aim for the responses to be clear in addressing the concerns and questions. If any concerns remain or if further improvements are deemed necessary, please let us know how we can further enhance the work.

1. Complexity Analysis.

We've included the time and space complexity analysis in the general response. The additional costs introduced by our proposed loss are minimal.

2. Concerns Regarding only using first/final layer.

Studying the case where consistency loss is applied only between the first and final layers could be beneficial, as it may further reduce the training time. We conduct additional experiments applying our consistency loss to the first and final layers of various backbone models, with the results presented in Table R3. The second to last column in Table R3 highlights the improvement of applying consistency loss to all layers, compared to the baseline models. The last column shows the improvement when consistency loss is applied only to the first and last layers. As shown, applying consistency loss only to the first and last layers achieves similar performance to applying it to all layers. This suggests our method can be further accelerated with little to no performance sacrifice.

Table R3. Classification performance on TU and OGB datasets for models with consistency loss applied to the first and last layers. The values represent average accuracy for TU datasets and ROC-AUC for the ogbg-molhiv dataset, along with their standard deviations. L_FL\mathcal{L}\_{FL} and L_ALL\mathcal{L}\_{ALL} denote the consistency loss applied to the first and last layers, and to all layers, respectively.

NCI1NCI109PROTEINSDDIMDB-Bogbg-molImprovements of LALL\mathcal{L}_{ALL} over base modelsImprovements of LFL\mathcal{L}_{FL} over base models
GCN+LFL\mathcal{L}_{FL}75.96±0.8975.96 \pm 0.8974.67±1.1174.67 \pm 1.1172.97±2.8572.97 \pm 2.8576.27±1.6976.27 \pm 1.6974.60±1.8574.60 \pm 1.8574.44±1.4274.44 \pm 1.42+5.49+5.49+7.08+7.08
GIN+LFL\mathcal{L}_{FL}79.08±1.2179.08 \pm 1.2177.00±2.0177.00 \pm 2.0173.15±2.7673.15 \pm 2.7674.07±1.3874.07 \pm 1.3874.80±4.6674.80 \pm 4.6674.20±1.6274.20 \pm 1.62+10.95+10.95+15.12+15.12
GraphSAGE+LFL\mathcal{L}_{FL}78.88±2.0178.88 \pm 2.0174.24±1.2174.24 \pm 1.2175.32±2.4675.32 \pm 2.4673.9±2.0373.9 \pm 2.0376.6±1.9676.6 \pm 1.9680.06±1.2180.06 \pm 1.21+9.10+9.10+9.71+9.71
GTransformer+LFL\mathcal{L}_{FL}76.79±1.2476.79 \pm 1.2474.38±0.4974.38 \pm 0.4973.69±2.0973.69 \pm 2.0975.08±1.5775.08 \pm 1.5776.8±1.6076.8 \pm 1.6080.53±0.7380.53 \pm 0.73+9.00+9.00+8.63+8.63
GMT+LFL\mathcal{L}_{FL}76.4±1.7176.4 \pm 1.7175.64±0.7775.64 \pm 0.7772.25±3.9672.25 \pm 3.9673.39±2.1873.39 \pm 2.1876.6±1.3676.6 \pm 1.3681.05±1.2981.05 \pm 1.29+6.23+6.23+5.24+5.24

3. Similarity/Difference with Contrastive learning.

This is an interesting question. While both methods assess similarity, our approach emphasizes the consistency across layers rather than merely capturing similarities, as contrastive learning does. To validate this, we applied the GraphCL[1] contrastive learning technique to the GCN model (denoted as GCN+CL) and evaluated its performance and similarity consistency across various datasets. The results, shown in Tables R4 and R5, use accuracy for classification performance and Spearman rank correlation (Section 5.3) for assessing similarity consistency across layers. The last columns present the average decrease in accuracy and rank correlation for GCN+CL compared to GCN+Lconsistency\mathcal{L}_{\text{consistency}}.

Table R4. Graph classification accuracy of GCN with contrastive learning applied across various datasets.

NCI1NCI109PROTEINSDDIMDB-BAverage decrease vs. GCN+Lconsistency\mathcal{L}_{\text{consistency}}
GCN+CL74.06±1.9174.06 \pm 1.9173.14±1.9073.14 \pm 1.9072.50±2.7372.50 \pm 2.7375.80±2.0975.80 \pm 2.0975.80±1.9075.80 \pm 1.901.311.31

Table R5. Spearman rank correlation for graph representations from consecutive layers.

NCI1NCI109PROTEINSDDIMDB-BAverage decrease vs. GCN+Lconsistency\mathcal{L}_{\text{consistency}}
GCN+CL0.8350.8350.7170.7170.8510.8510.7170.7170.8100.8100.1270.127

Our method consistently outperforms GCN+CL in both graph classification and improving similarity consistency, highlighting the key differences between the two approaches.

[1]Y. You et al., Graph Contrastive Learning with Augmentations, NeurIPS, 2020

4. Multi-classification.

To demonstrate the effectiveness of our method across diverse datasets, we applied it to REDDIT-MULTI-5K, a 5-class classification dataset. The results, presented in Table R6, show that our method consistently achieves improvements in this task.

Table R6. Graph classification performance on REDDIT-MULTI-5K, measured in accuracy.

REDDIT(5K)GINGMTGCNGraphSAGEGTransformer
Baseline54.06±2.1054.06±2.1051.04±1.4151.04\pm 1.4153.80±0.7853.80 \pm 0.7858.42±1.2658.42 \pm 1.2650.84±2.1850.84 \pm 2.18
+Lconsistency\mathcal{L}_{\text{consistency}}55.12±1.1755.12±1.1754.88±1.3354.88 \pm 1.3357.12±1.4757.12 \pm 1.4758.38±1.5958.38 \pm 1.5952.24±1.2352.24 \pm 1.23
评论

We apologize for a mistake in our rebuttal: The second-to-last column in Table R3 should be titled Improvements of LFL\mathcal{L}_{FL} over base models, showing the improvement from applying the loss to the first and last layers. The last column should be titled Improvements of LALL\mathcal{L}_{ALL} over base models, indicating the impact of applying the loss to all layers.

评论

Thank you for taking the time to address my concerns. Just adding a multi-classification dataset does not allay my concerns. In addition, I think that the method of constraining the similarity relationship of the graph layer by layer is not novel enough. I will maintain my review score.

评论

Dear Reviewer 2j7s:

Thank you for your valuable feedback. We would like to emphasize the novelty of our work. To the best of our knowledge, this is the first study to investigate similarity consistency within the context of graph classification. Furthermore, we provide a theoretical analysis that connects our approach to graph kernels and demonstrates the effectiveness of our proposed consistency loss. We are not aware of any prior research that addresses this topic, so if you know of any related work, we would greatly appreciate your insights.

Additionally, we have conducted as many experiments as possible during the rebuttal period, including a complexity analysis (in our general response), experiments demonstrating the effectiveness of adding the consistency loss only between the first and last layers, and a comparison with a contrastive learning method, which is fundamentally different from our approach. If you believe that adding one multi-class dataset is insufficient, would adding more multi-class datasets change your mind? If so, we will do our best to conduct additional experiments in the remaining two days.

评论

Yes, if the authors add more multi-class datasets, I will increase my rating.

评论

Thank you for your feedback on the multi-class datasets. In response, we have included 3 additional multi-class datasets and our proposed loss function continues to consistently outperform the baseline models.

On COIL-RAG Dataset(100-Class)

COIL-RAGGCNGINGraphSAGEGTransformerGMT
Baseline91.72±1.6593.33±1.4889.56±2.3783.74±3.1790.85±1.91
+Lconsistency\mathcal{L}_{\text{consistency}}93.38±1.6494.03±1.3392.31±1.3291.67±1.8892.00±1.43

On COLLAB Dataset(3-Class)

COLLABGINGMTGCNGraphSAGEGTransformer
Baseline79.84±1.0580.36±1.1581.72±0.8478.92±1.2080.36±0.56
+Lconsistency\mathcal{L}_{\text{consistency}}84.16±0.8182.80±0.6183.44±0.4582.12±0.7880.48±0.47

On IMDB-MULTI dataset(3-Class)

IMDB-MULTIGINGMTGCNGraphSAGEGTransformer
Baseline52.13±1.4254.13±2.9055.07±1.2451.33±2.9553.33±1.12
+Lconsistency\mathcal{L}_{\text{consistency}}53.46±2.4454.80±1.4256.27±1.0054.27±1.2456.53±1.54
评论

I hope our discussions can be included in the revised version.

评论

Thank you for raising your score. We will include additional experiments in the next version as suggested.

作者回复

We sincerely thank all reviewers for recognizing the novelty and theoretical contributions of our paper and appreciate the valuable feedback that has significantly enhanced our work. We hope our responses are informative and helpful. Further feedback on any remaining points of concern and suggestions for improvement would be gratefully received. We will begin by addressing the general question regarding our model's time and space complexity, followed by detailed responses to the specific questions raised by each reviewer.

1. Time Complexity.

We present the time complexity analysis for our proposed consistency loss. The loss computation involves the compuation of pairwise similarities of graphs in a batch, resulting in a computation complexity of O(batchsizebatchsize12)=O(batchsize2)O\left(\text{batchsize} \cdot \frac{\text{batchsize}-1}{2}\right) = O(\text{batchsize}^2). Given that there are datasetsizebatchsize\frac{\text{datasetsize}}{\text{batchsize}} batches in each training epoch and that the similarities are computed between consecutive layers, the total complexity is: O(loss)=O({batchsize2×( layernum 1)×datasetsize batchsize })=O( datasetsize×batchsize×layernum ). O(\text{loss})=O(\{\text{batchsize}^2 \times(\text{ layernum }-1) \times\frac{ \text{datasetsize }}{\text{batchsize }}\})=O(\text{ datasetsize} \times \text{batchsize} \times \text{layernum }).This analysis shows that the time required to compute consistency loss scales linearly with dataset size, batch size, and the number of layers. It is important to note that the training time for baseline models also scales linearly with dataset size.

Since batch size and the number of layers are generally small compared to dataset size, our experiments primarily focus on how dataset size affects training time. We evaluate the training time of several models—GCN, GIN, GraphSAGE, GTransformer, and GMT—each enhanced with our consistency loss. This evaluation is conducted on different subsets of the ogbg-molhiv dataset, with subset sizes adjusted by varying the sampling rates. The training time, measured in seconds, are presented in Figure R1 (see the attached PDF). As shown, our findings confirm that training time increases linearly with dataset size, indicating that our method maintains training efficiency comparable to baselines without adding significant time burdens.

Furthermore, we empirically measure the training time for both the baseline models and our proposed methods. Each model comprises three layers and is trained on the ogbg-molhiv dataset (40,000+ graphs) for 100 epochs. We calculate the average training time per epoch in seconds and present the results in Table R1. Table R1 shows that while the inclusion of the consistency loss slightly increases the training time, the impact is minimal.

Table R1: Average training time per epoch for different models on the ogbg-molhiv dataset, measured in seconds.

GMTGTransformerGINGCNGraphSAGE
w/o Lconsistency\mathcal{L}_{\text{consistency}}8.3808.3804.9374.9374.3184.3184.2214.2213.9523.952
w Lconsistency\mathcal{L}_{\text{consistency}}8.8618.8616.3586.3585.5295.5295.3825.3825.2525.252

2. Space Complexity.

Next, we present the space complexity analysis for our consistency loss. At each iteration, the loss function requires storing two pairwise similarity matrices corresponding to two consecutive layers, which is given by:

O(\text { loss })=O(\text {batchsize}^2 )$$ Since we use stochastic gradient descent, similarity matrices are not retained for the next iteration. The consistency loss requires significantly less space than node embeddings, making the additional space requirement minimal. Table R2 shows the peak memory usage in megabytes (MB) for different models when training on the ogbg-molhiv dataset. Table R2 shows that the space costs are negligible. Table R2. Peak memory usage for different models on the ogbg-molhiv dataset, measured in megabytes. | | GMT | GTransformer | GIN | GCN | GraphSAGE | | :--------------------------------------: | :----: | :----------: | :----: | :----: | :-------: | | w/o $\mathcal{L}_{\text{consistency}}$ | $1334.0$ | $1267.8$ | $1291.3$ | $1274.2$ | $1288.4$ | | w $\mathcal{L}_{\text{consistency}}$ | $1370.0$ | $1330.6$ | $1338.9$ | $1320.1$ | $1321.3$ | | Cost Increase (%) | $2.70$ | $4.96$ | $3.68$ | $3.60$ | $2.55$ |
评论

Dear reviewers and author(s),

I am very much looking forward to a fruitful discussion!

Dear author(s),

Thanks you very much for your submission and rebuttal.

Dear reviewers,

Thank you very much for your reviews! The authors have carefully responded to your criticisms and concerns, please have a look at all reviews and the rebuttal.

Does anything there change your opinion?

Best,

your AC

评论

Dear reviewers:

We would like to kindly remind you that author-reviewer discussion will end in 2 days. Could you take a look at our rebuttal and let us know if our rebuttal has addressed your concerns? As our paper is currently on the borderline, your input will be greatly appreciated.

Best

最终决定

After the rebuttal, the reviewers unanimously recommend to accept the paper. They highlight the strong theoretical foundation, a novel consistency principle, and the thorough empirical evaluation.

公开评论

This is very nice work! Could you comment on the following observation? Theorem 3.4 gives a statement that should hold for iterative kernels that are (i) monotonically decreasing and (ii) preserve order consistency. However, it appears that the second property (order consistency) is not relevant to make this statement. The proof does not refer to this property and also does not seem to implicitly rely on it in any way. Could this assumption be removed to obtain a stronger theorem? This would slightly weaken the motivation for order consistency, in my opinion.

公开评论

Thank you for your insightful comment. Indeed, in our initial version, we demonstrated that the monotonic decrease principle alone is sufficient to prove the increased decision margin on the training data. We have now explored the effects of combining both principles and found out that they can provide guarantees on the test data.

As a result, we updated our Theorem 3.4 and provided the new proof in the latest version available on arXiv. We will incorporate these updates in the final camera-ready version. Due to the publication delay on arXiv,we have attached a link to the revised document for your reference.

Best regards,

Authors