5.8

/10

Poster4 位审稿人

最低5最高6标准差0.4

3.5

置信度

正确性2.8

贡献度2.8

表达2.8

NeurIPS 2024

HGDL: Heterogeneous Graph Label Distribution Learning

Yufei Jin,Heng Lian,Yi He,Xingquan Zhu

OpenReview PDF

提交: 2024-05-15更新: 2025-01-08

摘要

关键词

Heterogeneous GraphGraph Representation LearningLabel Distribution Learning

评审与讨论

审稿意见

评分: 5置信度: 42024-07-06

This paper studies heterogeneous graph label distribution learning with the aim of predicting label distributions of unlabeled nodes in a heterogeneous graph. This paper elaborates the challenges for generalizing LDL into networked data, and proposes an LDL algorithm HGDL to overcome the challenges. Besides, this paper derives the PAC-Bayes error bound for HGDL and conducts experiments to show the superiority of the proposal.

优点

This paper studies a new problem, i.e., label distribution learning in heterogeneous graphs. Besides, this paper proposes an end-to-end HGDL learning approach to jointly learn the optimal meta-path graph topology and node representation. The effectiveness of the proposed method is studied theoretically and empirically.

缺点

The contribution of this paper is unclear. This paper attempts to combine heterogeneous graph learning with label distribution learning. However, the paper addresses challenges about the topology, heterogeneity, and inconsistency in terms of instances, and pays little attention to the challenges of learning label distributions of instances, i.e., the proposed challenges do not arise from label distributions. Therefore, the contributions for LDL are not clear.

问题

Why can't we just replace the output layer of existing heterogeneous node classification model with MSE (Mean Squared Error) or KL loss to learn the label distribution? What challenges arise when replacing categorical labels with label distributions in HGDL? And which of these problems does this paper address? What is the contribution of this paper to the field of label distribution learning?
Why use PAC-Bayes theory to analyze the error bound instead of PAC theory?

局限性

I don't believe that the paper has a potential negative social impact.

作者回复

2024-08-07

We thank the reviewer for the constructive comments. We address the reviewer's concerns below one by one in a Q&A fashion.

Q1: Why can't we just replace the output layer of existing heterogeneous node classification model with MSE (Mean Squared Error) or KL loss to learn the label distribution?

A1: Simply replacing the output layer with MSE or KL loss to learn the label distribution, like the reviewer has suggested, will not work well due to two prominent challenges imposed by the network heterogeneity. First, in heterogeneous graph, the label distribution of each node is influenced by their neighboring nodes that can vary through multiple factors, including their different types of nodes and edges, nodal contents, and topological features. This complicates the message-passing mechanism, because aligning those varying factors would result in a combinatorial issue. Second, nodes sharing similar contents may be frequently positioned far apart in heterogeneous graphs, separated by nodes of other types, resulting in substantial topological distances between them. During message-passing, the impact of distantly positioned nodes within a graph is substantially diminished, consequently steering the LDL model to prioritize individual nodal vectors, overlooking graph topology.

Our LDL research targets to resolve the heterogeneous graph challenges with two building blocks. First, we propose to use multiple weighted meta-paths in addition to regular KL divergence loss for class distribution learning. The weights are learned, allowing each meta-path to individualize its message-passing paths without being negatively impacted by the different node and edge types. Second, HGDL uses a consistency-aware graph transformer architecture to harmonize local topology and global feature information. The graph transformer aligns nodal features with the learned optimal topology through weighted meta-paths, capturing both local neighborhood information and global feature similarities. This harmonization is crucial for ensuring that nodes with similar features, even if topologically distant, are represented in a way that reflects their content-based similarities.

Q2: What is the contribution of this paper to the LDL research? A2: From the label distribution learning perspective, our research conveys four key contributions as follows.

(1) Our research is the first to generalize the LDL to heterogeneous networks. The practical implications such as for urban functionality prediction or drug function prediction (presented in Sec 6.1) and the theoretical analysis, to our knowledge, has not yet been explored by any contemporary research.

(2) Our research provides a simple yet effective way of modeling message-passing in heterogeneous network for label distribution learning. It also offer transparent interpretation to explain which meta-path play a more important role for the final outcome.

(3) Our theoretical study not only offers assurance of the proposed model performance, but also paves the way to enrich theoretical understanding of label distribution learning for networked data.

(4) We have created new datasets to validate the algorithm performance. Both our data and algorithms are published to stimulate future growth of research in label distribution for networks, which has been thus far under-explored.

Q3: Why use PAC-Bayes theory to analyze the error bound instead of PAC theory?

A3: We analyzed the algorithm performance in a PAC-Bayes regime instead of PAC for three main reasons. (1) Traditional PAC learning theory focuses on a single, deterministic model/hypothesis, which often leads to worst-case analysis. In contrast, PAC-Bayes extends it by integrating a Bayesian perspective, offering probabilistic bounds over a distribution of hypotheses. PAC-Bayes gauges how well a learner can generalize from a prior (model-dependent) to a posterior (data-dependent), thereby providing insights into the learning process that are not captured by PAC theory alone. This often results in tighter and more informative generalization bounds, especially for parameter-rich models such as neural networks [1].

(2) GNNs are inherently complex to model graph data with non-linear relationships. The high dimensionality and complexity of GNN models make PAC-Bayes particularly suitable for their analysis, as also indicated by [2]. In particular, PAC-Bayes lends insights into the interplay of model choice (prior) and data observation (posterior), allowing for the integration of specific model and data properties into the resultant bounds. Thanks to this, our analysis can establish a correlation between the generalization error and the maximum degree of the searched meta-path graph $\tilde{A}$ in HGDL, as detailed in Theorem 2. Traditional PAC theory that mainly makes the underlying data IID assumption lacks the capacity to incorporate the properties of graph data into analysis.

(3) PAC-Bayes leverages KL-divergence as an information-theoretic metric, which is well-suited for measuring distances in continuous spaces, such as those encountered in our LDL modeling. Our approach to quantifying discrepancies between label distributions relies on KL-divergence, making PAC-Bayes a natural fit for our analysis. This alignment simplifies our derivation and ensures that our theoretical insights aligns with the empirical results.

[1] Neyshabur, Behnam, Srinadh Bhojanapalli, and Nathan Srebro. "A pac-bayesian approach to spectrally-normalized margin bounds for neural networks." ICLR 2018.

[2] Liao, Renjie, Raquel Urtasun, and Richard Zemel. "A pac-bayesian approach to generalization bounds for graph neural networks." arXiv 2020.

审稿意见

评分: 6置信度: 32024-07-08

This paper studies the problem of heterogeneous graph label distribution learning. To deal with it, this paper proposes an HGDL method that optimizes meta-path graph topology and aligns it with nodal features for consistent message-passing, backed by theoretical support. Experimental results on five datasets demonstrate the effectiveness of the proposed model.

优点

S1. This paper is the first to investigate the LDL problem in heterogeneous graphs, which seems interesting.

S2. HGDL effectively combines meta-path aggregation and transformer-based methods to ensure consistent node label distribution learning, validated by both theoretical analysis and empirical studies.

S3. The contributions are clearly written, with codes and datasets provided for reproducibility and practical utility.

缺点

W1. The intuitions behind the techniques lack explanation. More details can be found in Q1 and Q2.

W2. The approach section is somewhat difficult to follow and could benefit from improved presentation.

W3. It would be beneficial to include baselines for the LDL problem, such as GLDL [1]. While these methods are not specifically designed for heterogeneous graphs, it is still important to treat heterogeneous graphs as homogeneous ones and apply these methods for comparison. This will provide a more comprehensive evaluation of the proposed approach.

W4. This paper compares only three baselines, which are relatively dated for heterogeneous graph learning. For example, GCN was published in 2017, HAN in 2019, and SeHGNN in 2023. This makes it challenging to convincingly demonstrate the advantages of the proposed HGDL. Thus, it is essential to include more recent baselines, such as HINormer [2].

References:

[1] Y. Jin, R. Gao, Y. He, and X. Zhu, “Gldl: Graph label distribution learning,” in Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence, 2024.

[2] Qiheng Mao, Zemin Liu, Chenghao Liu, and Jianling Sun. "Hinormer: Representation learning on heterogeneous information networks with graph transformer", in Proceedings of the ACM Web Conference, 2023.

问题

Q1. Why only search for meta-paths connecting nodes of the target type rather than aggregating information from neighbors of different types?

Q2. Why can the design in Section 4.2 effectively capture global feature consistencies? Could you provide more detailed explanations?

Q3. Could you provide an explanation for why HGDL performs worse on the CLD metric?

局限性

Yes.

作者回复

2024-08-07

We thank the reviewer for the constructive comments. We address the reviewer's concerns below one by one in a Q&A fashion.

Q1: Why only search for meta-paths connecting nodes of the target type rather than aggregating information from neighbors of different types?

A1: We justify our meta-path aggregation design as follows. Assume we have types T, A, and B, with T being the target node type. We assume that the reviewer was asking: why we only consider meta-path type like T-T, T-A-T, T-A-B-A-T, etc., but not consider information aggregation via other paths such as A...A, B...B, T..A..B, etc. We answer this question from two perspective.

First, for heterogeneous network with different types of nodes, each node type may have its own nodal feature space. Aggregation information from different node types is a major challenges in this setting. One may think of treating all node types equally by mapping those different feature spaces onto one shared latent subspace, on which the aggregation can be carried out. However, this solution would require a very high-dimensional adjacency matrix that encodes the graph topology consisting of all types of nodes, with dense node connectivity, making it both computational and memory demanding. In contrast, our proposed solution leverages meta-path to yield homogeneous graph of the target type only. This results in sparse meta-path graphs, alleviating computational overhead and preventing oversmoothed node embeddings during the message-passing [1].

Second, in our LDL modeling, the main objective is to predict label distributions for target nodes. The meta-paths starting from and ending with the target nodes provide a clear model of how the information within the neighborhood substructures of those target nodes can be aggregated. Other paths that do not entail the target node such as ABA, BAB, etc., do not serve for this goal directly, and there could be many such meta-paths, resulting in combinatorics numbers that makes our model overly complicated. Despite that exhausting all such meta-path combinations is in general possible, the homogeneous graphs obtained from them will again incur inconsistent feature spaces, e.g., consider two graphs resulted from T-A-T and A-A, of which the feature dimensions equate to the dimensions of T and A node types, respectively. Merging these graphs is still challenging.

[1] Kenta Oono, & Taĳi Suzuki. Graph Neural Networks Exponentially Lose Expressive Power for Node Classification. ICLR 2020.

Q2: Why can the design in Section 4.2 effectively capture global feature consistencies?

A2: In our paper, the global feature consistency refers to a regularization that ensures the nodes with similar original feature inputs, albeit being topologically faraway, should have similar embedding vectors. To have an explicit model on this, we draw insights from the attention mechanism in graph transformers to construct a feature adjacency matrix $(ZQ) (ZK)^{\top} \in \mathbb{R}^{n \times n}$ , as stated in Eq. (2). Intuitively, the attention scores over nodes of similar content are larger, compared to those nodes with unrelated contents. The Hadamard product between this feature adjacency matrix and the weighted meta-path graph $\tilde{A}$ will balance the (global) feature consistency and (local) topology consistency. We shall add more justification regarding our design in Section 4.2 in our camera-ready version.

Q3: Could you provide an explanation for why HGDL performs worse on the CLD metric?

A3: This is mainly because of the inconsistency between the KL Divergence and Clark distance (CLD) measurements, with the same phenomenon observed and reported in Table 2 of [2]. Specifically, we can make a concrete example that demonstrates this inconsistency. For instance, let y = [0.01,0.01,0.98], $\hat{y_{1}}$ = [0.05,0.05,0.9], and $\hat{y_{2}}$ =[0.03,0.07,0.9], where $\hat{y}_1$ , $\hat{y}_2$ indicate the predicted results from model1, model2, respectively. (KL divergence, CLD) of Model1 and Model2 are (0.4467,0.0222) and (0.2528,0.023), respectively, implying a tradeoff region. Generally, there exists a tradeoff region in the small probability values between Clark distance and KL divergence. We visualize this tradeoff in Figure~2 in the uploaded PDF file. In the figure, the black and blue curves delineate the loss trends of KL divergence and CLD, respectively. We can observe that tradeoff region of the two loss function falls in range [0.019, 0.1] along x-axis. This tradeoff explains the cases where HGDL does not attain superior CLD performance but still excels in terms of KL divergence. In general our HGDL algorithm outperforms the competitors in both KL and CLD in most cases.

[2] X. Geng, "Label Distribution Learning," in IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 7, pp. 1734-1748, 2016.

Q4: Need a comparison with GLDL.

A4: We have completed the comparative study and presented the results in the Table~1 of the uploaded pdf file. Please refer to A1 to the Reviewer EYc1 for more details.

Q5: Include more recent baselines such as HINormer.

A5: We have carried out the experiments on DRUG and DBLP dataset using HINormer as an additional competitor. Results are presented in Table~1 in the uploaded PDF file. We only use target node features to level the comparison, which has been applied to all baselines. Due to time limitation, we searched the hyperparameter of hidden dimension in [32,64,128,256], $\beta$ in [0.5,1], temperature in [0.5,1], and the number of GNN layers in [2,3]. We can observe that HINormer does not outperform other baselines. One reason might be the inconsistency of node feature aggregation and the relation encoding scheme. The node vectors are set to zero other than the target nodes, according to original code provided by the HINormer authors. Simply replacing BCE with KL-divergence does not make HINormer well for LDL. We will include the results in camera-ready.

2024-08-09

Thanks for your rebuttal. Most of my concerns have been addressed, I keep my rating intact.

审稿意见

评分: 6置信度: 32024-07-09

This paper introduces a novel approach to Label Distribution Learning (LDL) specifically tailored for heterogeneous graphs, addressing the inherent complexities and challenges associated with this domain. By highlighting the necessity of LDL in heterogeneous settings and outlining the unique challenges involved, the paper lays a foundation for advancing research in this emerging field. The proposed method is underpinned by a robust theoretical framework, providing a coherent rationale for its design and implementation. Experimental validation across five distinct datasets and evaluation using six metrics demonstrate the method's effectiveness over established baselines.

优点

1.This paper is the first to identify the necessity of Label Distribution Learning (LDL) on heterogeneous graphs, addressing the unique challenges and complexities associated with this task. 2.The proposed method is well-grounded in theory, providing a solid foundation for its design and implementation. The authors present a clear and thorough theoretical framework that supports the efficacy and rationale behind their approach, enhancing the credibility and robustness of the method. 3.The effectiveness of the proposed method is demonstrated through extensive experiments on five diverse datasets and six different metrics. The results consistently show that the method outperforms the baselines, indicating its superiority and practical applicability across various scenarios and evaluation criteria.

缺点

1.The paper categorizes current LDL methods into three distinct types in the related work section. However, the experimental evaluation only includes three models (GCN, HAN, and SeHGNN) with the KL-divergence loss function as baselines. This limited selection raises questions about whether the experiments sufficiently demonstrate the proposed method's superiority over the broader range of existing LDL methods. 2.The study employs γ as a hyperparameter in the proposed model but does not provide any analysis of how variations in this parameter affect the model's performance. A thorough analysis of the hyperparameter's impact would offer valuable insights into the model's sensitivity and robustness. 3.The paper does not present a detailed comparison of the computational time required for the proposed method versus the baselines. Without this information, it is unclear whether the proposed method is more efficient in terms of runtime. 4. The motivation of this paper is not very convincing. It seems that this work is done just because LDL has not been applied in heterogeneous graph. It is not clear why LDL can solve some key problems in heterogenous graph.

问题

I don't have any questions.

局限性

While the authors provide a complexity analysis and mention that scalability is not the main concern of this paper, they acknowledge that the current model may not scale well with larger datasets. Although a modification to improve scalability is suggested, it is left for future work, indicating that the current version might struggle with large-scale data.

作者回复

2024-08-07

We thank the reviewer for the constructive comments. We address the reviewer's concerns below one by one in a Q&A fashion.

Q1: Limited comparison to the existing LDL methods.

A1: We would like to kindly point out that the existing LDL methods were not tailored for handling graphs (networked data). In fact, GLDL is the only method specifically designed for label distribution learning on graph data, yet has been restricted to homogeneous network.

We have supplemented a comparative study with GLDL on the DBLP and DRUG datasets, with results presented in the Table~1 of the uploaded pdf file. The results indicate that GLDL outperforms other baselines but is still inferior to our proposed HGDL method. This is mainly because that, although GLDL and HGDL both considered label distribution in their learning processes, HGDL individualizes the learning of weights associated with different meta-paths and integrates them to construct optimal graph topology homogenization. In contrast, GLDL only generates one single meta-path for LDL by heuristics, which may end up with suboptimal solutions.

We will include these comparative results with GLDL and supplement a discussion in our camera-ready.

Q2: Sensitivity analysis on the hyperparameter $\gamma$ .

A2: We present the sensitivity analysis on the hyperparameter $\gamma$ , which controls the impact of the second term of Eq. (3). We illustrate trends of KL divergence between predicted and true label distributions w.r.t. values of $\gamma$ on all five datasets in Figure~1. Specifically, different values of $\gamma$ have been applied, including 0, 1e-5, 1e-4, 1e-3, 1e-2, and 0.1. We observe that the lowest KL divergence value occurs at various $\gamma$ choices, e.g., the optimal KL divergence occurs on the DRUG and URBAN datasets at $\gamma = 0.1$ , and YELP and DBLP dataset at 1e-3 and 1e-5, respectively. The non-zero values of $\gamma$ necessitates design of the second term in Eq. (3). We will include the sensitive analysis on $\gamma$ in our camera-ready.

Q3: A comparison of the computational time to show the efficiency of the proposed approach against baselines.

A3: We have completed a runtime comparison as presented in Table 2 in the uploaded PDF file. Each method has been trained with 100 epochs. Note, due to various neural architecture designs, different models may take varied number of training epochs to converge. Thus it has become a norm to compare their runtime performance on a fixed number of epochs (or per epoch training time), which has also been done in [1]. We can observe that our proposed HGDL algorithm enjoys good scalability by attaining runtime results that are comparable to the baseline GCNs. Compared to other baseline such as SeHGNN $_{KL}$ , HGDL is in general much faster. This indicates that our method is more efficient for learning heterogeneous networks. This can be mainly attributed to the automated learning of weights to control individual meta-paths and then integrates them to yield optimal graph topology homogenization for label distribution learning. We will add the runtime comparison in our camera-ready.

[1] Hao Yuan, Yajiong Liu, Yanfeng Zhang, Xin Ai, Qiange Wang, Chaoyi Chen, Yu Gu, and Ge Yu. "Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective," VLDB 2024.

Q4: The motivation of the research should be further clarified. Why LDL can solve the key problems imposed by heterogenous graph?

A4: Indeed our research is mainly motivated by the fact that no LDL model has been tailored for heterogeneous graph. We note two prominent challenges imposed by the network heterogeneity that motivate our research, as follows. First, in heterogeneous graph, the label distribution of each node is influenced by their neighboring nodes that can vary through multiple factors, including their different types of nodes and edges, nodal contents, and topological features. This complicates the message-passing mechanism, because aligning those varying factors would result in a combinatorial issue. Second, nodes sharing similar contents may be frequently positioned far apart in heterogeneous graphs, separated by nodes of other types, resulting in substantial topological distances between them. During message-passing, the impact of distantly positioned nodes within a graph is substantially diminished, consequently steering the LDL model to prioritize individual nodal vectors, overlooking graph topology.

In addition, from a data perspective, the label distribution is often more informative than a single class scalar. Consider our motivating example in the manuscript, delineating the local urban region with a distribution over multiple city functional classes will lead to richer understanding of the composition of the particular region, such as the number of buildings for various civil purposes, including housing, healthcare, education, etc. Examples of such application abound, including business network (YELP), citation network (ACM), protein-disease network (DRUG), with their datasets all studied in our experiments.

2024-08-11

Thanks for your response. I keep my score.

审稿意见

评分: 6置信度: 42024-07-11

This paper advances Label Distribution Learning (LDL) into the realm of graph domains, specifically addressing the heterogeneous graph label distribution learning (HGDL) problem. The authors highlight that graph heterogeneity, reflected in node types, node attributes, and neighborhood structures, poses significant challenges for generalizing LDL to graphs. To tackle these challenges, the authors propose a new learning framework with two key components: 1.Proactive Graph Topology Homogenization: This component focuses on learning optimal information aggregation between meta-paths to address node heterogeneity before the embedding learning phase. 2. Topology and Content Consistency-aware Graph Transformer: This component uses an attention mechanism to learn the consistency between meta-paths and node attributes, ensuring that network topology and nodal attributes are equally emphasized during label distribution learning.

优点

1.The introduction of a framework that proactively addresses graph heterogeneity and incorporates consistency-aware mechanisms is a novel contribution to the field.

2.The use of KL-divergence and additional constraints in an end-to-end learning process provides a robust solution for label distribution learning on graphs.

3.The theoretical and experimental validations are thorough, and the availability of code and datasets promotes transparency and reproducibility.

缺点

The paper lacks an analysis of the algorithm's complexity, and the framework diagram in Figure 2 is somewhat cluttered.
The authors' motivation is to address the challenges posed by graph heterogeneity, but these challenges are not clearly described in the introduction.
For optimization formula (3), the authors need to provide a detailed explanation of the advantages of this design.

问题

See weaknesses.

局限性

Yes

作者回复

2024-08-06

We thank the reviewer for the constructive comments. We address the reviewer's concerns below one by one in a Q&A fashion.

Q1: The paper lacks an analysis of the algorithm's complexity.

A1: We have conducted the complexity analysis, which has been deferred to the Appendix H.3 due to page limits. We briefly summarize the results of our analysis for the reviewer's convenience. Specifically, our method involves learning a new adjacency matrix from all meta-paths. Without considering any decomposition methods, the number of learnable parameters has to be at least $\mathcal{O}(n)$ for the graph topology homogenization stage, where $n$ denotes number of nodes. So by the attention mechanism, our proposed architecture entails a number of $knf+k^{2}f$ parameters, where $k$ is the number of meta-paths used and $f$ is the dimension of latent space. Omitting constants this amounts to an $\mathcal{O}(n)$ number of parameters to be learned. To prepare the feature adjacency matrix that mitigates feature-topology inconsistency, it requires a number of learnable parameters $h^2 + mh$ , where $h$ and $m$ represent the dimensions of node embedding and input nodal content, respectively. This reduces to a constant $\mathcal{O}(1)$ by screening out factors $h$ and $m$ , which are relatively neglectable compared to $n$ .

Overall, the number of learnable parameters of our proposed model is $\mathcal{O}(n)$ , which seemingly scales up with the total number of nodes in a linear way. However, we note that such linearity is not necessary if the message-passing at each iteration is conducted among a subset of node, instead of all nodes, such as through sampling as done by GraphSage. As such, we can further reduce the computation complexity without being bottlenecked by sizeable graphs with large number of nodes.

Q2: Figure 2 somewhat cluttered.

A2: We have improved the visual clarity of Figure 2 and put it into the pdf. file as "Figure 3".

Q3: The challenges posed by graph heterogeneity have not been clearly described in the introduction.

A3: We will enrich the two challenges using the motivating example shown in Figure 1 in the manuscript. Specifically, we will correlate the challenges to the URBAN dataset as follows. Challenge (1): Graph heterogeneity that complicates the message-passing between nodes of a specific type, as the label distributions of those nodes are influenced by their neighboring nodes that may vary in terms of types, content, and topological features. For example, R3 and R4 are two immediately neighboring residence regions but are connected to different functional areas, i.e., R3 connects to Leisure and Service areas but R4 connects to Service region only. Thus, the urban functionality (i.e., label distribution) of R3 and R4 eventually differs.

Challenge (2): Graph topology and nodal features that may suggest inconsistent label distributions, where nodes sharing similar contents are positioned far apart and separated by various nodes of different types on the graph. Unlike traditional LDL that focuses on instance vectors only, an effective LDL model on graphs require harmonizing nodal contents with topological structures for a unified representation. For example, R2 to topologically more faraway from R3 compared to R4; however, comparing their nodal contents, local neighborhood structures, and functionality distributions, R2 is more similar to R3 over R4.

We will add this motivating example in our camera ready.

Q4: For Eq. (3), please explain the advantages of this objective design.

A4: We rationale our design as follows. The optimization objective in Eq. (3) has two KL divergence terms. (1) The first term minimizes $D_{\text{KL}}(Y \| \hat{Y})$ , enforcing the predicted label distributions to be similar to the ground-true distributions (if exist). (2) The second maximizes the distance between the weight distribution for all meta-paths $\Theta[i, :]$ and a uniform distribution $U[1, k]$ . The intuition is to encourage the diversity of the learned weights associating with different meta-paths. The reason is that KL divergence naturally favors uniform distribution on small values, encouraging all weights to be learned similar. The second term is to counter against this: the noisy meta-path that has been ill defined will diminish its negative impact on the model performance, whereas other meta-paths resulting in better performance will be promoted. This ensures that the important meta-path information will dominate the resultant adjacency matrix, leading to optimal meta-path topology harmonization.

In addition to the intuitions, we present the sensitivity analysis on the hyperparameter $\gamma$ as empirical evidence. $\gamma$ controls the impact of the second term of Eq. (3). We illustrate the trends of KL divergence between the predicted and true label distributions w.r.t. the value of $\gamma$ on all five datasets in Figure~1. Specifically, different values of $\gamma$ have been applied, including 0, 1e-5, 1e-4, 1e-3, 1e-2, and 0.1. We observe that the lowest KL divergence value occurs at various $\gamma$ choices, e.g., the optimal KL divergence occurs on the Drug and URBAN datasets at $\gamma = 0.1$ , YELP and DBLP dataset at $\gamma = 1e-3, 1e-5$ , respectively. The non-zero values of $\gamma$ necessitates the design of the second term in Eq. (3).

We will supplement the intuition of the design of Eq. (3) and include the sensitive analysis on $\gamma$ in our camera-ready.

2024-08-14

Thank you for your reply, which solved most of my concerns, and I keep my score.

作者回复

2024-08-07

We thank the reviewers for their positive and constructive comments. Here, we summarize the major concerns and our responses.

(1) Add GLDL and HINormer as new rival models (suggested by Reviewers EYc1 and pXs2).

We have added new comparative results with the two models, presented in Table 1 in the uploaded PDF file. From the results, we can observe that our HGDL model still outperforms GLDL and HINormer in our benchmark datasets.

(2) Add sensitivity analysis on the hyperparameter $\gamma$ (suggested by Reviewers FJkr and EYc1).

We have conducted this sensitivity analysis, with results illustrated in Figure~1 in the uploaded PDF file. We note that the optimal choice of $\gamma$ varies across different datasets, which substantiates the impact of the regularization term in the overall objective design.

(3) Add runtime comparison to evaluate the algorithm efficiency (suggested by Reviewers FJkr and EYc1).

We have compared the runtime performance of our algorithm and its compared methods, with results documented in Table~2 in the uploaded PDF file. The results show that the efficiency performance of our HGDL approach is on a par with the GCN and HAN methods and is faster than the graph transformer including SeHGNN models.

In below, we have provided rebuttal to each individual reviewer to address their concerns in a Q&A fashion. If there are any points that require further clarification or additional information to facilitate the decision-making process, please do not hesitate to let us know. We are more than willing to provide further explanations or details.

最终决定Accept (poster)

2024-09-25

This paper advances label distribution learning (LDL) into the realm of graph domains, specifically addressing the heterogeneous graph label distribution learning (HGDL) problem. The authors propose an HGDL method that optimizes meta-path graph topology and aligns it with nodal features for consistent message-passing. The proposed method is underpinned by a robust theoretical framework, providing a coherent rationale for its design and implementation. Experimental validation across five distinct datasets and evaluation using six metrics demonstrate the method's effectiveness over established baselines. The novelty and contribution of this paper have been acknowledged by all the reviewers. Thus, I would recommend accepting this paper.