PaperHub
5.8
/10
Poster4 位审稿人
最低5最高7标准差0.8
5
7
6
5
4.0
置信度
正确性2.8
贡献度2.8
表达2.3
NeurIPS 2024

Lambda: Learning Matchable Prior For Entity Alignment with Unlabeled Dangling Cases

OpenReviewPDF
提交: 2024-05-01更新: 2024-11-06
TL;DR

entity alignment with unlabeled dangling entity pineer work using novel PU learning algorithm

摘要

关键词
Knowledge GraphEntity AlignmentPositive-Unlabeled LearningDangling Cases

评审与讨论

审稿意见
5

The paper proposes to match entities in dangling settings, where the entities may not have a link to any other entity. New dataset GA16K is proposed.

优点

The task is important in the knowledge graph field.

The method achieves better F1 in the experiments.

There are proofs for the correctness of the algorithm.

缺点

Writting should be improved. For example, there is no definition of 'GA16K' in the whole paper. The first clear definition of 'GA16K' is in the appendix.

Baselines are old. Many related works are missing (https://arxiv.org/abs/2210.10436, https://aclanthology.org/2022.acl-long.405/, https://dl.acm.org/doi/abs/10.1145/3404835.3462870, https://aclanthology.org/2021.emnlp-main.226/)

The GA16K dataset seems to be multi-modal, but there are well-established datasets for multi-modal entity matching (https://dl.acm.org/doi/10.1145/3534678.3539244, https://arxiv.org/abs/2212.14454) that are not mentioned or used in the experiments.

No large scale experiments are conducted. The paper should conduct experiments on large-scale datasets (https://arxiv.org/abs/2108.05211) to show the scalability of the proposed method.

问题

See above

局限性

No limitations section is provided.

作者回复

We appreciate the reviewer's comments and the corresponding discussion will be added in the revision. However, we still would like to clarify some misunderstanding of our work:

  1. In terms of baseline comparison, we think our comparison is fair enough since the problem we focus on is the alignment issue with unlabeled dangling nodes.

  2. Multi-modal and literal-based tasks are not concerned with ours and thus there are some misunderstandings towards our experimental setup.

  3. For the entity alignment on large-scale graphs, we have presented experimental results to verify the scalability of our method on large-scale entity alignment with dangling entities.

We explain them one by one in the following.

Baselines are old.

In selecting the baseline for our experiments, we carefully considered several factors. Although the GNN-based method of Sun et al. [1] is the closest related one to fairly compare our work with, a direct comparison is unfair as we do not apply any danglinglabel**dangling label** as they do. In our setting, we do not use any sideinformation**side information** such as literal information to avoid name bias [6,7]. Thus our goal is to compare with baselines purely relying on graphstructures**graph structures**.

According to the above principles, we consider none of the works mentioned by the reviewer could serve as a fair baseline. Specifically, EASY [2] and SEU [3] are literal-based methods, where additional side information for alignment is used. The LightEA [4] is a non-neural method based on label propagation algorithm with no consideration on the dangling entities. DATTI [5] is a plug-in that could be added onto Dual-AMN [8], RSNs [9], and TransEdge [10], of which the prototypes have been used as baselines in our paper.

Reference:**Reference:**

[1] Sun Z, Chen M, Hu W. Knowing the No-match: Entity Alignment with Dangling Cases[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021: 3582-3593.

[2] https://dl.acm.org/doi/abs/10.1145/3404835.3462870

[3] https://aclanthology.org/2021.emnlp-main.226/

[4] https://arxiv.org/abs/2210.10436

[5] https://aclanthology.org/2022.acl-long.405/

[6] Liu X, Wu J, Li T, et al. Unsupervised entity alignment for temporal knowledge graphs[C]//Proceedings of the ACM Web Conference 2023. 2023: 2528-2538.

[7] Zhang Z, Liu H, Chen J, et al. An Industry Evaluation of Embedding-based Entity Alignment[C]//Proceedings of the 28th International Conference on Computational Linguistics: Industry Track. 2020: 179-189.

[8] Mao X, Wang W, Wu Y, et al. Boosting the speed of entity alignment 10×: Dual attention matching network with normalized hard sample mining[C]//Proceedings of the Web Conference 2021. 2021: 821-832.

[9] Guo L, Sun Z, Hu W. Learning to exploit long-term relational dependencies in knowledge graphs[C]//International conference on machine learning. PMLR, 2019: 2505-2514.

[10] Sun Z, Huang J, Hu W, et al. Transedge: Translating relation-contextualized embeddings for knowledge graphs[C]//The Semantic Web–ISWC 2019: 18th International Semantic Web Conference, Auckland, New Zealand, October 26–30, 2019, Proceedings, Part I 18. Springer International Publishing, 2019: 612-629.

The GA16K dataset seems to be multi-modal.

There are some misunderstandings to clarify. GA16K is not**not** muti-modal. Although the original GAKG is a multi-modal Knowledge Graph, the extracted GA16K consists of pure graph structures in the form of URL links and their triples.

Our problem has nothing to do with multi-modality, and there is nodangling**no dangling** entity in the multi-modal dataset mentioned by the reviewer [1] [2]. We can study the multi-modal problem as an independent one.

Reference:**Reference:**

[1] https://dl.acm.org/doi/10.1145/3534678.3539244

[2] https://arxiv.org/abs/2212.14454

No large scale experiments are conducted.

LargeEA [1] mentioned by the reviewer is excluded from the baseline for its literal-based property and irrelevance to our dangling problem. As to large-scale datasets, DBP2.0 is one with dangling entities used in our evaluation: it has a total number of 943,894 entities, more than 20,000 relations, and 3,000,000 triples. It is approximately at the same order of scale as those in [1].

The scalability test has been evaluated on DBP2.0. More experimental results could be found in Appendix H.4Efficiency**H.4 Efficiency** concerning the training time, inference time, GPU, and CPU memory consumption. We hope our response could address the concern of the reviewer.

Reference:**Reference:**

[1] https://arxiv.org/abs/2108.05211

评论

Thank you for your response. I have raised my score accordingly. I have read the authors' response and the response addresses most of my concerns. However, I still believe that the authors can introduce newer baselines such as LightEA even its not specifically designed for dangling cases. I am not sure if the authors have tried LightEA, but it is a very strong baseline for the problem of interest. Also, creating a dangling dataset from an existing dataset is easy (as shown in the no-match paper). The authors may also need to revise line 589 about the description of GA16K. This paragraph explicitly states that GA16K is derived from a multi-modal KG, so it may confuse the readers. I would advise removing 'multi-modal' entirely from the paragraph since it's not necessary to mention it.

评论

Thanks for your acknowledgement and further suggestion.

We have carefully check the paper of LightEA and hope the following explanation can help clarify the differences between the LightEA and Lambda.

However, I still believe that the authors can introduce newer baselines such as LightEA even its not specifically designed for dangling cases. 

a. The experimental metrics differ. Lambda is a two-stage method where in the first stage, dangling entities are removed (by classifier), and only the remaining entities are aligned (by encoder) in the second stage. The alignment failure could be attributed to either the encoder or the classifier. Thus, its experimental metrics must consider both perspectives as in L617-645 of our paper. The single-stage method LightEA searches for alignment directly on all entities and hence cannot be appropriately evaluated under the metrics of Lambda.

b. The costs of alignment differ. LightEA employs a searching-based method on the embedding of dimension 2048 and the larger the dimension, the higher the alignment accuracy. In contrast, Lambda only uses embedding of dimension 128 to retrieve the aligned pair.

Also, creating a dangling dataset from an existing dataset is easy (as shown in the no-match paper).

Having a proportion of dangling nodes does not mean it is an appropriate dataset for testing the alignment method. As pointed out in the no-match paper [1], if the distribution of the dangling nodes is entirely different from the matchable ones, it is too straightforward to tell them apart; the dataset is only challenging if the node degree distribution of dangling entities is close to that of the matchable, which DBP2.0 satisfies. Hence it is not that easy to craft a dangling dataset from an existing one. Nevertheless, we will construct other dangling datasets given more time in the future.

Reference:

[1] Sun Z, Chen M, Hu W. Knowing the No-match: Entity Alignment with Dangling Cases[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021: 3582-3593.

评论

Thank you for your response. However, I must say there are certain issues with your justification. First, you can easily implement one classifier stage and adapt other common methods to the new setting with that classifier. This is done by the no-match paper and I don't see why it is not possible for LightEA. Second, you cannot directly compare the speed/cost of two methods by their embedding size, especially when LightEA does not have a back propagation process. In their paper, LightEA is efficient enough to run the largest dataset within 1 minute.

Regarding the dataset construction part, I am convinced that the authors have provided a detailed explanation.

评论

We sincerely thank the reviewer for the timely follow-up. During this period, we fixed LightEA's code to include dangling entities into the alignment candidates and evaluated their method on DBP2.0. Hits@1 and Hits@10 are evaluated in a similar way to the dangling-unaware methods in our paper, as listed below. We omitted experiments on FR-EN due to limited time. In comparison, Lambda still outperforms LightEA.

methodHits@1 (ZH-EN)Hits@10 (ZH-EN)Hits@1 (JA-EN)Hits@10 (JA-EN)Hits@1 (FR-EN)Hits@10 (FR-EN)
LightEA60.5%82.9%61.4%84.1%--
Lambda62.6%84.7%62.1%84.0%44.1%69.3%

We admit that LightEA is efficient, but we fail to reproduce their running time overhead, probably due to some configuration issues in Anaconda environment. We will complete the experiments and include the results in the revision.

审稿意见
7

The paper tries to tackle the challenge of entity alignment (EA) with unlabeled dangling cases in knowledge graphs (KGs), where some entities lack counterparts in another KG.

It presents a framework to detect dangling entities and align matchable entities using a GNN-based encoder and a positive-unlabeled learning algorithm, respectively.

It also provides theoretical guarantees for the proposed methods, including unbiasedness, uniform deviation bounds, and convergence.

Experimental results demonstrate the promising performance of the framework over baselines on real-world datasets, even without labeled dangling entities for training.

优点

  • The paper addresses the challenge of entity alignment in a dangling-aware context, even when labeled data for dangling entities is unavailable. This practical scenario is explored, and can inspire future work on this task.

  • In my opinion, the proposed method, selective neighborhood aggregation combined with positive-unlabeled learning, holds promise for tackling the problem under investigation.

  • Experimental results across multiple datasets demonstrate that the proposed method can outperform baselines for dangling entity detection and entity alignment in most metrics.

缺点

  • The discussion on related work appears insufficient. I recommend enhancing the appendix by providing a detailed discussion and comparison. For instance, although the proposed GNN looks similar to Dual-AMN, no explicit discussions regarding this similarity are currently included.

  • In Table 2, the proposed GNN exhibits significant superiority over MTransE and AliNet. However, in Table 4, the advantage is reduced. Although this could be attributed to various factors, such as dataset characteristics, evaluation metrics, or specific scenarios, it is essential to carefully analyze the experimental setup and consider potential confounding variables to fully understand this discrepancy.

问题

  • According to Table 2, the proposed GNN greatly outperforms MTransE and AliNet. Why does it not show such a huge advantage in Table 4?

  • (Open question) Is the proposed method applicable to dangling entity detection with labeled data? What about its performance in this case?

局限性

NA.

作者回复

We appreciate the reviewer's valuable comments and address the reviewer's concerns as follows.

The discussion on related work appears insufficient. For instance, although the proposed GNN looks similar to Dual-AMN, no explicit discussions regarding this similarity are currently included.

Thanks for the advice and we will provide a detailed discussion in the revision.

The differences between the proposed GNN and Dual-AMN include:

Aggregation**Aggregation**:

  1. We proposed adaptive dangling indicator rejr_{e_j} into GNN for eliminating dangling pollution.
  2. The indicator rejr_{e_j} is concatenated as a part of the entity feature.

Attention**Attention**:

  1. We scale the attention by rejr_{e_j} to filter dangling information.

  2. We link relation rkr_k's embedding hrkh_{r_k} to the adaptive dangling indicator rejr_{e_j} of the associated entity eje_j, and thus the attention in Eq. (2) models the relationship between the relation and the entity.

According to Table 2, the proposed GNN greatly outperforms MTransE and AliNet. Why does it not show such a huge advantage in Table 4?

In Table 2, MTransE and AliNet are both dangling-unaware methods. They are trained with 30% aligned entities under the same setting as our method. While in Table 4, MTransE and AliNet are dangling-aware for being extended**extended** by three techniques (NNC, MR, and BR), and have an additional**additional** 30% labeled dangling data for training. In contrast, our method does not leverage any labeled dangling data for training but has a superior performance, hence showing the power of our approach.

(Open question) Is the proposed method applicable to dangling entity detection with labeled data? What about its performance in this case?

Thanks for proposing the interesting question. Our PU learning is an unbiased estimation of the loss on the labeled data (only positive) and unlabeled data (both positive and negative). In the case of dangling entity detection with labeled data, both positive and negative labels are known. Hence the approach is adapted as follows.

The new loss function is L=λLN+(1λ)LPUL=\lambda L_N + (1-\lambda) L_{PU} where the first term is the loss for the labeled negative data and the second is a PU-learning loss. The only extension is an additional estimation λ\lambda making the new loss an unbiased one. The performance depends on how we estimate λ\lambda. This is an interesting field to explore in future work.

评论

Thank you for addressing my concerns. I would like to see the paper accepted.

评论

We sincerely appreciate your feedback and advice which helps improving our work.

审稿意见
6

This paper elaborates the unique challenges of unlabeled dangling entities in EA task. To address the challenge, it proposes the framework, namely Lambda, for dangling detection and then achieves entity alignment. The main idea is to perform selective aggregation with spectral contrastive learning and to adopt theoretically guaranteed PU learning to relieve the dependence on the labeled dangling entities. Extensive experiments demonstrate the effectiveness of each module and its ability to handle Unlabeled Dangling Cases.

优点

  1. The paper is well-motivated.
  2. The paper propose innovative methods to addresses the challenge of unlabeled dangling entities in entity alignment tasks
  3. Extensive experiments demonstrate the model's effectiveness, and the experiments unaware(aware) of dangling entities is particularly insightful.
  4. The paper is well written and organized.

缺点

  1. The section on KG Entity Encoder with Selective Aggregation section adds an adaptive dangling indicator based on the Dual-AMN[1] model. The paper would be better to provide a detailed explanation of why the adaptive dangling indicator is effective.
  2. In some evaluation metrics of datasets, Lamada's results still need improvement. For instance, the results for H@10 and H@50 in Table 2 are lower than those of Dual-AMN[1], but H@1 is more higher. And the precision in Tables 3 and 4 is slightly inferior.
  3. Comparing the proposed method with strong baseline models under different ratios of pre-aligned seeds would better demonstrate the method's superiority.

[1] Xin Mao, Wenting Wang, et al. Dual attention matching network with normalized hard sample mining. WWW 2021

问题

  1. The paper mention that the initialization of r_{e_j} is critical (L94-95). What are the potential consequences of poor initialization? How is a good initialization chosen? Is it based on experience or theoretical support?
  2. How is the equivalent form of infoNCE (L137-137) derived?
  3. What point of aligned entity sparsity does the model stop all subsequent processes and determine that the two KGs cannot be aligned? Is there experimental evidence supporting this?

局限性

NA

作者回复

We extend our gratitude to the reviewer's invaluable feedback and address the reviewer's concerns as follows.

The paper would be better to provide a detailed explanation of why the adaptive dangling indicator is effective.

Sorry for not explaining it clearly. We introduce the adaptive dangling indicator as a global weight, instead of a local weight, in that any dangling entities with a similar structure can be assgined a high weight in the propagation. Our experiments confirm the point that, as shown in Figure 5, Sec. 5.4, the results without the adaptive dangling indicator are inferior indicating the power of the design.

Comparing the proposed method with strong baseline models under different ratios of pre-aligned seeds would better demonstrate the method's superiority.

Thanks for the advice and we have included the experimental results on different ratios of pre-aligned seeds in the global rebuttal PDF. The experimental baseline includes MtransE w/ BR the SOTA method in previous works, which is also the only open-source method.

The paper mentions that the initialization of r\_{e\_j} is critical (L94-95). What are the potential consequences of poor initialization? How is a good initialization chosen? Is it based on experience or theoretical support?

As stated in ImplementationDetail**Implementation Detail** (L213-215), the tanh\tanh function changes rapidly in the region close to 00 but stays stable in the region beyond [3,3][-3, 3]. Since all nodes are considered equally important at the start of training, we thus initialize all the rejr_{e_{j}} to 11 to prevent gradients oscillation or near-zero gradients.

Our past experiments have shown that if we ignore the above setup and choose to initialize rejr_{e_{j}} to [0,0.5][1.5,+)[0, 0.5] \cup [1.5, +\infty), for example {0,0.5,1.5,2,30, 0.5, 1.5, 2, 3}, the alignment performance would be poor.

How is the equivalent form of infoNCE (L137-137) derived?

Due to the page limits, we omitted the following derivation details:

infoNCE(q,p+,pN)=logexp(λ sim(q,p+))jNexp(λ sim(q,pj))+exp(λ sim(q,p+))\text{infoNCE}(q, p^{+}, \\{p^{-}\\}^{N}) = -\log \frac{\exp(\lambda~ \textrm{sim}(q, p^{+}))}{ \sum_{j}^{N} \exp(\lambda~ \textrm{sim}(q, p^{-}_j)) + \exp(\lambda~ \textrm{sim}(q, p^{+}))}

=logexp(λsim(q,p+))+jNexp(λsim(q,pj))exp(λ sim(q,p+))=\log\frac{\exp(\lambda\text{sim}(q, p^{+}))+\sum_{j}^N\exp(\lambda\text{sim}(q, p^{-}_j))}{\exp(\lambda~ \textrm{sim}(q, p^{+}))}

=log[1+jNexp(λ sim(q,pj))exp(λ sim(q,p+))]=\log [1+ \frac{ \sum_{j}^N \exp(\lambda~ \textrm{sim}(q, p^{-}_j))}{\exp(\lambda~ \textrm{sim}(q, p^{+}))}]

=log[1+jNexp(λ sim(q,pj)λ sim(q,p+))]=\log\left[1 + \sum^N_{j} \exp(\lambda~ \textrm{sim}(q, p^{-}_j)-\lambda~ \textrm{sim}(q, p^{+}))\right]

If we substitute the exponent with H()H(\cdot) defined in Eq.(4), the loss turns into LinfoL_\text{info} defined in Eq. (3). We will include the proof to avoid confusion in the revision.

What point of aligned entity sparsity does the model stop all subsequent processes and determine that the two KGs cannot be aligned? Is there experimental evidence supporting this?

Such a termination point should be decided upon the requirements of downstream tasks --- whether the downstream task considers the alignment of two KGs is worthy. For example, for 11% entities remaining to be aligned, it may take a disproportionate amount of computing resources to get these negligible amount of entities aligned.

Through experiments, we consider 151 - 5% sparsity threshold is sufficient for most applications.

Furthermore, we have also considered rigorous estimation of the corresponding sparsity threshold. One way is to draw the ROC curve given varying sparsity thresholds. Such statistical results are based on a large corpus of paired and unpaired KGs which requires heavy human annotations.

评论

Thanks for authors' response. My concerns are addressed. I would be glad to see the paper accepted.

评论

We are glad that our previous responses clarified potential misunderstandings and addressed your concerns. We also sincerely thank you for your positive feedback.

审稿意见
5

This paper introduces a novel entity alignment framework called Lambda for aligning entities with dangling cases. It includes a GNN encoder, KEESA, to aggregate information within and across KGs, and an iterative positive-unlabeled learning algorithm, iPULE, to detect dangling entities. The authors provide both theoretical proof and empirical evidence to demonstrate the superiority of the proposed method.

优点

The idea of using positive pairs to support unlabeled dangling detection is interesting and effective.

Results on dangling entity detection are promising.

Theoretical proofs are provided to further support the proposed method.

缺点

The writing may be improved. Figures need more detailed captions. The methodology section introduces too many new terms. For instance, in Lemma 1, perhaps using e_i, e_+ and e_j is better than q, p^+, p^-. The overall subscripts and superscripts are also inconsistent.

The motivation of this paper is not clear enough. There are already many methods leveraging inter-graph and cross-graph GNNs for entity alignment. In this sense, the novelty of KEESA is limited. The so-called spectral contrastive learning seems no different from the existing ones.

Then, the core contribution of this paper is iPULE. This paper could be improved if the authors delve deeper into the discussion of iPULE instead of KEESA, especially regarding the application of this module on different EA methods.

问题

Why do the authors call L_info ``spectral contrastive learning''?

局限性

N/A

作者回复

We appreciate the reviewer's valuable comments and address the reviewer's concerns as follows. The writing issues will be fixed in the revision.

The motivation of this paper is not clear enough. There are already many methods leveraging inter-graph and cross-graph GNNs for entity alignment.

Here we restate the motivation: our focus is a new setting where dangling entities exist in the entity alignment problem. Since those entities do not have a match and remain unknown in prior, the alignment problem cannot be addressed by inter-graph and cross-graph GNNs alone.

Although KEESA leverages both the intra-graph and inter-graph information, it emphasizes on learning the unified matchable embedding space inthepresenceofdanglingentities**in the presence of dangling entities**. The idea is to avoid the pollution**pollution** of dangling entities to the embeddings of matchable ones in neighborhood aggregation. Conventional approaches not considering dangling nodes would assign a match to the dangling nodes, which leads to error spreading to the matchable nodes.

The so-called spectral contrastive learning seems no different from the existing ones.

The form of the loss function in spectral contrastive learning may already exist, but our purpose is to illustrate its role in serving both tasks, i.e., entity alignment and dangling detection, by mining high-quality negative samples. The loss LinfoL_\text{info} is critical to our problem as PU learning is based on the assumption of the existence of discriminative classes in the feature space, while this could be offered by the equivalent spectral clustering effect of LinfoL_\text{info} (L127-138).

Then, the core contribution of this paper is iPULE. This paper could be improved if the authors delve deeper into the discussion of iPULE instead of KEESA, especially regarding the application of this module on different EA methods.

We agree with the reviewer. However, we would like to point out that KEESA is important to iPULE, as the effectiveness of iPULE depends on the powerful and discriminative embedding produced by KEESA. The discriminative features are the prerequisite that PU learning works. Hence KEESA can be considered as an essential part which complements iPULE in our framework. According to our investigation, the encoder module by other EA methods lacks consideration of dangling nodes, and thus does not work with iPULE as well as KEESA does.

Why do the authors call L_info ``spectral contrastive learning''?

The word 'spectral' comes from spectral clustering as the loss is equivalent in performing spectral clustering in the embedding space (telling dangling entities apart from matchable ones). The word 'contrastive' indicates that the loss also performs contrastive learning over positive and negative sample pairs (for entity alignment).

作者回复

For reviewer HTtm.

Comparing the proposed method with strong baseline models under different ratios of pre-aligned seeds would better demonstrate the method's superiority.

Table.1 contains experimental results on different ratios of pre-aligned seeds. The experimental baseline includes MtransE w/ BR the SOTA method in previous works, which is also the only open-source method.

最终决定

The paper deals with entity alignment and the problem of entities without a counterpart in the other graph. It proposes a GNN-based encoder and an iterative learning algorithm supported by theoretical and empirical arguments. The strengths of the paper are (1) addressing an important practical problem (RNuV, HTtm, jgr6), (2) using an innovative approach (positive-unlabeled learning) (HTtm, cR2t), (3) extensive experiments (RNuV, HTtm, cR2t), (4) theoretical analysis (RNuV, jgr6). It could be further improved in terms of writing (RNuV, jgr6), additional experiments (HTtm, cR2t, jgr6), and inclusion of the outline of differences to Dual-AMN in the paper (which was already given in the discussion). Most reviewers actively participate in the discussion, sometimes over several rounds.