5.0

/10

Poster4 位审稿人

最低3最高7标准差1.6

4.3

置信度

正确性3.0

贡献度2.8

表达3.0

NeurIPS 2024

Tackling Uncertain Correspondences for Multi-Modal Entity Alignment

Liyi Chen,Ying Sun,Shengzhe Zhang,Yuyang Ye,Wei Wu,Hui Xiong

OpenReview PDF

提交: 2024-05-09更新: 2024-11-06

摘要

关键词

Entity alignmentknowledge graphmulti-modal learning

评审与讨论

审稿意见

评分: 3置信度: 42024-07-08

This paper aims to address the uncertainty in entity alignment within multimodal knowledge graphs (MMKGs). The authors design a MKE module to handle relations, attributes, and visual knowledge, enhancing attribute alignment and filtering through large language models and contextual learning. To address the issue of missing modalities, the paper introduces a MMI module that uses VAEs to generate pseudo-features. Additionally, MCE based on cross-attention and orthogonal constraints is developed to enhance semantic associations between different modalities.

优点

The paper considers the issue of missing modalities in MMKGs, using VAEs to generate pseudo-features, achieving modality alignment in the latent space.

It systematically considers the relationships between relations, attributes, and images in MMKGs and achieves interactive enhancement.

The model diagram is concise and easy to understand.

SOTA Experimental performance.

缺点

The problem description in the introduction is unclear, and the language logic is chaotic, failing to clearly explain the challenges faced by existing methods. The contribution points are also unclear, and the writing is poor.

I understand that MMI is a relatively innovative part of the paper, but the ablation study shows its impact is minimal, indicating the limited effectiveness of the features supplemented by MMI.

The paper lacks innovation. Using large models for filtering and MMI is a rather straightforward approach. However, the large model is simply ChatGPT. Why didn't you consider fine-tuning other large models such as Llama?

The experimental table settings are problematic. For example, traditional methods like IPTransE have only 0.04 H1. Is it still necessary to compare with it? The method in this paper shows a significant improvement, far surpassing all baselines. For instance, on the DB15K dataset, the H1 performance of the previous EMNLP23's MoAlign method is 31.8, and WWW23's method is 30.4, while this paper achieves 86.7. However, I find it hard to intuitively perceive the necessity for such performance improvement from the method presented in this paper. The MMI, which I consider innovative, shows a general impact in the ablation experiment.

Experimental tables are missing. Previous methods have tables set at 50% and 80%, but why are they absent here? I couldn't find them in the appendix either. This paper lacks two major model performance tables.

The ablation study is not thorough or sufficient.

问题

Why can the performance improve so significantly? What is your baseline, or is it entirely self-constructed?

局限性

See weakness and question

作者回复

2024-08-06

Many thanks for your valuable comments. We appreciate your recognition of the soundness of our method, the clarity of the writing, and the SOTA performance. In response to your concerns, we would like to address the following points:

[W1: Problem description & Challenges]: As described in our problem definition in Section 3, we focus on addressing a fundamental problem in the field of KGs: multi-modal entity alignment. We focus on tackling uncertain correspondences between inter-modal or intra-modal cues of entities for MMEA. It is a crucial issue that has not been solved in previous works and results in insufficient utilization of multi-modal knowledge, particularly in fusion and enhancement. The challenges include three points: weak inter-modal associations, description diversity, and modality missing. Please refer to Section 1 for more details about these challenges. Our experimental results show promising performance, which proves that effectively addressing the problem of uncertain correspondences in MMEA task can lead to significant improvement.
[W2: MMI effect]: We would like to clarify that the impact of MMI module is not minimal and its effect is related to the extent of data missingness in the dataset. Its impact and effectiveness are demonstrated in both the ablation experiment and modality sensitivity experiment. As shown in Table 2, MMI shows an obvious improvement of nearly four percentage points in Hits@1 on FB15K-YG15K dataset (w/o MMI: 77.9% vs TMEA: 81.8%). In Figure 3, our method demonstrates greater robustness: when only 20% of entities have the visual modality, TMEA achieves an MRR exceeding 0.5.
[W1 & W3: Contributions & Innovation of LLM for filtering and MMI & Finetuned LLM]: For your concern about innovation, we would like to clarify that our approach is not a simple LLM filtering or MMI. The technical innovations/contributions about them are as follows:
- To handle diverse attribute knowledge descriptions for attribute knowledge learning, our contribution is the design of a novel alignment-augmented abstract representation that incorporates the LLM and in-context learning into attribute alignment and filtering for generating and embedding the attribute abstract.
- To mitigate the impact of modality absence, our contribution lies in the proposal to unify diverse modalities into a shared latent subspace and generate pseudo features via VAEs according to existing modal features in MMI module.
In addition to these two aspects of innovations, we specially design an inter-modal commonality enhancement mechanism based on cross-attention with orthogonal constraints to address the weak semantic associations in MCE module.

Indeed, we have successfully addressed the critical issue of uncertain correspondences in MMEA by proposing a novel framework named TMEA, which significantly improves the performance of MMEA, a fundamental problem that benefits numerous downstream applications.

Regarding LLM selection, we would like to emphasize that our innovation does not lie in the use of LLMs and it lies in addressing the problem of uncertain correspondences by proposing a TMEA framework. This framework can accommodate any LLM, and we have achieved promising results using a relatively straightforward LLM.
[W4 & Q1: The reason for significant performance improvement & Baselines]: We would like to emphasize that the significant improvement in performance is attributed to our solution to the uncertain correspondence problem in the MMEA task, which enables more comprehensive feature learning of multi-modal knowledge. Our primary innovation lies in the proposal of a holistic framework to tackle this important problem, and the experiments have demonstrated the overall effectiveness of both the framework and each individual component. As evidenced by the modality and component ablation studies in Table 2, the experimental results apparently suggest the improvement is the result of the combined contributions of all components.

Regarding baselines, we followed the standard experimental setup of the MMEA task like ACK-MMEA, MCLEA, and MSNEA, using well-adopted baselines that comprehensively cover traditional and multimodal entity alignment approaches. Our results are either directly reproduced from public code or sourced from other papers, not self-constructed.
[W5: Missing 50% and 80% tables]: These experimental results are presented in Figure 3, which show that TMEA provides superior performance with different proportions of training data.
[W6: Ablation study]: The essential components in TMEA include MMI module (w/o MMI), MCE module (w/o MCE), alignment-augmented abstract representation (w/o AP), orthogonal constraint loss $L_o$ in MCE module (w/o $L_o$ ), MSE loss $L_{mse}$ in MMI module (w/o $L_{mse}$ ), and iterative strategy (w/o IT). In Table 2, we have presented the ablation study including all these variants, and the results demonstrate the effectiveness of all essential components. If there are any further suggestions, we are open to supplementing the experiments.

We hope these responses effectively address your concerns. We will make revisions to further clarify these aspects in our revised paper.

2024-08-10

Thank you to the authors for the rebuttal, which has addressed my concerns regarding the missing 50% and 80% tables as well as the contributions.

However, the following concerns remain unresolved:

W2: As you pointed out, MMI is one of your innovations, and you mentioned that in the FB15K-YG15K dataset, the ablation of MMI resulted in a performance drop from 81.8 to 77.9 (a decrease of 3.9). In contrast, other components such as MCE (81.8 to 63.6) and AP (81.8 to 59.3) showed much larger impacts. This suggests that MMI's contribution is not as significant. Moreover, as I previously mentioned, in the FB15K-DB15K dataset, MMI's performance change is only from 86.7 to 85.2 (a decrease of 1.5). While MMI does have an effect, its impact does not seem as strong as claimed in the paper.

W5: In the ablation study, the authors only conducted ablation on individual modules but did not perform combined ablations, such as removing both AP and MMI together.

W4: The authors stated that the code was built on other baselines, yet the performance improvement is exceedingly large. For example, in the ablation study, even after removing a single module, TMEA still performs well: without AP, the performance drops from 86.7 to 78.6; without MMI, from 86.7 to 85.2; without LMSE, from 86.7 to 84.1. This raises another concern: even after removing the proposed modules, the performance might still significantly exceed previous baselines (EMNLP23's MoAlign method is 31.8, and WWW23's method is 30.4). In the rebuttal, the authors have not yet provided a reasonable explanation for this concern—how the code or mechanism achieves nearly a 50-point improvement over previous SOTA baselines (86.7 vs. 31.8, 30.4) despite the ablation effects being not very pronounced. The authors pointed out that "the significant improvement in performance is attributed to our solution to the uncertain correspondence problem in the MMEA task." However, the ablation experiments do not indicate which specific mechanism(s) are responsible for such a substantial performance boost.

2024-08-11

Thank you for your valuable feedback. In response to your concerns, we would like to address the following points:

[W2: MMI effect]: We would like to clarify that the effectiveness of MMI module can be demonstrated due to the following reasons:
- The MMI module has shown clear improvements on both FB15K-YG15K and FB15K-DB15K datasets. The improvement on FB15K-DB15K is evaluated not only on Hits@1 but also on other metrics like MR. The MR notably decreased from 32.9 to 26.3. This indicates that incorporating the MMI module performs better in all ranking results, particularly for difficult samples, which aligns with our description of more improvement in extreme missing data scenarios (Please refer to the next point).
- As we explained before, the impact of MMI is related to the extent of data missingness in the dataset. In our experimental analysis, we provided the explanation about this issue in Lines 328-330.
- Additionally, to further validate the effectiveness of MMI, we conducted a modality sensitivity experiment on FB15K-DB15K. In Figure 3, the results confirm that our method exhibits superior performance and greater robustness. For example, when only 20% of entities have the visual modality, TMEA achieves an MRR exceeding 0.5, while the best baseline falls below 0.4. These results have validated the effectiveness of MMI module.
Regarding your concern that other components show much larger impacts than MMI, we respectfully disagree that the greater impact of other modules can indicate the ineffectiveness of MMI. Each module is designed to address the uncertain correspondence problem from a different perspective, such as MMI addressing the issue of missing modality and MCE addressing weak inter-modal associations. These modules are not conflicting, and the experiments in Table 2 have demonstrated the effectiveness of each module.
[W6 (Not W5): Ablation study]: We would like to emphasize that the AP and MMI modules are two separate modules with different roles, so we conducted separate ablation experiments for each, which have demonstrated the effectiveness of the modules. Regarding your mentioned experiments of combined ablations, we present the results as follows:

Method Hits@1 Hits@10 MR
w/o AP & MMI 0.734 0.874 60.3
TMEA 0.867 0.944 26.3

These results further validate the effectiveness of the AP and MMI modules.
[W4: Baseline]: There seem to be some misunderstandings regarding the meaning of "baseline" in this context. In your review, we understood "baseline" as the method being compared. Therefore, our response described that we reproduced these methods for comparison using open-source code or directly cited the results from the original paper. Based on your latest reply, we now understand that the "baseline" you are referring to is the base model. We would like to clarify that our model is entirely self-constructed.
[W4: Significant Improvement]: We would like to clarify that our method's improvement is compared against the SOTA method, MSNEA (65.3%), and our initial multi-modal feature extraction and contrastive learning methods are similar to MSNEA. It can be observed that the performance of ACK-MMEA (30.4%) and MoAlign (31.8%) is significantly lower than MSNEA (65.3%). Additionally, we provide the experimental results of removing all the designed modules as follows:

Method Hits@1 Hits@10 MR
w/o All Designed Components 0.647 0.788 147.6
MSNEA 0.653 0.812 54.0
TMEA 0.867 0.944 26.3

The experimental results also indicate that when we remove all the designed components, the results are similar to MSNEA.

In conclusion, the various components we designed can alleviate the uncertain correspondence problem from different perspectives. The combination of all designed components significantly improves the performance, and the effect of each component has been demonstrated through individual ablation studies.

Method	Hits@1	Hits@10	MR
w/o AP & MMI	0.734	0.874	60.3
TMEA	0.867	0.944	26.3

Method	Hits@1	Hits@10	MR
w/o All Designed Components	0.647	0.788	147.6
MSNEA	0.653	0.812	54.0
TMEA	0.867	0.944	26.3

We hope these responses effectively address your concerns, and we highly value your re-evaluation of our paper. Thank you very much for your continued consideration.

2024-08-12

Dear Reviewer fuge,

We would like to express our sincere gratitude for the time and effort you spend reviewing our paper. As the author/reviewer discussion stage draws to a close, we are eager for your response to ascertain if our detailed response has sufficiently addressed your concerns. We would be honored to address any further questions you may have. We eagerly anticipate and highly value your re-evaluation of our paper.

Best regards,

Authors of Paper 2934

审稿意见

评分: 6置信度: 42024-07-08

This paper proposed a novel method for tackling uncertain correspondences in multi-modal entity alignment, called TMEA. The approach addressed challenges such as weak inter-modal associations, description diversity, and modality absence that hinder effective entity similarity exploration. TMEA employd alignment-augmented abstract representation, integrating large language models and in-context learning to enhance attribute alignment and filtering. To address modality absence, it unified all modality features into a shared latent subspace and generates pseudo features using variational autoencoders. Additionally, it introduced an inter-modal commonality enhancement mechanism based on cross-attention with orthogonal constraints to improve weak semantic associations. Extensive experiments on two real-world datasets has demonstrated the effectiveness of TMEA, showing significant improvements over competitive baselines.

优点

This paper effectively encoded relational, attribute, and visual knowledge into feature representations using a Multi-modal Knowledge Encoder (MKE) module, providing a holistic view of entity features.
This paper used a Large Language Model (LLM) and in-context learning for better attribute alignment and filtering, and addressed modality absence through the Missing Modality Imputation (MMI) module, which generated pseudo features using Variational AutoEncoders (VAEs).
This paper developed an inter-modal commonality enhancement mechanism based on cross-attention with orthogonal constraints to improve semantic associations between modalities, ensuring coherent and aligned multi-modal features.

缺点

The design involves multiple modules, increasing the method's complexity in terms of reproducibility and spatio-temporal complexity compared to previous approaches.
The experimental part of Label Dependency Analysis is not described in detail enough, for example, when the aligned sample pairs reach 80%, the effect of MCLEA is basically the same as the effect of TMEA on some indexes in the FB15K-DB15K dataset, but the text doesn't explain the reason for the phenomenon.

问题

Why should we consider modal missingness, the Missing Modality Imputation(MMI) module doesn't seem to play a big role?

局限性

The authors have adequately addressed the limitations and potential negative societal impacts of their work in Appendix E. They underscored limitations in semantic specificity stemming from reliance on language models trained on general corpora and dependence on annotated data, which hampers practical application in knowledge graph contexts. Future research aims to refine these aspects for broader applicability in unsupervised settings within KGs.

作者回复

2024-08-06

Thank you very much for your valuable comments. We appreciate your recognition of our method's novelty, effectiveness, and the sufficiency of our experiments. In response to your concerns, we would like to address the following points:

[W1: Reproducibility & Complexity]: To ensure reproducibility and promote research in this field, we plan to release our code. For your concern about complexity, we would like to clarify that our method is not more complex than many previous approaches. Mainstream methods for feature learning in MMKGs typically use different encoders to extract initial features and then design various fusion methods. Due to multiple modalities, these approaches can lead to slightly higher time and space complexity. Here, we calculate the time and space complexity of our method and one representative baseline, ACK-MMEA. We denote the batch size as $B$ and the dimension size as $d$ .

The time complexity of ACK-MMEA consists of following parts: (1) Attribute Uniformization: This takes $O(|V|(d_T+d_E+d_I)d)≈O(|V|d^2)$ . (2) Merge Operator: As it is a GAT structure, it spends $O(|V|d^2+|E|d)$ . (3) Generate Operator: This involves average aggregation, so it takes $O(|E|d)$ . (4) ConsistGNN: It is a 2-layer GCN structure, costing $O(2*3|E|d)$ . (5) Joint Loss calculation: The entity similarity loss takes $O(3|V|d)$ , the attribute similarity loss takes $O(2|V|d)$ , and the neighbor dissimilarity takes $O(|E|d)$ . Thus, the total complexity is around $O(|V|d^2+(|V|+|E|)d)$ .

For our method, the time complexity consists of the following parts: (1) Linear Projections of the Pre-trained Visual and Attribute Features: They take $O(2Bd^2)$ . (2) Loss for TransE: It takes $O(|T_{\mathcal{R}}| |T_{\mathcal{R}}^{-}|d)$ . (3) MMI: It first does relational projection for $O(2Bd^2)$ , and then VAEs take $O(4Bd^2)$ . Finally, the loss functions take $O(6Bd)$ . (4) MCE: It consists of six multi-head cross-attention modules, taking $O(6\eta B^2d)$ . $L_{orth}$ further takes $O(6Bd)$ . (5) Contrastive Learning Loss: It takes $O(4|H|d)$ . Therefore, the total time complexity of TMEA is around $O(Bd^2+(B^2+|H|+|T_{\mathcal{R}}|^2)d)$ .

The space complexities of TMEA and ACK-MMEA are both $O(d^2 + |V|d)$ .

Our method has a smaller time complexity than ACK-MMEA when the number of entities $|V|$ and edges $|E|$ is much larger than the batch size $B$ , and when the term $B^2$ along with $|H|$ and $|T_{\mathcal{R}}|^2$ , is smaller than the sum of entities and edges. This demonstrates the scalability of our method to large-scale and dense graphs, when $|V|$ and $|E|$ are large while we can keep $B$ moderate to maintain small complexity. Regarding effectiveness, it is noted that our method outperforms ACK-MMEA by a large margin (0.867 vs 0.304).
[W2: Phenomenon explanation]: Thank you for your constructive advice. In Figure 4, there is a significant performance gap between our model and MCLEA, so we presume you are referring to MSNEA. When the aligned sample pairs reach 80%, even though their performance on the Hits@10 metric appears to be close, there is still a noticeable difference in MRR and Hits@1. This indicates that MSNEA performs poorly in aligning difficult samples. We will add this content into the revised paper.
[Q1: Modal missingness & MMI effect]: For your concern about the meaning of considering modal missingness, we would like to clarify that KGs in the real world are generally incomplete but rather have missing information. We propose considering modality missingness to more robustly address this situation. The effect of MMI module is related to the extent of data missingness in the dataset. To validate the effectiveness of this module, we conducted ablation experiments as well as experiments with different proportions of missing modalities. As shown in Table 2, MMI shows an obvious improvement of nearly four percentage points in Hits@1 on FB15K-YG15K dataset (w/o MMI: 77.9% vs TMEA: 81.8%). Additionally, as illustrated in Figure 3, our method demonstrates greater robustness: when only 20% of entities have the visual modality, TMEA achieves an MRR exceeding 0.5, while the best baseline falls below 0.4. These results demonstrate the effectiveness of MMI module.

We hope these responses effectively address your concerns. We will make revisions to further clarify these aspects in our revised paper.

2024-08-12

Dear Reviewer qUrF,

Thank you very much for taking the time and making the effort to review our paper. We sincerely appreciate your valuable and constructive feedback.

The author/reviewer discussion stage will be ending soon. We are looking forward to your reply, to wonder whether our detailed response has adequately addressed your concerns. If you have any further questions, we would be honored to address them.

Thank you once again for reviewing our paper.

Best regards,

Authors of Paper 2934

审稿意见

评分: 7置信度: 52024-07-11

This paper addresses the task of multi-modal entity alignment for integrating MMKGs. Existing efforts mostly focus on capturing entity features via diverse modality encoders or fusion methods but face issues with uncertain correspondences. To overcome these challenges, the authors propose a novel method called TMEA. TMEA introduces alignment-augmented abstract representation using a large language model (LLM) and in-context learning to manage diverse attribute descriptions. It also mitigates modality absence by unifying all modality features into a shared latent subspace and generating pseudo features with variational autoencoders (VAEs). Furthermore, it addresses weak inter-modal associations through an inter-modal commonality enhancement mechanism based on cross-attention with orthogonal constraints. Extensive experiments on two real-world datasets demonstrate the effectiveness of TMEA, showing significant improvements over baseline methods.

优点

This paper proposes to address the issues of uncertain correspondences for MMEA, such as weak inter-modal associations, description diversity, and modality absence. It is a novel view to improve MMEA task by exploring the similarities of aligned entities.
This paper proposes a novel method named TMEA. TMEA includes alignment-augmented abstract representation to handle diverse attribute knowledge descriptions, shared latent subspace for missing modality imputation, and inter-modal commonality enhancement mechanism to enhance modality associations.
Extensive experiments on two real-world datasets are conducted to validate the effectiveness of TMEA, showing clear improvements over competitive baselines.

缺点

The contributions of the paper are not explicitly summarized. It appears that the primary contribution is the design of new structures to address uncertain correspondences, rather than focusing on multi-modal encoding.
General models like ChatGPT can sometimes have difficulty understanding the specific language and context within knowledge graphs. Have the authors considered personalized attribute learning methods that could more effectively grasp the semantics for different knowledge graphs?
Could the authors clarify whether they plan to release their code? This would greatly benefit the research community.

问题

Could the authors provide a more explicit summary of the contributions of this paper?
How have the authors considered addressing attribute learning issues in different knowledge graphs, especially those with specialized knowledge?
Could the authors plan to release their code?

局限性

The authors have adequately addressed the limitations.

作者回复

2024-08-06

Many thanks for your valuable feedback on our paper. We appreciate your recognition of the novelty of our research problem and methodology, as well as the robustness of our experiments. In response to your concerns, we would like to address the following points:

[W1 & Q1: Summary of contributions]: Thank you for your insightful suggestion. Below, we provide a more explicit summary of our contributions:
- In this paper, we focus on tackling uncertain correspondences between inter-modal or intra-modal cues of entities for multi-modal entity alignment, including weak inter-modal associations, description diversity, and modality missing.
- We propose a novel TMEA model to address these three issues by specially designing an inter-modal commonality enhancement mechanism based on cross-attention with orthogonal constraints, designing an alignment-augmented abstract representation that incorporates the LLM and in-context learning into attribute alignment, and unifying diverse modalities into a shared latent subspace and generating pseudo features via VAEs according to existing modal features.
- We conduct extensive experiments on two real-world datasets, FB15K-DB15K and FB15K-YG15K. The experimental results clearly demonstrate the effectiveness and superiority of our proposed TMEA, showing a significant improvement over competitive baselines, with at least a 32.8% increase in Hits@1.
Furthermore, we will add this content into the revised paper.
[W2 & Q2: Specialized knowledge]: Regarding your concern about specialized knowledge, we agree that it is indeed an excellent research problem. Currently, general LLMs can handle a lot of specialized knowledge such as law and medicine, but there are still limitations. In future work, we will focus more on the alignment of specialized KGs. In Appendix E, we propose exploring the fine-tuning of open-source LLMs to enhance semantic understanding. However, the main focus of our work remains on addressing the issue of uncertain correspondences in MMEA task.
[W3 & Q3: Code release]: Thank you for your constructive suggestion. To ensure reproducibility and promote research in this field, we plan to release our code. We stated in the checklist that we would release the code upon publication.

We hope these responses effectively address your concerns. We will make revisions to further clarify these aspects in our revised paper.

2024-08-12

Dear Reviewer Sc2m,

Thank you very much for taking the time and making the effort to review our paper. We sincerely appreciate your valuable and constructive feedback.

Thank you once again for reviewing our paper.

Best regards,

Authors of Paper 2934

2024-08-12

Thanks for the authors' response, which has adequately addressed my concerns. With a more explicit summary, the contributions are clearly presented. I think addressing the uncertain correspondences is an interesting idea, and the authors’ ablation results have demonstrated the effectiveness of the technical contributions. I will raise my confidence score.

2024-08-12

Thank you very much for acknowledging our paper! We will carefully incorporate these clarifications and further improve the quality of our paper.

审稿意见

评分: 4置信度: 42024-07-12

This paper addresses the task of aligning entities across multi-modal knowledge graphs (MMKGs). The authors propose a novel method called TMEA to tackle the challenges of uncertain correspondences between inter-modal or intra-modal cues of entities. The TMEA method consists of several key components: 1. Multi-modal Knowledge Encoder: This module encodes relational, attribute, and visual knowledge into preliminary feature representations. It includes an alignment-augmented abstract representation that leverages the Large Language Model for attribute alignment and filtering. 2. Missing Modality Imputation: This module addresses the issue of missing modalities by unifying diverse modalities into a shared latent subspace and generating pseudo features via Variational AutoEncoders. 3. Multi-modal Commonality Enhancement: This module enhances the semantic associations between modalities using a cross-attention mechanism with orthogonal constraints. The experiments validate the effectiveness of the method.

优点

1.The problem addressed in this paper is highly practical and represents a pivotal issue for multi-knowledge graph entity alignment. 2.Extensive expriments. 3.The method is clearly presented.

缺点

the proposed method seems to be a simple combination of the existing modules. Please clarify the challenges and the impact of this paper.
The authors should elaborate on the specific issues that arise when the semantic relationship between different modalities of the same entity is weak.
The alignment-augmented abstract representation is referenced multiple times in the article "to address diverse attribute knowledge descriptions." However, the article does not delve further into this concept or provide a concrete explanation.
The ablation was conducted with only one variable, and the essential components necessary to achieve a significant performance improvement were not identified. It is vital to explore whether each component, such as the loss function or MMI, contributes linearly and significantly enhances the model. The authors should engage in further discussions and experiments.

问题

Please see the weaknesses above.

局限性

YES

作者回复

2024-08-06

Thank you very much for your valuable feedback on our paper. We appreciate your recognition of the significance of our research problem, extensive experiments, and clear presentation of the methodology. In response to your concerns, we would like to address the following points:

[W1: Module combination & Challenges & Impact]: We would like to clarify that our proposed method is not just a simple combination of existing modules and we have multiple innovative designs in our method. In this paper, we focus on tackling uncertain correspondences between inter-modal or intra-modal cues of entities for MMEA, which is a crucial problem that has not been solved in previous work. The challenges include three points, i.e., weak inter-modal associations, description diversity, and modality missing. More details of these challenges are described in the Introduction. Our technical novelty is summarized as follows:
- To address weak semantic associations, we design an inter-modal commonality enhancement mechanism based on cross-attention with orthogonal constraints.
- To handle diverse attribute knowledge descriptions, we design an alignment-augmented abstract representation that incorporates the LLM and in-context learning into attribute alignment and filtering for generating and embedding the attribute abstract.
- To mitigate the impact of the modality absence, we propose to unify diverse modalities into a shared latent subspace and generate pseudo features via VAEs according to existing modal features.
Our experimental results show promising performance, which proves that solving the crucial problem of uncertain correspondences can effectively improve the performance of MMEA.

Impact: Our method can address the issue of uncertain correspondences in MMEA task, which enhances the feature learning of multimodal knowledge and more effectively aligns entities for better integration of MMKGs. This provides a more comprehensive external knowledge base for downstream applications such as recommendation systems and question-answering systems, thereby improving their performance.
[W2: Specific issues about weak semantic associations]: Thank you for your valuable suggestion. We would like to clarify that in multimodal tasks, semantic associations between different modalities are often utilized to mutually enhance features across modalities [1] [2]. If the semantic association is weak, modules designed for interactions between different modalities may negatively impact overall performance, potentially resulting in performance worse than that of a single modality. We will add this content in the revised paper.

[1] Zhang, Yunhua, Hazel Doughty, and Cees Snoek. "Learning unseen modality interaction." NIPS 2023.

[2] Qu, Leigang, et al. "Dynamic modality interaction modeling for image-text retrieval." SIGIR 2021.
[W3: Explanation of diverse attribute knowledge descriptions]: As we have described in lines 65-69, "The distinct descriptive manners of entities across different MMKGs complicate the matching of attributes with the same meanings between aligned entities. For instance, Release Date for Twilight in $MMKG_1$ and Debut Date for Twilight (film) in $MMKG_2$ both mean the initial release date of the movie", "Diverse attribute knowledge descriptions" refers to instances where the same attribute across two KGs is described using different terms, making it challenging for computing word embedding similarity to recognize that these terms denote the same underlying meaning. In addition to the above example, Storage Capacity and Memory Size may also describe the same concept.
[W4: The ablation for the loss function and MMI]: The ablation results for the loss functions (w/o $L_o$ and w/o $L_{mse}$ ) and MMI (w/o MMI) are shown in Table 2. The essential components in TMEA include MMI module (w/o MMI), MCE module (w/o MCE), alignment-augmented abstract representation (w/o AP), orthogonal constraint loss $L_o$ in MCE module (w/o $L_o$ ), MSE loss $L_{mse}$ in MMI module (w/o $L_{mse}$ ), and iterative strategy (w/o IT). In Table 2, we have presented ablation studies to demonstrate the effectiveness of all these components. If there are any further suggestions, we would be glad to supplement the experiments.

We hope these responses effectively address your concerns. We will make revisions to further clarify these aspects in our revised paper.

2024-08-12

Dear Reviewer PXNL,

Thank you very much for taking the time and making the effort to review our paper. We sincerely appreciate your valuable and constructive feedback.

Thank you once again for reviewing our paper.

Best regards,

Authors of Paper 2934

2024-08-14

Dear Reviewer PXNL,

Thank you once again for your dedication to reviewing our paper. The author/reviewer discussion stage will be ending today. We hope our detailed response has adequately addressed your concerns. If any concerns remain, we would be glad to discuss them further. If no concerns remain, we would greatly appreciate it if you could consider raising the score. Thanks!

Best regards,

Authors of Paper 2934

2024-08-10

Dear Reviewer fuge,

It seems that you gave the lowest score and missed some results that already existed in the appendix. Did the author's response address your concerns? Or do you have further comments to discuss with the authors?

Regards,

2024-08-13

Dear Reviewer PXNL,

Could you please have a look at the authors' responses and reply to them accordingly? The author-review discussion will end soon!

Regards,

最终决定Accept (poster)

2024-09-25

This paper examines entity alignments in multi-modal knowledge graphs (MMKGs). It introduces a method called TMEA to address challenges in aligning inter-modal and intra-modal entity cues. The paper received mixed reviews.

After discussions with the SAC, we agreed that the authors presented a simple method yielding significant empirical improvements over the state-of-the-art. While the ablation performance (removing the proposed MMI module within the pipeline) shows 4% contribution in terms of HITS@1, the other metrics improved like MR notably decreased from 32.9 to 26.3. This suggests that the proposed module is effective for handling difficult samples. Although the novelty is somewhat limited, this contribution remains valuable to the community.

Weighing the pros and cons, we recommend accepting the paper as a poster presentation. However, the current version still requires substantial improvements. The authors should reorganize the paper to highlight their contributions and the importance of the performance improvements.