PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
5
4
4
4
3.5
置信度
创新性2.5
质量2.8
清晰度3.0
重要性2.5
NeurIPS 2025

LBMKGC: Large Model-Driven Balanced Multimodal Knowledge Graph Completion

OpenReviewPDF
提交: 2025-05-11更新: 2025-10-29
TL;DR

To tackle the challenges of imbalance and heterogeneity in multimodal knowledge graph completion (MKGC), this paper proposes a novel MMKGC framework that uses a large vision-language model, cross-modal alignment, and adaptive multimodal fusion.

摘要

关键词
Multi-modal Knowledge GraphsKnowledge Graph CompletionCross-modal InteractionLarge Vision-Language Model

评审与讨论

审稿意见
5

This paper introduces LBMKGC, a novel framework for Multimodal Knowledge Graph Completion designed to address three key challenges: modality imbalance, intra-entity modality heterogeneity, and inter-modality information inconsistency. Through evaluations and comparisons with state-of-the-art methods, the effectiveness of LBMKGC is demonstrated.

优缺点分析

Strengths:

The paper is well-written, clearly presenting the motivation and structure of the proposed method. The key contributions of the paper include: introducing the Large model-driven Balanced MMKGC (LBMKGC) for multimodal knowledge graph completion, evaluation of LBMKGC against state-of-the-art MMKGC methods; and a good level of detail is provided on the methodology with the strong performance of the proposed approach.

Weaknesses:

  • The paper restricts itself to text and images. Though this is reasonable given scope, discussions around audio/video modalities and potential future extensions are only briefly mentioned.

  • Despite strong performance margins, the lack of statistical testing (e.g., standard deviations) weakens claims of consistency. Including these is crucial to allow the reader to draw meaningful conclusions from the results.

问题

  • Is it possible to provide more detailed statistics, such as error bars?
  • Are the methods sensitive to the hyperparameters? Lack of sensitivity analysis.

局限性

The limitations are mentioned in the paper in the conclusion section.

最终评判理由

This work is complete, well-written, clear and useful. I think it is to be acceptable.

格式问题

No major formatting issues.

作者回复

Dear reviewer,

Thank you for reviewing our paper and providing valuable feedback. We greatly appreciate your comments and have made detailed responses and revisions based on your suggestions. Below are our specific replies to the issues and recommendations you raised:

(1) Response concerning the further discussion on limitations:

We sincerely appreciate the reviewer’s insightful comments, which have prompted us to provide a more detailed discussion of the audio and video modalities and potential future directions. First, we summarize the primary limitations of our research: our framework currently focuses only on two modalities, i.e., text and images, without incorporating other semantically rich information such as audio and video. For instance, in cases where the entity is an animal, the audio features of its sounds or its dynamic behaviors captured in video are essential for completing related triples. Additionally, existing experiments are conducted on general domain datasets, leaving the adaptability and robustness in specialized domains such as biomedical fields and social networks unvalidated. Next, we explore directions for future research: future efforts can formally integrate audio and video modalities into the multimodal knowledge graph completion system. This includes developing scalable architectures that effectively balance high capacity with low complexity. Moreover, for vertical scenarios like biomedical knowledge graphs and social networks, constructing specialized multimodal benchmarks and evaluation protocols will be essential to advance MMKGC from general modeling to domain-specific applications. We believe that with these additions, our paper will offer a more comprehensive portrayal of the research's depth and breadth, providing readers with a clearer understanding of the study's trajectory and future development avenues. Once again, we thank the reviewer for their suggestions. We remain committed to enhancing the paper's content to meet higher academic standards.

(2) Response concerning the statistical tests (e.g., standard deviation):

We appreciate your insightful feedback. Previously, to ensure stable reproducibility, our experiments were conducted using a fixed random seed. However, based on your suggestion, we recognize that omitting quantitative indicators of result variability, such as standard deviation, can indeed impact the assessment of the method's robustness and the reliability of our conclusions. This represents a shortcoming in the experimental section of the original manuscript, and we apologize for this oversight. To address this issue comprehensively and provide readers with more statistically significant insights, we have conducted five independent experiments on each dataset (MKG-W, MKG-Y, DB15K) without fixing the random seed. The results for each dataset are as follows:

(i) MKG-W (MRR: 0.3846 Hit@1: 0.3120 Hit@3: 0.4178 Hit@10: 0.5146)

1)MRR: 0.386091 Hit@1: 0.314343 Hit@3: 0.417525 Hit@10: 0.516729

2)MRR: 0.384785 Hit@1: 0.312939 Hit@3: 0.417642 Hit@10: 0.516144

3)MRR: 0.385227 Hit@1: 0.313407 Hit@3: 0.417057 Hit@10: 0.519303

4)MRR: 0.386556 Hit@1: 0.315161 Hit@3: 0.419279 Hit@10: 0.516378

5)MRR: 0.384019 Hit@1: 0.311886 Hit@3: 0.417291 Hit@10: 0.515910

STD: 0.001013 STD: 0.001263 STD: 0.000879 STD: 0.001380

(ii) MKG-Y (MRR: 0.4003 Hit@1: 0.3389 Hit@3: 0.4311 Hit@10: 0.5081)

1)MRR: 0.403128 Hit@1: 0.344912 Hit@3: 0.431656 Hit@10: 0.503380

2)MRR: 0.399168 Hit@1: 0.338716 Hit@3: 0.427901 Hit@10: 0.506947

3)MRR: 0.400182 Hit@1: 0.338903 Hit@3: 0.431093 Hit@10: 0.504882

4)MRR: 0.401752 Hit@1: 0.341344 Hit@3: 0.432219 Hit@10: 0.506008

5)MRR: 0.401017 Hit@1: 0.340030 Hit@3: 0.432970 Hit@10: 0.506196

STD: 0.001508 STD: 0.002536 STD: 0.001953 STD: 0.001388

(iii) DB15K (MRR: 0.3723 Hit@1:0.2778 Hit@3: 0.4275 Hit@10: 0.5471)

1)MRR: 0.373437 Hit@1: 0.278732 Hit@3: 0.429105 Hit@10: 0.547566

2)MRR: 0.372852 Hit@1: 0.277419 Hit@3: 0.430014 Hit@10: 0.547667

3)MRR: 0.371596 Hit@1: 0.277166 Hit@3: 0.425823 Hit@10: 0.545698

4)MRR: 0.371958 Hit@1: 0.275449 Hit@3: 0.429004 Hit@10: 0.547566

5)MRR: 0.370715 Hit@1: 0.275651 Hit@3: 0.425823 Hit@10: 0.546051

STD: 0.001065 STD: 0.001356 STD: 0.001984 STD: 0.000954

The experimental data demonstrate that our results exhibit excellent stability. We will highlight quantitative indicators of result variability in the results section. We sincerely appreciate the reviewer’s suggestions and will continue to strive for improvements to achieve higher academic standards in our paper.

(3) Response concerning the hyperparameter sensitivity:

Thank you for your suggestions regarding the analysis of hyperparameter sensitivity. We have conducted experiments on the core hyperparameters (including embedding dimension, number of negative samples, and margin) using the MKG-Y dataset. To ensure the fairness of these experiments, we fixed the random seed at 42 and employed a controlled variable approach for multiple independent experiments. The experimental results are shown below.

(i) Dim (embedding dimension)

When Dim=300:

MRR: 0.3887 Hit@1: 0.3191 Hit@3: 0.4271 Hit@10: 0.5084

When Dim=400:

MRR: 0.3980 Hit@1: 0.3349 Hit@3: 0.4307 Hit@10: 0.5071

When Dim=500:

MRR: 0.4003 Hit@1: 0.3389 Hit@3: 0.4311 Hit@10: 0.5081

When Dim=600:

MRR: 0.3980 Hit@1: 0.3385 Hit@3: 0.4266 Hit@10: 0.5024

When Dim=700:

MRR: 0.3961 Hit@1: 0.3368 Hit@3: 0.4253 Hit@10: 0.5023

(ii) neg_num (number of negative samples)

When neg_num = 32:

MRR: 0.3965 Hit@1: 0.3237 Hit@3: 0.4358 Hit@10: 0.5212

When neg_num = 64:

MRR: 0.3980 Hit@1: 0.3295 Hit@3: 0.4348 Hit@10: 0.5152

When neg_num = 128:

MRR: 0.4003 Hit@1: 0.3389 Hit@3: 0.4311 Hit@10: 0.5081

When neg_num=192:

MRR: 0.3960 Hit@1: 0.3342 Hit@3: 0.4296 Hit@10: 0.5008

(iii) Margin

When Margin=8:

MRR: 0.3856 Hit@1: 0.3286 Hit@3: 0.4163 Hit@10: 0.4818

When Margin=12:

MRR: 0.3912 Hit@1: 0.3321 Hit@3: 0.4223 Hit@10: 0.4869

When Margin=16:

MRR: 0.3954 Hit@1: 0.3355 Hit@3: 0.4275 Hit@10: 0.4936

When Margin=20:

MRR: 0.3981 Hit@1: 0.3376 Hit@3: 0.4294 Hit@10: 0.5032

When Margin=24:

MRR: 0.4003 Hit@1: 0.3389 Hit@3: 0.4311 Hit@10: 0.5081

The experimental results indicate that while there is some variability in the outcomes under different hyperparameter settings, this variability is minor and does not significantly impact the model's performance. Additionally, we plan to carry out further experiments about hyperparameter sensitivity over the remaining datasets and include these findings in the revised manuscript. We sincerely appreciate the reviewer’s valuable comments, which help ensure that readers can thoroughly assess the stability of our method.

评论

Dear Authors, thanks for your extend experiments. I have read it. I have no further questions.

评论

Thank you for your response. We sincerely appreciate your valuable feedback once again!

审稿意见
4

This paper presents LBMKGC, a multimodal knowledge graph completion framework targeting key challenges such as modality imbalance, heterogeneity, and semantic inconsistency. LBMKGC employs the Stable Diffusion XL to augment the imbalanced information across modalities. LBMKGC aligns the multimodal embeddings of entities semantically by using the CLIP. Comprehensive experiments on 21 benchmarks show the effectiveness.

优缺点分析

Strengths:

  1. This paper is well writing and easy to read.
  2. The perceptual-conceptual dual-branch fusion module is innovative and contributes to improved semantic coherence.

Weaknesses:

  1. In the LLM-Based Modality Completion section, it appears that the main contributions largely rely on the existing work of SDXL (Stable Diffusion XL). The authors should more clearly articulate the innovations and improvements this paper makes upon prior work to help readers understand its original contributions.

  2. This paper lacks an analysis or empirical evaluation of time complexity, which should be included to better assess the method's efficiency.

  3. While the paper emphasizes modality modeling and fusion, it pays relatively little attention to structural modeling of the KG. The influence of graph structure on the completion task is underexplored.

  4. The authors have provided a non-anonymous GitHub link in the abstract, which appears to violate the anonymity guidelines. While I do not wish for this paper to be desk rejected because of it, it should be addressed.

问题

See Strengths And Weaknesses.

局限性

See Strengths And Weaknesses.

最终评判理由

The authors addressed my concerns. I will maintain my score of 4.

格式问题

None

作者回复

Dear reviewer,

Thank you for reviewing our paper and for your valuable feedback. We greatly appreciate your insights and have made detailed responses and revisions based on your comments. Below are our specific responses to the issues and suggestions you raised:

(1) Response regarding the concern that the main contributions seem to heavily rely on existing SDXL work:

We appreciate the reviewer's attention to this aspect. It is important to clarify that the SDXL model itself, as a powerful open-source generative foundation, has not had its core architecture modified in our work. We fully agree with the reviewer’s point that our original contributions need to be more clearly defined. The core innovation of our work lies not in improving the internal mechanisms of SDXL, but in the systematic application and integration of it for the specific task of multimodal knowledge graph completion. Specifically: (i) In LLmMC (LLM-Based Modality Completion) module: We are the first to introduce an advanced generative vision model, i.e., Stable Diffusion XL (SDXL), into the task of multimodal knowledge graph completion (MMKGC). The core innovation of this module lies in its ability to explicitly address the imbalance of cross-entity modal information, as opposed to the conventional reliance on random initialization. (ii) In CMoA (Cross-Modality Alignment) module: To ensure a fair evaluation of model performance and to facilitate meaningful comparisons with baseline approaches, we utilized the pre-trained models commonly used in existing MMKGC methods for extracting visual and textual features. However, current methods tend to overlook the modality gap issue during multimodal feature extraction, failing to construct a coupled representation space that enables effective cross-modal interaction. This limitation hinders downstream tasks from fully capturing the deep semantic associations between modalities. Within the LBMKGC framework, we innovatively apply these techniques. By adopting a contrastive-learning-based approach, BERT and ViT are jointly leveraged to extract multimodal features, ensuring that representations from different modalities are aligned within a unified semantic space. (iii) In CGuAF (Context-Guided Adaptive Fusion) module: Traditional MMKGC methods typically rely on simple concatenation or averaging operations to fuse the multimodal information of entities. Unlike traditional methods, our approach distinguishes between the perceptual and conceptual attributes of entities. On the other hand, we incorporate relational context information to dynamically adjust the weight distribution between structural and multimodal information during the adaptive fusion process. Finally, experimental results demonstrate that our proposed LBMKGC framework achieves state-of-the-art performance across various datasets and scenarios when compared with 21 state-of-the-art baseline algorithms.

(2) Response concerning the analysis or empirical evaluation of time complexity:

We sincerely thank the reviewer for pointing out this oversight, as efficiency evaluation is indeed crucial for assessing the practicality of the method. We have conducted time efficiency experiments on the DB15K dataset using an NVIDIA GeForce 4090 GPU, employing the LBMKGC, TBKGC, QEB, RSME, and IKRL methods. The results are as follows: LBMKGC: 7h53min; TBKGC: 5h1min; QEB: 8h34min; RSME: 5h38min; IKRL: 3h51min. The experimental results indicate that although LBMKGC reduces training speed slightly, this deceleration is within an acceptable range and leads to substantial performance improvements. The increased computational time is primarily due to the additional processing required for assigning weights to the text and image modalities, as well as to the graph structural information regulated by relational context. Overall, the LBMKGC model achieves a commendable balance between efficiency, performance, and stability. Additionally, we will carry out further experiments on runtime efficiency on other datasets and include the results in the revised manuscript. We are grateful for the reviewer's insightful comments, as we are confident that these enhancements will enable readers to thoroughly evaluate the efficiency of our method.

(3) Response concerning the exploration of the impact of graph structure on completion tasks:

Thank you for appointing this. In this paper, we employ RotatE to learn graph structural information. The core idea of RotatE is to represent relations as rotational operations on the entity embeddings in the complex vector space. It captures implicit topological structure information through the intricate interactions between entities and relations. Moreover, graph structural information is crucial for our model. During the adaptive fusion process, we innovatively incorporate relational contextual information to dynamically adjust the weight distribution between structural and multimodal information. Furthermore, the efficacy of graph structure utilization is detailed in our ablation studies. When graph structural information is omitted in the LBMKGC, the hit@1 metric significantly drops: from 27.78 to 16.96 on the DB15K dataset, from 31.20 to 23.84 on the MKG-W dataset, and from 33.89 to 29.33 on the MKG-Y dataset. This clearly demonstrates the essential role of graph structural information in our LBMKGC framework.

(4) Response concerning the non-anonymous GitHub link in the abstract:

Thank you very much for pointing out this oversight. The link was originally intended to share our research's code and resources openly. We have promptly removed the link in the revised manuscript to fully comply with the double-blind review requirements. We appreciate the reviewer’s understanding and patience regarding this matter.

评论

Thank you for your thoughtful and detailed response. I will maintain my score of 4.

评论

Thank you for your response. We sincerely appreciate your valuable feedback once again!

审稿意见
4

This paper introduces LBMKGC, a benchmark and evaluation suite aimed at assessing large language models (LLMs) for knowledge graph completion (KGC) tasks. The framework covers various KGC paradigms—link prediction, triple classification, and query answering—across multiple datasets. It further dissects performance by task format (e.g., prompt-based vs. scoring-based) and by evaluation granularity (e.g., head/tail/generalized split). The authors compare LLMs such as GPT and GLM with traditional structure-based KGC models.

优缺点分析

Strengths a) Evaluating LLMs for KGC is important as LLMs increasingly replace specialized symbolic systems. b) The benchmark breaks down KGC evaluation across multiple axes, which offers useful diagnostic insights. c) The authors run consistent experiments comparing LLMs (in zero-shot or few-shot) with structure-based methods.

Weaknesses a) The paper uses GLM and ChatGPT, but there is no comparison with strong open-source instruction-tuned LLMs (e.g., LLaMA-2-chat, Mistral, Claude) or modern prompting strategies (e.g., CoT, self-consistency). b) Current tasks focus heavily on triple completion/classification. However, real-world KGs involve complex reasoning. c) Prompt design, sampling strategy, and evaluation conditions (e.g., few-shot template consistency) can strongly affect LLM performance.

问题

a) The paper uses GLM and ChatGPT, but there is no comparison with strong open-source instruction-tuned LLMs (e.g., LLaMA-2-chat, Mistral, Claude) or modern prompting strategies (e.g., CoT, self-consistency). b) Current tasks focus heavily on triple completion/classification. However, real-world KGs involve complex reasoning. c) Prompt design, sampling strategy, and evaluation conditions (e.g., few-shot template consistency) can strongly affect LLM performance.

局限性

yes

最终评判理由

I have read other reviewers comments, so I would like to change the score.

格式问题

No Formatting Concerns

作者回复

Dear reviewer,

We sincerely appreciate the time and effort you have dedicated to reviewing our manuscript. However, upon thorough examination, it seems that the comments you provided do not align with the content of our submission. We suspect these comments might have been inadvertently linked to another submission.

The detailed comments are as follows: (i) The framework covers various KGC paradigms—link prediction, triple classification, and query answering—across multiple datasets. It further dissects performance by task format (e.g., prompt-based vs. scoring-based) and by evaluation granularity (e.g., head/tail/generalized split); (ii) The authors compare LLMs such as GPT and GLM with traditional structure-based KGC models. (iii) The authors run consistent experiments comparing LLMs (in zero-shot or few-shot) with structure-based methods. Since these comments do not pertain to our manuscript, we would appreciate your confirmation regarding this matter.

To save your precious time, please allow us to reiterate our key contributions:

Specifically: (i) In LLmMC (LLM-Based Modality Completion) module: For the first time, we introduce the advanced generative vision model Stable Diffusion XL into multimodal knowledge-graph completion. The core innovation lies in explicitly rectifying cross-entity modal imbalance instead of relying on conventional random initialization, thereby delivering semantically richer and more accurate completions for subsequent alignment and fusion. (ii) In CMoA (Cross-Modality Alignment) module: Guided by contrastive learning, we jointly leverage BERT and ViT to extract multimodal features, ensuring that representations from different modalities are aligned in a unified semantic space. This effectively alleviates intra-entity heterogeneity and markedly enhances the capture of cross-modal associations. (iii) In CGuAF (Context-Guided Adaptive Fusion) module: Whereas prior methods often resort to simple concatenation or averaging, we innovatively incorporate relational context and distinguish between perceptual and conceptual entity attributes to dynamically re-weight modalities. Experimental results demonstrate that our proposed LBMKGC framework achieves state-of-the-art performance across various datasets and scenarios when compared with 21 state-of-the-art baseline algorithms.

审稿意见
4

In this paper, the authors propose a novel Large model-driven Balanced Multimodal Knowledge Graph Completion framework, termed LBMKGC. Initially, LBMKGC employs the Stable Diffusion XL (a large language model) to augment the imbalanced information across modalities. Subsequently, to bridge the semantic gap between heterogeneous modalities, LBMKGC aligns the multimodal embeddings of entities semantically by using the CLIP (Contrastive Language-Image Pre-Training) model. Furthermore, by distinguishing between the perceptual and conceptual attributes of entities, LBMKGC learns entity representations through the adaptive fusion of embeddings from various modalities, guided by structural information. The authors show comprehensive experiments and provide source code of their work for reproducibility.

优缺点分析

Strength: The problem is well motivated with the sections of (1) The imbalance of inter-entity information across different modalities and (2)The heterogeneity of intra-entity multimodal information, with the contributions clearly highlighted. The problem formulation along with knowledge graph completion is clearly identified. Figure 2 is highly informative. Experiments are comprehensive and show solid performance.

Weakness: It is unclear to me about the novelty of the proposed architecture. It seems that it is a combination of already proposed techniques e.g., Bert, Vision Transformer, MLP etc. The authors need to more clearly highlight this in their work.

The Related Work is missing a recent work on KGC, which the authors should reference for completeness and recency.

[KDD 2022] Dual-Geometric Space Embedding Model for Two-View Knowledge Graphs. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22). Association for Computing Machinery, New York, NY, USA, 676–686. https://doi.org/10.1145/3534678.3539350

问题

It is unclear to me about the novelty of the proposed architecture. It seems that it is a combination of already proposed techniques e.g., Bert, Vision Transformer, MLP etc. The authors need to more clearly highlight this in their work.

The Related Work is missing a recent work on KGC, which the authors should reference for completeness and recency.

局限性

The authors have provided some discussion on this as part of their Conclusions section, though this can be elaborated upon in its own section.

最终评判理由

Thank you for the insightful comments by the authors. Upon reviewing the new results and explanations on novelty, I have decided to raise my score.

格式问题

N/A

作者回复

Dear reviewer,

We sincerely appreciate your thorough review and insightful comments on our paper. Your feedback is highly valued, and we have carefully considered your suggestions. As a result, we have made detailed responses and revisions accordingly. Below, we provide our specific responses to the issues and recommendations you have raised:

(1) Further elaboration on the novelty of the proposed architecture:

While components such as BERT, ViT, and MLP are widely utilized across various domains, their integration and synergy within the LBMKGC framework are unique. Specifically: (i) In LLmMC (LLM-Based Modality Completion) module: We are the first to introduce an advanced generative vision model, i.e., Stable Diffusion XL (SDXL), into the task of multimodal knowledge graph completion (MMKGC). The core innovation of this module lies in its ability to explicitly address the imbalance of cross-entity modal information, as opposed to the conventional reliance on random initialization. (ii) In CMoA (Cross-Modality Alignment) module: To ensure a fair evaluation of model performance and to facilitate meaningful comparisons with baseline approaches, we utilized mainstream pre-trained models commonly used in existing MMKGC methods for extracting visual and textual features. However, current methods tend to overlook the modality gap issue during multimodal feature extraction. This limitation hinders downstream tasks from fully capturing the deep semantic associations between modalities. Within the LBMKGC framework, we innovatively apply these techniques. By adopting a contrastive-learning-based approach, BERT and ViT are jointly leveraged to extract multimodal features, ensuring that representations from different modalities are aligned within a unified semantic space. (iii) In CGuAF (Context-Guided Adaptive Fusion) module: Traditional MMKGC methods typically rely on simple concatenation or averaging operations to fuse the multimodal information of entities. Unlike traditional methods, our approach distinguishes between the perceptual and conceptual attributes of entities and innovatively incorporates relational context information during the fusion process. Consider the triplet <Snowman, State, Still> as an example. Here, "Snowman" is a perceptible and concrete entity, making the visual modality particularly discriminative. As a result, we assign a higher weight to the image modality during feature fusion for entity "Snowman". Conversely, "Still" is a conceptual entity relying more on symbolic logical reasoning from textual data. Therefore, we assign a higher weight to the text modality during feature fusion for it. On the other hand, we incorporate relational context information to dynamically adjust the weight distribution between structural and multimodal information during the adaptive fusion process. Taking the triplet 〈Snowman, State, Still〉 as an example: since “State” is an abstract relation whose semantics hinge on logical associations between entities, we assign higher weight to the structural information of “Snowman.” This allows the model to more precisely interpret the relation “State” and thereby capture the abstract concept “Still.” Conversely, in the triplet 〈Snowman, Color, White〉, “Color” is a concrete relation. Here, textual, visual, and other multimodal signals provide the model with more discriminative cues than structural information alone. Therefore, we increase the weight given to these multimodal features of “Snowman,” enabling the model to accurately grasp the relation “Color” and the specific concept “White.”

Experimental results demonstrate that our proposed LBMKGC framework achieves state-of-the-art performance across various datasets and scenarios when compared with 21 state-of-the-art baseline algorithms.

Meanwhile, LBMKGC exhibits outstanding algorithmic stability. Specifically, we have conducted five independent experiments on each dataset (MKG-W, MKG-Y, DB15K) without fixing the random seed. The results for each dataset are as follows:

(i) MKG-W (MRR: 0.3846 Hit@1: 0.312 Hit@3: 0.4178 Hit@10: 0.5146)

1)MRR: 0.386091 Hit@1: 0.314343 Hit@3: 0.417525 Hit@10: 0.516729

2)MRR: 0.384785 Hit@1: 0.312939 Hit@3: 0.417642 Hit@10: 0.516144

3)MRR: 0.385227 Hit@1: 0.313407 Hit@3: 0.417057 Hit@10: 0.519303

4)MRR: 0.386556 Hit@1: 0.315161 Hit@3: 0.419279 Hit@10: 0.516378

5)MRR: 0.384019 Hit@1: 0.311886 Hit@3: 0.417291 Hit@10: 0.515910

STD: 0.001013 STD: 0.001263 STD: 0.000879 STD: 0.001380

(ii) MKG-Y (MRR: 0.4003 Hit@1: 0.3389 Hit@3: 0.4311 Hit@10: 0.5081)

1)MRR: 0.403128 Hit@1: 0.344912 Hit@3: 0.431656 Hit@10: 0.503380

2)MRR: 0.399168 Hit@1: 0.338716 Hit@3: 0.427901 Hit@10: 0.506947

3)MRR: 0.400182 Hit@1: 0.338903 Hit@3: 0.431093 Hit@10: 0.504882

4)MRR: 0.401752 Hit@1: 0.341344 Hit@3: 0.432219 Hit@10: 0.506008

5)MRR: 0.401017 Hit@1: 0.340030 Hit@3: 0.432970 Hit@10: 0.506196

STD: 0.001508 STD: 0.002536 STD: 0.001953 STD: 0.001388

(iii) DB15K (MRR: 0.3723 Hit@1: 0.2778 Hit@3: 0.4275 Hit@10: 0.5471)

1)MRR: 0.373437 Hit@1: 0.278732 Hit@3: 0.429105 Hit@10: 0.547566

2)MRR: 0.372852 Hit@1: 0.277419 Hit@3: 0.430014 Hit@10: 0.547667

3)MRR: 0.371596 Hit@1: 0.277166 Hit@3: 0.425823 Hit@10: 0.545698

4)MRR: 0.371958 Hit@1: 0.275449 Hit@3: 0.429004 Hit@10: 0.547566

5)MRR: 0.370715 Hit@1: 0.275651 Hit@3: 0.425823 Hit@10: 0.546051

STD: 0.001065 STD: 0.001356 STD: 0.001984 STD: 0.000954

(2) The addition of related work:

Thank you very much for pointing out the omission in our related work section. The "Dual-Geometric Space Embedding Model for Two-View Knowledge Graphs" is indeed a significant contribution in this field, introducing an innovative dual-geometric space embedding model (DGS) specifically designed to handle knowledge graphs with two views. We deeply appreciate this contribution, and to ensure the comprehensiveness and currency of our research, we have incorporated a citation to this work in the "Related Work" section along with a detailed discussion in the text. We believe that this comprehensive literature review and discourse will allow our paper to more effectively demonstrate its advancements and contributions relative to existing research. Thank you again for your invaluable feedback, and we remain committed to enhancing the quality and depth of our research.

(3) Further Discussion on Limitations:

We sincerely appreciate the reviewer’s insightful feedback. We have recognized the importance of discussing research limitations and future directions. Therefore, we have added a specific "Limitations and Future Work" paragraph in the Conclusion section, where we detail the current limitations of our study and potential avenues for future research. We begin by outlining the primary limitations: our study currently focuses on text and image modalities, without integrating semantically rich information such as audio and video into a unified framework. Additionally, our experiments have been conducted exclusively on general domain datasets, leaving the adaptability and robustness in specialized fields like biomedical data and social networks unverified. Following this, we propose directions for future research, including the formal integration of audio and video modalities into the multimodal knowledge graph completion framework. For specialized scenarios, such as biomedical knowledge graphs and social networks, it is crucial to establish tailored multimodal benchmarks and evaluation protocols to advance MMKGC from generic modeling to domain-specific applications. By including this dedicated section, we believe our paper more comprehensively demonstrates the research's depth and breadth, offering readers a clearer trajectory and directions for future development. We once again thank the reviewer for their valuable suggestions and remain committed to continually refining the paper to achieve higher academic standards.

评论

Thank you for your response. We sincerely appreciate your valuable feedback once again!

We would like to know if you have any additional concerns or suggestions regarding our work. If possible, we hope to engage in further technical communication to earn your endorsement.

评论

Dear AYg4,

We'd love to hear your thoughts on the rebuttal. If the authors have addressed your rebuttal questions, please let them know. If the authors have not addressed your rebuttal questions, please inform them accordingly.

Thanks, Your AC

最终决定

This paper proposes LBMKGC, a novel large-model-driven framework for multimodal knowledge graph completion (MMKGC). Extensive experiments across 21 baselines and three datasets demonstrate state-of-the-art performance with good efficiency and generalizability. The paper presents a clearly defined problem and proposes the LBMKGC framework, which innovatively integrates existing techniques to systematically address modality imbalance and heterogeneity in multimodal knowledge graph completion. In their rebuttal, the authors thoroughly responded to all reviewer concerns by providing extensive additional experiments—including five independent runs, runtime analysis, hyperparameter sensitivity studies, and ablation experiments— and expanded discussions. These revisions led to consistently improved or maintained scores from three reviewers. Overall, the work demonstrates significant performance gains, reproducibility, and practical value, warranting acceptance.