Uncertain Knowledge Graph Completion via Semi-Supervised Confidence Distribution Learning
摘要
评审与讨论
The paper addresses the challenge of imbalanced confidence distributions in Uncertain Knowledge Graphs (UKGs), where high-confidence triples dominate, leading to suboptimal embeddings for UKG completion. The authors propose ssCDL, a semi-supervised method combining Confidence Distribution Learning (CDL) and Meta Self-Training. Experiments on NL27k and CN15k show ssCDL outperforms SOTA baselines in both confidence prediction and link prediction tasks. Ablation studies further confirm the effectiveness of the proposed model.
优缺点分析
Strengths: • Quality: The model design is technically sound. CDL effectively captures uncertainty via confidence distributions, and meta self-training robustly handles pseudo-labels. Also, experiments conducted on confidence prediction and link prediction prove the effectiveness of the model. • Clarity: The paper is well written. The structure is clear and easy to follow. • Significance: This paper addresses a meaningful problem, confidence imbalance, in UKG completion, improving robustness for low-confidence triples. • Originality: The paper based on novel integration of confidence distributions and meta self-training for UKGs.
Weakness: • Quality: The evaluation metrics are relatively limited. For example, link prediction only uses Hits@1 and WMRR. It is recommended to include commonly used metrics such as Hits@3 and Hits@10. • Significance: The proposed model still relies on initial training with existing high-confidence samples. It would be valuable to further explore its performance under zero-shot or few-shot supervision settings.
问题
Please refer to "weakness".
局限性
yes
格式问题
None
We thank the reviewer for the valuable comments, and will take them into account to improve our paper. Our responses are as follows:
W1: In this paper, we only use Hits@1 and WMRR as our evaluation metrics for link prediction. Hits@k focuses on whether the correct entity is among the top k predicted entities, and Hits@1 is the most strict evaluation metric among all Hits@k metrics, as it measures whether the predicted entity reaches the highest rank. Therefore, in real-world applications, Hits@1 is a more practical metric compared to Hits@k with larger k values. WMRR means the weighted average multiplicative inverse of the ranks for all correct tail entities, and it is a more comprehensive metric that focuses more on the impact of entities ranked higher on link prediction. These two metrics measure different aspects, and are complementary and representative. In fact, we have also evaluated our method and baselines using other Hits@k metrics, and we present the results on Hits@3 and Hits@5 in Table 1, which also demonstrates that our method outperforms all baselines. This fully validates the effectiveness of our method, and we will include additional Hits@k results in the final version if space allows.
Table 1. The comparison results between ssCDL and baselines on NL27k and CN15k for link prediction, evaluated by Hits@3 and Hits@5.
| Dataset | NL27k | CN15k | ||
|---|---|---|---|---|
| Metric | Hits@3 | Hits@5 | Hits@3 | Hits@5 |
| UKGE | 0.597 | 0.651 | 0.146 | 0.182 |
| UKGE | 0.595 | 0.655 | 0.132 | 0.172 |
| BEUrRE | 0.307 | 0.366 | 0.175 | 0.223 |
| PASSLEAF | 0.676 | 0.724 | 0.178 | 0.228 |
| PASSLEAF | 0.703 | 0.746 | 0.223 | 0.268 |
| PASSLEAF | 0.706 | 0.737 | 0.167 | 0.223 |
| UKGsE | 0.062 | 0.080 | 0.006 | 0.010 |
| UPGAT | 0.654 | 0.702 | 0.168 | 0.215 |
| ssCDL | 0.760 | 0.809 | 0.246 | 0.275 |
W2: Our work focuses on UKG completion, a scenario where exists a large amount of high-confidence labeled data, i.e., existing triples with high confidences. We believe that leveraging these labeled data will lead to better performance. It is true that there exists few-shot or zero-shot scenarios at certain confidences. In fact, both the introduced confidence distribution learning and pseudo confidence distribution generator can address the few-shot and zero-shot issues to some extent. Moreover, few-shot or zero-shot settings are more common in another relevant task, i.e., computing triple confidences for deterministic KGs, and we also plan to explore few-shot or zero-shot strategies in this task for our future work.
ssCDL addresses confidence imbalance in Uncertain Knowledge Graphs by transforming triple confidences into distributions, introducing semi-supervised learning with pseudo-labels to augment training data. It combines relational embedding learning with a meta-learning-based pseudo-confidence generator to iteratively refine embeddings using both labeled and unlabeled triples. Experiments claim consistent SOTA performance on UKG benchmarks, though technical novelty and cost-benefit trade-offs require scrutiny.
优缺点分析
Strengths:
-
The paper identifies confidence distribution skew as a legitimate challenge in UKG embedding learning, an underexplored issue in prior work.
-
The core idea of representing confidences as continuous distributions rather than discrete values introduces useful regularization that may enhance model robustness.
-
The semi-supervised framework leverages unlabeled data resourcefully through pseudo-labeling, offering a promising approach to data augmentation.
Drawbacks:
-
The claim that neighboring confidences share transferable features lacks empirical validation. Confidence scores typically reflect extraction certainty or source reliability—not semantic relatedness—making spatial correlation between confidence values conceptually questionable.
-
Performance gains over baselines are minimal yet incur substantial overhead. Iterative pseudo-label refinement requires multiple training passes. And meta-learning introduces additional optimization loops.
-
The strong performance of ssCDL w/o CDL suggests distributional confidence encoding—not the full architecture—drives improvements. The value of meta-learned pseudo-labels remains unproven, while a deeper analysis of distributional confidence encoding would be more insightful.
问题
Please refer to the Weaknesses.
局限性
Yes
最终评判理由
The paper addresses the novel and underexplored problem of confidence distribution skew through a promising approach using continuous distributions. The author's response has addressed my concerns. While performance gains are marginal given the additional overhead, the cost remains acceptable in practical offline applications.
格式问题
N/A
We thank the reviewer for the valuable comments, and will take them into account to improve our paper. Our responses are as follows:
W1: As the reviewer pointed out, confidence can reflect extraction certainty or source reliability, but according to the definition [1] used in UKGs, confidence actually represents the likelihood of the relation fact to be true. Thus, similar confidences suggest that the feature representations on modeling the likelihood for triples to be true should also be similar. This is why we argue that neighboring confidences share transferable features. This idea is inspired by the use of label distribution learning in facial age estimation. Geng et al. [2] transformed each face image into an example associated with a label distribution consisting of age labels, with the assumption that the facial appearances of a person at close ages (e.g., 35, 36, and 37) tend to be similar. Similarly, in UKGs, triples with close confidences should exhibit similar features that encode this “truthfulness”. Conversely, if the confidences of two triples differ significantly, it suggests a difference in the features that support their likelihood of being true. This is why we transform confidence into confidence distribution and introduce confidence distribution learning, and its effectiveness is demonstrated by our ablation study in Sec. 5.4.
W2: Actually, according to the previous studies on UKG completion, our method achieves notable improvements over the state-of-the-art baselines. For instance, on NL27k, PASSLEAF [3] reduces the MSE and MAE of UKGE [1] by 20.7% and 15.0%, respectively, while our method achieves reductions of 52.6% in MSE and 17.6% in MAE compared to the best-performing baseline. These results demonstrate that the performance improvement is not marginal in the field of UKG completion. The method we propose indeed introduces additional computational cost due to its complexity. However, since it is designed for offline rather than online UKG completion, we think this additional cost remains acceptable in practical applications. In this work, efficiency is not the primary focus, but we acknowledge its significance and we plan to explore optimizations on efficiency in real-world large-scale UKG completion for our future work.
W3: We have conducted ablation study by removing meta self-training (denoted as w/o mst), as shown in Sec 5.4. The results indicate that the pseudo labeled data generated by pseudo confidence distribution generator (PCDG) assist ssCDL for better UKG completion. However, the influence is relatively smaller compared to that of removing confidence distribution learning (CDL). This is because our PCDG is primarily designed to generate pseudo confidence distributions for unlabeled data obtained through negative sampling. Since such data are often associated with low confidences, our PCDG focuses on enhancing the performance of our method on low-confidence triples in the tasks of confidence prediction and link prediction. However, the test sets contain relatively few low-confidence triples, which limits the potential impact of PCDG in this setting. In contrast, CDL enhances the method’s ability to handle both high-confidence and low-confidence triples, thereby providing more effective performance improvements.
[1] Xuelu Chen, Muhao Chen, Weijia Shi, Yizhou Sun, and Carlo Zaniolo. Embedding Uncertain Knowledge Graphs. In Proc. of AAAI, volume 33, pages 3363–3370, 2019.
[2] Xin Geng, Chao Yin, and Zhi-Hua Zhou. Facial Age Estimation by Learning from Label Distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:2401–2412, 2013.
[3] Zhu-Mu Chen, Mi-Yen Yeh, and Tei-Wei Kuo. PASSLEAF: A Pool-bAsed Semi-Supervised LEArning Framework for Uncertain Knowledge Graph Embedding. In Proc. of AAAI, volume 35, pages 4019–4026, 2021.
Thank you for your detailed response. I'll maintain my current score.
We appreciate your valuable feedback. Thanks again for helping us improve this paper.
This paper proposes to use semi-supervised confidence distribution learning for uncertain knowledge graph completion, which tries to alleviate the problem of extremely imbalanced distributions of triple confidences. The proposed method ssCDL has a very interesting idea that transforms a triple confidence to a confidence distribution to introduce more supervision information of diverse confidences. Besides, meta self-training is applied to generate reliable confidences for unlabeled data so that reinforcement with unlabeled data could be achieved for UKG completion. Experiments show that ssCDL outperforms the state-of-the-art baselines in both tasks of link prediction and confidence prediction. The ablation study and low-confidence triple analysis also provide evidences to demonstrate the effectiveness of specialized design of ssCDL.
优缺点分析
Strengths:
-
The paper is the first work to propose the problem of imbalanced distributions of triple confidences in UKG completion. It presents a novel semi-supervised method, ssCDL, which integrates meta self-training and confidence distribution learning to obtain high-quality embeddings for UKG completion under the imbalanced confidence distribution in UKGs.
-
This paper extends the idea of label distribution learning to confidence distribution learning for UKG embedding learning. Such an idea skillfully leverages confidence distribution to reinforce the learning process with existing labeled data.
-
This paper presents a comprehensive set of experiments, including comparison with baselines, ablation studies, analysis of low-confidence triples, and parameter sensitivity analysis, which collectively provide strong evidences for the effectiveness and superiority of the proposed ssCDL.
-
This paper is clearly structured and well-written, with a logical flow that makes it easy for readers to follow the motivation, methodology, and experimental results.
-
This paper has released the source code and provided a corresponding README file for guidance, ensuring the reusability of the proposed method.
Weaknesses:
-
The differences of the datasets used in experiments are unclear. I understand that they are widely used benchmarks, but we need more details to show the experiments on such datasets are rational, and the results do make sense.
-
In ablation study, with the designed meta self-training strategy, the improvement of the performance in both tasks on both datasets is not that significant. Could you please explain more on this point?
问题
See weaknesses.
局限性
yes
格式问题
NO
We thank the reviewer for the valuable comments, and will take them into account to improve our paper. Our responses are as follows:
W1: The benchmarks CN15k and NL27k used in our experiments exhibit distinct characteristics in terms of relation diversity, graph density, and confidence distribution. Specifically, CN15k contains only 36 relations, while NL27k includes 404 relations, presenting a much broader relational space. This difference enables us to evaluate whether the proposed method can generalize across knowledge graphs with both low and high relational diversity. In terms of graph density, CN15k includes 241,158 quadruples over 15,000 entities, whereas NL27k has 175,412 quadruples over 27,221 entities. These differences indicate that CN15k is relatively denser for each entity, while NL27k is much sparser. Such structural diversity allows us to examine the performance of our method under both dense and sparse conditions. For confidence distribution, both datasets are dominated by high-confidence triples, but they differ in terms of the average confidence and the concentration range of confidences. Specifically, the average confidence of CN15k is 0.629, with most confidences concentrated in the range of [0.7, 0.8]. NL27k has a higher average confidence of 0.797, with most confidences falling within [0.9, 1.0]. Both datasets thus provide varied confidence characteristics for evaluation. Therefore, using both datasets enables a more comprehensive evaluation of the proposed method.
W2: As the reviewer pointed out, the impact of meta self-training is not that significant, as shown in Sec. 5.4. In meta self-training, we design pseudo confidence distribution generator to generate pseudo confidence labels for unlabeled data and feed the pseudo labeled data to confidence distribution learning (CDL) based relational learner. Since such unlabeled data (generated by negative sampling) are often associated with low confidences, meta self-training mainly improves the performance of our method on low-confidence triples. However, due to the test sets contain relatively few low-confidence triples, the improvement of meta self-training is not that significant compared with CDL under this dataset setting.
The response has addressed my concern. I will keep my score.
We appreciate your valuable feedback. Thanks again for helping us improve this paper.
Uncertain knowledge graph(UKG) represents the uncertainty of knowledge, and uncertain knowledge graph completion is the task of reasoning over UKG. This paper notices the fact of extreme imbalance of triple-confidence scores(most stored facts have very high confidence, while low-confidence facts are rarely kept) in KGs like Nell. Then this paper propose ssCDL (Semi-supervised Confidence-Distribution Learning), a two-component framework: 1.CDL-RL converts confidence into a confidence distribution; 2.PCDG that assigns pseudo confidence distributions to negative-sampled triples. Empirical results the effectiveness of this method.
优缺点分析
Strengths:
-
The finding of the imbalance of triple-confidence scores and the design of ssCDL with its two component is novel. The motivation is reasonable and the methodology makes sense.
-
The semi-supervised meta self-training is described clearly and succeeded in using unlabeled data to rebalance the data. This part of paper is presented clearly and easy to follow.
-
The experiment shows SOTA result, with considerable improvement against the baseline.
Weakness:
-
Only two medium-sized datasets are included in the experiment, I wonder whether it can include especially web-size UKGs.
-
The computation of and should use equation environment.
问题
See weekness.
局限性
Yes.
最终评判理由
I am basically content with the author response. Though the paper can be further improved in aspects like complexity discussion and writing clarity, it's generally ok and I will maintain my positive rating.
格式问题
No.
We thank the reviewer for the valuable comments, and will take them into account to improve our paper. Our responses are as follows:
W1: We chose NL27k and CN15k because they are widely used benchmarks in previous studies on UKG completion, such as UKGE [1], PASSLEAF [2], and BEUrRE [3]. We agree that testing on web-scale UKGs would further validate the robustness of our method. However, to the best of our knowledge, no publicly available web-scale UKG benchmarks currently exists. A benchmark should support fair, repeatable, and meaningful comparison, which requires careful construction, annotation, and validation. Creating such a benchmark is a time-consuming process, typically involving manual or semi-automated labeling. Thus, it is hard for us to build a web-scale UKG benchmark and conduct comparison experiments in a short period of time, but we recognize its importance for better evaluation in the research of UKG completion. In our future work, we plan to build such a web-size UKG benchmark to support more comprehensive evaluations for UKG completion.
W2: We thank the reviewer for pointing out this issue, and will revise those equations to use proper equation environments in the final version.
[1] Xuelu Chen, Muhao Chen, Weijia Shi, Yizhou Sun, and Carlo Zaniolo. Embedding Uncertain Knowledge Graphs. In Proc. of AAAI, volume 33, pages 3363–3370, 2019.
[2] Zhu-Mu Chen, Mi-Yen Yeh, and Tei-Wei Kuo. PASSLEAF: A Pool-bAsed Semi-Supervised LEArning Framework for Uncertain Knowledge Graph Embedding. In Proc. of AAAI, volume 35, pages 4019–4026, 2021.
[3] Xuelu Chen, Michael Boratko, Muhao Chen, Shib Sankar Dasgupta, Xiang Lorraine Li, and Andrew McCallum. Probabilistic Box Embeddings for Uncertain Knowledge Graph Reasoning. In Proc. of NAACL, pages 882–893, 2021.
Thanks for your response, I agree that this work is ok to me at the current stage and I will keep my score as it already suggests for accept.
We appreciate your valuable feedback. Thanks again for helping us improve this paper.
The paper first points out the extremely imbalanced distributions of triple confidences in the context of uncertain knowledge graph learning. Then, it proposes a semi-supervised confidence distribution learning method for UKG completion. Reviewers agree that the authors' viewpoint itself is novel and the proposed method successfully obtain high-quality embeddings for UKG completion under the imbalanced confidence distribution in UKGs. One concern is the effectiveness of meta-learned pseudo-labels. However, it would be a minor concern. The area chair nominates the paper for a spotlight since the paper's viewpoint (imbalanced distributions of triple confidences) itself is novel so it has the potential to lead many followers.