KGMark: A Diffusion Watermark for Knowledge Graphs
We present KGMark, the first watermarking method for knowledge graph embeddings that ensures high detectability, transparency, and robustness across various graph modifications.
摘要
评审与讨论
Briefly summarize the paper (including the main findings, main results, main algorithmic/conceptual ideas, etc. that the paper claims to contribute). This summary should not be used to critique the paper. A well-written summary should not be disputed by the authors of the paper or other readers.
This paper introduces KGMark, the first watermarking method for knowledge graph embeddings. It leverages a latent diffusion model to embed watermarks in the frequency domain, using a learnable mask to ensure transparency. Experimental results show its high detectability and robustness against various attacks.
给作者的问题
-
Comparative Analysis: How does KGMark compare to existing watermarking methods from other domains (e.g., image or text)? A direct comparison would highlight its unique advantages.
-
Optimal Embedding Strategy: What is the optimal strategy for selecting vertices or communities for watermark embedding, and how does it influence performance under various attacks?
-
Robustness vs. Transparency Trade-off: How does the learning process of the adaptive watermark mask matrix ensure it balances robustness and transparency without overfitting?
论据与证据
The claims made in the submission regarding KGMark’s efficacy in watermarking knowledge graphs are supported by clear and convincing evidence:
-
Theoretical Analysis: The paper provides a detailed theoretical framework, including the use of latent diffusion models, Fourier transform-based watermark embedding, and the Learnable Adaptive Watermark Mask Matrix (LAWMM) to ensure transparency and robustness. It also introduces principles like Latent Space Equilibrium and Information-Theoretic Robustness to justify the method’s design.
-
Experimental Results: The claims are empirically validated through extensive experiments on three diverse datasets (Last-FM, MIND, and Alibaba-iFashion). The results demonstrate high watermark detectability (AUC up to 0.99) and robustness against various attacks (e.g., relation alteration, triple deletion, isomorphism variation) while maintaining minimal impact on the knowledge graph’s usability.
Comparative Analysis: The paper compares KGMark with its variants (e.g., without LAWMM, only community layer, only vertex layer) and shows that the full method outperforms these variants in terms of robustness and transparency, supporting the claim that the combined approach is superior.
方法与评估标准
The proposed methods, evaluation criteria, and benchmark datasets are well-aligned with the problem of watermarking knowledge graphs:
-
Methods: The proposed KGMark framework leverages a latent diffusion model to embed watermarks in the frequency domain, addressing the unique challenges of spatial-temporal variations and structural complexities in dynamic knowledge graphs. The use of a Learnable Adaptive Watermark Mask Matrix and redundant embedding strategies enhances transparency and robustness while ensuring minimal disruption to the knowledge graph’s usability.
-
Evaluation Criteria: The evaluation comprehensively assesses watermark detectability using AUC, transparency through cosine similarity and downstream task performance metrics (e.g., GMR, HMR, AMR, Hits@10), and robustness against various attacks (e.g., relation alteration, triple deletion, isomorphism variation). These metrics effectively capture the critical aspects of watermarking performance in knowledge graphs.
-
Benchmark Datasets: The experiments are conducted on three diverse public datasets—Last-FM, MIND, and Alibaba-iFashion—which represent different real-world applications. These datasets provide a robust basis for evaluating the method’s effectiveness and generalizability across various knowledge graph structures and domains.
理论论述
The paper presents theoretical claims supported by principles such as Latent Space Equilibrium and Information-Theoretic Robustness. These principles aim to ensure watermark transparency and robustness through mathematical formulations. The claims appear logically consistent and align with the problem context.
实验设计与分析
The paper presents a well-structured experimental design with:
-
Comprehensive evaluations on three benchmarks (Last-FM, MIND, Alibaba-iFashion) across diverse KG structures.
-
Rigorous testing of watermark detectability, transparency, and robustness against multiple attacks.
-
Comparative analysis of KGMark variants to highlight key components and their combined benefits.
补充材料
The supplementary material provides additional details that support the main paper, including Algorithm 1 and Algorithm 2. These algorithms offer clear procedures for graph alignment and redundant watermark embedding, which enhance the method’s robustness against structural variations and attacks. The case study in the supplementary material further validates the transparency of the watermarking process by demonstrating minimal impact on the original embedding distribution.
与现有文献的关系
The paper effectively situates its contributions within the broader scientific context by drawing on key concepts from diffusion models, watermarking techniques[1] [2], and knowledge graph[3] management. It leverages the strengths of diffusion models to embed watermarks in a way that balances transparency and robustness[4], building on prior work that has demonstrated the effectiveness of these models in generating and protecting synthetic data[5]. Additionally, the paper integrates ideas from graph neural networks and watermarking to address the unique challenges of structured data, such as graph isomorphism and dynamic updates.
-
Wen, Y., Kirchenbauer, J., Geiping, J., and Goldstein, T. Tree-rings watermarks: Invisible fingerprints for diffu sion images. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, pp. 58047 58063. Curran Associates, Inc., 2023.
-
Yang, Z., Zeng, K., Chen, K., Fang, H., Zhang, W., and Yu, N. Gaussian shading: Provable performance-lossless image watermarking for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12162–12171, 2024b.
-
Yang, Y., Chen, J., and Xiang, Y. A review on the reliability of knowledge graph: from a knowledge representation learning perspective. World Wide Web, 28(1):4, 2024a.
-
Barman, N. R., Sharma, K., Aziz, A., Bajpai, S., Biswas, S., Sharma, V., Jain, V., Chadha, A., Sheth, A., and Das, A. The brittleness of ai-generated image watermarking techniques: Examining their robustness against visual paraphrasing attacks. arXiv preprint arXiv:2408.10446, 2024.
-
Bauer, A., Trapp, S., Stenger, M., Leppich, R., Kounev, S., Leznik, M., Chard, K., and Foster, I. Comprehensive exploration of synthetic data generation: A survey, 2024.
遗漏的重要参考文献
N/A
其他优缺点
Strengths:
-
Originality and Creativity: Adapting diffusion-based watermarking to structured data like knowledge graphs is novel. Traditional methods for images or text often fail to account for spatial-temporal variations and relational complexities, which KGMark effectively addresses.
-
Clarity and Presentation: The paper presents its methodology clearly, with detailed explanations of theoretical foundations and practical implementations. Diagrams and pseudocode enhance understanding.
Weaknesses:
- Scalability Concerns:
The experiments are limited to relatively small datasets, raising questions about the method’s scalability to larger, more complex knowledge graphs like Wikidata. Addressing scalability is crucial for real-world applications, where knowledge graphs often contain millions of entities and relationships.
- Generalizability:
The paper lacks a thorough discussion on how well KGMark would perform on different types of knowledge graphs (e.g., biomedical, social networks) or in diverse application scenarios. Demonstrating generalizability across various downstream domains would strengthen the paper’s impact.
- Comparative Analysis:
While the paper compares KGMark with its variants, it lacks a direct comparison with existing watermarking methods from other domains (e.g., image or text watermarking). Such comparisons would provide a clearer picture of KGMark’s unique advantages and potential limitations.
其他意见或建议
Overall, the paper is well-written and presents a novel approach to watermarking knowledge graphs. Here are a few suggestions:
-
Clarify Scalability: Include a brief discussion on how KGMark can be scaled to handle larger knowledge graphs.
-
Expand Generalizability: Add a section on potential applications beyond the tested datasets to highlight broader applicability.
-
Proofread: Minor typos and grammatical errors should be corrected to enhance readability. For example, the Method section inconsistently refers to equations as "Equ. (x)" in some places and "Equation (x)" in others, which should be standardized..、
Dear Reviewer voAk:
Thank you for your detailed and constructive feedback.
We note that your concerns mainly focus on four aspects:
- Scalability and generalizability of KGMark
- Comparative analysis
- Embedding strategy and robustness–transparency trade-off
- Presentation and writing quality
Dataset Scale and Heterogeneous
Scale and Generalization KGMark is evaluated on three large-scale, real-world knowledge graphs from diverse domains, ensuring both scalability and generalizability::
-
Alibaba-iFashion: 1.21B user clicks from 5.54M users, 4.68M products, and 192K outfit sets. It supports both recommendation and compatibility prediction tasks, and has been deployed in real-world e-commerce systems.
-
Last-FM: Combines user-tag data from Last. fm with the Million Song Dataset and playlists, covering 5,075 songs with emotion labels and 464K triples. It enables tasks like emotion classification and multimodal analysis.
-
MIND: Includes ~1M users and 161K news articles with 2.4M click records and 512 relation types. It supports news recommendation, classification, and content-based cold-start modeling.
The variety in domains (fashion, music, news) and graph structures allows KGMark to be evaluated under diverse, realistic conditions, ensuring broad generalization ability.
Heterogeneous Graph Structures All datasets naturally exhibit heterogeneous characteristics. Nodes represent different entity types (e.g., users, products, songs, articles), and edge semantics vary (e.g., click, purchase, like). For example, MIND includes 512 relation types, indicating complex, multi-relational structures. Modeling these interactions requires a heterogeneous graph framework, which KGMark is designed to support.
Baselines
We have added additional experiments on the baselines, as presented in our response to Reviewer dihZ (Performance vs. Baseline) and zE4M (Adversarial Attacks).
Transparency Strategy
To further clarify the role of the adaptive mask matrix, we describe how it contributes to both the robustness and transparency of watermark embedding.
We aim to balance watermark detectability with minimal impact on downstream tasks by jointly minimizing reconstruction and task-related loss:
In the refined objective, we introduce two key changes to improve both performance and efficiency:
-
While learns to embed the signature via the mask , the resulting latent representation may deviate from the original, potentially altering structural information. To improve alignment, we introduce a correction term , which reduces the gap between the watermarked and original latent representations. This better alignment helps preserve the graph's inherent structure, indirectly supporting downstream task performance.
-
Since gradients on accumulate across all diffusion steps, direct optimization becomes computationally intensive. To mitigate this, we adopt a "sample-then-embed" strategy. First sampling the latent representation and then applying watermark embedding which simplifies training and reduces complexity.
The resulting loss is:
This formulation enables efficient optimization while enhancing the transparency of the embedded watermark.
Training Process of Adaptive Mask Matrix:
Another goal of the adaptive mask matrix is to improve watermark transparency without sacrificing robustness. As shown in Figure 3(c), the density of the watermark mask matrix is the primary factor affecting transparency.
In contrast, its impact on watermark detectability (i.e., robustness) is relatively minor. Based on this observation, LAWMM is designed to learn an adaptive mask matrix that maintains a fixed density while improving transparency through training. To ensure control over the final density, we regulate the number of training epochs. We find that setting the number of epochs to 50 yields an average density of approximately 0.015, which aligns with our desired sparsity level.
Writing Issues
In the revised version, we will carefully review the manuscript to correct any grammatical issues and improve overall clarity and writing style.
Your insights were extremely helpful in improving the presentation of our work. We hope the revisions and clarifications reflect the value of our contribution more clearly.
I have read the author's rebuttal and the review of other reviewer. I'd love to increase my score.
We sincerely appreciate your thoughtful feedback and the updated score, which reflect your recognition of the improvements and clarifications we have made. If the paper is accepted, we will ensure that all additional experiments and textual revisions introduced during the rebuttal are fully integrated into the final version, along with a thorough proofreading of the entire manuscript.
This manuscript presents a novel watermarking method for knowledge graph embeddings to ensure the traceability and auditability of knowledge graphs, claiming to embed invisible signatures into diffusion-based latent representations using the Fourier transform. It addresses key challenges by incorporating multi-level redundancy, graph alignment, and a learnable mask to enhance robustness and transparency.
给作者的问题
- Considering the large number of hyperparameters involved in the paper, can you explain the basis for configuring these parameters?
- The paper mentioned the use of community algorithm to segment the graph. Can you explain the necessity of introducing community algorithm? Why don't we directly select several vertices on the graph for redundant embedding?
论据与证据
This paper emphasizes the detectability, transparency, and robustness of the proposed watermarking method, KGMark. These claims are supported by the following aspects:
- Theoretical Derivation: The paper rigorously presents the mathematical formulation of KGMark along with detailed derivations, ensuring the theoretical correctness of the proposed method.
- Experimental Evidence: Comprehensive evaluations are conducted to assess the detectability, transparency, and robustness of KGMark, with results demonstrating the effectiveness of the approach.
- Case Study: The paper provides a case study for specific downstream tasks, showcasing the practical utility of KGMark in real-world scenarios.
方法与评估标准
The proposed methods and evaluation criteria align well with the problem at hand. The study:
- The AUC is used to evaluate the detectability and robustness of KGMark.
- The transparency of KGMark is evaluated by the cosine similarity and quality metrics of knowledge graph embeddings.
- Extensive ablation experiments have been performed on multiple variants of KGMark and their associated hyperparameters.
理论论述
In this paper, the mathematical expression of KGMark and the related theoretical derivation are given, and the attacks on KGmark are modeled. The proofs are logically sound and mathematically rigorous.
实验设计与分析
The paper selects three KG datasets from different domains to evaluate KGMark.
- The detectability of KGMark on clean samples was evaluated.
- The robustness of KGMark under different kinds and intensities of attacks is evaluated.
- The transparency of KGMark is evaluated by the similarity and quality of knowledge graph embeddings.
补充材料
I have examined the contents of the appendix, such as relevant algorithms, additional theoretical proofs and case studies, and found no major flaws.
与现有文献的关系
This paper aims to introduce watermarking technology into knowledge graph, which ensures the traceability and auditability of knowledge graph and protects its intellectual property rights to a certain extent. I think this is a meaningful and useful exploration.
遗漏的重要参考文献
N/A
其他优缺点
Strength:
KGMark presents an outstanding exposition of a novel watermarking method for KGEs. Driven by an insightful motivation to protect the copyrights, the proposed KGMark framework introduces key innovations to enhance both transparency and robustness. The Learnable Adaptive Watermark Mask Matrix improves transparency, while multi-level redundancy ensures resilience against structural modifications. Additionally, the incorporation of graph alignment effectively mitigates challenges arising from isomorphism variations. Overall, this approach demonstrates significant potential for safeguarding intellectual property and maintaining data integrity in KGEs.
Weakness:
- The lack of comparison with relevant baselines makes the effectiveness of the proposed method in various aspects unable to be more credible verified.
- Although the paper introduces a learnable mask matrix to improve transparency, the relationship between this matrix and the redundant embedding strategy is not explicitly clarified. Further explanation on how these two components interact and contribute to watermark robustness and transparency would enhance the paper.
- The case study demonstrates KGMark in a recommendation system. However, it remains unclear whether the authors have tested the method in other domains. Expanding the evaluation to other real-world applications would strengthen the paper’s claims of versatility.
其他意见或建议
- While KGMark is the first watermarking scheme designed for knowledge graphs, it would benefit from additional comparisons with existing watermark methods to provide a clearer context for its innovations.
- The experiments are conducted on relatively small datasets. Evaluating KGMark on larger knowledge graphs would better demonstrate its scalability and robustness in more realistic settings.
- The paper does not discuss the computational efficiency of KGMark for embedding watermarks in knowledge graphs of varying sizes. Including an analysis of the method’s performance in terms of computational resources and time complexity across different scales would be valuable.
Dear Reviewer 1f2k:
We sincerely appreciate your thoughtful comments, which primarily concern the following three aspects:
-
Design choices behind KGMARK
-
Algorithmic ration for the redundant embedding strategy
-
Consideration of computational efficiency
KGMark's Designing
KGMARK is specifically designed for knowledge graphs (KGs), which differ significantly from images or text in both structure and semantics. Traditional methods developed for images or text are difficult to apply to KGs due to their unique characteristics:
- Non-Euclidean topology: KGs lack the spatial continuity of images, making frequency or pixel-based watermarking inapplicable.
- High sensitivity to structural changes: Small perturbations can cause large semantic shifts.
- Large scale and sparsity: Real-world KGs require scalable and efficient embedding strategies.
Traditional watermarking techniques often assume dense and continuous data representations, making them ill-suited for the sparse and structured nature of KGs. These methods typically lack structural adaptability and fail to account for the non-Euclidean topology, rendering them vulnerable to perturbations such as isomorphic transformations or large-scale deletions. As a result, they often fail to ensure robustness or transparency in such settings.
For example, Gaussian-Shading is designed for image generation, where watermarks are embedded by modifying the initial latent noise during sampling. This approach relies on controlling the sampling process from the start. In our setting, however, the graph is already generated, and we recover its latent state via DDIM inversion. Applying Gaussian-Shading post hoc would overwrite the inverted latent representation and break the reconstruction process, making it incompatible with our framework.
KGMARK is thus tailored to address these challenges through graph-aware mechanisms. By leveraging graph alignment and community-based redundant embedding, it ensures both robustness and transparency under structural perturbations.
We further demonstrate the effectiveness of KGMark across multiple baselines, with detailed results presented in our response to Reviewer dihZ (Performance vs. Baseline).
Robustness Strategy
To ensure robustness against adversarial perturbations, we require that the amount of information retained about the original watermark remains above a guaranteed threshold, even after the graph is modified. This is quantified by the mutual information between the original watermark and the extracted result under perturbation:
Here, denotes a bounded structural modification to the watermarked graph , and is the watermark extraction function. The inequality ensures that even under worst-case sparse attacks (e.g., modifying up to edges), the extractor can still recover meaningful information about .
To satisfy this robustness condition, we adopt a redundant embedding strategy that spreads the watermark across both global and local graph structures. Specifically, we partition the graph into communities , with vertices ranked by their centrality . The watermark is then embedded as:
Here, encodes into the structural signature of each community, while encodes it around high-centrality vertices. This dual-layer design ensures that the watermark is preserved even when parts of the graph are modified, thereby improving robustness by maximizing the information retained across complementary substructures.
To justify our choice of high-centrality vertices for embedding, we highlight three key considerations:
- We assume attackers aim to modify the graph without harming its usability, and thus tend to avoid high-centrality nodes.
- High-centrality nodes are structurally stable due to their dense connections, making embedded watermarks more resilient to local changes.
- Although community detection may vary across runs, unstable nodes are few and weakly connected. They are excluded from embedding, so this does not affect redundancy or robustness.
Time complexity
The computational cost of KGMark is primarily determined by the watermark embedding and optimization processes, with the majority of the time spent on DDIM inference. The runtime is directly correlated with the size of the knowledge graph.
To provide a more accurate evaluation, we will include the average runtime of KGMARK in the revised version.
We appreciate your detailed comments and hope our responses have clarified the key contributions and addressed your concerns.
Thank you to the authors for the substantial efforts made during the rebuttal phase. The additional analysis on computational overhead has improved the overall clarity of the paper, making the presentation of the method and experiments more complete. Based on this, I recommend accepting the paper.
We sincerely thank you for your positive feedback and your recommendation to accept the paper. We are encouraged by your recognition of the contributions made in this work.
KGMark is the first watermarking framework specifically designed for generated knowledge graphs. Unlike existing watermarking schemes that often overlook the unique structural and semantic properties of KGs, KGMark introduces a diffusion-based embedding mechanism to ensure robustness and transparency under structural perturbations.
If the paper is accepted, we will incorporate all experimental improvements and textual refinements introduced during the rebuttal process into the final version, and ensure the manuscript is thoroughly proofread.
This paper proposes a watermarking method for knowledge graphs (KGs) using diffusion models. The authors claim their method embeds watermarks into KGs via diffusion-based encoding, ensuring traceability, integrity, and copyright protection. The method primarily relies on diffusion encoding, subgraph preservation principles, and loss functions aimed at robustness against certain graph-based attacks. Experimental evaluations were conducted on datasets such as AliF and MIND to validate the effectiveness of watermark embedding, extraction accuracy, and impact on downstream tasks.
给作者的问题
Please refer to the weakness mentioned above.
论据与证据
The authors' key empirical claims lack strong evidence. The claim of robustness against graph-based attacks is weak, as the evaluations do not compare against advanced adversarial attacks. Similarly, the claim of preserving graph integrity and traceability is undermined by significant performance drops in downstream tasks like MIND and Alif. These issues reveal major gaps in empirical support, further compounded by the absence of baseline comparisons on diffusion watermarking [1,2]. [1] Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models [2] Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust
方法与评估标准
The proposed methods and evaluation criteria have several fundamental issues. Primarily, the choice to use diffusion models for watermark encoding on graphs is not well-justified, especially given significant drops in Hit@10 performance on downstream tasks (MIND and Alif) observed even with reconstruction alone. Furthermore, essential evaluations against more advanced attack methods prevalent in recent graph attack literature are missing such as [3,4]. [3] Adversarial attacks on knowledge graph embeddings via instance attribution methods [4] Adversarial attacks and defenses on graphs
理论论述
The paper contains inconsistencies and ambiguities in its theoretical formulations, particularly in equations (4) and (9), where the definition of variable S changes without clarification. Additionally, critical conceptual misunderstandings or mislabeling, such as referring to objectives as "principles" add to the confusion.
实验设计与分析
The experimental design has several critical flaws. The evaluation fails to compare with state-of-the-art diffusion-based watermarking methods [1,2]. Additionally, the significant performance drop on the AliF and MIND datasets, as shown in previous sections, raises serious concerns about its paradigm suitability.
补充材料
I reviewed the supplementary material, which includes some important algorithm details missing from the main paper. The main key parts of watermark extraction and algorithm specifications are only in the appendix, making the main paper less clear and harder to understand on its own.
与现有文献的关系
The paper is mainly related to diffusion-based watermarking methods, graph watermarking, graph robustness and graph embedding techniques.
遗漏的重要参考文献
The paper does not discuss significant recent advances in diffusion-based watermarking methods [1,2], and graph robustness [3,4].
其他优缺点
While the idea of integrating diffusion models with watermarking in graphs is promising, poor execution and writing severely limit its impact. The paper lacks a clear motivation for using diffusion models for watermarking, and inconsistent, confusing technical explanations further weaken its clarity. Sections 3.3 and 3.5 are particularly unclear, making it difficult to understand what is being optimized, how different loss terms interact, or how watermark extraction works in practice. Additionally, the challenges posed by heterogeneous nodes and relations in knowledge graphs are not addressed.
其他意见或建议
Statements like "potentially introducing harmful content that compromises analyses or even facilitates the exploitation of real-world systems" and discussions such as "unnerves large corporations, let alone individual researchers" in page 1 are vague or inaccurate. In Figure 2, community detection and alignment correspond to Section 4.4, and Section 4.3 relates to extraction. However, since these are experimental sections, shouldn't they reference Sections 3.5 and 3.3 instead?
Dear Reviewer dihZ:
We sincerely thank you for your constructive and thoughtful suggestions.
We understand that your comments mainly concern the following four aspects:
- The robustness of KGMARK under stronger adversarial attacks
- Concerns about KGMARK’s performance across datasets
- Explanation of the strategies in Learnable Adaptive Watermark Mask Matrix and Defending Isomorphism and Structural Variations
- Writing and presentation issues
robustness
Guided by the reviewer's suggestions, we have added experiments with stronger adversarial attacks([1], [2]) and included corresponding baselines([3], [4], [5], [6]). The results are presented in our response to Reviewer zE4M (Adversarial Attacks). We hope this clarifies the robustness of KGMARK.
Performance vs. Baseline
Guided by the reviewer's suggestion about KGMARK's performance across datasets, we have added comparisons with four watermarking baselines (two preprocessing-based, two diffusion-based).
| Datasets | Method | CosSim@50 | CosSim@65 | CosSim@75 | GMR ↓ | HMR ↓ | AMR ↓ | Hits@10 ↑ |
|---|---|---|---|---|---|---|---|---|
| AliF | Original KG | - | - | - | 1.828 | 1.162 | 135.459 | 0.8980 |
| DwtDct | 0.7215 | 0.7928 | 0.8251 | 5.096 | 1.699 | 157.036 | 0.6933 | |
| DctQim | 0.7509 | 0.7633 | 0.7653 | 5.104 | 1.654 | 161.142 | 0.7385 | |
| TR | 0.7761 | 0.8431 | 0.9071 | 3.928 | 1.618 | 152.634 | 0.8017 | |
| GS | 0.2879 | 0.3226 | 0.3538 | 6.641 | 1.798 | 172.813 | 0.5137 | |
| KGMark | 0.7839 | 0.8309 | 0.9482 | 3.046 | 1.580 | 141.904 | 0.8296 | |
| MIND | Original KG | - | - | - | 7.197 | 1.975 | 155.656 | 0.6649 |
| DwtDct | 0.7831 | 0.8244 | 0.8312 | 11.328 | 2.328 | 188.992 | 0.5216 | |
| DctQim | 0.7549 | 0.7574 | 0.7703 | 12.037 | 2.753 | 205.483 | 0.4835 | |
| TR | 0.7976 | 0.8108 | 0.8581 | 11.102 | 2.297 | 182.925 | 0.5108 | |
| GS | 0.2843 | 0.3196 | 0.3728 | 14.523 | 3.312 | 234.064 | 0.3940 | |
| KGMark | 0.8083 | 0.8533 | 0.9397 | 10.508 | 2.226 | 169.305 | 0.5683 | |
| Last-FM | Original KG | - | - | - | 3.571 | 1.202 | 1711.695 | 0.8436 |
| DwtDct | 0.7215 | 0.7928 | 0.8433 | 4.519 | 1.502 | 1734.823 | 0.8221 | |
| DctQim | 0.7509 | 0.7633 | 0.7679 | 5.139 | 1.704 | 2043.249 | 0.7264 | |
| TR | 0.7262 | 0.7896 | 0.8364 | 4.733 | 1.652 | 1772.961 | 0.8149 | |
| GS | 0.3252 | 0.3649 | 0.4184 | 6.349 | 2.016 | 2192.492 | 0.6463 | |
| KGMark | 0.8876 | 0.9051 | 0.9161 | 4.455 | 1.452 | 1716.365 | 0.8430 |
We hope the results of these additional experiments demonstrate that KGMark consistently outperforms TreeRing (TR)[3], GaussianShading (GS)[4], both of which replace 5% of nodes with watermark data across multiple downstream tasks. DwtDct [5] and DctQim [6] are both frequency-domain post-processing watermarking techniques that embed watermark signals into transformed frequency coefficients, aiming to balance robustness and imperceptibility.
[3] Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
[4] Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust
[5] Digital Watermarking and Steganography
[6] A class of provably good methods for digital watermarking and information embedding
Explanation of Strategies
A detailed explanation of the rationale behind KGMARK’s design is presented in our response to Reviewer 1f2k (KGMARK’s Designing).
We have also elaborated on the core strategies and their underlying motivations in our responses to Reviewer 1f2k (Robustness Strategy) and Reviewer voAk (Transparency Strategy).
In the revised version, we will revise the relevant sections(3.3 and 3.5) in the our paper to provide a clearer and more comprehensive explanation of both the learnable mask mechanism and the robustness design for handling isomorphism and structural variations.
Writing Issues
We sincerely appreciate your careful feedback and will make the following revisions in the final version:
- Correcting the section reference in Figure 2.
- Clarifying the duplicated variable usage.
- Relocating the algorithm for better coherence.
- Refining Sections 3.3 and 3.5 to improve clarity and readability.
We will thoroughly proofread the paper and clarified the sentences.
We thank the reviewer's helpful suggestions. The negatives pointed out by the reviewer are very insightful and have inspired us to reflect more deeply on our paper. We believe that the revisions have strengthened our paper.
The paper presents KGMark, a watermarking framework designed for Knowledge Graphs (KGs), which are widely used in applications like semantic search, question answering, and recommendation systems. The primary goal of KGMark is to embed robust, detectable, and transparent watermarks into dynamic KGs to protect intellectual property and ensure data integrity, especially in the context of AI-generated content.
给作者的问题
- What are some limitations of this work?
- Are there any privacy concerns that need to be discussed when the KG contains sensitive data?
论据与证据
The claims made in the submission are well-supported by experimental results and case studies. However, to further strengthen the claims, the authors could consider additional experiments on testing against more extreme or adaptive attacks such as attacks specifically designed to remove watermarks.
方法与评估标准
Standard metrics such as FPR (False Positive Rate) and TPR (True Positive Rate) for detectability and cosine similarity for transparency have been used.
理论论述
The theoretical claims have been checked and to the best of my knowledge seem valid.
实验设计与分析
The authors use three datasets (Last-FM, MIND, and Alibaba-iFashion) representing diverse real-world scenarios. These datasets are appropriate for evaluating the generalizability and effectiveness of KGMark across different domains.
补充材料
Yes, all supplementary material have been reviewed.
与现有文献的关系
This work is quite innovative and to the best of my knowledge, there are no other watermarking techniques for knowledge graphs.
遗漏的重要参考文献
To the best of my knowledge, there are no essential references that are not discussed.
其他优缺点
Strengths:
- The document appears to be well-structured and comprehensive.
- This is an innovative work related to the watermarking of KGs
Weaknesses:
- Additional experiments on testing against more extreme or adaptive attacks such as attacks specifically designed to remove watermarks.
其他意见或建议
No other comments.
Dear Reviewer zE4M:
We appreciate your insightful comments, which mainly focus on:
-
Empirical validation under stronger adversarial attacks
-
Clarification of the limitations of KGMARK
-
Discussion of KGMARK’s role in privacy protection for sensitive data
Adversarial Attacks
We have incorporated two recent and stronger adversarial attacks NEA[1] and L2 Metric [2] into our evaluation. In the following table, we retain the original five high-intensity attacks, introduce two additional adversarial attack types, and evaluate KGMark against four newly added baseline methods. The results show that KGMARK consistently outperforms four baseline methods acrosss three dataset, demonstrating superior robustness.
| Datasets | Method | Clean | Relation Alteration (50%) | Triple Deletion (50%) | Gaussian Noise (50%) | Smoothing (50%) | L2 Metric | NEA | IsoVar |
|---|---|---|---|---|---|---|---|---|---|
| AliF | DwtDct | 0.9837 | 0.8371 | 0.7724 | 0.8626 | 0.8053 | 0.9577 | 0.9638 | 0.6039 |
| DctQim | 0.9749 | 0.8139 | 0.7073 | 0.6949 | 0.7665 | 0.9203 | 0.9278 | 0.5867 | |
| TR | 0.9814 | 0.7392 | 0.8091 | 0.8063 | 0.7823 | 0.9621 | 0.9584 | 0.6257 | |
| GS | 0.9882 | 0.7998 | 0.7850 | 0.8921 | 0.7906 | 0.9364 | 0.9512 | 0.6094 | |
| KGMark | 0.9991 | 0.9207 | 0.9320 | 0.9136 | 0.8887 | 0.9841 | 0.9809 | 0.9933 | |
| MIND | DwtDct | 0.9793 | 0.8161 | 0.7610 | 0.8285 | 0.8121 | 0.9358 | 0.9291 | 0.6348 |
| DctQim | 0.9785 | 0.8269 | 0.6993 | 0.7186 | 0.7935 | 0.9209 | 0.9198 | 0.5708 | |
| TR | 0.9862 | 0.8171 | 0.7831 | 0.7721 | 0.8296 | 0.9682 | 0.9543 | 0.5763 | |
| GS | 0.9903 | 0.7930 | 0.8284 | 0.8536 | 0.8252 | 0.9767 | 0.9681 | 0.5845 | |
| KGMark | 0.9987 | 0.9314 | 0.9576 | 0.9232 | 0.9012 | 0.9849 | 0.9883 | 0.9842 | |
| Last-FM | DwtDct | 0.9801 | 0.8229 | 0.7415 | 0.8514 | 0.7740 | 0.9596 | 0.9678 | 0.6407 |
| DctQim | 0.9842 | 0.8062 | 0.7125 | 0.7383 | 0.8174 | 0.9144 | 0.9161 | 0.5938 | |
| TR | 0.9879 | 0.7982 | 0.8519 | 0.8632 | 0.8531 | 0.9553 | 0.9487 | 0.6109 | |
| GS | 0.9795 | 0.8303 | 0.8667 | 0.8912 | 0.8575 | 0.9638 | 0.9594 | 0.6551 | |
| KGMark | 0.9976 | 0.9421 | 0.9031 | 0.9295 | 0.9131 | 0.9886 | 0.9814 | 0.9977 |
We also highlight that the robustness enhancement in KGMARK allows it to retain watermark fidelity even under high-intensity perturbations (e.g., random deletion of 50% of entities), where baseline methods fail significantly. Another interesting observation is that while KGMark shows better performance than the baselines under low-intensity attacks (to be detailed in the extended version), its robustness remains relatively stable as the attack strength increases. In contrast, most baselines exhibit noticeable performance degradation under higher levels of perturbation.
[1] Node embedding attacks via graph poisoning (NEA)
[2] knowledge graph embedding attacks via instance attribution methods
Limitations
While the embedding dimensionality may influence the balance between watermark detectability and downstream performance, this can be mitigated through adaptive tuning in future work. Additionally, extending our DDIM-based framework to support newer sampling strategies is a promising direction we plan to explore.
Privacy Protection
When (KGs) involve sensitive information, they face several privacy risks:
- Tampering risk: Attackers may alter the generated graph to inject harmful content, potentially leaking sensitive information.
- Disturbing sensitive subgraphs: Even with structure-aware embedding, watermarking may affect subgraphs involving sensitive entities.
- Key misuse: If the watermark key is exposed, it could be exploited for data tracing or structural inference.
KGMARK incorporates embedding-level mechanisms to mitigate the above risks and enhance privacy protection. Moreover, KGMark is model-agnostic and can be easily integrated into existing knowledge graph embedding (KGE) frameworks. This compatibility enables combination with complementary privacy-preserving techniques (such as differential privacy) to achieve multi-layered protection.
We sincerely appreciate your valuable feedback, which has helped us improve the quality and clarity of our work. We hope our responses have addressed your concerns and clarified the contributions of our paper.
The reviewers all agree to accept the paper. I recommend acceptance.