PaperHub
6.3
/10
Poster7 位审稿人
最低5最高7标准差0.7
7
7
7
5
6
6
6
2.7
置信度
COLM 2025

Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

OpenReviewPDF
提交: 2025-03-21更新: 2025-08-26
TL;DR

We present a privacy-preserving framework for relational learning, showcased by fine-tuning LLMs on sensitive graphs with differential privacy.

摘要

关键词
differential privacyrelational learningprivate learninglanguage modelsfine-tuning

评审与讨论

审稿意见
7

This paper studies the technical difficulty of applying standard DP-SGD when training models in relational learning. The paper describes a pipeline that can provably achieve DP in learning from relational data. The main computational bottleneck of applying DP-SGD is the tuple size kk introduced by relational learning and the lack of support for tracking tuple-level gradients g(E)g(E). The situation worsens when the relational data contain textual attributes and the target model is a language model. The paper provides experimental results to simulate common scenarios of applying relational learning in sensitive domains, where the graph data used to enhance target models contain personal or proprietary relations need to be protected. The authors choose two publicly available real-world text-attributed graphs with millions of entities/relations for the simulation: the e-commerce network from Amazon and the academic network from Microsoft Academic Graph.

接收理由

The problem statement and formalization is well-defined. The proposed approach is the first for relational learning with DP. The explanation of why current DP-GNNs are not sufficient to address the problem of differentially private relational learning is discussed The evaluation is conducted over a data set that does seem appropriate for this model. Overall, the paper does a convincing job of demonstrating that LLMs can effectively learn from relational data to address relational learning tasks while providing DP.

拒绝理由

The claim that the proposed techniques for pairwise relationships can be extended easily to relational graph models is unclear, but the paper seems strong enough without this.

给作者的问题

Section 3.3. could use some re-structuring to better explain the space/memory complexity of the gradient clipping approach, ideally in the form of a function pseudocode.

评论

We sincerely thank the reviewer for the strong support in accepting our work and for recognizing the novelty and effectiveness of our proposed method for private relational learning. In Sec. 3, we list the general formulation of relational learning, Eq. (1), where multiple types of loss functions are compatible with our pipeline, including the Hinge loss that supports different relation types commonly used for learning from knowledge graphs. We acknowledge that the effort on such an extension is non-trivial, and thus this is left for future study. We thank the reviewer for the suggestion on revising Sec. 3.3. We will incorporate all the discussions and reorganize the complexity analysis of per-tuple gradient clipping, paired with appropriate pseudocode in the final version.

审稿意见
7

This paper proposes a novel pipeline for differentially private relational learning using LLMs on text-attributed graphs. The core innovation lies in addressing the incompatibility of traditional DP-SGD with relational learning, where gradients inherently depend on multiple interdependent relations. The authors introduce a decoupled negative sampling strategy to isolate gradients per relation, enabling per-sample clipping and DP noise addition.

接收理由

  1. Novelty: The paper introduces a conceptually novel and practical solution for learning from relational data with differential privacy DP, which is an area not addressed by existing DP methods.
  2. The paper is clearly written, transparent about its assumptions and limitations. 2.The paper conducts a thorough empirical study across large-scale text-attributed graphs (Amazon and MAG), covering both zero-shot and few-shot settings for relation prediction and entity classification. It explores key hyperparameters, including negative sample size batch size and noise multiplier σ and analyzes their effects on privacy-utility trade-offs.
  3. The inclusion of membership inference attacks and statistical significance testing to quantify privacy leakage strengthens the empirical validation of the proposed approach’s privacy guarantees.

拒绝理由

  1. Baseline Limitations: The experimental comparison primarily relies on a randomized response (RR) baseline and non-private fine-tuning. While the authors argue that existing DP-GNN methods are not directly applicable, they do not include even approximate or relaxed alternatives such as standard per-sample DP-SGD with entity-level privacy. Including such baseline would provide a stronger empirical foundation and better highlight the practical advantages of the proposed method.
  2. No Ablation on Core Design Choices: The contributions decoupled sampling, low-rank gradient clipping, use of LoRA are not individually ablated.
  3. While results are shown for ϵ=4 and 10, more continuous privacy-utility curve would strengthen claims about practical deployability.
评论

We thank the reviewer for recognizing the novelty of our work in addressing relational learning with DP and the completeness of our empirical study. We address the three concerns below:

Q1 Baseline Limitations

We thank the reviewer for this suggestion. Our work is among the first to provide strict (ϵ,δ\epsilon, \delta)-DP guarantees for individual relations in a relational learning context. Consequently, Randomized Response (RR) is one of the few existing mechanisms that can be adapted to offer a comparable relation-level DP guarantee, forming our primary baseline.

Regarding the suggestion of 'standard per-sample DP-SGD with entity-level privacy' as an approximate baseline, it's not immediately apparent to us how standard entity-level DP-SGD could be straightforwardly adapted to serve as an 'approximate or relaxed' baseline for our relation-level privacy during the relational supervision phase. Even if entity embeddings were produced with entity-level DP, the core challenge studied in our work would persist: the interdependent gradients arising from relational loss terms (e.g., a positive relation contrasted with several negatives in one loss computation). Our decoupled sampling method is specifically designed to handle this issue for the supervisory loss.

Concerning existing DP-GNN methods: As we discuss in Appendix A.1, these approaches typically focus on privatizing the GNN encoding process (i.e., message passing or node feature aggregation) primarily for tasks like node classification under node-level DP. They do not directly address the problem of ensuring DP for the relational learning objective function itself, where the supervision comes from relations. We are not aware of a simple adjustment to these DP-GNN techniques that would make them directly applicable to our problem of privacy-preserving relational supervision with relation-level DP guarantees.

Q2 No Ablation on Core Design Choices.

We thank the reviewer for pointing this out. We conducted the ablation studies on critical hyperparameters and privacy parameters in Sec. 4.2 (see Figs. 2 and 4). Additionally, we address the following points below.

  • We added the ablation study of decoupled sampling of Table R1 (see the response to Reviewer SESE's Q4).
  • For efficient per-tuple gradient clipping, the proposed technique is mathematically equivalent to the conventional method of aggregating per-token gradients, which does not affect the result accuracy but substantially saves the memory cost of privacy computing, as detailed in Sec. 3.3 lines 249-252. We added the comparison of these two methods and presented the results in Table R2 (see the response to Reviewer pMGM's Q3).
  • The comparison of non-private and private fine-tuning is conducted over models all with LoRA, due to resource constraints and the intensity of privacy computing for large pretrained models. Meanwhile, we do not observe an obvious performance gap between full parameter fine-tuning and LoRA for BERT-based models in non-private fine-tuning.

Q3 ... more continuous privacy-utility curve would strengthen claims about practical deployability.

We appreciate this suggestion. In Figure 2 (Right), we plot the model performance (MRR) against a range of noise multipliers (σ\sigma) for two datasets. Each distinct noise multiplier σ\sigma corresponds to a specific (ϵ,δ\epsilon,\delta)-DP guarantee, with smaller σ\sigma values leading to larger ϵ\epsilon (less privacy) and larger σ\sigma values leading to smaller ϵ\epsilon (more privacy). This plot already provides 5-6 points across this spectrum for each dataset, illustrating the privacy-utility trade-off trend, including and extending beyond the specific ϵ4\epsilon\approx 4 and ϵ10\epsilon\approx 10 values highlighted in other tables. We believe this captures the general behavior effectively. For instance, on MAG-USA, the σ\sigma values range from approximately 0.2 to 0.5, corresponding to a wide spectrum of ϵ\epsilon values. We can add the specific ϵ\epsilon values for each point to the figure's caption or an appendix table for enhanced clarity if the reviewer suggests.

评论

Thank you to the authors for the thoughtful response. I appreciate the clarifications and the added ablations, which improve the paper and address my concerns. I will increase my score accordingly.

评论

We thank the reviewer for your thoughtful feedback and for taking the time to engage with our work. We’re glad the clarifications and additional results addressed your concerns, and we appreciate your updated assessment and support.

审稿意见
7

Quality: The paper presents a high-quality contribution, offering a theoretically grounded and empirically validated method for differentially private relational learning. The approach is carefully designed to overcome limitations of existing DP training methods when applied to graph-structured data. Experiments are comprehensive and support the claims.

Clarity: The paper is generally well-written and easy to follow. The problem is clearly motivated, and the proposed solution is explained with appropriate use of figures and equations.

Originality: The work introduces a novel decoupling method for negative sampling to make DP-SGD applicable in relational settings, and it proposes an efficient gradient clipping technique for large-scale models. These contributions address a clear gap in both privacy and graph learning literature.

Significance: The significance is strong. The ability to privately fine-tune LLMs on graph data has wide applications in sensitive domains. The proposed method advances the field of privacy-preserving machine learning in a meaningful way.

接收理由

  1. Novel and principled methodology: The decoupling of negative sampling for DP compatibility is elegant and effective. The custom gradient computation strategy to reduce O(KM) overhead is clever and practically necessary.

  2. Strong empirical results: The approach consistently improves upon baseline models while respecting differential privacy constraints.

  3. Scalability: Demonstrating private fine-tuning of models as large as Llama2-7B, which is non-trivial under memory and privacy constraints.

  4. Comprehensive analysis: Includes motivation to decouple the sampling process, empirical privacy leakage, and utility-privacy-computation trade-offs.

拒绝理由

  1. Limited relation types: The paper focuses solely on binary relations.

  2. Dependency on graph structure: The approach assumes that relational samples can be reasonably decoupled, which might not hold in densely connected or overlapping relations.

评论

We sincerely thank the reviewer for their strong support and positive evaluation, particularly for recognizing the novelty, significance, and empirical strength of our work. We address the two minor points raised:

Limited Relation Types

We appreciate this observation. While our experiments primarily focus on pairwise relations to demonstrate the core methodology, our approach is designed to be extensible. As discussed in Section 3 (lines 155-158), the principle of decoupled negative sampling and tuple-level gradient clipping can potentially be applied to more complex relational structures, such as knowledge graph triplets, network motifs, or hyperedges. Given that our current empirical studies on pairwise relations already provide substantial support for the method's effectiveness in private relational learning, we consider the extension to other relation types a promising direction for future work.

Dependency on Graph Structure

Thank you for pointing out the potential influence of graph structure. The concern that decoupled negative sampling might inadvertently select true positive pairs as negatives, especially in dense graphs, is valid. However, several factors mitigate this: (1) Sparsity of Real-World Graphs: Many real-world graphs, including those used in our experiments (with densities ranging from 1.58×10−5 to 5.01×10−6 as derived from Table 4 data), are inherently sparse. In such cases, the probability of randomly sampling an existing positive partner as a negative is quite low. (2) Empirical Utility: Our experiments (Tables 1, 2, and 8) demonstrate strong utility, suggesting that this issue does not significantly degrade performance in practice. This aligns with our note in footnote 1 (line 204) that "no obvious harm to utility is observed in practice."

评论

Thanks for the authors' reply. Given the postive rating, I will keep my score.

审稿意见
5

Due to efficiency, most graph neural network leverage one edge as positive and negative examples in one mini-batch. This lead to larger influence when changing one relationship. To solve this problem, the paper proposed to sample random from the whole entity set as negatives. The paper also proposes a low-rank characterization of per-sample gradient to improve computation efficiency. Many experiments has been proposed to demonstrate effectiveness of the method.

接收理由

The ablation study on the tradeoff is complete. The proposed method is incremental but together may provide a good solution for private learning in graph in finetuning LLMs.

拒绝理由

Section 3.3 is really hard to read for people who are unfamiliar with the domain. How does tracking tuple-level gradient influence computation cost? There is no standard deviation on the benchmark. This makes me hard to judge method's performance. When the author mentioned that their methods are efficient, it is useful to show the training time comparison to demonstrate its effectiveness. The paper did mentioned the trade off but did not mention how efficient the method is compared to other methods.

给作者的问题

  1. Can we just simply include each relationship once in (2) to form loss function? The potential impact of changing one relationship will only impact one part of loss function then.
评论

We thank the reviewer for recognizing the effectiveness of our proposed solution and the completeness of our experiments. We feel sorry for the induced confusion in Sec. 3.3 and will revise it for improved clarity in the final version. We address the reviewer’s concerns and questions below:

Q1 How does tracking tuple-level gradient influence computation cost?

Our primary innovation for tuple-level gradients addresses the significant memory overhead typically associated with per-sample gradients in relational learning. As detailed in Sec. 3.3 (lines 252-256), a naive approach would require caching O(KM)O(KM) gradient copies per loss term, leading to O(KMpd)O(KMpd) memory. Our proposed method (lines 249-252) reduces this to O(KM(p+d)+pd)O(KM(p+d)+pd). For large models where the parameter cost pdpd is substantial, this is a memory saving of approximately a factor of KMKM, making the approach feasible. Once this memory bottleneck is addressed, the computational time for processing a tuple and performing the backward pass to get the aggregated per-tuple gradient becomes efficient. The sum over tokens and entities (as described by raTra^T in lines 251-252) is a batched matrix multiplication. The subsequent clipping and noise adding are standard. Thus, the primary drivers of training time per step become the model size, batch size, and sequence length, similar to standard DP training on non-relational data, rather than being prohibitively inflated by the K entities in a tuple.

Q2 There is no standard deviation on the benchmark.

We acknowledge the reviewer's preference for standard deviations. Given the significant computational resources and time required for privately fine-tuning LLMs (up to 7B parameters in our work, as noted in lines 91-93), reporting error bars was unfortunately infeasible through multiple full training runs for each experimental setting. This is a common challenge in the field, and several prior works on private LLM fine-tuning also present results from single runs (e.g., Li et al., 2021; Yu et al., 2022). We ensured our hyperparameter tuning was thorough (Appendix D, Table 6) to find stable optimal performance for the presented results.

Q3 ...show the training time comparison to demonstrate its effectiveness.

Our method's main efficiency gain is a significant reduction in GPU memory usage, which is critical for the practicality of fine-tuning large models on relational data with DP. As detailed in Sec. 3.3 (lines 252-256), a naive per-token gradient approach would incur O(bKMpd)O(bKMpd) memory per batch, whereas our per-tuple gradient computation reduces this to O(bKM(p+d)+bpd)O(bKM(p+d)+bpd). This represents an approximate memory saving factor of KMKM, which is substantial. This memory efficiency makes it feasible to train large models like Llama2-7B on relational data with DP, which would otherwise be impractical due to out-of-memory (OOM) errors.

Regarding direct training time comparisons with 'other methods':

  • Baseline Infeasibility: A direct comparison to a naive DP approach for relational learning that materializes all O(KM)O(KM) intermediate gradients (the 'Aggregating per-token gradients' baseline) is impossible for LLMs, as it would quickly lead to OOM errors, preventing completion of training or even single steps at reasonable batch sizes.
  • Throughput: Our method's efficiency enables practical batch sizes, leading to improved training throughput compared to what would be achievable if one were forced to use extremely small batches with a naive memory-intensive approach. For example, with Llama2-7B (parameters d=4096,p≈4096 or 32000), K\in[10,34], and M=32 (lines 254-255), the KM factor is significant.

Instead of direct time, the crucial benefit is feasibility and the ability to use batch sizes large enough for effective DP training (as larger batches improve the signal-to-noise ratio, lines 393-396). We add the Table R2 below with concrete memory usage from our experiments using Llama2-7B with our method v.s. the projected usage for a naive approach, and report the achieved training throughput to illustrate this point.

MethodGPU VRAM (MB/tuple) ↓Throughput (tuples/s) ↑
Aggregating per-token gradients1440.317.92
Per-tuple gradient computing164.3110.08

Q4 Can we just simply include each relationship once in (2) to form loss function?

We assume “include each relationship once in (2)” refers to only including a positive or negative relation for a loss term, which reduces the problem to a standard case for private learning. However, without pairing positive and negative relations in a tuple, the model cannot effectively capture the relational pattern and is prone to drifting due to biased signals. We observed that the model often collapses when only one relationship is included in the loss in a non-private setting, and our study on the impact of negative samples in Fig. 2 is also consistent with such an observation.

评论

Q1: Do you mean computation cost is reduced due to reducing computation memory?

Q2: I am not happy with the answer, given that you are using A100 (80GB) GPUs and PEFT for finetuning. You also mentioned that you have done thorough parameters search. This raises me to doubt why you cannot report standard deviation. Is there any work demonstrate that DP finetuning model have low standard deviation?

Q3 / Q4 makes sense to me.

评论

We sincerely thank the reviewer for the follow-up questions and constructive engagement with our work. We address each point in detail below:

[Q1] Thank you for the opportunity to clarify. Our proposed per-tuple gradient computing primarily reduces memory cost, which in turn enables more efficient computation in practice. Specifically, by avoiding the need to store per-token gradients, we reduce memory usage per tuple from 1.44GB to 164MB, as shown in Table R2 above. This reduction is crucial for large models such as Llama-7B and allows us to use larger batch sizes, thereby improving training throughput. While the FLOPs required remain similar, the overall training efficiency improves significantly by avoiding memory bottlenecks and better utilizing available hardware.

[Q2] We appreciate the reviewer's point. While PEFT reduces the number of trainable parameters, private fine-tuning of LLMs still invovles considerable overhead due to (1) repeated backward passes to compute tuple-level gradients, (2) additional steps for gradient clipping and noise addition, and (3) the need for large batch sizes to maintain reasonable utility in DP training (lines 393-396). As noted in our response to Reviewer SESE's Q1, we provided detailed memory usage per step: even with A100 (80GB) GPUs, we reach near capacity when fine-tuning Llama-7B with moderate batch sizes. All experiments were conducted using two cloud GPUs, further limiting our ability to run repeated experiments across all configurations, which are common constraints and challenges in the field of LLM fine-tuning.

Our parameter search was conducted using Bayesian hyperparameter optimization with early stopping to efficiently navigate the large search space. The reported results reflect a carefully selected configuration, consistent with prior work in this area (Li et al. 2021, Yu et al. 2022), where reporting standard deviation is often infeasible due to the high cost of private LLM fine-tuning.

That said, we agree that variance metrics can be helpful. To address this, we now include results for Bert.base with ϵ=10\epsilon=10 on four datasets with 5 random seeds, demonstrating that our DP fine-tuning process is stable and produces low variance:

ModelMAG-USAMAG-CHNAMAZ-ClothAMAZ-Sports
ϵ=10\epsilon =10PREC@1/MRRPREC@1/MRRPREC@1/MRRPREC@1/MRR
Bert.base22.59±0.41/33.23±0.4536.41±0.79/48.51±0.7732.28±0.20/42.79±0.2226.40±0.27/36.44±0.27

We will continue expanding this analysis and consolidating these results in the final version.

We thank the reviewer again for the feedback. We hope our responses address your concerns, and we welcome any further questions or suggestions.

审稿意见
6

This paper presents a sampling method named decoupled sampling for graph data private learning. The problem this paper aims to solve is that when applying traditional DP-SGD method to graph data, the positive sample and sampled negative sample may contain the same entities, which breaks the gradient decoupling assumptions. In the proposed decoupled sampling, the positive and negative samples are involved with different entities, which can bypass this problem. The experiments on a few datasets validate the effectiveness of the proposed method.

接收理由

  1. Graph data private learning is an important research topic.

  2. The proposed method seems reasonable.

  3. The experimental results are promising.

拒绝理由

  1. The proposed method is very simple. One concern is that since the sampling process is manipulated, it is not clear whether it will bring impact on the learning outcomes. More solid theoretical analysis should be included.

  2. The proposed method seems designed for relation prediction. I am not sure whether it can work for other graph learning tasks, for example, sub-graph classification.

  3. The relation of this paper with LLM is not strong.

给作者的问题

Please refer to above comments.

评论

We thank the reviewer for acknowledging the importance of our study and the effectiveness of our solution. We address three concerns as follows:

Q1 ... the sampling process is manipulated, it is not clear whether it will bring impact on the learning outcomes.

We appreciate the reviewer's concern about the potential impact of decoupled negative sampling on learning outcomes. Empirically, we address this by comparing decoupled sampling with standard in-batch negative sampling in a non-private setting. We will include these results (see Table R1 in response to Reviewer SESE's Q4) in the revised Appendix. These results show that decoupled sampling achieves comparable utility, indicating no significant adverse impact on learning outcomes (consistent with our observation in footnote 1, line 204).

From a learning perspective, sampling negative relations from the entire entity set VV is a well-established strategy in representation learning to provide diverse negative examples for contrastive-style losses. While our primary motivation for this choice is its necessity for enabling relation-level DP (as it decouples dependencies), our experiments (Tables 1 & 8) demonstrate that this 'manipulation' for privacy does not unduly compromise the model's ability to learn relational patterns effectively. A deeper theoretical analysis of the utility trade-offs of different negative sampling strategies under DP could be an interesting direction for future work.

Q2 I am not sure whether it can work for other graph learning tasks, for example, sub-graph classification.

Our proposed method achieves a privacy guarantee of individual relations used for relational learning, which injects structural and contextual patterns of relational data into model weights, as formulated in Sec. 3, lines 138-158. We demonstrate the effectiveness of our pipeline in two types of tasks: relation prediction and entity classification.

Our method can potentially be used for subgraph-level tasks through the readout of entity embeddings of a given subgraph, which provides a relation-level DP. To achieve DP for each individual subgraph, standard DP-SGD can be directly applied for subgraph classification.

Q3 The relation of this paper with LLM is not strong.

We respectfully disagree with the reviewer’s comment, as language models are integral to our work's motivation, methodology, and findings.

  1. Leveraging LLM Capabilities: Our approach is designed to fine-tune LLMs (like BERT and Llama2 ) using sensitive graph data. We specifically leverage their strong ability to understand and generalize from the entity's textual attributes to learn relational patterns, especially in challenging cross-domain scenarios (Sec. 4.1) where interpreting rich text is paramount. This can be particularly advantageous compared to models that might not capture nuanced textual semantics as effectively.
  2. Methodological Adaptations for LLMs: A core part of our technical contribution is the efficient gradient clipping method (Sec. 3.3), which is specifically designed to handle the computational complexities of applying DP-SGD to LLMs in a relational learning context (involving K entities, each with M tokens). This makes private fine-tuning of large models like Llama2-7B feasible.
  3. Extending Private LLM Research: Our work is among the first to systematically investigate privacy-preserving fine-tuning of LLMs specifically for relational learning tasks, extending the body of research on private LLM (which has largely focused on standard non-relational text data) to this new, important domain. We also provide extensive analysis on the privacy, utility, and efficiency trade-offs in this specific context.

We believe these aspects together demonstrate a strong and critical connection to LLMs throughout our research.

评论

Thank the authors very much for responding to my concerns. I am happy with their answers.

审稿意见
6

This paper introduces a framework for privacy-preserving relational learning, with a focus on fine-tuning large language models (LLMs) using graph-structured data under differential privacy (DP) guarantees.

The core challenge addressed is the incompatibility of existing DP methods like DP-SGD with relational learning, due to interdependent data samples inherent in graph structures. The authors propose a training pipeline that decouples observed and unobserved relational samples, thereby restoring the theoretical soundness of DP-SGD in this context.

The approach is evaluated on real-world datasets, demonstrating that it significantly outperforms existing DP baselines (e.g., randomized response) while maintaining strong privacy guarantees (ε ≤ 10). Experimental results also highlight favorable trade-offs among utility, privacy, and computational efficiency, making this solution practically relevant.

接收理由

  1. The paper tackles the challenge of applying differential privacy to relational learning tasks. While DP has been well studied in standard supervised settings, its integration with relational data—where dependencies between samples violate DP-SGD assumptions—remains underexplored.

  2. The proposed solution—decoupling the sampling of negative relations to ensure compatibility with DP-SGD—is well motivated. The authors also address practical challenges in computing per-sample gradients for large models by introducing an efficient low-rank approximation method, enabling scalable training on modern LLMs.

  3. The evaluation goes beyond utility metrics by also including privacy risk assessment via membership inference attacks, and a study of trade-offs among privacy, utility, and efficiency.

拒绝理由

While the paper addresses an important problem at the intersection of privacy and relational learning, the reviewer has several concerns.

First, the core privacy argument relies on the claim that decoupled negative sampling ensures that adding or removing a single positive relation affects at most one tuple in a mini-batch, thereby enabling the use of DP-SGD. However, sampling negative entities from the entire entity set introduces a global dependency: adding or removing a relation can change the composition of the entity set itself. If an entity is unique to a newly added or removed relation, its presence or absence would alter the distribution from which negatives are sampled, thereby potentially affecting many tuples in a mini-batch. I am not an expert in relational learning, so if there are standard practices or assumptions I am missing (such as assuming a fixed entity set independent of the relation set), I welcome clarification.

Second, the experimental design is somewhat intuitively not reasonable to the reviewer. The authors evaluate generalization by fine-tuning on one domain and testing on a completely different domain, where both the entities and their relationships are disjoint. It is unclear what meaningful relational knowledge could transfer in such a setting, especially when the core assumption of relational learning is that structure matters. For my perspective, a more intuitive and controlled experiment might involve within-domain generalization—e.g., holding out a subset of entities and their associated relations from the training set—so that the model is tested on structurally similar but unseen data. This would better isolate the model's ability to generalize relational patterns. The current design casts doubt on whether the observed performance gains are attributable to relational learning or merely from memorizing local patterns. Again, I am not an expert in relational learning. So if there is precedent for the current cross-domain setup in prior work, it would be helpful for the authors to cite and explain it.

Third, the method that sampling negatives from the full entity set is relatively straightforward and simple. One might argue that this is a more natural choice than in-batch negatives (although the latter may be typically used for efficiency). The main contribution, therefore, seems to be an analysis showing that global negative sampling is more amenable to DP-SGD, rather than providing a novel DP learning pipeline tailored specifically for relational learning.

给作者的问题

Refer to Reasons To Reject.

评论

We thank the reviewer for recognizing the significance of our work and for the detailed and constructive feedback. We address three concerns as follows:

[Q1] We thank the reviewer for this insightful question, which touches upon a subtle but important aspect of defining differential privacy for relational data.

The reviewer's concern is about the potential change in the entity set VV if an entity is unique to an added or removed relation, and how this might affect negative sampling. Our work, as defined in Sec. 3.1, focuses on relation-level DP, where two relation sets EE and EE' are considered adjacent if one can be obtained from the other by adding or removing a single relation. Critically, for this definition of adjacency, the underlying entity set V is assumed to be fixed. The privacy guarantee pertains to the indistinguishability of models trained on EE v.s. EE’, where only one relation differs, not the entities themselves. This is a common setup when defining relation (edge)-level DP in graph settings.

Thus, when we add or remove a positive relation eEe\in E (to define adjacent datasets for DP), the overall entity set VV (from which negatives are sampled) does not change. Our decoupled negative sampling, which samples negatives uniformly from the set VV, is designed to be independent of the set EE (beyond the specific positive relation in a tuple). This independence is key to ensuring that perturbing one relation ei+e_i^+ affects at most one tuple EiE_i in the mini-batch, allowing DP-SGD to be correctly applied. The dependency on the global entity set VV for negative sampling does not violate relation-level DP because VV is constant across adjacent relation sets.

We acknowledge that defining DP at the entity level is a different and more complex setting, which is beyond the scope of this work. Our experiments demonstrate that robust relational learning is achievable under our proposed relation-level DP framework. We hope this clarifies the definition of adjacency used and how our negative sampling strategy is compatible with it. We will add the above discussion in Sec. 3.1 to improve clarity.

[Q2] We appreciate the reviewer's perspective and suggestion on the experimental design. While within-domain evaluation is indeed valuable for assessing certain aspects of model performance, our cross-domain setup was a deliberate choice, designed to test a more challenging and practically relevant form of generalization, especially under privacy constraints. Our motivation, as outlined in Sec. 4 (lines 261-273), is to simulate real-world scenarios, e.g., 'cross-category co-purchase recommendation' and 'cross-regional model deployment.' In these situations:

  1. Disjoint Data is Realistic: Entities and their specific relationships are often entirely disjoint between, e.g., different product departments or different operational regions.
  2. Privacy is Paramount for Sharing: The need for DP often arises precisely because sensitive relational data from one domain cannot be directly shared or used to train models for another, even if within-domain data might have fewer sharing restrictions. Our setup directly reflects this challenge.
  3. Transfer of Abstract Knowledge: We aim for the model to learn transferable relational patterns or the underlying knowledge of forming relations from textual attributes, rather than memorizing specific entity-relation instances. For example, an LLM might learn how language describing certain types of products often links to co-purchase relations, even if the products themselves are from different categories. The disjoint nature of entities/relations in our setup rigorously tests this ability to generalize abstract relational knowledge.

[Q3] We agree with the reviewer that sampling negatives from the entity set is conceptually straightforward. However, our contribution is not this specific sampling method alone, but rather the study of this complete and theoretically sound privacy-preserving pipeline for relational learning, which has been previously unaddressed. Our key contributions are:

  1. Identifying the DP Challenge: We first pinpoint why common negative sampling methods (e.g., in-batch) are incompatible with DP-SGD in relational learning due to coupled dependencies.
  2. A DP-Compatible Solution: We propose a pipeline that uses decoupled negative sampling to enable correct DP-SGD application. This involves tuple-level gradient clipping and noising to protect individual relations.
  3. Efficient Implementation for LLMs: Crucially, we introduce a tailored, efficient gradient computation method (Sec. 3.3) that makes this approach practical for large models like LLMs by exploiting low-rank gradient structures, which is essential given the setting of multiple entities per tuple.
  4. Rigorous Validation: We provide extensive experiments demonstrating our pipeline's superior utility, assess privacy empirically via MIAs, and analyze practical trade-offs.
评论

The reviewer appreciates the authors' responses to the concerns raised.

Regarding [Q1], the clarification about the assumption of a fixed entity set for relation-level DP is helpful and addresses the core of the concern. While this assumption may be a common setup in edge-level DP for graphs, it would be beneficial to make this explicit earlier in the paper, perhaps in the Preliminaries section, so that readers unfamiliar with this convention can follow the privacy argument more easily.

For [Q2], the reviewer understands and acknowledges the motivation for the chosen cross-domain setup, especially in light of practical applications like cross-category recommendation or cross-regional deployment. However, I still believe that a within-domain evaluation would serve as a valuable complement, since it remains somewhat unclear how transferable relational knowledge is learned across disjoint domains. Except for adding the experiment, including a concrete example or visualization could also be helpful in that regard. Furthermore, it would be helpful to explicitly describe the input/output format of the model in your setup—this was difficult to infer from the current draft.

Overall, my concerns have been partially addressed. I appreciate the authors’ clarifications, and I will increase my rating to 6.

评论

We sincerely thank the reviewer for the thoughtful follow-up and for increasing the rating. We’re glad that our clarifications addressed the core concerns, and we appreciate the additional suggestions for improving the clarity of our paper.

[Q1] We will clarify the setting of edge-level DP, including the assumption of a fixed entity set in the Preliminaries, to make the privacy framework more accessible to broader audiences.

[Q2] We appreciate the reviewer’s recognition of our motivation for the cross-domain setup. To further illustrate how relational knowledge is transferred across disjoint domains under privacy constraints, we will include a diagram in the final version. We also acknowledge that the input/output format of our pipeline could be better specified. We will include examples and a more detailed description in the revised Appendix D.

审稿意见
6

This paper addresses the open problem of differentially-private relational learning—training models with supervision coming from pairs or tuples of related entities (e.g. co-purchase edges) rather than independent samples. Empirical studies show improvements in relational learning tasks while maintaining robust privacy guarantees.

接收理由

  1. The authors conduct a lot of experiments, providing solid empirical validation.
  2. Efficient gradient clipping methods make the approach viable for large-scale applications

拒绝理由

  1. The private guarantee is stated informally; a theorem with assumptions and proof sketch would strengthen rigor.
  2. The evaluation primarily focuses on relational prediction tasks. Broader validation across more diverse relational learning tasks might enhance the claims.

给作者的问题

  1. Can you provide memory and wall-clock time analysis?
  2. Can you provide theoretical analysis of privacy guarantees?
  3. Have you tried applying the pipeline to knowledge-graph completion datasets?
  4. How does decoupled negative sampling affect link-prediction recall or AUC?
评论

We thank the reviewer for recognizing our work addressing the open problem of private relational learning with thorough experimentation and appreciating our practical solution for large-scale applications. We clarify the questions as follows:

Q1 Can you provide memory and wall-clock time analysis?

We provide an analysis of memory complexity savings in Section 3.3 (lines 253-259), where we show our method reduces memory cost by a factor of approximately O(KM)O(KM) compared to a naive approach. This memory efficiency is crucial for the feasibility of training LLMs on relational data with DP. Below, we provide representative GPU VRAM usages (with batch size b=32b=32 relation tuples) and achieved training throughputs from our experiments.

Base ModelVRAM (Base + Adapter)VRAM (Backprop.)Peak VRAMThroughput (tuples/s)
BERT.base1,178 MB6,044 MB12,175 MB67.52
BERT.large2,006 MB14,284 MB21,428 MB14.04
Llama2-7B26,419 MB66,562 MB79,020 MB14.41

Q2 Can you provide theoretical analysis of privacy guarantees?

With our decoupled negative sampling, the proposed pipeline Alg. 3 in Appendix C achieves (ϵ,δ\epsilon, \delta)-DP for individual training relation, which is formally defined in Sec. 3.1, lines 162-164. Specifically, the decoupled sampling ensures the sensitivity of the batch gradients is bounded regardless of batch size bb, i.e., at most one relation tuple would be affected when one relation is added or removed from the relation set EE. The tuple-level gradient clipping further limits the sensitivity to a constant CC. We will formalize this analysis into a proposition in our revised version.

Based on the Gaussian mechanism, each parameter update from batch gradient with added Gaussian noise in Alg. 3 achieves DP. Lastly, we leverage the composition theorem (Balle & Wang, 2018) to account for the total privacy loss over T training steps, and convert it to (ϵ,δ\epsilon, \delta)-DP. The privacy loss ϵ\epsilon on each relational dataset used for training is reported in Table 9, Appendix E.

Q3 Have you tried applying the pipeline to knowledge-graph completion datasets?

We haven’t applied our pipeline to KG completion tasks, even though our method can be extended to KG for relational learning as discussed in Sec. 3, lines 151-155. Our current empirical studies have sufficiently supported the effectiveness of our proposed method for private relational learning. We thank the reviewer for pointing out this direction, and it would be great to explore in the KG setting for future work.

Q4 How does decoupled negative sampling affect link-prediction recall or AUC?

We evaluate link prediction using PREC@1 and MRR, which are common ranking metrics for these tasks, focusing on the accuracy of top-ranked predictions. In our experiments (and as noted in footnote 1, line 204), we found that decoupled negative sampling (sampling negatives randomly from the entire entity set VV) achieves comparable utility to standard in-batch negative sampling in a non-private setting, without the obvious harm to performance that might be feared from its 'simplicity.'

To provide a direct comparison as requested, the following table (Table R1) shows results using our primary metrics for these two negative sampling strategies (both in a non-private training setting to isolate the effect of the sampling strategy itself):

MethodMAG-USAMAG-CHNAMAZ-ClothAMAZ-Sports
Bert.basePREC@1/MRRPREC@1/MRRPREC@1/MRRPREC@1/MRR
In-batch negatives28.07/39.1141.93/53.9136.13/47.0729.84/39.61
Decoupled sampling28.44/39.4541.90/54.1935.78/46.7928.33/38.59
评论

I thank the authors for their responses and new results. I will keep my score.

最终决定

This paper addresses the important problem of differentially-private relational learning. The authors propose a method that separates the sampling of positive and negative relations and introduces an efficient tuple-level gradient clipping mechanism, enabling practical application of DP-SGD even for large-scale (7B parameters) language models. Experiments across four real-world text-attributed graphs demonstrate clear improvements over a baseline employing randomized-response under practical privacy budgets (ε ≤ 10). Privacy leakage was analyzed via membership-inference attacks, adding practical relevance to the approach.

Pros:

Clearly motivated, addressing an under-explored and significant problem relevant to private fine-tuning of large language models.

The technical contribution of decoupling negative sampling and using a low-rank gradient clipping technique provides practical scalability.

Extensive and thorough experiments illustrate robust privacy–utility–efficiency trade-offs, supported by meaningful ablations and empirical privacy analyses.

General consensus among reviewers is positive, noting methodological soundness and strong empirical validation.

Cons:

The paper lacks a formal privacy guarantee with a rigorous proof, which reviewers suggest should be included to enhance theoretical grounding.

Section 3.3, describing the tuple-level gradient approach, needs clearer presentation, possibly via pseudocode or concrete examples.

Experiments are primarily restricted to binary relations, and extending the approach to multi-relation or hyper-edge scenarios would enhance generality.

Variance and computational details (e.g., multiple runs with reported variance) are currently limited, potentially impacting reproducibility and confidence in results.

Cross-domain evaluation alone raised some reviewer concerns; including an in-domain baseline would better contextualize performance.