PaperHub
4.8
/10
Rejected4 位审稿人
最低3最高6标准差1.1
5
5
6
3
3.3
置信度
正确性2.5
贡献度2.3
表达2.5
ICLR 2025

Label Privacy in Split Learning for Large Models with Parameter-Efficient Training

OpenReviewPDF
提交: 2024-09-27更新: 2025-02-05
TL;DR

We develop a practical two party algorithm for fine-tuning large language models over an API by taking advantage of PEFT algorithms.

摘要

As deep learning models become larger and more expensive, many practitioners turn to fine-tuning APIs. These web services allow fine-tuning a model between two parties: the client that provides the data, and the server that hosts the model. While convenient, these APIs raise a new concern: the data of the client is at risk of privacy breach during the training procedure. This challenge presents an important practical case of vertical federated learning, where the two parties perform parameter-efficient fine-tuning (PEFT) of a large model. In this study, we systematically search for a way to fine-tune models over an API *while keeping the labels private*. We analyze the privacy of LoRA, a popular approach for parameter-efficient fine-tuning when training over an API. Using this analysis, we propose P$^3$EFT, a multi-party split learning algorithm that takes advantage of existing PEFT properties to maintain privacy at a lower performance overhead. To validate our algorithm, we fine-tune DeBERTa-v2-XXLarge, Flan-T5 Large and LLaMA-2 7B using LoRA adapters on a range of NLP tasks. We find that P$^3$EFT is competitive with existing privacy-preserving methods in multi-party and two-party setups while having higher accuracy.
关键词
Split LearningVertical Federated LearningFederated LearningParameter Efficient Fine-tuningPrivacyLarge Language Models

评审与讨论

审稿意见
5

This work analyzes privacy-preserving fine-tuning of LLMs in the context of parameter-efficient fine-tuning and the two-party split learning setting. The authors show empirically that ordinary fine-tuning reveals sensitive label and successfully alleviates this privacy risk by obfuscating the back propagated gradient using forward and backward API. Experiments on 3 datasets are done.

优点

This paper focuses on a very practical scenario and attempts to address an importance and realistic privacy issue. Also, I like the visualization of the of top-2 principal components of gradients and activations from different fine-tuning steps before and after the defense is applied which directly demonstrated the effectiveness of the proposed method.

缺点

  1. The writing of the paper needs to be improved. For example, the last sentence in the abstract is hard to understand. "We find that P3EFT is competitive with existing privacy-preserving methods in multi-party and two-party setups while having higher accuracy." What is competitive when higher accuracy is achieved.
  2. Experimental datasets are inadequate. Only classification datasets are used although generation task is also mentioned in the paper.
  3. Aside from label privacy, I believe sample input privacy is also important. However, this work exposed the input samples completely to the LLM. I expect that the authors chould provide a proper real-world setting in which only label information but not input features are sensitive and should be protected.

问题

  1. As I know, some closed-source model training API requires uploading the whole training dataset to the server. Is the proposed method able to preserve the privacy of label under this setting? If not, what might can be done to address this case?
评论

We thank the reviewer for their feedback and address their concerns and question below.

For example, the last sentence in the abstract is hard to understand.

We thank the reviewer for their careful attention to our paper's writing. We agree that the original phrasing contained a logical contradiction. Our intention was to convey that P3^3EFT successfully competes with existing methods and outperforms them.

To address this ambiguity, we propose the following reformulation: "We find that P3^3EFT outperforms existing privacy-preserving methods in multi-party and two-party setups." We would welcome the reviewer's thoughts on this revised phrasing.

Additionally, we are open to any further specific suggestions the reviewer might have regarding the paper's writing quality and clarity.

Only classification datasets are used although generation task is also mentioned in the paper.

Indeed, while we mention generation in line 043, it specifically refers to image generation. But we acknowledge that the current phrasing might be ambiguous, and we will revise this statement in the updated version to provide better clarity.

Regarding text generation specifically, we agree that ensuring label privacy in such setups is of significant importance. However, this particular scenario presents its own unique challenges. For instance, text generation tasks utilize the same tokens for both inputs and labels, creating potential information leakage in both directions. Adapting our method to address these challenges would require additional approaches --- such as integration with input privacy methods --- which would extend beyond the scope of this paper. We believe these considerations require separate, focused study.

We've chosen to focus on classification as a specific, but important setup. This focus allows us to thoroughly examine key aspects of privacy-preserving techniques in split learning. We believe this provides valuable insights that can serve as a foundation for future research.

Aside from label privacy, I believe sample input privacy is also important.

We share the reviewer's emphasis on the critical nature of input privacy. In our work, however, we specifically explore label privacy as one significant component of private learning [1,2,3,4], while recognizing it is not the only one. Since our approach is orthogonal to most feature privacy methods, it can be effectively combined with these methods to provide comprehensive protection for both inputs and labels.

I expect that the authors chould provide a proper real-world setting in which only label information but not input features are sensitive and should be protected.

We provide a potential real-world example of such a scenario in lines 060-064. Specifically, we consider the case of social media user pages, where the data is publicly available, while the labels --- such as user behavior and click information --- are accessible only to the social network.

As I know, some closed-source model training API requires uploading the whole training dataset to the server. Is the proposed method able to preserve the privacy of label under this setting? If not, what might can be done to address this case?

Not in the exact setting you describe. Specifically, requiring clients to provide their exact dataset to the server would result in complete exposure of private data. Traditional approaches to address this privacy concern primarily rely on anonymization techniques, which include either the systematic removal of sensitive components from the training data [5] or the generation of synthetic data that approximates the original distribution while avoiding exact private records [6]. However, these methods have notable limitations, as both approaches typically lead to diminished model accuracy and cannot guarantee complete privacy preservation.

To overcome these limitations, we explicitly design the API protocol in Section 3 in such a way that does not require the client to share their training labels with the server.

[1] Li et al., Label leakage and protection in two-party split learning. ICLR 2022.

[2] Sun et al., Label leakage and protection from forward embedding in vertical federated learning. arXiv:2203.01451, 2022.

[3] Liu and Lyu., Clustering label inference attack against practical split learning. arXiv:2203.05222, 2022.

[4] Zou et al., Defending batch-level label inference and replacement attacks in vertical federated learning. IEEE Transactions on Big Data, 2022.

[5] Gardiner et al., Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts, Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization. 2024

[6] Xie et al., Differentially Private Synthetic Data via Foundation Model APIs 2: Text. arXiv:2403.01749, 2024.

评论

Thanks for your response. I have no futher question.

评论

We thank the reviewer for acknowledging our response. However, as we have provided detailed explanations addressing each concern, we would appreciate clarification on what specific issues remain that justify maintaining the original score. We would be happy to further engage in discussion if there are some unresolved concerns.

审稿意见
5

This paper addresses the problem of parameter-efficient fine-tuning (PEFT) over an API, focusing on preserving the privacy of training labels. It starts by analyzing the label privacy implications of API fine-tuning with LoRA, a commonly used PEFT algorithm. Empirical results demonstrate that in a split learning setup, LoRA may leak the client’s training labels. Specifically, the unprotected transmission of gradients, model parameters, and activations in split learning can expose training labels to privacy attacks. To address this vulnerability, the paper introduces the P3EFT\text{P}^3\text{EFT} (privacy-preserving parameter-efficient fine-tuning) framework. P3EFT\text{P}^3\text{EFT} employs private backpropagation and obfuscates learned activations to protect label privacy by securing activations and gradients during communication between client and server. In their PEFT experiments, the paper shows that P3EFT\text{P}^3\text{EFT} achieves better label privacy with a small reduction in utility compared to previous split learning methods. Privacy leakage is measured using metrics such as Spectral attack AUC, Norm attack AUC, and k-means accuracy.

优点

  1. Although P3EFT\text{P}^3\text{EFT} is based on the analysis of LoRA, its approach can be applied to any PEFT method aiming to address label privacy in split learning during PEFT over an API.

  2. The concept of private backpropagation leverages the conditional linearity of the backpropagation operator, which, while conceptually similar to the secure aggregation protocol used in horizontal federated learning, is well-suited for the vertical FL setting of this problem. This approach can also extend to other vertical FL or split learning scenarios requiring gradient propagation through an untrusted party and is adaptable for future algorithms.

  3. P3EFT\text{P}^3\text{EFT} performs significantly better than existing PEFT methods in preserving label privacy, with only a minor drop in accuracy. Moreover, compared to the distance correlation defense (DC), another privacy-aware algorithm, P3EFT\text{P}^3\text{EFT} achieves superior label privacy, making it a strong choice for API-based PEFT scenarios.

缺点

  1. While label privacy in P3EFT\text{P}^3\text{EFT} is evaluated using metrics such as Spectral attack AUC, Norm attack AUC, and k-means accuracy, demonstrating strong performance on these measures, it lacks a formal theoretical guarantee (e.g., differential privacy). Although differential privacy is mentioned as providing loose upper bounds on potential privacy leakage, establishing a theoretical privacy guarantee for P3EFT\text{P}^3\text{EFT} would enhance understanding of privacy leakage and offer a more solid theoretical basis for fair comparisons with other label privacy-focused algorithms.

  2. This paper addresses label privacy in PEFT over an API within an "honest but curious" attacker model, making it a unique but somewhat limited scenario in terms of generality. Analyzing cases where the server is not "honest" would provide a broader perspective, especially given the emphasis on worst-case privacy scenarios in the paper. Additionally, extending the framework to protect both input and label privacy would be a natural generalization, though it appears challenging to adapt P3EFT\text{P}^3\text{EFT} for this purpose.

  3. Some references in the paper would benefit from clarification. For instance, line 52 refers to private multi-party fine-tuning methods that support a narrow class of fine-tuning algorithms but does not specify which algorithms fall within this scope and which do not. Similarly, line 132 states that differential privacy upper bounds are loose and do not align with practical observations on real models, but it lacks a reference or supporting argument, leaving some statements unclear.

  4. There are a few minor writing issues, including a formulation error, which I have included in the question section for reference.

问题

  1. In line 237, it is stated that leaving gradients, activations, or parameters unprotected would compromise label privacy. While P3EFT\text{P}^3\text{EFT} obfuscates gradients and activations, it is unclear why parameters do not require similar privacy measures in the context of this paper.

  2. In line 266, it appears that the expression backprop(x,θ,gh)\text{backprop}(x, \theta, g_h) should be written as 12(backprop(x,θ,gh+z)+backprop(x,θ,ghz))\frac{1}{2} \Bigl( \text{backprop}(x, \theta, g_h + z) + \text{backprop}(x, \theta, g_h - z) \Bigr) instead of (backprop(x,θ,gh+z)+backprop(x,θ,ghz))\Bigl( \text{backprop}(x, \theta, g_h + z) + \text{backprop}(x, \theta, g_h - z) \Bigr). Could you confirm if this is correct?

  3. Regarding Algorithm 1, did you study the effects of different noise levels in the private backpropagation algorithm? Does this imply that P3EFT\text{P}^3\text{EFT} would yield the same utility across various noise levels?

  4. In line 479, it is mentioned that P3EFT\text{P}^3\text{EFT} achieves the same accuracy at a given privacy level. However, it is unclear what “same level of privacy” specifically means, given that the privacy metrics in Tables 1 and 2 differ for DC and P3EFT\text{P}^3\text{EFT}.

  5. Finally, in line 503, how is the privacy-accuracy trade-off precisely defined? It would be helpful to clarify this term—if it refers to an average measure of privacy and utility, please specify, as the term “trade-off” here is ambiguous without further explanation.

  6. Here are some minor spelling errors: line 326: it's line 478: P3FT\text{P}^3\text{FT} line 484: both both our algorithm

评论

We appreciate the reviewer's detailed feedback and address each weakness and question below.

establishing a theoretical privacy guarantee for P3^3EFT would enhance understanding of privacy leakage and offer a more solid theoretical basis for fair comparisons with other label privacy-focused algorithms.

We appreciate the reviewer's emphasis on the importance of theoretical privacy guarantees. While we agree that such guarantees would strengthen our work, developing them for P³EFT presents unique challenges. Traditional differential privacy approaches typically rely on introducing randomness through noise addition to gradients [1] or label flipping [2], which serves as the foundation for their theoretical analysis.

In contrast, our method preserves privacy of the activations through a deterministic regularization approach. The absence of explicit randomization makes it challenging to apply standard differential privacy analysis techniques, which typically rely on comparing outcomes between neighboring datasets through probabilistic bounds.

Nevertheless, recognizing the importance of theoretical foundations, we have provided a theoretical analysis of a key component of P³EFT --- the private_backprop algorithm --- in Appendix C. While this analysis operates under slightly different assumptions, we believe it represents a meaningful first step toward building a theoretical framework for our approach.

We welcome suggestions from the reviewer on potential approaches to establishing theoretical privacy guarantees for regularization-based privacy mechanisms, as this could be a valuable direction for future research.

This paper addresses label privacy in PEFT over an API within an "honest but curious" attacker model, making it a unique but somewhat limited scenario in terms of generality. Analyzing cases where the server is not "honest" would provide a broader perspective, especially given the emphasis on worst-case privacy scenarios in the paper.

We appreciate the reviewer's suggestion about considering scenarios beyond the "honest but curious" server model. However, in our specific Split Learning setup, where the client computes the final loss using their private labels after receiving intermediate results from the server, we believe the "honest but curious" model is both practical and sufficient. This is because any deviation from honest behavior by the server would be immediately detectable by the client through observable training dynamics (e.g., unexpected increases in training loss).

Our focus on the "honest but curious" model aligns with the established literature in this domain [3,4,5], where this threat model has proven to be both realistic and meaningful for analyzing privacy concerns in split learning scenarios. To the best of our knowledge, there are no existing works analyzing "not honest" server behavior in our specific setup, likely due to the client's ability to detect such behavior through the training process. However, if the reviewer is aware of any relevant work in this direction, we would be very interested in considering it for future research.

Additionally, extending the framework to protect both input and label privacy would be a natural generalization, though it appears challenging to adapt P3^3EFT for this purpose.

We fully agree with the reviewer that protecting inputs is also incredibly important. We plan to extend our framework to protect inputs in future work. Meanwhile, to protect both inputs and labels, P3^3EFT can be combined with an existing input protection algorithm.

For instance, line 52 refers to private multi-party fine-tuning methods that support a narrow class of fine-tuning algorithms but does not specify which algorithms fall within this scope and which do not.

We agree with the reviewer that specifying the particular class of algorithms will improve the quality of the paper. Here we are referring to prompt-tuning, and we have clarified this in the revision.

Similarly, line 132 states that differential privacy upper bounds are loose and do not align with practical observations on real models, but it lacks a reference or supporting argument, leaving some statements unclear.

We acknowledge that statements like the one on line 132 should be better supported with appropriate references. This particular statement was primarily motivated by our empirical findings presented later in Section 4.2, but we agree that this connection should have been made more explicit in the text.

We have revised the formulation of this statement to make it clearer and better connected to our experimental results. If the reviewer has identified other parts of the paper that would benefit from similar clarification or additional references, we would greatly appreciate their suggestions.

Question 2

We thank the reviewer for their attention to detail. Indeed, there should be 12\frac{1}{2}. We have corrected the formula in the revision (line 264).

评论

In line 237, it is stated that leaving gradients, activations, or parameters unprotected would compromise label privacy. While P3^3EFT obfuscates gradients and activations, it is unclear why parameters do not require similar privacy measures in the context of this paper.

We appreciate this insightful observation about the role of parameters in privacy preservation. Indeed, in general settings, model parameters can leak private information, as they essentially represent compressed information about the training dataset - this is precisely why approaches like DP-SGD [1] are used in standard privacy-preserving training scenarios. This aligns with the well-established understanding from membership inference attacks literature [6], where it's known that parameter-based privacy leakage typically obtained through model activations and predictions.

However, in our specific setup of label privacy, the situation is somewhat different. The server already has access to the input data and computes the activations through the forward pass. Intuitively, any information contained in the parameters that could be used to compromise label privacy would be manifest through activations after the forward pass. Therefore, by protecting activations, we indirectly address potential privacy leaks through parameters.

We acknowledge that this explanation, while intuitive, lacks formal theoretical foundation. For clarity, we have removed the reference to "parameters" from this sentence to maintain consistency with the rest of the paper's exposition (line 236). However, if the reviewer thinks it would be valuable, we would be happy to include this discussion about the relationship between parameters, activations, and label privacy in the paper.

Regarding Algorithm 1, did you study the effects of different noise levels in the private backpropagation algorithm? Does this imply that P3EFT would yield the same utility across various noise levels?

While different magnitudes of the obfuscated gradients in Algorithm 1 are mathematically equivalent, using values at the boundaries of the corresponding floating point data type could indeed affect the computation of the final gradient in Algorithm 1. However, within the range we investigated (up to 1e3, see line 448), everything worked correctly.

In line 479, it is mentioned that P3EFT achieves the same accuracy at a given privacy level. However, it is unclear what “same level of privacy” specifically means, given that the privacy metrics in Tables 1 and 2 differ for DC and P3EFT.

We thank the reviewer for this feedback. The statement in line 479 refers specifically to DeBERTa. Our comment was intended only for Table 1, not Table 2. The results in Table 2 were described in lines 513-516 (of the new revision), where we state that both our algorithm and DC consistently solve all three tasks.

We made a mistake in the writing; what we meant was "outperforming in terms of the privacy given the same accuracy level." By "the same accuracy level", we mean that accuracy values for both DC and P³EFT are very close to the non-private upper bound, and from a practical application perspective, the difference in accuracy is not significant. We have corrected this oversight in the paper (line 510). If the reviewer has additional questions or suggestions regarding the formal formulation, we would be happy to address them.

Finally, in line 503, how is the privacy-accuracy trade-off precisely defined? It would be helpful to clarify this term—if it refers to an average measure of privacy and utility, please specify, as the term “trade-off” here is ambiguous without further explanation.

We agree that our terminology was not entirely precise. What we meant was that our method maintains good accuracy on QNLI --- still suitable for practical applications of the model --- while achieving better privacy guarantees. We acknowledge that the term "trade-off" might not be the most appropriate here, and we have revised the paper to provide a more detailed explanation (lines 522-523), similar to what we've outlined in this response.

Here are some minor spelling errors

We thank the reviewer for their attention to detail. We have fixed these typos in the revised version.

[1] Abadi et al. Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016.

[2] Ghazi et al. Deep learning with label differential privacy. Advances in NeurIPS, 2021.

[3] Li et al. Label leakage and protection in two-party split learning. ICLR 2022.

[4] Sun et al. Label leakage and protection from forward embedding in vertical federated learning. arXiv:2203.01451, 2022.

[5] Wan et al. PSLF: Defending against label leakage in split learning. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023.

[6] Shokri et al. Membership inference attacks against machine learning models. IEEE symposium on security and privacy, 2017.

评论

Dear Reviewer SHmL,

The discussion period is about to end and we would kindly like to hear from you regarding our rebuttal. We have addressed all concerns raised in the review and have further strengthened our submission with a theoretical analysis of the private_backprop algorithm.

Given your positive comments about the flexibility and empirical results of our framework, we respectfully request that you consider revising your score.

评论

Thanks for your response! I have no further questions.

评论

We appreciate your acknowledgment of our response. Since we've provided comprehensive explanations for each weakness and question, we'd value understanding which specific concerns still support the initial score. We're open to discussing any outstanding issues.

审稿意见
6

This paper proposes P3EFTP^3EFT, based on PEFT for fine-tuning models in a 2-party setting common to fine-tuning APIs in practice. The focus is on guaranteeing a form of empirical label-privacy. Their approach utilises a private backpropagation approach inspired by secure-aggregation to prevent the server from inferring label information and further trains a mixture of multiple adapters to improve privacy.

优点

  • The paper focuses on an interesting and relevant privacy problem related to fine-tuning APIs which are increasingly popular.
  • Strong empirical results which show that the proposed method can maintain both accuracy and empirical label-privacy.
  • The presentation of the paper is clear and well-written.

缺点

  • More could be done to compare with existing baseline methods in the experiments section (see below).
  • Some experimental results are unclear (see below).

问题

  1. It is implied that training multiple adapters is a way to prevent label-leakage, yet in the experiments only n=2 adapters are used. Did you run any experiments varying the number of adapters and what effect did this have on utility and privacy? Further to this, how small are the adapters that are used? It would be useful to include some information about the overhead that is required for clients.
  2. I think some of the Appendix results with the PSLF method could be added into the main paper tables to show there is a clear utility loss when utilising a label DP approach and that there are advantages to using a more empirical method like P3EFTP^3EFT.
  3. What does it mean in Table 6 to train with an ε=0\varepsilon = 0 with PSLF?
  4. How many times are the experiments repeated over? There is often a large amount of variance in the privacy leakage e.g., on QNLI in Table 1 and Table 3 which can make DC and P3EFTP^3EFT seem quite comparable.
  5. It could help to move Algorithm 2 (or a more concise version of it) into the main text from the Appendix to help make the final P3EFTP^3EFT method clear.
  6. The abstract mentions that P3EFTP^3EFT is competitive to methods in both a multi-party and 2-party setting. As far as I can tell, the experiments in Section 4 are focused only on a 2-party setting? How does this process change or scale to a multi-party setting?
  7. Minor: Typo L485 algorothm → algorithm
评论

It could help to move Algorithm 2 (or a more concise version of it) into the main text from the Appendix to help make the final P3^3EFT method clear.

We agree with the reviewer that including Algorithm 2 in the main paper would be beneficial for the coherence of the paper's presentation. We had originally omitted it due to space limitations. We have now moved Algorithm 2 along with its description from the Appendix to Section 3.4 (see lines 352-372 and 411-414).

The abstract mentions that P3^3EFT is competitive to methods in both a multi-party and 2-party setting. As far as I can tell, the experiments in Section 4 are focused only on a 2-party setting? How does this process change or scale to a multi-party setting?

We thank the reviewer for this important clarification question. When we mentioned "multi-party" in the abstract, we were referring to scenarios discussed in Section 3.3 where either:

  1. a client interacts with multiple servers at different stages of training through their APIs, or 
  2. there is a single server but with multiple trusted execution environments required.

While our experiments in Section 4 focus on the basic two-party setting, our method naturally extends to these setups. However, we acknowledge that we should be more careful with the terminology, and we welcome the reviewer's suggestions on how to better characterize these scenarios.

Minor: Typo L485 algorothm → algorithm

We thank the reviewer for their attention to detail. We have corrected this typo.

[1] Hu et al. (2022), Lora: Low-rank adaptation of large language models. ICLR 2022.

[2] Wan et al. (2023), PSLF: Defending against label leakage in split learning. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023.

评论

Dear Reviewer ESUj,

With the discussion deadline approaching, we are writing to kindly request your feedback on our rebuttal response. We have carefully addressed all concerns raised in your review, and we have further strengthened the empirical results the reviewer highlighted by conducting additional experiments with different numbers of adapter sets.

Given these improvements and clarifications, we would kindly request the reviewer to reconsider their score.

评论

We are grateful for the reviewer's insightful feedback. We address their concerns and questions below.

Did you run any experiments varying the number of adapters and what effect did this have on utility and privacy?

We thank the reviewer for this question. We agree that understanding the effect of varying the number of adapters on utility and privacy is indeed important for better comprehension of the method's potential. To address this, we conducted additional experiments on SST2 and QNLI using DeBERTa. The results are presented in Table 8.

The results generally demonstrate, that increasing the number of adapters has minimal influence on the resulting privacy and accuracy. We have also evaluated the efficacy of our setup when utilizing a single set of adapters. Despite slightly reduced stability concerning the α\alpha (reguralization weight hyperparameter), this setup proved highly competitive, which opens promising direction for further research.

Further to this, how small are the adapters that are used? It would be useful to include some information about the overhead that is required for clients.

As specified in lines 504-505 for DeBERTa, we followed the setup from the original LoRA paper [1]. This setup used rank r=8r=8. The same value was used for T5 (see lines 513-514). For LLaMA, as indicated in line 521, we used adapters with rank 16.

Overall, the total number of these parameters is less than 1% of all model parameters, so the overhead for clients is negligible.

To improve readability of the paper, we have added clarification about the DeBERTa adapter rank in line 505.

I think some of the Appendix results with the PSLF method could be added into the main paper

We agree with the reviewer that including PSLF results in the main paper would be beneficial for a clearer comparison between label DP approaches and P3^3EFT. We have incorporated some of these results — specifically DeBERTa and Flan-T5 on SST2 — into the main paper in Table 4.

What does it mean in Table 6 to train with an ε=0\varepsilon=0 with PSLF?

While we acknowledge that from the formal definition of differential privacy, setting ε=0\varepsilon=0 might not be entirely rigorous (as some definitions require the privacy budget ε\varepsilon to be a positive real number), it carries meaningful practical implications. Specifically, if we consider Randomized Response (equation (3) in the original PSLF Paper [2]), which forms the foundation of the PSLF framework, setting ε=0\varepsilon=0 means that for each training sample, the flipped label y~\tilde{y} has an equal probability of belonging to any class.

Consequently, training with ε=0\varepsilon=0 is equivalent to training with random labels, corresponding to the setup with theoretical lower bound for privacy. We included these entries in the table to better illustrate the trend of results across the hyperparameter grid. We have added this explanation of the ε=0\varepsilon=0 case in Appendix B. We welcome any further questions or suggestions from the reviewer regarding this matter.

How many times are the experiments repeated over?

As stated in the main paper (line 499 of the revised version), we repeated the training procedure with 3 random seeds.

There is often a large amount of variance in the privacy leakage e.g., on QNLI in Table 1 and Table 3 which can make DC and P3^3EFT seem quite comparable.

We acknowledge that the standard deviation for QNLI in Table 1 is indeed high. This occurred because training DeBERTa on QNLI proved to be somewhat unstable, with one of the three seeds showing a significant spike in privacy leakage at one point during training. However, regarding QNLI in Table 3, we respectfully disagree with the reviewer's assessment, as the standard deviation there is relatively small.

We would also like to note that while P3^3EFT and DC might be comparable in some tasks, P3^3EFT significantly surpasses DC in many cases. For instance, when training DeBERTa on MNLI using DC, we were unable to find hyperparameters that would allow the model to learn effectively without complete (~100%) privacy leakage. Similarly, when training DeBERTa on SST2 with DC, the privacy leakage exceeded 90, indicating that the adversary recovered almost all labels.

Furthermore, we would like to emphasize that, unlike DC, our P3^3EFT method provides protection beyond just activation-based attacks. One of our work's main contributions is the private_backprop algorithm, which preserves gradient privacy. To maintain bidirectional privacy when using DC, it would need to be combined with another algorithm to prevent privacy leakage from gradients --- such as our private_backprop.

审稿意见
3

This paper proposes a privacy-preserving approach to safeguard label privacy during the fine-tuning of large language models (LLMs) using Parameter-Efficient Fine-Tuning (PEFT) techniques. It addresses a two-party learning scenario, where users can access forward and backward APIs to update the model. Experiments are conducted to evaluate the approach across various language models.

优点

The paper addresses a critical question of how to protect privacy in client-server fine-tuning settings. The author proposes a privacy-preserving approach to safeguard label privacy in vertical learning and evaluates this method across various models and NLP tasks.

缺点

The paper claims that backpropagation is conditionally linear in output gradients and attempts to decompose gradients to enhance privacy. However, based on my understanding, if decomposed gradients are backpropagated through earlier layers, how can this linearity be maintained?

The proposed approach requires the user to decompose gradients into separate parts and make multiple API calls. What is the associated computation and communication overhead? Additionally, to my knowledge, I am not aware of any LLM companies that offer APIs for forward and backward passes in this way. Could the authors provide examples of such an application if available?

If the client is involved in parameter updates and can observe the training process, how is it ensured that the client cannot access private information during training?

Regarding the experimental results, I don’t see a clear definition of what "leak" represents in the tables. My assumption is that it refers to the client’s prediction accuracy. If this is correct, with prediction accuracy over 60%, it doesn’t seem that label privacy is effectively preserved.

问题

Can you provide some definitions to describe the metrics used to evaluate the approach?

评论

We thank the reviewer for their feedback and provide responses to their concerns hereafter.

The paper claims that backpropagation is conditionally linear in output gradients and attempts to decompose gradients to enhance privacy. However, based on my understanding, if decomposed gradients are backpropagated through earlier layers, how can this linearity be maintained?

We believe there has been a misunderstanding: the backprop API method does not update θ\theta, it only computes the gradients with respect to θ\theta and returns them to the client (see lines 180-181). Instead, the client updates θ\theta, which is possible because θ\theta are parameter-efficient adapters. Since parameters θ\theta remain the same, the computational graph on the server --- which represents Jacobian --- also remains unchanged, which allows to backpropagate through it again (see lines 256-268).

The detailed sequence of gradient computation and parameter updates is described in Algorithm 2. Specifically, line 14 executes the private_backprop algorithm (during which the backprop API method is called multiple times). After the gradients are computed (and only then), the client updates the adapter set.

If θ\theta were updated during each backprop call , linearity would indeed be violated. For this reason, we deliberately design backprop to be stateless, i.e. it merely computes gradients with respect to adapter parameters based on given gradients with respect to activations, without changing either the adapter weights or the weights of the original model.

The proposed approach requires the user to decompose gradients into separate parts and make multiple API calls. What is the associated computation and communication overhead?

As specified in Algorithm 1, a single private_backprop requires mm API calls, where mm is a hyperparameter chosen by the client. Each API call requires the same amount of computation (to compute backward) and communication (for the client to send obfuscated gradients to the server, and for the server to send gradients w.r.t. adapter weights to the client). To preserve label privacy, the value of mm must be at least 22, meaning that one forward pass requires two backward passes. In this case, private_backprop introduces a computational overhead of approximately 1.5x compared to a regular training step (forward and 2 backwards instead of forward-backward).

Additionally, to my knowledge, I am not aware of any LLM companies that offer APIs for forward and backward passes in this way. Could the authors provide examples of such an application if available?

This is correct that no company specifically offers this exact interface for forward and backward. However:

  1. companies do offer forward pass API (without the backward), e.g. see [1].
  2. some experimental open-source API frameworks [2], offer both forward and backward pass API as described in line 154.

In summary, while we agree that companies currently typically don’t offer this exact API, we use a setup that is close to what they offer and they can, should they choose to, adjust their API to support our method efficiently — as demonstrated by the referenced libraries.

If the client is involved in parameter updates and can observe the training process, how is it ensured that the client cannot access private information during training?

We believe there has been some misunderstanding. In the setup discussed in the paper, we emphasize the privacy of labels, as indicated in lines 018-019, lines 060-061, etc. In this setup, the client is the only training participant who has access to private labels, as seen in lines 060-061, line 075, line 161, line 196, etc. Conversely, it is the server that does not have access to the labels.

If the reviewer has additional questions, we would be happy to provide clarification.

I don’t see a clear definition of what "leak" represents in the tables. My assumption is that it refers to the client’s prediction accuracy.

The notation "leak" represents a measure of privacy leakage, which is measured across 3 potential attacks that can be conducted by the server. A detailed description can be found in lines 472-485. We would also like to note that privacy leakage is not always measured in accuracy; for two main attacks --- Spectral attack and Norm attack --- following prior works, we measure classifier ROC AUC.

Additionally, to improve and facilitate the paper's readability, we will add a clear definition of what we label as "acc" and "leak" in the Tables.

评论

If this is correct, with prediction accuracy over 60%, it doesn’t seem that label privacy is effectively preserved.

We kindly disagree with the reviewer on this matter. We would like to note that for an nn-class classification task, a leak value of 1/n1/n corresponds to a random guess. Thus, theoretically, the minimum leakage value for binary classification (as in SST-2 and QNLI) is 50%. Moreover, the pre-existing representations learned during pre-training contribute to an increase of this lower bound --- we measured this baseline in the "Without LoRAs" column. Therefore, we believe that a leakage value slightly above 60% represents a very minor degradation compared to the lower bound baseline (for instance, T5 on QNLI shows only a minimal increase from 58.7 to 63.0).

We would also like to specifically emphasize that our method offers better empirical guarantees than the baselines. Specifically:

  1. As shown in Table 6, for PSLF we struggled to find a suitable hyperparameter ε\varepsilon that would maintain both reasonable performance and privacy

  2. DC performs noticeably better than PSLF but operates less stably than our P³EFT method, as evidenced in Table 1, where DC exhibits significantly higher leakage.

Can you provide some definitions to describe the metrics used to evaluate the approach?

As stated in line 472, to evaluate utility, we standardly measured accuracy on the target task. To assess privacy leakage, as described in lines 472-485, we measured vulnerability to 3 different attacks. The two main attacks --- Spectral Attack AUC and Norm Attack AUC --- were adopted from prior works [3],[4] correspondingly.

To make the description of our experimental setup more comprehensive and detailed, we will add a separate section in the appendix with a thorough description of these attacks.

[1] https://platform.openai.com/docs/guides/embeddings/

[2] https://petals.dev/

[3] Sun et al. (2022), Label leakage and protection from forward embedding in vertical federated learning. arXiv preprint arXiv:2203.01451, 2022.

[4] Li et al. (2022), Label leakage and protection in two-party split learning. In International Conference on Learning Representations, 2022

评论

Thanks for your response and I have no further questions.

评论

We thank the reviewer for acknowledging our response. As we've provided comprehensive answers to each weakness, we'd like to understand which specific concerns still warrant the unchanged score. We're ready to address any remaining issues.

评论

We thank the reviewers for their time and useful feedback.

We are grateful that the reviewers appreciated several key strengths of our work:

  1. Problem Importance. Paper addresses critical and practical privacy concerns in increasingly popular fine-tuning APIs and client-server settings. (by reviewers 88W1, ESUj, UNtp)
  2. Technical Performance. P3^3EFT shows strong empirical results with superior label privacy compared to existing methods while maintaining good accuracy. (by reviewers ESUj, SHmL)
  3. Method Generalizability. private_backprop algorithm can be extended to other Split Learning and Vertical Federated Learning settings. Also, full algorithm P3^3EFT can be used with any PEFT method. (by reviewer SHmL)
  4. Paper presentation. Well-written paper with effective visualizations and thorough empirical evaluation. (by reviewers ESUj and UNtp)

We have also enhanced the paper with additional experiments and incorporated the revisions suggested by the reviewers:

Major revisions:

  • Added experiments with n={1,3,4}n=\{1,3,4\} adapter sets (asked by reviewer ESUj).
  • Moved a part of the PSLF method results and Algorithm 2 to the main body of the paper (by reviewer ESUj).
  • Added theoretical analysis of the private_backprop algorithm (by reviewer SHmL).
  • Added a detailed explanation of what ε=0\varepsilon=0 means in PSLF experiments (by reviewer ESUj).

Minor revisions:

  • Clarified several references in the paper (by reviewer SHmL).
  • Fixed formulation errors (by reviewer SHmL).
  • Added explicit mention of adapter ranks used in experiments with DeBERTa and T5 (by reviewer ESUj).
  • Corrected typos (by reviewers ESUj and SHmL).

For convenience, all new additions to the paper are highlighted in blue.

AC 元评审

The paper proposes a novel approach (P3 EFT) to protect label privacy during fine-tuning of LLMs, showing strong empirical results and outperforming existing baselines. However, the concerns remain regarding scalability to multi-party settings, theoretical analysis of privacy guarantees, and feasibility of the required API in real-world scenarios.

The reviewers were ultimately not excited about the paper.

审稿人讨论附加意见

These were the additional comments.

Scalability to multi-party settings: One reviewer raised concerns about the scalability of the approach to multi-party settings, where multiple parties may be involved in the fine-tuning process. The authors clarified that the method can be extended to such settings, but did not provide detailed explanations or empirical evaluations. This aspect of the approach needs further investigation.  

Theoretical analysis of privacy guarantees: Another reviewer pointed out the lack of theoretical analysis of the privacy guarantees provided by the method. The authors responded by providing some theoretical analysis, but did not establish formal privacy guarantees, such as differential privacy. The theoretical foundations of the approach need to be further strengthened.  

Feasibility of API in real-world scenarios: One reviewer questioned the feasibility of implementing the required API in real-world scenarios, as no existing LLM provider offers such an API. The authors argued that the required API is feasible and could be implemented by LLM providers, but did not provide concrete examples or evidence to support this claim. This aspect of the approach needs further clarification.

最终决定

Reject