Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy
We propose Cape, a context-aware and bucketized prompt perturbation mechanism based on differential privacy, to enable efficient LLM inference with an improved privacy-utility trade-off.
摘要
评审与讨论
This paper introduces Cape, a DP prompt perturbation mechanism aimed at enhancing the privacy-utility trade-off in LLM-based inference services through context-aware utility function and bucketized sampling function. The proposal is motivative and provides promising solutions to the long-tail dilemma in private selection over large NLP vocabulary. This paper provides comprehensive experimental validation showcasing Cape’s effectiveness.
给作者的问题
- See weakness above. I think on-device model acceleration is promising, especially for privacy concerns.
- Is the proposed mechanism applicable to other DP mechanisms? E.g., more recent private selection mechanisms like permute-and-flip mechanism[1]?
- The calculation of mean(b_i) uses b_{i, k}, does it refer to u_{i, k}?
[1]: Permute-and-Flip: A new mechanism for differentially private selection
论据与证据
Technically solid moderate-to-high impact paper
方法与评估标准
with no major concerns with respect to evaluation, resources, reproducibility, ethical considerations.
理论论述
with no major concerns with respect to theoretical claims.
实验设计与分析
with no major concerns with respect to soundness/validity of any experimental designs or analyses.
补充材料
The proposed solution causes inevitable utility loss to enhance prompt privacy with high efficiency. However, this paper indeed achieves a better privacy-utility trade-off against prior works.
与现有文献的关系
[1]: Permute-and-Flip: A new mechanism for differentially private selection
遗漏的重要参考文献
[1]: Permute-and-Flip: A new mechanism for differentially private selection
其他优缺点
Strengths: 1. The direction of enhancing prompt privacy in LLM inference services is promising. 2. The idea is neat and straightforward. 3. The writing is good and easy-to-understand. 4. Extensive experiments, along with ablation studies showcase the effectiveness of proposed approach. 5. The utility improvements are promising against prior works in both text classification and text generation tasks. Weakness: The proposed solution causes inevitable utility loss to enhance prompt privacy with high efficiency. However, this paper indeed achieves a better privacy-utility trade-off against prior works.
其他意见或建议
- It would be better if the authors could align the theoretical privacy budgets of different works in Table 3 for better understanding.
- I recommend the authors to move the ablation of model distillation into main body. Besides, quantization could be discussed to further improve the efficiency and memory consumption.
We thank the reviewer jdXq for the positive feedback on "The proposal is motivative and provides promising solutions". We hereby answer the specific questions below and provide detailed explanations.
Paper polishment
Thanks for your suggestion. We acknowledge that on-device generation and acceleration are receiving increasing attention. In this regard, we conduct experiments of device model distillation, which are deferred to Appendix B.6 due to page limit. In the final version, we will supplement a detailed discussion on accelerating on-device computation using other SOTA techniques like quantization, sparsity, etc.
Regarding Table 3, as explained at the beginning of Section 5.2, prior work uses distinct formal privacy definitions, making alignment challenging. Furthermore, we believe that empirical privacy, as indicated by privacy attacks, better reflects actual privacy protection.
Extensibility to other DP mechanisms
Thanks for your suggestion. The permute-and-flip mechanism can be regarded as a variant of private selection algorithm, which we consider orthogonal to our work. We believe the permute-and-flip mechanism could be incorporated into Algorithm 2 as a replacement for the Exponential mechanism, and we can explore this as part of future work.
Typos
Thanks for your careful reading. We have corrected the typos in the mean calculation. We will review the entire paper and check for any additional typos in the final version.
The paper introduces Cape, a DP mechanism designed to protect user privacy when interacting with LLM inference services. The authors identify an issue in current LLM services: users need to submit their prompts in plaintext for inference, exposing sensitive information. The paper proposes a context-aware approach that perturbs user prompts before sending them to the server while maintaining reasonable utility, including: a hybrid utility function that combines both token embedding distance and contextual information to better measure semantic similarity, and a bucketized sampling mechanism to handle the large vocabulary space in NLP tasks. The authors evaluate Cape on both text classification and text generation tasks, demonstrating improved privacy-utility trade-offs compared to prior methods like SANTEXT, CUSTEXT, and InferDPT.
给作者的问题
On line 335, the authors state: "'When receiving the noisy generation yˆ from S, C uses the extraction model to de-noise the response as y′ ← Me(x, yˆ)." Could you clarify the type of server model in this setup?
The extraction model, a 1.5B model, appears to be strong enough to handle continuation tasks on the Wikitext103-v1 dataset. Have you conducted control experiments measuring the performance of text generation using unperturbed prompts directly with the extraction model? This would help distinguish whether the reported performance stems primarily from the extraction model's capabilities rather than the privacy-preserving pipeline.
Have you considered designing experiments where the task completion explicitly requires the server's involvement, such as scenarios where the server possesses privileged information unavailable to the client? This would better demonstrate the necessity and effectiveness of your privacy-preserving approach in real-world applications.
The text samples in the appendix exhibit limited length and grammatical issues that raise concerns about practical usability. Could you address the scalability of your method in real-world deployment scenarios? Specifically, what additional techniques might improve the quality of sanitized outputs while maintaining privacy guarantees?
论据与证据
The authors claim that Cape achieves a better privacy-utility trade-off than existing approaches, which is supported in the experiments section through the adversarial attacks. The empirical results demonstrate that Cape provides stronger defense against privacy attacks (KNN and MTI attacks) while maintaining competitive utility compared to baselines. The ablation studies validate design choices like the hybrid utility function and bucketing strategy.
方法与评估标准
The authors evaluated both utility metrics (accuracy for classification, coherence and alignment for generation) and privacy metrics (attack success rates, effective mapping set size, retention ratio). The authors use multiple datasets (SST-2, QNLI, Wikitext-103) to validate their approach across different task types.
While the paper uses KNN Attack and MTI Attack as evaluation metrics for privacy leakage, these methods may not capture all potential forms of information leakage in the perturbation process. For example, there's a possibility of semantic leakage that goes beyond token-level. The composition of multiple perturbed words in context could still reveal meaningful patterns that might enable adversaries to infer sensitive information, even when individual tokens appear adequately protected. A more comprehensive evaluation framework that considers semantic composition effects and contextual meaning reconstruction would improve the privacy analysis.
Furthermore, it might be interesting to use additional utility metrics, such as LLM as a judge, or comparing the difference between perturbed response and the ground truth response.
理论论述
I did not check the correctness of Theorem 4.1 whose proof is in the appendix. At a glance, the proof looks sound to me.
实验设计与分析
I reviewed the experiments and have questions regarding section 5.1.2, which I mention in the questions for authors section.
补充材料
I reviewed the prompts used and the perturbed results. The perturbed results appear quite short and potentially unusable in practice, despite the improved method. I attribute this limitation to the fundamental constraints of the problem setup.
与现有文献的关系
The paper contributes to privacy-preserving LLM inference techniques. The authors reviewed cryptographic solutions, client-server hybrid execution approaches, and previous DP-based techniques (SANTEXT, CUSTEXT, InferDPT).
遗漏的重要参考文献
There are some additional literature that offers less privacy protection but potentially better utility for minimizing disclosure risks:
Dou, Yao, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, and Wei Xu. “Reducing Privacy Risks in Online Self-Disclosures with Language Models.” arXiv, June 23, 2024. http://arxiv.org/abs/2311.09538.
Staab, Robin, Mark Vero, Mislav Balunović, and Martin Vechev. “Large Language Models Are Advanced Anonymizers.” arXiv, February 21, 2024. http://arxiv.org/abs/2402.13846.
其他优缺点
Strengths
The paper addresses an important area of privacy protection for LLM. The authors improve DP token perturbation algorithms through two novel improvements: the hybrid utility function and bucketized sampling mechanism.
Weaknesses
While the author bridged some part of the gap, the output text remains difficult to interpret and use in practice.
其他意见或建议
If you have any important questions for the authors, please carefully formulate them here. Please reserve your questions for cases where the response would likely change your evaluation of the paper, clarify a point in the paper that you found confusing, or address a critical limitation you identified. Please number your questions so authors can easily refer to them in the response, and explain how possible responses would change your evaluation of the paper.
On line 335 that the authors mention: 'When receiving the noisy generation yˆ from S, C uses the extraction model to de-noise the response as y′ ← Me(x, yˆ).' Does this step introduce potential privacy leakage? Since the extraction model has knowledge of prompt structures such as syntax or grammar, could this denoising step help adversaries recover the original prompt? It would be valuable to know if the authors performed privacy analyses (such as KNN attacks) against the extracted text, not just the perturbed prompts.
It seems like the text from the appendix, though after extraction, is still unusable given the short length and the often ungrammatical nature. Could you please comment on the scalability of your method in the actual deployment of a sanitization framework?
We thank the reviewer pKo2 for the positive feedback on "achieves a better privacy-utility trade-off than existing approaches". We hereby answer the specific questions below.
A more comprehensive evaluation for semantic privacy leakage and utility.
Thanks for your suggestions. For privacy evaluation, we follow prior works and try our best to measure the empirical privacy comprehensively, including attack success rates, effective mapping set size and retention ratio. In this work, we primarily focus on single-token-level privacy leakage, such as name, age, etc. We clarify that more coarse-grained, semantic-level leakage like intent recognition (involving multiple token compositions), is out of our current scope. However, it is an interesting topic that could be explored in future work.
For utility evaluation, we use commonly-used metrics like accuracy and alignment for a fair comparison, which yields better interpretability. We believe it would be interesting to use LLM-as-a-judge in more complext tasks like multi-candidate ranking.
Additional literature references
These two works use LLMs to detect and anonymize sensitive attributes in an innovative way, which we perceive as complementary to our approach for improving utility. We will reference these papers and discuss their potential incorporation in Related Work.
Privacy concern on extraction model
We clarify that there is no potential privacy leakage on using extraction model. In our setting, the extraction model is deployed locally on the client-side. Therefore, the adversary only has access to the perturbed prompt and noisy generation and cannot launch attacks on the extraction model without additional knowledge.
Clarification on perturbed text generation pipeline
We follow InferDPT [1] to perform perturbation-then-extraction text generation pipeline, with Qwen2-1.5B-Instruct serving as both the server model and extraction model. We clarify that the choice of the models is actually orthogonal to our work. As illustrated in Figure 6, with models fixed and varying perturbation mechanisms, our method yields better trade-off than baselines.
Besides, the effectiveness of this privacy-preserving pipeline is evidenced by Table VII in InferDPT paper [1]. The perturbation-then-extraction scheme (with Vicuna-7b-4bit model as the extraction model) yields better coherence scores against generation with Vicuna-7b-4bit alone.
Scenario where the server possesses privileged information.
Thans for your suggestion. The scenario you mentioned resembles RAG-based applications, where server possesses an external knowledge database, which helps generate better responses. We will include this scenario in Introduction to better highlight the necessity of invoking privacy-preserving ML services.
Practical scalability and usability
We first clarify that the two sample texts in Appendix are taken from the validation split of SST-2 dataset, providing a real-world example. Our method is also applicable to long-text scenarios, as demonstrated by the text generation experiments on Wikitext-103-v1 dataset. As shown in Figure 6, similar trade-offs are observed in long-text generation, not just in short-text classification.
Below, due to words limitation, we present a sample text from the Wikitext-103-v1 dataset. Some tokens (marked in bold), such as numerical values for length and weight, and words like "homarus" -> "hortus", "large" -> "small", "weighting" -> "measuring" are perturbed. However, the overall semantics and grammatical structure are well preserved. More fine-grained sensitive data taxnomy level perturbation shall greatly improve real-world usability, which is complementary to our work.
- : hortus gammarus is a small crustacean, with aak length up to 95 centimetres ( 30 in ) and measuring up to 5 – 6 kilograms ( 11 - 15 lb ), although the lobsters caught in lobster pots are usually 23 – 35 cm ( 9 – 15 in ) long and weigh 0 @.@ 7 – 2 @.@ 2 kg ( 1 @.@ 5 – 4 @.@ 9 lb ) .
In terms of usability, we acknowledge the inevitable trade-off between privacy and utility in DP-based methods—privacy comes at a cost. However, our approach manages to achieve a better balance than previous work. In real-world applications, we believe that an application-specific or user-specific taxonomy of sensitive data can guide the development of more refined privacy mechanisms. For instance, we could use separate sampling spaces for location, name, etc. to constrain sampling space while preserving semantics. Additionally, perturbation-aware fine-tuning in SANTEXT[2] could further enhance utility. Although the perturbed prompts may not be human-readable, they can be effectively understood by the fine-tuned model (e.g., extraction model).
[1]: InferDPT: Privacy-preserving Inference for Black-box Large Language Models.
[2]: Differential Privacy for Text Analytics via Natural Text Sanitization.
The paper proposes a new approach to perturb the tokens in user prompts to preserve local differential privacy in an efficient and utility-friendly way. For this, the authors utilize a secondary model, called device model, to generate logits for each token and use exponential mechanism (weighted by these logits) to randomize the token. The experimental results suggest better privacy-utility tradeoffs compared prior work in this area.
给作者的问题
Please see above.
论据与证据
-
In the introduction and related work sections, the paper talks about "efficiency", claiming that their work is more efficient that cryptographic solutions. But it is not clear what kind of efficiency this refers to or why their approach is more efficient at least at a high level. I suggest adding a brief discussion on why cryptographic solutions are less efficient.
-
The paper proposes the new approach as a black-box technique. But it would be better to clarify that it needs another model, called device model, to generate the logits. It's also not clear from the experiments how the results would be affected when there is a vocabulary or tokenizer mismatch between the black box model and the device model.
方法与评估标准
- What does the privacy axis in Figure 5 refer to? How was it measured and what unit is this?
理论论述
- In the definition of exponential mechanism, the sensitivity is given as but it should be .
实验设计与分析
The experimental setup seem reasonable. It would be nice to try a few more pairs of pre-trained models and device models to understand the impact of vocabulary and tokenizer mismatches better.
补充材料
I checked the supplementary to find out more about what "privacy" refers to in Figure 5 but couldn't find a clear explanation.
与现有文献的关系
The proposed approach would be useful to improve privacy preserving inference on LLMs.
遗漏的重要参考文献
I can't think of a key missing reference.
其他优缺点
- Clarity of Algorithm 1 could be improved. For instance, the operation on line 6, , was not defined.
Minor:
- "where exists a trade-off" in Related Work
其他意见或建议
An example on how the mechanism would work step by step on a toy setup would improve the clarity significantly.
We thank the reviewer Mrs5 for the positive feedback on "proposed approach would be useful to improve privacy preserving inference on LLMs", which is quite encouraging for us. We hereby answer the specific questions below and provide detailed explanations.
Q1: Clarity of low efficiency of cryptographic solutions.
We measure efficiency by the average inference runtime cost. For cryptographic solutions, take the SOTA 3-party computing work Ditto [1] as an example, the average runtime for generating one token with a sequence length of on Bert-base model requires about 30 seconds, which is far more expensive than our method. As shown in Appendix B.5, DP-based methods typically incur less than 1 second overhead for perturbation after an initial setup. We will supplement a brief description on the concrete runtime numbers in the Introduction section for better clarity.
[1]: Ditto: Quantization-aware secure inference of transformers upon MPC. https://arxiv.org/pdf/2405.05525
Q2: Effect of vocabulary or tokenizer mismatch.
We first clarify that by "black-box," we mean that no knowledge of or modification to the server model is required in our work. We will add this description in Section 4.1 for better clarity.
We list the vocabulary configuration in Appendix A. Notably, the device models (i.e., Bert and GPT-2) actually use different vocabularies compared to the server model (i.e., Qwen2-1.5B-Instruct, with a vocabulary size of 151,936). As shown in Table 3, while Bert (with a vocabulary size of 30,522) exhibits a larger vocabulary mismatch compared to GPT-2 (with a vocabulary size of 50,257), it still achieves higher sentence similarity.
The possible reason could be that the perturbation using device model produces semantically-close natural language tokens. Consequently, although the tokenization granularity differs, the composition of sub-tokens still yield similar embeddings in the black-box model. For example, if device model perturbs 'unhappy' to 'un pleasant', although separated by space, its semantics can still be recognized by the black-box model. However, we believe that partial knowledge of the server model, such as its vocabulary, could further enhance semantic similarity, making it an interesting topic for future work.
Q3: What does the privacy axis in Figure 5 refer to? How was it measured and what unit is this?
Apologies for the misunderstanding. "privacy" here refers to privacy score, which is defined at the beginning of Section 5 (at the bottom of Page 5). We will unify the notation and modify the axis label to "Privacy Score" for better clarity. We first calculate the attack success rate () using KNN Attack and Masked Token Inference Attack. The privacy score is then defined as .
Q4: Clarity of Algorithm 1 could be improved.
Thanks for your suggestion. We will provide more detailed description in the final version. In Algorithm 1, denotes the token set that maps to the -th bucket. Correspondingly, denotes the score vectors for these tokens. Then, we can calculate the mean score for this bucket.
Q5: Typos
Thanks for your careful reading. We have corrected the typos in Related Work and Preliminaries. We will review the entire paper and check for any additional typos in the final version.
Q6: Toy example of the overall mechanism
Thanks for your recommendation. We will supplement a section in the Appendix to give a step-by-step example to ease understanding.
Thank you for the response. I have increased my rating to 4.
Dear Reviewer Mrs5,
We'd like to thank you once again for the thoughtful feedback. We are glad to hear that our rebuttal has resolved your concerns. Best regards.
This paper proposed an algorithm to achieve (local) differential privacy at inference time by using an exponential mechanism weighted by a reference model to randomize the tokens in the user prompt. The proposed algorithm is simple and effective and achieves good privacy-utility trade off. The problem of protecting user privacy in LLM inference is important and the reviewers found this paper well writting, easy to follow and the results convincing with extensive experiment results.