CodeCipher: Learning To Obfuscate Source Code Against LLMs
A learning based code obfuscation method to protect code privacy from LLM service providers
摘要
评审与讨论
This paper introduces CodeCipher, a method designed to transform input code sequences into an obfuscated format to mitigate the risk of data leakage when using cloud-based large language models (LLMs). Specifically, the approach modifies the embedding matrix of the LLM, ensuring that each row corresponds to a different word for the purpose of obfuscation. To optimize this transformation, a discrete gradient search algorithm is employed, leveraging the vector representation of the nearest word. Extensive experiments, including performance preservation, privacy protection, ablation studies, and transferability assessments, validate the effectiveness of the proposed technique.
优点
- The research problem is interesting. With the wide use of LLMs, the security problem is essential.
- The evaluation is from different aspects, which is good.
缺点
- The performance of the obfuscated models has an evident decrease compared with the original models across three tasks in Section 5.2.
- Some important technical details are missing, especially in the discrete gradient search algorithms. Why is single gradient computation inadequate? What is the motivation behind using the nearest valid token, and how do we get the nearest? The motivation part of the proposed algorithm can be strengthened.
- The prompt used is casual and does not have a rigorous design.
- The analysis of the transferability is weak. The ability to generalize is an interesting and important finding. However, more analysis should be discussed to explain why it works across closed models.
问题
- In Section 5.2, why only choose 200 samples from 2 million samples of CodeSearchNet and how to get the compilation rate using code snippets?
- Will performance be affected by the training epochs? How to get the best hyper-parameters?
We appreciate for your valuable comment.
The performance of the obfuscated models has an evident decrease compared with the original models across three tasks in Section 5.2.
As an obfuscation method, some performance reduction is expected. The goal of CodeCipher, however, is to perturb code to prevent reading, execution, and data collection for training, all while minimizing this performance drop. Our experimental results indicate that CodeCipher achieves a significantly smaller performance reduction compared to conventional obfuscation techniques.
Some important technical details are missing, especially in the discrete gradient search algorithms. Why is single gradient computation inadequate? What is the motivation behind using the nearest valid token, and how do we get the nearest? The motivation part of the proposed algorithm can be strengthened.
The motivation behind the discrete gradient search algorithm is detailed in Section 4.1, paragraph 2. In response to your questions: A single gradient computation is inadequate because it is difficult to pinpoint a valid token. Instead, the multi-step gradient search dynamically adjusts the gradient direction based on intermediate token projections, allowing for more accurate token replacements. Experiments in Section 5.4 demonstrate the effectiveness of this approach.
We select the nearest valid token because it represents an actual word embedding rather than an imaginary vector, enabling more precise gradient calculations. The nearest valid token is identified by calculating the dot product with the normalized embeddings of existing words. We have also strengthened the motivation behind this algorithm in the revised version (Section 4.1).
The prompt used is casual and does not have a rigorous design.
We use a straightforward prompt style to reflect typical prompting methods employed by general users. This simple approach aligns with widely adopted practices in code-related tasks, such as the prompts used by DeepSeekCoder for code completion: https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/Evaluation/HumanEval/eval_instruct.py.
The analysis of the transferability is weak. The ability to generalize is an interesting and important finding. However, more analysis should be discussed to explain why it works across closed models.
We agree that this is an interesting and significant finding. The observed generalizability may be due to the key patterns extracted in the token cipher mapping, which is likely regarded as essential across different models. However, further experiments are necessary to validate this hypothesis, and we plan to explore this in future work.
In Section 5.2, why only choose 200 samples from 2 million samples of CodeSearchNet and how to get the compilation rate using code snippets?
We believe you are referring to the experiment in Section 5.3. Testing the entire dataset would be computationally expensive and time-consuming, so we chose a representative subset of 200 samples. Through several rounds of experimentation, we found that this subset size yields stable results. We consider this subset sufficient to support our key findings. The compilation rate is computed using the built-in compile() function in Python.
Will performance be affected by the training epochs? How to get the best hyper-parameters?
Yes, the number of training epochs impacts the performance, as a higher number of epochs typically results in a greater obfuscation degree, which may lead to a performance decline. Our current parameters are chosen to balance obfuscation level with minimal performance impact. Following your comments, we conducted additional experiments to evaluate the effects of different learning rates and other hyperparameters. The new results are included in the revision.
Thanks for your responses. I chose to keep my score unchanged.
This paper proposes CodeCipher, which perturbs privacy from code while preserving the original response from LLMs. CodeCipher transforms the LLM’s embedding matrix so that each row corresponds to a different word in the original matrix, forming a token-to-token confusion mapping for obfuscating source code. The new embedding matrix is optimized through minimizing the task-specific loss function. CodeCipher is evaluated on three AI-assisted coding tasks including code completion, summarization, and translation. Results show that CodeCipher successfully confuses the source code while preserving the original LLM’s performance.
优点
This paper addresses the important research problem of preserving code privacy when submitting code to LLMs. It introduces an interesting approach to achieve this by transforming the LLM’s embedding matrix into a new one, creating a token-to-token confusion mapping that obfuscates the source code.
The effectiveness of the proposed approach is demonstrated through three code analysis tasks.
缺点
The feasibility of CodeCipher depends on access to the LLM's embeddings, which raises some concerns. First, if the LLM is closed-source, are the embeddings accessible? Could the authors clarify this? Second, if the LLM is open-source, I question whether the proposed approach is necessary. With access to trained LLMs for local execution, code privacy may no longer be a significant issue.
Second, adaptive attacks are not discussed in this paper. If an attacker understands how the approach works, could they potentially generate the confusion mapping themselves and deobfuscate the obfuscated code?
问题
- Discuss how to obtain LLM's embeddings if LLMs are closed-source.
- Clarify the motivation of CodeCipher when LLMs are open-source.
- Discuss adaptive attacks.
Thank you for your insightful comments.
How to obtain LLM's embeddings if LLMs are closed-source?
Our method is primarily built on white-box models. In cases where the LLMs are closed-source, we take the trainable white-box model as a proxy, then apply the learned cipher mapping to the closed-source model. As demonstrated in Section 5.6, code obfuscated by a local white-box model transfers effectively to a remote closed-source model.
If the LLM is open-source, I question whether the proposed approach is necessary. With access to trained LLMs for local execution, code privacy may no longer be a significant issue.
While large white-box LLMs are readily available for local deployment and execution, limited computational resources may prevent individual users from deploying them locally, necessitating a client-server architecture. We have added this point in the revised manuscript in the first paragraph of Section 1.
Adaptive attacks are not discussed in this paper. If an attacker understands how the approach works, could they potentially generate the confusion mapping themselves and deobfuscate the obfuscated code?
Though our method is built on white-box models, the trained transformation mapping is inherently black-box. Since we do not disclose the model parameters, even if an attacker understands how the approach works, they would be unable to replicate the obfuscation process or perform de-obfuscation. In the revision, we have added the discussion of this point in Appendix E.
Summary:
This paper studies the way of obfuscating source code for LLMs. The proposed approach transforms the LLM’s embedding matrix so that each row corresponds to a different word in the original matrix, forming a token-to-token confusion mapping for obfuscating source code. The paper introduce a discrete optimization strategy that aligns the updated vector to the nearest valid token in the vocabulary before each gradient update.
优点
Strengths:
- The paper proposed a novel approach to obfuscate the code.
- The proposed approach can preserve better performance than other code obfuscation methods.
缺点
Weaknesses:
- Missing important metrics and SOTA baselines for privacy protection
- Missing extensive experiments on the discrete gradient search
问题
Questions:
-
As the line 100 described, the application of code obfuscation are twofold. One is for privacy protection, and another one is for the robustness of code language models. Since the method is not aimed at improving the robustness of the code language model, why not use privacy protection metrics(i.e. TopK, which is a token-level metric that measures the percentage of correct words in the attacker's top k predictions)? You use the obfucation degree to measure the performance. How can you prove the effectiveness of privacy protection?
-
In line 272, you compared with an identifier renaming approach(Chakraborty et al., 2022). This approach is about naturalizing source code can enhance the performance of generation on three tasks. Why do you use this approach which is unrelated with code obfuscation? And why it's performance decrease in your experiments? TextObfuscator is a SOTA approach for preserving inference privacy. Why don't you compare with it?
-
In Figure 5(b), the obfuscated code still has tokens like "password" which is in the original code, how can you ensure the user's password "securePassword123" will not show in the obfuscated code?
Minor comments:
- In Table 1, the "Origin" method is not defined.
- In Figure 6, missing the references of other LLMs(i.e. deepseekcoder)
We appreciate the reviewer's insightful comments. Below we provide the point-by-point response to each of the weaknesses you raised:
Why not use privacy protection metrics (i.e. TopK)?
-
The primary goal of CodeCipher is to obfuscate source code to prevent it from execution, reading, and data collection for training. Therefore, we employ commonly accepted privacy protection metrics within the code domain, such as 'Compilation Rate' (to measure the usability of the obfuscated code), 'Edit Distance' (to quantify the readability after obfuscation), and 'PPL' (to quantify the loss when the obfuscated code is collected for training). In Section 5.3, we conducted extensive experiments about the efficacy of privacy protection of our methods.
-
The top-k accuracy you suggest is not applicable in our problem context because it requires a list of predictions, whereas our output consists of code segments or texts. It is also crucial to note that in our approach, the obfuscated code is taken as input to the server model, rather than being part of the server's output. So we don't have the 'ground truth' prediction to calculate the Top-K metric.
Why do you compare with identifier renaming (Chakraborty et al., 2022) which is unrelated with code obfuscation? And why it's performance decrease in your experiments?
We apologize for incorrectly citing Chakraborty et al.'s work as an "identifier renaming" approach. Identifier renaming is a traditional and widely used method for source code obfuscation; in the revised version, we have updated the citation to more relevant studies.
TextObfuscator is a SOTA approach for preserving inference privacy. Why don't you compare with it?
TextObfuscator is a relevant technique developed primarily for natural language, and we attempted to adapt it for source code to compare with CodeCipher. However, we encountered several challenges that made this comparison infeasible. First, TextObfuscator outputs vector representations rather than discrete tokens, making it difficult to accurately measure the degree of obfuscation and potentially leading to unreliable comparison results. Second, TextObfuscator depends on task-specific loss functions, which are not directly applicable to our code generation task and would require substantial modifications to model architecture and training processes. As a result, we have not included this comparison in the current version and instead leave the adaptation and evaluation of TextObfuscator for future work.
In Figure 5(b), the obfuscated code still has tokens like "password" which is in the original code, how can you ensure the user's password "securePassword123" will not show in the obfuscated code?
The current approach primarily focuses on perturbing code tokens that reduce readability and executability, but it does not yet guarantee the removal of all sensitive tokens in all cases. Ensuring complete obfuscation of specific sensitive data, such as "securePassword123," is a promising direction for further optimization and refinement in future work.
However, we believe these weaknesseses may not be enough to obscure the strengths of our paper. Since the embedding perturbation methodology is very different from previous papers, It offers a lightweight, practical solution that only involves minimal data perturbation on the client side. It also brings a lot of interesting problems to be solved in future work. Maybe our method can gradually become a new branch of code privacy protection.
Missing extensive experiments on the discrete gradient search
Thanks for your suggestion. We conducted experiments on the discrete gradient search in Section 5.4, wherein we ablate the discrete gradient search part and tested the effect on the final performance. We also experimented by varying the learning rates of the discrete gradient search part. The results are provided in Table 3.
Following your advice, we have conducted more extensive experiments on the discrete gradient search part. We vary the number of sub-steps as well as the learning rates. The results are updated in Appendix C in the new revision. The results are consistent with previous observations, demonstrating the efficacy of the discrete gradient search.
I keep my original scores unchanged and provide further evaluation of the paper’s current state.
The authors claim that CodeCipher achieves a compilation rate of 0. This is a critical concern because this indicates that the obfuscated code is completely unusable and cannot be compiled. A core goal of code obfuscation is to ensure the availability of code in the runtime environment while implementing privacy protection. If the compilation rate is 0, it means that the method is difficult to meet basic requirements in practical applications, which greatly weakens the actual utility and influence of the method.
Additionally, while the author emphasize the privacy protection capabilities of CodeCipher, the lack of privacy protection metrics(e.g., under adaptive attacks) and the absence of comparisons with relevant baselines (e.g., cryptography-based approaches) limit the ability to fully evaluate the effectiveness of the proposed method.
The authors claim that CodeCipher achieves a compilation rate of 0. This is a critical concern because this indicates that the obfuscated code is completely unusable and cannot be compiled. A core goal of code obfuscation is to ensure the availability of code in the runtime environment while implementing privacy protection. If the compilation rate is 0, it means that the method is difficult to meet basic requirements in practical applications, which greatly weakens the actual utility and influence of the method.
We apologize for the misunderstanding you may have met due to our writing. As highlighted in the paper's title, CodeCipher obfuscates code against LLMs, rather than traditional platforms or runtime environments. The obfuscated code is designed for use in LLM-driven AI tasks such as code summarization, translation, and completion. In this context, a low compilation rate is not a limitation but a desirable feature. It prevents the code from execution, reading, and collection while still being interpretable by LLMs for AI-coding tasks. We hope this explanation addresses your concern and clarifies the intended utility of CodeCipher.
The absence of comparisons with relevant baselines (e.g., cryptography-based approaches)
While CodeCipher falls within the broad area of security, it is fundamentally different in technical scope from cryptography-based approaches, making direct comparisons challenging.
As clarified in our response to Reviewer Co3V, cryptography-based methods, such as homomorphic encryption, often require significant modifications to both client- and server-side architectures and training methodologies. For example, implementing homomorphic encryption necessitates substantial changes to the self-attention mechanism, which lies outside the scope of this work. In contrast, CodeCipher provides a practical and lightweight solution by focusing solely on minimal data transformations on the client side. It does not require any modifications to server-side models or architectures, nor does it interfere with the underlying AI tasks. We believe this simplicity makes CodeCipher more accessible and easier to deploy in various application scenarios.
The lack of privacy protection metrics (e.g., under adaptive attacks)
CodeCipher is designed as a lightweight code obfuscation method and does not fall within a traditional "attack-and-defense" framework. In our problem setting, there is no explicit "attacker," making adaptive attack metrics inapplicable to our approach. The primary focus of this work is on obfuscating code to protect it from unauthorized use by LLMs while ensuring its utility for AI-driven tasks such as summarization and translation. While privacy protection is inherently part of the motivation for obfuscation, the paper does not aim to address adversarial attacks or defenses on models directly.
We acknowledge the importance of exploring the robustness of obfuscation methods against adaptive attacks. We plan to investigate this aspect further to broaden the applicability of CodeCipher in more adversarial contexts.
We appreciate Reviewer 7kDk for your constructive feedback. We believe there was a significant misunderstanding about the research scope and objective, which has been thoroughly clarified in our previous response. As the discussion period ends soon, we would like to ask Reviewer 7kDk if our response has clarified your concerns. If so, would you kindly increase your score?
The paper introduces CodeCipher, a new learning-based code obfuscation technique tailored for LLMs.
CodeCipher safeguards code from unauthorized training, reading, compiling, and execution, without sacrificing LLM service quality.
The main idea behind CodeCipher is to transform the LLM’s embedding matrix so that each row corresponds to a different word in the original matrix. This process creates a token-to-token confusion mapping, which the system uses to obfuscate tokens when encountering new code snippets.
优点
The idea of using a token-to-token confusion mapping is interesting.
The efficacy of the proposed approach was assessed across three AI-assisted coding tasks: code completion, code summarization, and code translation.
Results revealed that the proposed method surpassed a range of conventional obfuscation methods in terms of both the level of confusion and the preservation of performance for downstream tasks.
缺点
Only rule-based baselines are compared.
More related techniques need to be mentioned and compared, such as:
Cryptography-based Approaches (e.g., Homomorphic Encryption, Multi-Party Computation, Functional Secret Sharing, Differential Privacy in Inference):
W.-j. Lu, Z. Huang, Z. Gu, J. Li, J. Liu, K. Ren, C. Hong, T. Wei, and W. Chen, “Bumblebee: Secure two-party inference framework for large transformers,” Cryptology ePrint Archive, 2023.
I. Zimerman, M. Baruch, N. Drucker, G. Ezov, O. Soceanu, and L. Wolf, “Converting transformers to polynomial form for secure inference over homomorphic encryption,” arXiv preprint arXiv:2311.08610, 2023.
X. Liu and Z. Liu, “Llms can understand encrypted prompt: Towards privacy-computing friendly transformers,” arXiv preprint arXiv:2305.18396, 2023.
Detection-based Approaches (e.g., Direct Detection, Contextual Inference Detection):
S. Kim, S. Yun, H. Lee, M. Gubri, S. Yoon, and S. J. Oh, “Propile: Probing privacy leakage in large language models,” 2023. N. Mireshghallah, H. Kim, X. Zhou, Y. Tsvetkov, M. Sap, R. Shokri, and Y. Choi, “Can llms keep a secret? testing privacy implications of language models via contextual integrity theory,” 2023.
Hardware-based Approaches (e.g., Data Locality, Confidential Computing with Trusted Execution Environment (TEE)):
Y. Wang, Y. Lin, X. Zeng, and G. Zhang, “Privatelora for efficient privacy preserving llm,” arXiv preprint arXiv:2311.14030, 2023.
T. South, G. Zuskind, R. Mahari, and T. Hardjono, “Secure community transformers: Private pooled data for llms.” 2023.
W. Huang, Y. Wang, A. Cheng, A. Zhou, C. Yu, and L. Wang, “A fast, performant, secure distributed training framework for large language model,” arXiv preprint arXiv:2401.09796, 2024.
问题
Can the proposed idea be applicable and scalable for state-of-the-art code generation models? How does the proposed idea work on state-of-the-art code generation models?
Jingxuan He, Martin Vechev. Large Language Models for Code: Security Hardening and Adversarial Testing. 2023. In CCS. https://arxiv.org/abs/2302.05319.
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In ICLR. https://arxiv.org/ abs/2203.13474
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis. 2023. InCoder: A Generative Model for Code Infilling and Synthesis. In ICLR. https://arxiv.org/ abs/2204.05999
Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Muñoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. 2023. SantaCoder: Don’t Reach for the Stars! CoRR abs/2301.03988 (2023). https://arxiv.org/abs/2301.03988
We appreciate the reviewer for the comprehensive and valuable comments. We have meticulously reviewed each of your points and provided corresponding clarifications as follows:
Only rule-based baselines are compared.
CodeCipher is not solely compared with rule-based methods. In addition to the conventional code obfuscation methods (rule-based), we also compare it with two LLM-based approaches, including 1) directly instructing LLMs for obfuscation and 2) obfuscating and then informing LLMs. Details of these baselines, including their setup and implementation, are provided in Section 5.1.
More related techniques need to be mentioned and compared, such as cryptography-based approaches (e.g., Homomorphic Encryption, Multi-Party Computation, Functional Secret Sharing, Differential Privacy in Inference), detection-based approaches (e.g., Direct Detection, Contextual Inference Detection), and hardware-based approaches.
We appreciate your suggestion and the comprehensive references. However, these cryptography-based, detection-based, and hardware-based approaches differ significantly in technical scope and are thus challenging to compare directly with CodeCipher. One key reason is that they often necessitate complex modifications to both client- and server-side model architectures and training methodologies to enable privacy-preserving communication. For example, homomorphic encryption requires substantial changes to the self-attention architecture, which differs significantly from our technical scope.
In contrast, CodeCipher offers a lightweight, practical solution that only involves minimal data transformations on the client side without altering the model itself or requiring server modifications. This simplicity makes CodeCipher a more accessible option for obfuscation in many application settings. Following your advice, we have revised the related work section to include a more comprehensive differentiation of our work.
Question: can the proposed idea be applicable and scalable for state-of-the-art code generation models? How does the proposed idea work on state-of-the-art code generation models?
In Section 5.6, we evaluate the transferability of our approach, wherein we feed the obfuscated code by our method into other models, including the state-of-the-art models such as DeepSeekCoder and CodeLlama. Results show that our approach can be applicable and scalable for state-of-the-art code generation models.
Thank you to the authors for responding to my comments and concerns. Based on the authors' responses, I keep my original score.
This submission received mixed reviews and sits at the boundary. Reviewers Co3V and 7KDk raised concerns regarding the comparison with baselines and transferability analysis. While the authors provided a rebuttal addressing these points, their arguments lacked sufficient detail, leaving some concerns unresolved. Additionally, Reviewer v6Ge, who provided the highest rating, did not actively advocate for acceptance during the discussion phase. Given the remaining concerns and the current positioning of this submission, I regretfully recommend rejection.
审稿人讨论附加意见
During the discussion phase, Reviewer 7KDk finalized their rating at 5, with remaining concerns about the submission. Meanwhile, Reviewer v6Ge, despite providing the highest rating, did not actively advocate for acceptance. Given these factors, I do not think the paper has reached the acceptance bar.
Reject