DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness
We proposed a framework to make the ML-based malware detection model more robust
摘要
评审与讨论
The authors propose DRSM, a custom de-randomized smoothing algorithm for MalConv, an end-to-end convolutional neural network. DRSM works by dividing input malware in chunks, each of them containing a percentage of the input bytes. These are classified independently, and labels are provided through majority voting. Table 3 shows that, using different number of windows, DRSM is able to certify more than 40% of points, with a peak of almost 54%. To empirically show their results, the authors also use state-of-the-art adversarial attacks against DRMS, highlighting that DRMS-12 to 24 are able to stop most of the proposed attacks. Lastly, the authors also release their PACE dataset, by sharing URLs and SHAs of both malware and goodware programs.
优点
- This paper bridges an interesting gap between general machine learning robustness, and its application to complex domains like malware detection.
- The certification approach works by splitting malware in chunks, averging its predictions. This is doable thanks to the nature of the architecture. Also, this would have been much more difficult to do with a model relying on hand-crafted features.
- The PACE dataset is timely, since it is always difficult to get goodware samples.
缺点
No white-box evaluation. The authors state that they compute attacks on the base model, thus framing them as white-box attacks (like Partial DOS; Shift, etc). However, these are evaluated as black-box transfer attack, and thus should be clarified on the paper.
Probable bad params for attacks. The low success rates of attacks (especially GAMMA) might be due to a wrong initialisation. In the appendix, it is written that 200 as population size and query are used, but the number of queries for the GAMMA attack are computed as population_size * iterations. Also, the number of used sections is missing (which is a crucial point for the attack).
Dataset concerns. While the release of a goodware dataset is for sure a great contribution, I am doubtful on the composition of such corpus. In particular, the sources might contain malware or generic unwanted propgrams (Softonic is known to host plenty of installers and grayware that asks you to install other third-party programs). The authors should better clarify the origins of these data, or at least try to study the quality of the provided ground truth. Otherwise, the dataset might contain biases that reduce the fairness of the publication.
False statements. The authors state that "it is difficult to add more than 10% of content". This statement is false, since adversarial malware attacks are automated through tools. Papers like [Demetrio et al. 2021a&b / Lucas et al. 2021] can increase the size more than 10% of the file size (Lucas et al. bound it to 5% just to not enlarge too much the input file). Lastly, Header Modification is not proposed by Nisi et al. 2021, but it is contained inside the SecML Malware library, inspired by the paper (that states which fields are not used by the loader anymore).
Limitations and related work not addressed. The paper does not discuss limitations of their methodology, by just saying that it is certification is a difficult problem to solve. Also, related work misses a preliminary (but unpublished) paper [1] that addressed the problem in the early months of 2023 (more than 6 moths ago). It would be better to mention the fact that preliminary work on certification for malware detection are already there.
[1] Certified Robustness of Learning-based Static Malware Detectors - https://arxiv.org/pdf/2302.01757.pdf
问题
-
How you conducted the adversarial attacks? Which library did you use / how you obtained the code of attacks? Inside the provided material, it is not possible to test the attacks from Lucas et al.
-
Can the authors provide better information on the quality of the collected goodware?
"Probable bad params for attacks. The low success rates of attacks (especially GAMMA) might be due to a wrong initialisation. In the appendix, it is written that 200 as population size and query are used, but the number of queries for the GAMMA attack are computed as population_size * iterations. Also, the number of used sections is missing (which is a crucial point for the attack)."
We apologize for mentioning the query number as 200. We set the population size as 200, and ran it for 20 iterations. We have corrected it in the Appendix A.4.8. Also, for further confirmation, you can go to our provided code, and in the line 68 of the
attack_malconv.pyfile, you can find the initialization. It is -attack = CGammaSectionsEvasionProblem(section_population, CEnd2EndWrapperPhi(net), population_size=200, penalty_regularizer=1e-12, iterations=20, threshold=0.5)
Thank you so much for this clarification, but what about the goodware used to manipulate the malware? How many sections? Which sections? I agree that there is not much space, but these details are important. Also because defenses should be evaulated at the best possible to avoid the same history on vision models [1,2,3].
[1] Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. Advances in neural information processing systems, 33, 1633-1645. [2] Athalye, A., Carlini, N., & Wagner, D. (2018, July). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning (pp. 274-283). PMLR. [3] Carlini, N., & Wagner, D. (2017, November). Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM workshop on artificial intelligence and security (pp. 3-14).
"... but what about the goodware used to manipulate the malware? How many sections? Which sections?"
Thank you for the response. We also agree that “defenses should be evaluated at the best possible to avoid the same history on vision models”. For payload extraction in the GAMMA attack, we used the ‘.data’ section from benign files randomly selected from our PACE dataset. It is already mentioned in the Appendix A.5.8.
"How you conducted the adversarial attacks? Which library did you use / how you obtained the code of attacks? Inside the provided material, it is not possible to test the attacks from Lucas et al."
For most of the attacks we used the
secml-malwarePython library. Also, we built the DRSM framework on top of the secml-malware library so that it can be easily reproduced, extended, and evaluated against more attacks in future (mentioned in the footnote of page 2).We really appreciate your effort of checking our provided material and code. About Disp and IPR attack of Lucas et. al. – we collected the code implementation directly from the authors. Though we want to make everything of this work publicly available, we cannot do that for Disp and IPR attack. The white-box implementation of these attacks was kept private by the authors, and before access, we agreed to keep it private too.
Thank you for this reply, I would like the authors to add this statement to the paper. It should be clear what is reproducible and what is not (and why).
"False statements. The authors state that "it is difficult to add more than 10% of content". This statement is false, since adversarial malware attacks are automated through tools. Papers like [Demetrio et al. 2021a&b / Lucas et al. 2021] can increase the size more than 10% of the file size (Lucas et al. bound it to 5% just to not enlarge too much the input file)."
In the last paragraph of subsection 6.2, we mentioned that – perturbing 200KB in 2MB file (=10%) is ‘challenging’ in a malware file. If that is the concern, we have rephrased it and toned it down to ‘considered as a sizeable modification’. If you have further suggestions, let us know.
"Lastly, Header Modification is not proposed by Nisi et al. 2021, but it is contained inside the SecML Malware library, inspired by the paper (that states which fields are not used by the loader anymore)."
We are aware of the fact that – Nisi et. al. 2021 did not explicitly propose the header field modification attack but we wanted to mention the most relevant paper to the attacks so that interested readers can study them. We have changed ‘proposed by’ to ‘motivation from’ in our text for Nisi et. al. paper. If you have further suggestions regarding presentations, we are more than happy to incorporate them.
Thank you for re-wording that sentence. Still, I would add a citation to the technique you have used from which paper. So, if you have coded that attac by yourself, just write so. Otherwise, explicitly say where you took the attack implementation.
"Thank you for re-wording that sentence. Still, I would add a citation to the technique you have used from which paper. So, if you have coded that attac by yourself, just write so. Otherwise, explicitly say where you took the attack implementation."
"Thank you for this reply, I would like the authors to add this statement to the paper. It should be clear what is reproducible and what is not (and why)."
We apologize for not making it very clear. In our first draft, we already mentioned that – Disp and IPR attack implementations were collected from the authors and they are private in A.5.6 and A.5.7. For better clarity, now we have added another subsection, named ‘Implementation Sources’ at the beginning of discussing all attacks (Appendix A.5.1). Here, we are mentioning that – we used the
secml-malwarepython library for all attack implementations (except Disp, IPR), and Disp, IPR implementations are not reproducible. Let us know if we should make more changes. We would be happy to have your feedback.
"Dataset concerns. While the release of a goodware dataset is for sure a great contribution, I am doubtful on the composition of such corpus. In particular, the sources might contain malware or generic unwanted propgrams (Softonic is known to host plenty of installers and grayware that asks you to install other third-party programs). The authors should better clarify the origins of these data, or at least try to study the quality of the provided ground truth. Otherwise, the dataset might contain biases that reduce the fairness of the publication."
Evaluating the quality of goodware data is a very good point, and we are actively trying to address this. We are in the process of getting access to the premium API of VirusTotal, and are planning to scan all the benign files to filter out any malicious file before the official release. Meanwhile, we are working on getting some preliminary statistics before the discussion ends.
Thank you for your concise and constructive comments.
"No white-box evaluation. The authors state that they compute attacks on the base model, thus framing them as white-box attacks (like Partial DOS; Shift, etc). However, these are evaluated as black-box transfer attack, and thus should be clarified on the paper."
Thank you for pointing this out. We agree the current description might be confusing: Since the attacks require white-box access to the base models, they are indeed white-box. However, we agree they are in some sense transfer attacks since it is the base model rather than the DRSM itself being attacked. We have updated the Section 7 for clarity. Let us know if you have further suggestions regarding presentations. We appreciate your feedback.
Thank you for the answer. But transfer attacks are black-box attacks. You are optimizing samples against a believed-similar model, and then you hope they will be effective on the target. I read Section 7 and I think it should be clearly stated that attacks where computed on base malconv (through gradient-descent methods) and then later transfered to the real target, mimicking a blackbox attack. Then, it should be reported in limitation that currently, no state of the art gradient-based attacks have been defined.
"Thank you for the answer. But transfer attacks are black-box attacks. You are optimizing samples against a believed-similar model, and then you hope they will be effective on the target. I read Section 7 and I think it should be clearly stated that attacks where computed on base malconv (through gradient-descent methods) and then later transfered to the real target, mimicking a blackbox attack. Then, it should be reported in limitation that currently, no state of the art gradient-based attacks have been defined."
We revised the description to clearly state that attacks are computed on base MalConv (through gradient descent method) and then later transferred to the DRSM. However, we respectfully suggest that there is still a difference between this and typical transfer attacks, and describing such attack as black-box can be misleading and should be refrained, since it might cause a false impression that such attack is done without accessing the weights of the base models. However, we have mentioned that there is no state-of-the-art gradient-based attacks for this in the newly added ‘Limitations’ section (section 8).
*"Limitations and related work not addressed. The paper does not discuss limitations of their methodology, by just saying that it is certification is a difficult problem to solve.
We wanted to put a ‘Limitations’ section before the ‘Conclusion’. However, due to the page constraint, we could not do that. So, we have added a ‘Limitations’ section in our Appendix (A.6) and referred to it in our ‘Conclusion’ section.
Also, related work misses a preliminary (but unpublished) paper [1] that addressed the problem in the early months of 2023 (more than 6 moths ago). It would be better to mention the fact that preliminary work on certification for malware detection are already there. [1] Certified Robustness of Learning-based Static Malware Detectors - https://arxiv.org/pdf/2302.01757.pdf"*
Although ICLR policy tolerates such overlooks for preprints without peer reviews that are published after May 28 (https://iclr.cc/Conferences/2024/ReviewerGuide), we have discussed this recent paper in the ‘Related Work’ of our revision.
Limitations are an important part of the paper, it helps reader understand where to go next. I totally understand the problems with space, but there are many tweaks you can do. First, remove bullet lists, they consume plenty of space. Same for paragraphs: you can use \noindent to break the line without adding space. Then, all equations without numbers can go inline, etc.
"Limitations are an important part of the paper, it helps reader understand where to go next. I totally understand the problems with space, but there are many tweaks you can do. First, remove bullet lists, they consume plenty of space. Same for paragraphs: you can use \noindent to break the line without adding space. Then, all equations without numbers can go inline, etc."
Thank you for your suggestions. We really appreciate this. And we have worked on this. Eventually, we made some extra space and added the ‘Limitations’ section (section 8).
We have so far scanned randomly selected 1000 benign files from our PACE dataset. For each file, we got results from 54~72 engines from the VirusTotal. So far, we have got only 1 file (MD5 hash = cfa051242ce5d13f9fb588736c4601ec) that was detected as malicious by 19 engines out of 67. The rest of the files were found as ‘benign’. So, even if we label a file ‘malware’ just because of getting detected by only 28% of the engines (which would be a conservative assumption), statistically, there will be 0.1% malicious files in our dataset. However, we understand that 1000 files do not represent the whole dataset, so, we are still in the process of scanning all files. And we will give you another update just before the discussion phase ends.
Thank you for your work during this rebuttal period. I raised my score to ACCEPT (8).
Thanks for your reconsideration and constructive feedback. We believe your feedback has helped us to make the paper better. On another note, we have scanned another 1000 benign files and got no malicious file in these 1000. And we have just got the premium API for VirusTotal. We hope that we will be able to finish scanning all benign files very soon.
This paper applies the de-randomized smoothing technique from the image classification domain to the domain of malware detection - proposing a window ablation scheme. Theoretical robustness is argued and certified and empirical robustness (the latter against a broad range of attacks) is tested empirically. A dataset of benign executables will also be made available to support future research.
优点
This is a well written paper which clearly presents the core idea. The experiments are quite thorough and support the claims.
Making the dataset available for future research is a positive.
缺点
The basic idea is quite simple, and this is not a contribution of major impact in the field.
It would have been good to see a description of de-randomized smoothing in the related work, as this is core to the idea.
There are a few grammatical issues throughout the paper - it needs a polish before publication, e.g.
- "We will use it as base classifiers"
- "potentially attributing to the issue"
- "convolution neural network"
- "there have been a large amount of work"
- "to some extents"
- "though it has been believed as a robust model"
- "However, it's worth highlighting.." not sure However is right word.
Minor issues:
- X \subset [0,N-1] is wrong. [0,N-1] is a real-valued continuous interval. I think it means X is the set {0, 1, ..., N-1}
问题
Would it be possible to indicate the values of \Delta on the x-axis of figure 3? Everything has been in terms of that up to this point.
One issue would be with attacks which INSERT bytes. Section 3.2 seems to suggest this is possible ("attacker can modify or add any bytes in a contiguous portion"). Surely, if added at the start of the file, this can change the contents then of EVERY window, as all the bytes get shifted to the right. So adding bytes does not seem to be in the threat model that would give certified robustness.
In fact, the above comment may explain why the DOS Extension attack has such a big effect on the DRSM models in Figure 5.
伦理问题详情
N/A
"One issue would be with attacks which INSERT bytes. Section 3.2 seems to suggest this is possible ("attacker can modify or add any bytes in a contiguous portion"). Surely, if added at the start of the file, this can change the contents then of EVERY window, as all the bytes get shifted to the right. So adding bytes does not seem to be in the threat model that would give certified robustness."
"In fact, the above comment may explain why the DOS Extension attack has such a big effect on the DRSM models in Figure 5."
This is a very good observation, and we are thankful for this. It might be the case that – the DOS extension attack has an impact on the rest of the windows to some extent, and thus the DRSM-4 and DRSM-8 are less robust to this attack. Since this attack still cannot directly perturb or alter bytes in other windows, it is partially aligned with our threat model. So, we have modified our Table 4, subsection 3.2, and Appendix A.4.3, accordingly. We have also added a subsection in the Appendix (A.4.1) discussing the probable reason for the higher ASR of DOS Extension Attack.
A future mitigation can be – extracting sections from the file and training different base classifiers on each of them (discussed in the 'Limitations' in Appendix A.6). Thus, attacks like DOS extension will not be able to impact other (or later) windows.
Thank you for your responses. I acknowledge and appreciate the updates. That said, I am maintaining my original review score as I think this still represents a fair overall view of the paper.
We respect your decision, and we really appreciate your concise feedback. We believe your feedback has helped us to make the paper better.
"Would it be possible to indicate the values of \Delta on the x-axis of figure 3? Everything has been in terms of that up to this point."
We have added another figure (Figure 10 in the Appendix A.4) that shows the certified accuracy in terms of . From this figure, it is interpretable that a smaller window size achieves higher certified accuracy. At the same time, we want to mention that – a smaller window allows attackers a smaller budget for perturbation. For example, if , in DRSM-4, the attacker can perturb up to 511K bytes, whereas in DRSM-24, it is 82K bytes. So, it might not be fair to compare these models just depending on the , and hence, we kept the Figure 3 showing accuracy with respect to perturbed bytes in the main text, and put the Figure 10 in the Appendix so that interested readers can see them and easily interpret the results. We would also love to have your feedback on this one.
"There are a few grammatical issues throughout the paper - it needs a polish before publication,"
We really appreciate your effort in reading our paper thoroughly and pointing out the grammatical issues. We have corrected all of the mentioned ones. You can find them in blue text in the updated version of our submission. Let us know if you find anything else. We would love to have your feedback.
"It would have been good to see a description of de-randomized smoothing in the related work, as this is core to the idea."
While we understand that it would have added more clarity if we discussed de-randomized smoothing more in the ‘Related Work’, we could not do it due to space constraints. So, we have added a subsection in the Appendix (A.2) discussing the de-randomized smoothing in-depth, and have referred to this in our ‘Related Work’.
"X \subset [0,N-1] is wrong. [0,N-1] is a real-valued continuous interval. I think it means X is the set {0, 1, ..., N-1}"
Thank you for pointing this out. We have addressed this in our revised version.
This paper applies de-randomized smoothing to produce a classifier for malware detection (called DRSM) that is certifiably robust against patch attacks. The proposed classifier can be viewed as an ensemble of base classifiers, each of which operates on a distinct block of the input file. After collecting predictions for each block from the base classifiers, the prediction for the file as a whole is made by majority vote. This architecture admits a patch certificate that depends on the block size, maximum input length and the voting margin. The included experiments show that DRSM (with MalConv base classifiers) achieves a similar accuracy as vanilla MalConv, while producing patch certificates of order 100KB in size. Experiments examining empirical robustness to several attacks are also reported, which generally demonstrate improvements compared to vanilla MalConv and MalConv with non-negative weights. The paper also contributes a new dataset of benign executables, which is useful given the limited availability of publicly available benchmark datasets for malware.
优点
-
It’s great to see a paper investigating certified robustness outside the vision domain, which has dominated the literature to date. The patch-like threat model seems well-motivated for malware, given it encompasses several existing attacks.
-
Another strength of the work is its simplicity. DRSM is conceptually straightforward to implement and analyze, which could reduce barriers to adoption.
-
I’m pleased the authors have found a way to share a public dataset for malware analysis. The lack of public benchmark datasets is a major impediment for academic malware research. Having this dataset available will save researchers' time, and it should allow for better comparison between papers (which tend to use different datasets currently).
缺点
-
The paper can be seen as applying an existing method (de-randomized smoothing) to a new domain (malware). Although the authors claim that it is “challenging” to adapt de-randomized smoothing for malware, I’m not convinced. The proposed method is a form of structured ablation, originally studied by Levine & Feizi (2020) for 2-d inputs with homogeneous base classifiers. The modification from 2-d to 1-d inputs and from homogeneous to heterogeneous base classifiers seems straightforward. Moreover, the proposed method has appeared in prior work in a more general form by Hammoudeh & Lowd (2023).
-
The paper claims to be “first to offer certified robustness in the realm of static detection of malware executables”. However there is prior work on this topic by Huang et al. (2023) which appeared on arXiv in January 2023. Their work considers a different threat model for malware: edit distance robustness rather than patch robustness.
-
A characteristic feature of the malware domain is that inputs vary in length. However, it’s not clear to me how the proposed classifier architecture handles this. As an example, consider a 100 KB malicious file and a classifier with a maximum input length of 2 MB. For in the range 4–20, the malicious file fits within a single block, meaning it is passed to a single base classifier, while the remaining base classifiers receive padding as input. Assuming the base classifiers predict “benign” for padding, the votes are “malicious” and “benign” giving a prediction of “benign”. I wonder if I’m missing something here, because it seems the classifier is guaranteed to make false negative errors on small files, which are abundant according to Figure 6.
-
I found the description of the threat model and certificate unclear. Section 3.2 states that the attacker is allowed to “modify or add any bytes in a contiguous portion”. I understand that “modify” means overwrite or replace, but it’s not clear what “add” means in this context. For instance, “add” could mean “insert” or “append”, or it may mean “increment or decrement by some amount”. The mathematical description implies the original sequence is additively perturbed by (which is undefined), but this seems to be at odds with the earlier description. Reading between the lines, my understanding is that the certificate covers a contiguous chunk of the original file being overwritten, which may include some bytes being appended to the end of the file. This should be precisely stated somewhere. The current definition of the certificate in Section 5 is in terms of ablated sequences – it would be helpful to translate this to the input space.
-
It’s great that the authors are planning to release the PACE dataset. However, the current description of the collection process is a bit light on detail. It would be helpful to describe how binaries were selected from the various sources. For instance, was there a preference for recent binaries? Are the binaries for a single platform (e.g., Windows x64) or multiple platforms? How do you know the binaries are benign?
Minor points:
- Section 2 states that it’s “surprising” MalConv is still considered state-of-the-art for malware detection on raw byte sequences, given it was released in 2018. The authors attribute this to limited availability of public data. However, I think the main reason is due to difficulties in scaling more complex models (such as transformers) to very long sequences, containing upwards of a million tokens. It’s worth pointing out that the authors of MalConv have released a follow up model known as MalConv 2 (Raff et al., 2021).
- Section 3 states that the input vector fed into the network “has to be of a fixed dimension”. This is not true in general, and I don’t believe it’s true for MalConv. I believe it is possible to support arbitrary length inputs in modern frameworks such as PyTorch by specifying
Nonefor the size of the dimension. - Section 3.1 states that models like EMBER and GBDT “can work only on feature vectors”. I’m a bit puzzled by this statement. When composed with their feature extractors, these models must be able to operate on raw binaries, otherwise they would be useless as malware detectors?
- Section 5 states that vision-oriented ablation techniques such as masking and block ablations are infeasible for byte sequences. However I can’t see why these wouldn’t work on 1d sequences? It seems feasible to mask bytes or ablate 1d blocks.
- Section 7 states MalConv NonNeg “has been believed as a robust model for a long time”. It would be good to include a citation for this claim.
References
-
Huang et al., “Certified robustness of learning-based static malware detectors,” arXiv:2302.01757 (2023). https://arxiv.org/abs/2302.01757
-
Hammoudeh & Lowd, “Feature Partition Aggregation: A Fast Certified Defense Against a Union of Attacks,” AdvML-Frontiers 2023. https://openreview.net/forum?id=NX5Nxrz6PV
-
Levine & Feizi, “(De)Randomized Smoothing for Certifiable Defense against Patch Attacks,” NeurIPS 2020. https://proceedings.neurips.cc/paper/2020/file/47ce0875420b2dbacfc5535f94e68433-Paper.pdf
-
Raff et al., “Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,” AAAI 2021. https://ojs.aaai.org/index.php/AAAI/article/view/17131/16938
问题
It would be great if the authors could comment on my feedback about:
- novelty of DRSM (how does it differ from Levine & Feizi (2020) and Hammoudeh & Lowd (2023)?)
- how DSRM operates on small files
- the threat model
"The paper claims to be “first to offer certified robustness in the realm of static detection of malware executables”. However there is prior work on this topic by Huang et al. (2023) which appeared on arXiv in January 2023. Their work considers a different threat model for malware: edit distance robustness rather than patch robustness."
Although ICLR policy tolerates such overlooks for preprints without peer reviews that are published after May 28 (https://iclr.cc/Conferences/2024/ReviewerGuide), we have acknowledged and discussed this recent paper in the ‘Related Work’ of our revision. Additionally, we want to point out that – along with the different threat model (that you already mentioned), this paper adapts the randomized smoothing scheme, which differs from ours.
Thanks for pointing out the ICLR policy which excuses authors from citing preprints. It's great that you have decided to include a citation regardless.
Thank you for appreciating our revision.
We really appreciate your effort in going into the details and for your constructive feedback.
"The paper can be seen as applying an existing method (de-randomized smoothing) to a new domain (malware). Although the authors claim that it is “challenging” to adapt de-randomized smoothing for malware, I’m not convinced. The proposed method is a form of structured ablation, originally studied by Levine & Feizi (2020) for 2-d inputs with homogeneous base classifiers. The modification from 2-d to 1-d inputs and from homogeneous to heterogeneous base classifiers seems straightforward. Moreover, the proposed method has appeared in prior work in a more general form by Hammoudeh & Lowd (2023)."
We respectfully argue that the success of a ‘general’ method in a specific domain should not be taken for granted. An example will be the paper of Hammoudeh & Lowd (2023) mentioned in your review, which also resembles Levine & Feizi (2020): One contribution of Hammoudeh & Lowd (2023) is their study of partitioning strategies, in which they suggest using stridded input dimensions (pixels) is beneficial for vision tasks, potentially because information contained by adjacent pixels are redundant to some extent. In contrast, for the malware domain, due to its nature, some adjacent bytes are malicious only when they are combined, i.e., not considering adjacent bytes might end up representing a whole different instruction (or invalid instruction) that might not be malicious.
I agree that there is value in studying a general method (de-randomized smoothing) in a new domain, and will take this into consideration when reconsidering my score.
"I agree that there is value in studying a general method (de-randomized smoothing) in a new domain, and will take this into consideration when reconsidering my score."
Thank you for understanding and reconsidering. We appreciate your constructive feedback on this paper.
Thanks, the revision has addressed most of my concerns. I have decided to increase my score from 3 to 6.
Thank you for your reconsideration. We really appreciate your constructive feedback, and we believe it has helped us to make the paper better.
On another note, we have scanned another 1000 benign files and got no malicious file in these 1000. And we have just got the premium API for VirusTotal. We hope that we will be able to finish scanning all benign files very soon.
"Section 2 states that it’s “surprising” MalConv is still considered state-of-the-art for malware detection on raw byte sequences, given it was released in 2018. The authors attribute this to limited availability of public data. However, I think the main reason is due to difficulties in scaling more complex models (such as transformers) to very long sequences, containing upwards of a million tokens. It’s worth pointing out that the authors of MalConv have released a follow up model known as MalConv 2 (Raff et al., 2021)."
Thanks for pointing this out. Initially, we did not include this due to space constraints, but we have modified the first paragraph in the 'Related Work', and discussed the MalConv 2 model.
"Section 3 states that the input vector fed into the network “has to be of a fixed dimension”. This is not true in general, and I don’t believe it’s true for MalConv. I believe it is possible to support arbitrary length inputs in modern frameworks such as PyTorch by specifying None for the size of the dimension."
With that sentence, we wanted to mention the original implementation of MalConv that was done for 2MB of file size.
"Section 3.1 states that models like EMBER and GBDT “can work only on feature vectors”. I’m a bit puzzled by this statement. When composed with their feature extractors, these models must be able to operate on raw binaries, otherwise they would be useless as malware detectors?"
We are aware that EMBER, GBDT can work as malware detectors that require an extra feature extraction step unlike our work, and we apologize for the misunderstanding. We have rephrased that sentence in subsection 3.1.
"Section 5 states that vision-oriented ablation techniques such as masking and block ablations are infeasible for byte sequences. However I can’t see why these wouldn’t work on 1d sequences? It seems feasible to mask bytes or ablate 1d blocks."
An instruction is converted to multiple bytes that are usually contiguous, and masking a random byte might end up changing the meaning of an instruction or in the worst case, an invalid instruction. Block ablation was proposed for 2D input, it is not clear how it can be applied for 1D input. (We also update our manuscript with a recap of de-randomized smoothing in vision tasks in Appendix A.2.).
"Section 7 states MalConv NonNeg “has been believed as a robust model for a long time”. It would be good to include a citation for this claim."
We see how this statement might be controversial. We have removed it from section 7.
"It’s great that the authors are planning to release the PACE dataset. However, the current description of the collection process is a bit light on detail. It would be helpful to describe how binaries were selected from the various sources. For instance, was there a preference for recent binaries? Are the binaries for a single platform (e.g., Windows x64) or multiple platforms? How do you know the binaries are benign?"
Thanks for appreciating our plan to release the PACE dataset. The sources of this dataset are given in Table 2, and we crawled these websites to download the benign files. We downloaded them in August, 2022 and there was no preference for recent binaries. Yes, the binaries are for a single platform (windows). Evaluating the quality of goodware data is a very good point, and we are actively trying to address this. We are in the process of getting access to the premium API of VirusTotal, and are planning to scan all the benign files to filter out any malicious file before the official release. Meanwhile, we are working on getting some preliminary statistics before the discussion ends.
Thanks for clarifying, this sounds like a great plan.
We have so far scanned randomly selected 1000 benign files from our PACE dataset. For each file, we got results from 54~72 engines from the VirusTotal. So far, we have got only 1 file (MD5 hash = cfa051242ce5d13f9fb588736c4601ec) that was detected as malicious by 19 engines out of 67. The rest of the files were found as ‘benign’. So, even if we label a file ‘malware’ just because of getting detected by only 28% of the engines (which would be a conservative assumption), statistically, there will be 0.1% malicious files in our dataset. However, we understand that 1000 files do not represent the whole dataset, so, we are still in the process of scanning all files. And we will give you another update just before the discussion phase ends.
"I found the description of the threat model and certificate unclear. Section 3.2 states that the attacker is allowed to “modify or add any bytes in a contiguous portion”. I understand that “modify” means overwrite or replace, but it’s not clear what “add” means in this context. For instance, “add” could mean “insert” or “append”, or it may mean “increment or decrement by some amount”. The mathematical description x′=x+delta implies the original sequence x is additively perturbed by delta (which is undefined), but this seems to be at odds with the earlier description. Reading between the lines, my understanding is that the certificate covers a contiguous chunk of the original file being overwritten, which may include some bytes being appended to the end of the file. This should be precisely stated somewhere. The current definition of the certificate in Section 5 is in terms of ablated sequences – it would be helpful to translate this to the input space."
Thanks for your constructive feedback and asking for the clarification of the term ‘add’. We have rephrased a few sentences in the second paragraph of our threat model (subsection 3.2). With the term ‘add’, we meant the attacker can ‘insert’ or ‘append’ any bytes; not ‘increment / decrement’. Additionally, we have mentioned that it has to be bounded in a contiguous portion. Also, we apologize if the mathematical description looked confusing. We have reformed that into a simple sentence now for better clarity. Additionally, we have rephrased a few sentences in the third paragraph of the threat model, and mentioned what type of attack falls within our threat model. Let us know if you have further suggestions about the presentation. We would appreciate your feedback.
"A characteristic feature of the malware domain is that inputs vary in length. However, it’s not clear to me how the proposed classifier architecture handles this. As an example, consider a 100 KB malicious file and a classifier with a maximum input length of 2 MB. For n in the range 4–20, the malicious file fits within a single block, meaning it is passed to a single base classifier, while the remaining n-1 base classifiers receive padding as input. Assuming the base classifiers predict “benign” for padding, the votes are 1 “malicious” and n−1 “benign” giving a prediction of “benign”. I wonder if I’m missing something here, because it seems the classifier is guaranteed to make false negative errors on small files, which are abundant according to Figure 6."
This is a very good observation. To tackle this issue in our DRSM framework, we consider the votes (or predictions) from the base classifiers that get any input (except padding). We could consider all votes and solve this by adding an extra learnable layer (such as logistic regression) on all votes, but it would have hurt the non-differentiability property (eventually, certified robustness) of the whole framework.
To tackle this issue in our DRSM framework, we consider the votes (or predictions) from the base classifiers that get any input (except padding)
Just to clarify, are you saying that DRSM does not incorporate votes from base classifiers that receive padding exclusively as input? Could you point out where this behavior is specified in the paper? Also, if votes from some classifiers are excluded dependent on the input, does the robustness certificate remain valid?
Yes, and we have discussed this in our newly added ‘Limitations’ section (section 8). The robustness certificate remains valid since it just depends on the difference between and , i.e., the difference between the malware and benign votes.
The author employs the de-randomized smoothing technique to develop a certified defense for malware detection. Furthermore, the author introduces a new dataset named PACE, comprising 15.5K recent benign raw executables from diverse sources. Experimental results validate the effectiveness of their approach.
优点
- The paper is well written and generally easy to follow.
- A well-structured and clear presentation.
- I appreciate the author for providing a novel dataset.
缺点
- The motivation behind the method somewhat contradicts intuition to a certain extent.
- There is a lack of comparison with recent methods for defense adversarial attack.
问题
1.The proposed "window ablation" strategy seems to be applicable only to scenarios where perturbations are clustered together (similar to patch attacks in the CV domain). However, many adversarial attack methods for malware disperse the inserted perturbations across various locations within the software. It remains unclear whether the proposed method would still be effective in such cases. Update: this has now been addressed
2.The construction of many malware samples follows a piggyback approach, where the majority of the software consists of benign code, with only a small portion exhibiting malicious behavior. That’s to say most of the ablated sequences will be given a benign label. The algorithm proposed by the author may result in false negatives for such malware samples. Update: this has now been addressed
3.It is essential to provide a comparative analysis of the author's proposed defense method against existing adversarial example defense techniques for malware. Update: this has now been addressed
"The proposed "window ablation" strategy seems to be applicable only to scenarios where perturbations are clustered together (similar to patch attacks in the CV domain). However, many adversarial attack methods for malware disperse the inserted perturbations across various locations within the software. It remains unclear whether the proposed method would still be effective in such cases."
We really appreciate your concise review.
Yes, you are right. The proposed method is ‘theoretically’ for the attacks that can modify or add in a contiguous region. And we also agree that there are many adversarial attacks that do not follow this and can modify/add in multiple regions. So, in this work, we actually evaluated our proposed model DRSM against such attacks, because we believe that – only ‘theoretical robustness’ is not enough for a security-critical application like malware detection. In section 7 ‘Empirical Robustness Evaluation’, we considered in total of 9 attacks where 5 of them can modify multiple regions in a file. In Table 4, we listed all these attacks with their short description and alignment with our threat model. For example, Slack append, Header field modification, and Gamma attack can perturb multiple regions in a malware file, and recently proposed Disp and IPR attacks can modify a file on the instruction level and are not limited to a certain region.
We used these attacks to compare our DRSM with the baseline model ‘MalConv’ and its more robust variant ‘MalConv (NonNeg)’. The results are shown in Figures 4 and 5.
"It is essential to provide a comparative analysis of the author's proposed defense method against existing adversarial example defense techniques for malware."
The existing defenses for adversarial malware can be broadly divided into two – Non-negative classifier, and Adversarial training. We have already included the former one, ‘MalConv (NonNegative)’, in our work and compared it in terms of standard accuracy, certified accuracy, and empirical robustness in Table 3 and Figures 4, 5 ( and Appendix A.3). We also want to emphasize that – recent work by Lucas et. al. [1] has already shown that – in most cases, the adversarially trained models (the second defense) do not provide good robustness against other attacks. Moreover, such defense compromises the standard accuracy too, for example, Lucas et. al. showed training the model adversarially on Kreuk-0.01 degraded the true positive rates to 84.4% ~ 90.1%. We have discussed this in the second paragraph of section 2 (Related Work).
[1] Adversarial training for {Raw-Binary} malware classifiers. https://www.usenix.org/conference/usenixsecurity23/presentation/lucas
"The construction of many malware samples follows a piggyback approach, where the majority of the software consists of benign code, with only a small portion exhibiting malicious behavior. That’s to say most of the ablated sequences will be given a benign label. The algorithm proposed by the author may result in false negatives for such malware samples."
This is a very good observation, and it is true that most of the contents in a malware file are actually benign. This argument actually aligns with our result too. We found that – if we increase the number of windows (decreasing the length of ablated sequences), the probability of an ablated sequence getting classified as ‘benign’ gets higher because a smaller ablated sequence would cover less content, and hence, less/no malicious content. As a result, for the higher number of windows (), DRSM models have less standard accuracy, and we showed this in Table 3 and discussed this in subsection 6.1. For example, DRSM-4 has 98.18% standard accuracy, whereas DRSM-24 achieves 90.24%. The goal of this paper was to – find a balance between standard accuracy and robustness.
This work introduces a certified defense called DRSM (De-Randomized Smoothed MalConv) through a redesign of the de-randomized smoothing technique in the context of malware detection. More specifically, they introduce a window ablation scheme that creates a series of ablated sequences by partitioning the input sequence into non-overlapping windows. Extensive experimentation involving 9 distinct empirical attacks of various types reveals that the proposed defense demonstrates empirical robustness when faced with a diverse range of attacks.
优点
- The authors have gathered 15.5K recent benign raw executables from a variety of sources. These files will be released to the public as a dataset named PACE (Publicly Accessible Collection(s) of Executables). This dataset aims to address the shortage of publicly available benign datasets for research in malware detection and to provide future studies with more representative contemporary data.
缺点
-
Insufficient theoretical analysis of certified robustness when facing multiple malicious ablated sequences. Specifically, the authors assume that an attacker generates a byte perturbation of size and can modify a maximum of ablated sequences. However, if the attacker simultaneously inserts multiple adversarial code segments at different locations, how will this impact the certified robustness? The authors should offer a more in-depth theoretical analysis of this scenario. Furthermore, the authors should establish the relationship between the window size () and the resulting certified robustness. For example, does a smaller window size lead to improved certified robustness?
-
In my view, there appears to be a contradiction between Figures 1 and 2. As discussed in Section 5, malicious ablated sequences are expected to influence their respective base classifiers, and the predicted winning class should be "benign." However, in Figure 1, the predicted winning class is labeled as "malware,". It confuses me. If this work indeed presents a more robust framework that prevents attackers from generating adversarial examples, then the winning class should ideally be "benign." However, if the winning class is consistently benign, it suggests that the proposed framework might miss detecting certain malware instances, creating a contradiction.
-
The absence of comparisons with a broader range of real-world antivirus engines accessible via VirusTotal is notable. It would be beneficial, for instance, to determine how many antivirus engines effectively identify the malware and test cases as malicious.
-
Another deficiency is the absence of specific details regarding the process of compromising executable files. The authors should provide a more comprehensive explanation of how to generate adversarial examples within the problem-space[Ref-1], which encompasses defining a comprehensive set of constraints on available transformations, preserving semantics, ensuring robustness to preprocessing, and maintaining plausibility.
-
The rationale behind the design choice is unclear. Why have the authors chosen MalConv as the baseline classifier? There are numerous alternative models that can serve as the baseline classifier, such as LGBM, RF, and SVM. The authors should consider evaluating their framework with these alternative baseline classifiers.
Pierazzi, Fabio, et al. "Intriguing properties of adversarial ml attacks in the problem space." 2020 IEEE symposium on security and privacy (SP). IEEE, 2020.
问题
-
If the winning class of Fig. 2 is "benign"?
-
Does a smaller window size lead to improved certified robustness?
伦理问题详情
No ethics.
"Is the winning class of Fig. 2 is "benign"?"
Figure 2 is a generalized Figure where the input file can be anything (malware or benign). The green check stands for a correct prediction, whereas the red cross stands for a wrong prediction. At the end, the bars stand for the number of correct vs wrong predictions. For example, let us assume that the input file is an adversarial malware, and the perturbed region of that malware falls into one of the ablated sequences of DRSM (shown with a red small block in ablated sequences). So, the base classifier misclassifies that sequence (shown with a red cross) but classifies the rest of the sequences correctly (shown with green checks). Hence, the winning class is still ‘malware’ (shown with the green bar in ‘count class prediction’). We apologize if the figure is confusing. We have added a small description in the caption. If there is any scope to make the presentation better, let us know. We would love your feedback.
"The rationale behind the design choice is unclear. Why have the authors chosen MalConv as the baseline classifier? There are numerous alternative models that can serve as the baseline classifier, such as LGBM, RF, and SVM. The authors should consider evaluating their framework with these alternative baseline classifiers."
We want to emphasize that CNN-based models like MalConv can take the whole raw bytes as input whereas models like LGBM, RF, or SVM require feature engineering (e.g., byte entropy), and they cannot take raw bytes as input. These features are known to be specifically targeted by adversaries to avoid detection, which makes "'being able to take the whole raw binary" attractive. However, this new capability introduces a risk of adversarial attacks too, which motivates us to position our defense for CNN-based models. As MalConv is one of the state-of-the-art CNN based static classifiers, it was a suitable choice to evaluate our framework. However, note that our window-ablation scheme is agnostic of the base classifier as it operates at the input level. As a result, it could be applied to any detector that consumes raw bytes.
"The absence of comparisons with a broader range of real-world antivirus engines accessible via VirusTotal is notable. It would be beneficial, for instance, to determine how many antivirus engines effectively identify the malware and test cases as malicious."
"Another deficiency is the absence of specific details regarding the process of compromising executable files. The authors should provide a more comprehensive explanation of how to generate adversarial examples within the problem-space[Ref-1], which encompasses defining a comprehensive set of constraints on available transformations, preserving semantics, ensuring robustness to preprocessing, and maintaining plausibility."
The main focus of this paper was to propose a better defense model for adversarial attacks, not the attacks themselves. Moreover, we evaluated our models against 9 different attacks and there is a page limitation. So, we just kept the gist of the attacks in our main paper. Table 4 lists all 9 attacks with their short description, and alignment with our threat model and settings. Additionally, in Appendix A.4, we included details for all of these attacks and their implementation in this work. Also, all of these attacks have already been proposed in published prior works showing that they do not compromise executable files, and we cited them so that any specific details about them can be retrieved if necessary. And some of these attacks (such as Disp, IPR, GAMMA, etc.) had already been tested against VirsuTotal and were found successful. We also want to mention that -- Disp, IPR attack (Lucas et. al. (2021)) reported that they can evade VirusTotal in 49%-53%, whereas in DRSM, these attacks could evade 42% and 9.50% cases, respectively, even for our weakest model (DRSM-4).
"In my view, there appears to be a contradiction between Figures 1 and 2. As discussed in Section 5, malicious ablated sequences are expected to influence their respective base classifiers, and the predicted winning class should be "benign." However, in Figure 1, the predicted winning class is labeled as "malware,". It confuses me. If this work indeed presents a more robust framework that prevents attackers from generating adversarial examples, then the winning class should ideally be "benign." However, if the winning class is consistently benign, it suggests that the proposed framework might miss detecting certain malware instances, creating a contradiction."
“If this work indeed presents a more robust framework that prevents attackers from generating adversarial examples, then the winning class should ideally be "benign."” – we want to mention that – this work does not prevent attackers from generating adversarial examples. We proposed a framework (DRSM) on top of an already existing classifier that can improve its robustness for adversarial malware; our goal was not to stop malware authors from generating them.
In Figure 1, we tried to show the fundamental difference between the original base classifier (MalConv) and our method (DRSM) with a toy example in a simple way. Here, for DRSM, the adversarial malware gets ablated into 3 non-overlapping windows (or sequences) and generates 3 different predictions. Since the perturbation impacted only one window (the middle portion in the file written with red font), DRSM gave the wrong prediction (benign) only for that one. But for the rest of the windows (the first and third portion in the file written with black font), DRSM correctly classifies them as ‘malware’. As a result, ‘malware’ (2) wins against ‘benign’ (1) in voting, and DRSM gives the final output as ‘malware’.
"Furthermore, the authors should establish the relationship between the window size (w) and the resulting certified robustness. For example, does a smaller window size lead to improved certified robustness?"
Thanks for asking about the relationship between window size and certified accuracy. Yes, a smaller window size leads to a better certified accuracy. And we discussed this in the last paragraph of subsection 6.2. We mentioned that – “By analyzing Table 3, we can see that has a positive and negative correlation with certified and standard accuracy, respectively. While DRSM-24 provides the highest certified accuracy (53.97%), it has the lowest standard accuracy (90.24%).” The Table 3 shows the certified accuracy for the same for each model variant, and notably, DRSM-24 (with the smallest window size) achieves the highest certified accuracy (53.97%) whereas DRSM-4 (with the highest window size) achieves the lowest certified accuracy (12.2%). Additionally, we have added another figure (Figure 10 in the Appendix A.4) that shows the certified accuracy in terms of . From this figure, it is interpretable that smaller window size achieves higher certified accuracy. At the same time, we want to mention that – a smaller window allows attackers a smaller budget for perturbation. For example, if , in DRSM-4, the attacker can perturb up to 511K bytes, whereas in DRSM-24, it is 82K bytes. So, it might not be fair to compare these models just depending on the , and hence, we kept the Figure 3 showing accuracy with respect to perturbed bytes in the main text, and put the Figure 10 in the Appendix. We would appreciate your feedback too.
"Insufficient theoretical analysis of certified robustness when facing multiple malicious ablated sequences. Specifically, the authors assume that an attacker generates a byte perturbation of size p and can modify a maximum of ⌈p/w⌉+1 ablated sequences. However, if the attacker simultaneously inserts multiple adversarial code segments at different locations, how will this impact the certified robustness? The authors should offer a more in-depth theoretical analysis of this scenario."
Thanks for such a detailed response. We appreciate your effort in reading our paper thoroughly.
As the de-randomized smoothing approach was borrowed from the computer vision domain, we followed the same threat model where the attacker can add a patch of a specific size (add or modify byte sequence of size, in our case). In our theoretical study, we wanted to keep it as similar as the CV domain. At the same time, we also understand that the malware domain does not work like that, and your point on – "the attacker simultaneously inserts multiple adversarial code segments at different locations" is totally valid and we agree with that. Therefore, we empirically evaluated the robustness of our DRSM approach against such attacks that can insert multiple adversarial code segments (section 7). To be specific, we have included attacks, namely Slack Apppend, Header Field Modification, Gamma, where the attacker can include or modify at multiple places, and stronger attacks, namely Disp, IPR, where the attacker can even modify on the instruction level. Table 4 includes the list of attacks we experimented with in this paper where the column ‘Threat Model’ indicates their alignment with our threat model. In Figures 4 and 5, we showed the attack success rates for these attacks. Also, we have included how all these attacks work and were implemented in Appendix A.4.
Five experts reviewed the paper. All but one reviewer were positive, and the only negative reviewer's questions were addressed by the rebuttal per AC's understanding. Hence, the decision is to recommend the paper for acceptance.
为何不给更高分
A good dataset was proposed, but the method and experiments could be improved
为何不给更低分
reviewers like the paper in general
Accept (poster)