6.3

/10

Poster4 位审稿人

最低6最高7标准差0.4

3.5

置信度

正确性2.8

贡献度3.0

表达3.0

NeurIPS 2024

Dissect Black Box: Interpreting for Rule-Based Explanations in Unsupervised Anomaly Detection

Yu Zhang,Ruoyu Li,Nengwu Wu,Qing Li,Xinhan Lin,Yang Hu,Tao Li,Yong Jiang

OpenReview PDF

提交: 2024-05-05更新: 2024-11-06

摘要

关键词

Machine LearningAnomaly DetectionRule Extraction

评审与讨论

审稿意见

评分: 6置信度: 22024-06-24

The paper addresses the challenge of distinguishing between normal and anomalous structured data. A new method designed to interpret and understand the structure of normal data distributions. It integrates anomaly detection model predictions into its splitting criteria to enhance the clustering process. In addition, a complementary algorithm that defines boundaries within each segmented distribution is proposed.

优点

The designed Gaussian boundary delineation algorithm helps in managing high-dimensional data and ensures robustness against data drift and perturbations.
Extensive evaluations are conducted for comparisons against five established baseline anomaly detection models across four diverse datasets

缺点

The writing should be enhanced as some technical concepts are unclear. For example, a running example could be provided to explain the SCD tree and GBD algorithms.
The work targets the structured, tabular data which should be clear at the beginning of the paper. How to build such an SCD tree for instructed data is not explored.

问题

How to implement the SCD tree for unstructured data?
What is the typical running efficiency?
Considering the data drifts and perturbations, does the naive Gaussian-based assumption work for more heterogeneous drifting data?

局限性

The authors have broadly discussed the limitations of the work.

Most of the concerns are discussed and clarified through the rebuttal stage. I would like to update thee overall score.

作者回复

2024-08-04

Thank you for your valuable feedback. We appreciate your insights and suggestions, which have helped us to improve our paper.

Comment 1: “The writing should be enhanced as some technical concepts are unclear. For example, a running example could be provided to explain the SCD tree and GBD algorithms.”

Response: We agree that a running example would greatly clarify the technical concepts of the SCD tree and GBD algorithms. We have added a detailed running example in Section 4.2, “Methodology Overview,” and pseudocode in Appendix A.2 where we walk through the entire process of constructing an SCD tree and applying the GBD algorithm using a sample dataset. This example includes:

Initial Data Segmentation (SCD-Tree)
- The SCD-Tree uses the outputs of the anomaly detection model to segment the data.
- The tree structure is built by recursively splitting the data based on the model's outputs.
Boundary Delineation (GBD)
- Within each segment, the GBD algorithm uses Gaussian Processes to define decision boundaries.
- For instance, in a segment with features A and B, the algorithm determines the boundary where the likelihood of being normal is highest.
- These boundaries are then refined to capture the nuances of the data distribution.
Rule Extraction
- The refined boundaries are translated into interpretable rules.
- For the example dataset, a rule might be: "If A > 5 and B < 10, then the data point is normal."

Comment 2.1: “The work targets the structured, tabular data which should be clear at the beginning of the paper.”

Response: We apologize for the lack of clarity regarding the data type our method targets. We have revised the introduction to explicitly state that our approach is designed for structured, tabular data. Additionally, while our method is designed for structured, tabular data, we acknowledge the importance of extending interpretability methods to unstructured data (e.g., text, images). Potential strategies for applying the SCD-Tree to unstructured data include:

Feature Engineering: Transforming unstructured data into structured formats using techniques such as text vectorization or image feature extraction.
Hierarchical Clustering: Using hierarchical clustering to segment unstructured data before applying the SCD-Tree.

We will discuss these potential extensions further in the future work. Thank you for your carefully detailed suggestions, which helped us improve the comprehensibility of the article!

Comment 2.2: “ How to build such an SCD tree for unstructured data is not explored.”

Response: In future work we envisage Building SCD Tree for Unstructured Data. To extend the SCD tree for unstructured data, such as images, we propose the following steps:

Image Data Feature Engineering. Use convolutional neural networks (CNNs) or other feature extraction techniques to convert images into feature vectors. Pretrained models (e.g., VGG16, ResNet) can be utilized to extract high-level features from images.
Data Segmentation (SCD-Tree). Once the unstructured data is transformed into a structured format, the SCD-Tree can be applied in the same manner as for structured data. The tree uses the extracted features(chhannel or patch granularity) to segment the data based on the anomaly detection model's outputs. Subsequent GBDs and Rule Extractions will work in the same way as the original.

Comment 3: “What is the typical running efficiency?”

Response: We appreciate the reviewer's interest in the running efficiency of our model. We have conducted a thorough evaluation of both the training and inference times of our method. Detailed results are presented in Appendix A.4.1, where we provide comprehensive performance metrics for various datasets.

Specifically, Figure 2c in the main manuscript illustrates the typical running efficiency. This figure shows the training and inference times for our proposed method across different data dimensions, highlighting its scalability and efficiency. The results indicate that while the training time scales with the number of features and data points, the inference time remains consistently low due to the rule-based nature of our method. These findings demonstrate that our method is both efficient and scalable, making it suitable for deployment in high-stakes environments where timely decision-making is crucial.

Comment 4: “Considering the data drifts and perturbations, does the naive Gaussian-based assumption work for more heterogeneous drifting data?”

Response: The naive Gaussian-based assumptions can be effectively applied to more heterogeneous drift data.

The probabilistic nature of Gaussian Processes provides a measure of uncertainty in the predictions. This is particularly useful for heterogeneous drift data, as it allows the model to quantify the confidence in its predictions.
Gaussian Processes are inherently flexible and capable of modeling complex, non-linear relationships in data, allowing GPs to adapt to local patterns and variations within the data. And GPs can dynamically adjust their parameters to capture the nuances of different segments, ensuring accurate representation of the underlying distributions.
In our methodology, the SCD-Tree initially segments the data into more homogeneous regions. Within each segment, the GBD algorithm applies the Gaussian-based assumptions which ensures that GP are applied within localized regions where they are more likely to hold true, even if the overall data distribution is heterogeneous.

The experiments results in Table 3 of Section 6.3 show that our method maintains high fidelity and robustness, indicating that the Gaussian assumptions are sufficiently flexible to handle diverse and drifting data patterns.

Thank you for your constructive review. We hope these changes meet your expectations and look forward to any further comments you may have.

2024-08-09

Hello,

Thank you so much for your reply. One more question is that from my understanding, Gaussian distribution is a naive assumption to many machine learning models, even for the mixture of Gaussian. I cannot find the arguments for "the naive Gaussian-based assumptions can be effectively applied to more heterogeneous drift data." convincing. Maybe the authors could provide more thoughts on that.

评论- Response to Concerns Regarding the Use of Gaussian Processes in Heterogeneous and Drifting Data Scenarios

2024-08-11

Thank you for your insightful comment regarding the use of Gaussian Processes (GPs) in our framework. We understand the concern about the adequacy of Gaussian assumptions, particularly in scenarios involving heterogeneous or drifting data. Our choice of GPs, however, is grounded in their unique strengths and the specific nature of our application, which we would like to clarify further.

Gaussian Processes are favored in our method primarily because of their ability to provide not just point estimates but also a measure of uncertainty through variance predictions. This feature is especially critical in high-stakes anomaly detection, where understanding the confidence in model predictions is as important as the predictions themselves. The uncertainty estimates offered by GPs allow our model to adapt more effectively to data variations, including heterogeneous drift.

While it is true that a single Gaussian distribution might be too simplistic for complex data, GPs offer a more sophisticated approach by being able to model complex, non-linear relationships. GPs do not assume a single global Gaussian distribution for the entire dataset; instead, they create a smooth, flexible function that can adapt to the local structure of the data. This adaptability is crucial in our methodology, where the data may exhibit different behaviors in different regions of the feature space. For instance, in our application, the GBD algorithm applies GP locally within each segment identified by the SCD-Tree, allowing it to accurately model the nuances of each specific data region.

In our empirical studies, presented in Tables 3 and 4 of the paper, and noise experiments shown in Table 6 in attachment of global response, we demonstrate that the integration of GPs within our framework enhances the robustness and interpretability of the model, even in the presence of heterogeneous data or drift. The performance metrics consistently show that our method maintains high fidelity and low false positive rates across various datasets, underscoring the effectiveness of GPs in this context. The ablation studies further confirm that removing the GBD component leads to a decrease in performance, which underscores the value added by the probabilistic modeling that GPs provide.

While GPs have shown strong performance in our experiments, we acknowledge that no single method is universally optimal. As such, we are exploring other probabilistic models, such as Variational Inference techniques or non-parametric methods like Bayesian Non-Parametrics, which could potentially offer even greater flexibility for highly complex and non-stationary data. These explorations will be part of our future work.

We appreciate your constructive feedback and are open to further discussions or suggestions on how to enhance our approach.

审稿意见

评分: 6置信度: 42024-06-30

This paper introduces the Segmentation Clustering Decision Tree (SCD-Tree) and Gaussian Boundary Delineation (GBD) algorithm to interpret black-box anomaly detection models in high-stakes domains. The method segments high-dimensional data, incorporates model predictions into decision criteria, and defines flexible boundaries to distinguish normal from anomalous data points. Evaluations across diverse datasets demonstrate superior explanation accuracy, fidelity, and robustness compared to existing methods.

优点

The proposed approach of using rule-based interpretations for anomaly detection results is intriguing, addressing a critical gap in the field and offering a fresh perspective on model explainability.
The clarity of the empirical study is commendable, effectively showcasing the proposed method's robustness and versatility.

缺点

The proposed method primarily builds upon some existing techniques. While the integration and adaptation of these techniques are innovative to some extent, the foundation lacks substantial originality. The approach leverages well-known concepts without introducing new theoretical insights or methodologies.
The clarity of the paper is compromised by the disorganized structure of the related work section. The related work section is somewhat disordered, making it challenging for readers to follow the logical flow of the discussion. To improve clarity, the sub-sections should be organized more coherently.
While the current evaluation demonstrates the potential of the proposed method, using more advanced anomaly detection models and realistic datasets would provide a more comprehensive and convincing validation. (i) The experiments use AE, VAE, OC-SVM, and iForest as black-box models. To better illustrate the applicability and robustness of the proposed interpretation model, it would be beneficial to include the latest state-of-the-art deep anomaly detection models. (ii) The curse of dimensionality is highlighted as a challenge in the paper. However, the highest dimensionality of the datasets used in the experiments is only 80. The proposed method should be tested on datasets with much higher dimensions.

问题

Are there any theoretical advancements or unique aspects of these methods distinguishing them from similar techniques?

局限性

The authors have adequately addressed the limitations of their work in section B.4 of the paper. They discuss several key limitations.

作者回复

2024-08-04

Thank you! Your feedback has been invaluable in enhancing the clarity and impact of our work.

Comment 1:The proposed method primarily builds upon some existing techniques. ... While the integration and adaptation of these techniques are innovative to some extent, the foundation lacks substantial originality.

Response: While it is true that our method integrates existing techniques like decision trees and Gaussian Processes, the novelty lies in how these techniques are combined and applied to the problem of interpreting black-box anomaly detection models in high-risk fields like cybersecurity.

We observed that traditional methods struggle with rule fitting for high-dimensional data due to their reliance on direct Euclidean distance calculations in feature space. To address this, we developed the SCD-Tree, which integrates anomaly detection model outputs directly into its splitting criteria. Unlike traditional decision trees, our approach leverages the decision-making results of black-box models to capture complex data distributions effectively and this unsupervised calculation method subverts traditional entropy-based approaches.

In addition, we recognized that outliers often exist within normal data in anomaly detection scenarios. Traditional methods typically have rigid decision boundaries, leading to reduced robustness and the potential for false alarms and is a significant concern in security field [1]. To mitigate this, we introduced the GBD algorithm, providing flexible boundaries that better accommodate data variability. By integrating GBD with the SCD-Tree, our method offers an interpretable, and robust framework for anomaly detection that maintains high fidelity to the original black-box models while reducing the incidence of false alarms.

Comment 2: The clarity of the paper is compromised by the disorganized structure of the related work section. ...To improve clarity, the sub-sections should be organized more coherently.

Response: We appreciate the advice on the related work. We have reorganized the related work section into clearer sub-sections, each focusing on a specific aspect of related research: Unsupervised Anomaly Detection Techniques, Interpretability in Anomaly Detection, Issues inside Existing interpretation Approaches in Anomaly Detection Models. Due to word limitations, we will provide you with the revised full related work section in the final version during the discussion phase.

Thank you very much for your advice. This reorganization helps to show the logical flow more clearly.

Comment 3(i): The experiments use AE, VAE, OC-SVM as black-box models. ... it would be beneficial to include the latest state-of-the-art deep anomaly detection models.

Response: We would have liked to demonstrate the fitness of our model with as classical a model as possible, and the experimental results bear this out. We acknowledge the importance of using state-of-the-art deep anomaly detection models. To address this, we have extended our experimental evaluation to include recent advanced models such as VRAE and DAGMM. The results are shown in Table 3 of the global response. These models represent the latest advancements in deep anomaly detection and provide a more comprehensive validation of our method's applicability.

Comment 3(ii): The curse of dimensionality is highlighted as a challenge in the paper. However, the highest dimensionality of the datasets used in the experiments is only 80.

Response: We appreciate your concern regarding the dimensionality of our datasets. In anomaly detection, the curse of dimensionality is particularly challenging due to the inherent sparsity and complexity of high-dimensional data. Although 80 features might appear moderate in other fields, they are considered high-dimensional within anomaly detection. We have carried out an exhaustive study of datasets in the field of anomaly detection and have added a summary table to the attachment of global response. Our chosen datasets, such as CIC-IDS2017, and TON-IoT, are standard benchmarks in this domain, typically ranging from 10 to 80 features. These dimensions capture the real-world complexity, each feature signifys a specific aspect of network traffic.

In practice, datasets with 80 features are sufficient for effective anomaly detection, as demonstrated by our method's high fidelity and robustness across these benchmarks. We have also added to the global response the results of our model's experiments on data of different dimensions, demonstrating its ability to handle high-dimensional data. Research has shown that more features do help in security tasks, so building datasets with a higher number of features is really the trend nowadays. We are also in the process of collecting higher dimensional datasets about network packets, and in future work, we will also try to use our method to apply it to unstructured data to test its effectiveness in higher dimensional data.

Comment 4:Are there any theoretical advancements or unique aspects of these methods distinguishing them from similar techniques?

Response: Yes, our method introduces several theoretical advancements and unique aspects that distinguish it from similar techniques:

The SCD-Tree uses the outputs of anomaly detection models directly in its splitting criteria, which is a novel unsupervised approach to enhance the tree's ability to capture complex data distributions.
The GBD algorithm refines the decision boundaries within each segment using Gaussian Processes, providing a probabilistic framework that quantifies the uncertainty in boundary definitions, enhancing robustness against data drift.

We have also restated the necessity and innovation in global response, and thank you for your professional comments and suggestions.

Reference:

[1] Hassan, Wajih Ul, et al. "Nodoze: Combatting threat alert fatigue with automated provenance triage." network and distributed systems security symposium. 2019.

2024-08-09

Thanks for the response. Some of my concerns are addressed. I would like to raise my score.

One minor point: the references [8] and [18] are repetitive.

评论- Thank you for your review and fair ranking, we will modify the details you raised!

2024-08-11

Thank you very much for your reply, your meticulous suggestions are very important to us. We will revise your questions in the final version of the paper and merge references [8] and [18], and we hope that you will be well.

审稿意见

评分: 6置信度: 42024-07-12

The paper proposes a general method to extract interpretable rules from any anomaly detection model. A decision tree is learned from a black-box anomaly detector output/scores and the decision boundaries in the learned tree are further refined using Gaussian Processe framework.

优点

The paper tries to address an important issue with anomaly detection: explainability. It presents results against relevant baseline methods for explainability.

缺点

Not all claims are rigorously supported with evidence.

Main comments:

Lines 106-110: "In summary ... their operational logic." -- The only concrete evidence provided in the paper is for robustness. The paper has not systematically addressed attributes such as interpretability, non-reliance of oversimplified surrogate models.
Section 4: Rule extraction using scores from other models has been researched in earlier literature (e.g., [1, 2, 3, 4]). The current paper should discuss the differences with prior literature and whether any of those earlier techniques can be utilized here.
Section 5: A justification for using Gaussian Processes is not presented. Why not use some other model such as KDE?
Section 5: There should be an ablation experiment to show the benefits with boundary estimation vs without.
Lines 267-268: "For the calibration of our anomaly detection models' hyperparameters, only normal instances from these datasets are utilized." -- This contradicts the claim that the algorithm is unsupervised as this statement implies that labeled normal instances are available.
Line 282 -- Fidelity and Robustness are predictive measures, not interpretability. The more relevant interpretability measures in [49] are 'Number of rules' and 'Average rule length'. These must be shown here with respect to interpretability.
Line 312: "...proving its resilience to data drift and..." -- 'data drift' is a different concept -- it means that over time the inherent data characteristics change permanently. What probably is being implied here is sample variance. The paper has not presented any evidence of being able to handle data drift.

问题

A justification for using Gaussian Processes is not presented. Why not use some other model such as KDE?

局限性

作者回复

2024-08-03

Thank you for your thorough review and insightful feedback on our paper.

Comment 1: Lines 106-110: "In summary ... their operational logic." The paper has not systematically addressed attributes such as interpretability, non-reliance on oversimplified surrogate models.

Comment 6:Line 282 -The more relevant interpretability measures in [49] are 'Number of rules' and 'Average rule length'.

Response: Thank you for pointing out this oversight. We had intended to use robustness to show that our model explains the black-model well. We agree that we should provide concrete evidence for interpretability and non-reliance on oversimplified surrogate models.

In revised paper, we incorporate these metrics ('Number of rules' and 'Average rule length' ) into our evaluation. Additionally, We add ablation experiments to demonstrate the interpretability. The results of the experiment are attached to the Table 4 and Table 1 of global response.

	NumberofRules			AverageRuleLength
	AE	VAE	IFOREST	AE	VAE	IFOREST
CIC-IDS	22	15	17	4.83	3.03	4.97
ton-iot	21	23	13	5.00	5.00	5.00
kddcup	17	19	21	5.00	4.70	5.00

Comment 2:Section 4: The current paper should discuss the differences with prior literature and whether any of those earlier techniques can be utilized here.

Response: We appreciate the references to earlier work. Our approach differs from prior literature primarily in the integration of Segmentation Clustering Decision Tree with Gaussian Boundary Delineation.

To clarify these distinctions, we include a more comprehensive related work section that discusses the differences between our method and previous techniques [33, 34, 35, 36], specifically highlighting the novelty and advantages of combining SCD-Tree with GBD for rule extraction. Additionally, we explore the potential applicability of earlier techniques to our methodology and discuss their comparative performance.

Comment 3 & Questions: Section 5: A justification for using Gaussian Processes is not presented. Why not use some other model such as KDE?

Response: We use Gaussian Processes (GPs) for their probabilistic framework, which quantifies uncertainty at each prediction point through variance. This capability is crucial for ensuring robustness in boundary delineation by allowing us to assess confidence levels in decision boundaries. KDE is a non-parametric way to estimate the probability density function of a random variable. While KDE is effective in density estimation, it does not inherently provide a measure of uncertainty. This lack of uncertainty quantification limit the robustness of boundary delineation, especially in high-stakes anomaly detection tasks.

The probabilistic nature of GPs allows us to define decision boundaries that take into account both the mean and variance of the predictions. By setting thresholds on the mean and variance, we can delineate boundaries that are not only accurate but also resilient to variations and perturbations in the data. KDE can estimate density contours, but without an inherent measure of uncertainty, it may not delineate boundaries as effectively in terms of robustness.

Comment 4:Section 5: There should be an ablation experiment to show the benefits with boundary estimation vs without.

Response: We agree that ablation study is necessary to demonstrate the benefits of boundary estimation. We conduct ablation experiment that compares our method with and without the GBD step. The results of are presented in global rebuttal and Appendix A.4.5 of the final version which show that rules-based approach GBD improves the accuracy by up to 0.12.

Comment 5:Lines 267-268: "only normal instances from these datasets are utilized." -This contradicts the claim that the algorithm is unsupervised as this statement implies that labeled normal instances are available.

Response: We apologize for the confusion. Our approach is indeed unsupervised; however, for the calibration of hyperparameters, we utilized a small portion of one-class data presumed to be normal based on domain knowledge, not labeled instances. Instead, domain knowledge is leveraged to identify normal data for hyperparameter calibration. This ensures that the unsupervised nature of our anomaly detection approach is maintained.

While using purely normal data is the ideal state, in real-world scenarios, some attack data may inevitably be present. We conducted experiments using CIC-IDS and TON-IoT datasets to assess the efficacy of our method under varying percentages of “noisy” data. Our results demonstrate that the model maintain high levels of fidelity and robustness even as noise levels increase.

Dataset	NoiseLevel(%)	TPR	FD
CIC-IDS	1	0.91	0.943
	6	0.897	0.935
	8	0.89	0.928
	10	0.856	0.92
TON-IoT	1	0.995	0.991
	6	0.975	0.987
	8	0.971	0.979
	10	0.966	0.969

Comment 7:Line 312: 'data drift' is a different concept -- it means that over time the inherent data characteristics change permanently. What probably is being implied here is sample variance.

Response: Thank you for bringing this blunder to our attention. We acknowledge the misuse of the term 'data drift'. What we intended to convey is the model's ability to handle sample variance and minor perturbations. We will correct this terminology to 'Data Variability' in the final version and provide a more accurate description of our method's resilience to sample variance.

Our model's ability to handle sample variance is achieved through the dynamic nature of the Gaussian Processes used in the boundary delineation step. GPs provide a probabilistic framework that can adapt to changes in the data distribution by continuously updating the mean and variance estimates based on new data points. Ablation experiments demonstrated the effectiveness of the method in dealing with sample variance.

2024-08-13

I thank the authors for responding to my comments. A couple of my concerns remain:

The new results for interpretability do not compare against benchmark algorithms. Hence it is hard to say whether the proposed one is better.
Even if a little data is being used for calibration, it is still labeled data available at the time of training. Hence 'weakly supervised' might be more appropriate.

Overall, the authors response has satisfied most of by concerns and hence I will increase my score.

评论- Clarifications on Interpretability Benchmark Comparisons and Terminology Adjustment to 'Weakly Supervised'

2024-08-13

Thank you for your valuable feedback on the need to compare our interpretability results against benchmark algorithms.

comment 1： “The new results for interpretability do not compare against benchmark algorithms. Hence it is hard to say whether the proposed one is better.”

I apologize that I can't immediately provide you with the results of the comparison test of the different benchmarks due to time issues, but in the revised manuscript, we want to include comparisons of our method with well-established interpretability methods, such as LIME (Local Interpretable Model-agnostic Explanations) [1] and SHAP (SHapley Additive exPlanations) [2]. Both LIME and SHAP are widely recognized in the literature for providing interpretable models, especially in the context of black-box models. Specifically, LIME generates local linear models that approximate the decision boundaries of the original black-box model, while SHAP leverages Shapley values from cooperative game theory to attribute feature importance.

We conducted additional experiments to evaluate the interpretability of our method relative to these benchmarks. The comparison metrics include Number of Rules, Average Rule Length, Fidelity.

Our findings, will be presented in of the revised manuscript, demonstrate that the SCD-Tree combined with GBD not only generates fewer rules with shorter average lengths but also maintains higher fidelity compared to these benchmarks. These improvements are particularly significant in high-dimensional datasets, where traditional methods like LIME and SHAP may generate overly complex or less intuitive explanations.

References:

[1]. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144).

[2]. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (pp. 4765-4774).

comment 2：Even if a little data is being used for calibration, it is still labeled data available at the time of training. Hence 'weakly supervised' might be more appropriate.

We appreciate the reviewer’s insight regarding the use of labeled data for calibration and the terminology used to describe our method. This is indeed a crucial aspect of the methodological clarity and accuracy.

Our method is designed to operate primarily in an unsupervised manner, where the majority of the data used is unlabeled. However, as you correctly noted, a small portion of labeled data is utilized during the calibration phase to fine-tune the hyperparameters and thresholds within the model. This process ensures that the model is optimized for the specific characteristics of the dataset, enhancing its overall performance.

Given this use of labeled data, we agree that ‘weakly supervised’ is a more accurate descriptor. The term ‘weakly supervised’ is widely accepted in the literature to describe methods that rely predominantly on unlabeled data but incorporate some level of supervision, often minimal, to guide the learning process.

In light of this, we have revised the manuscript to describe our method as ‘weakly supervised’ rather than ‘unsupervised.’ This change not only accurately reflects the methodological approach but also aligns our work with the broader literature on weakly supervised learning. We believe that this adjustment will help clarify the nature of our approach and its reliance on minimal labeled data for calibration.

审稿意见

评分: 7置信度: 42024-07-16

In high-stakes sectors like network security and IoT security, accurately distinguishing between normal and anomalous data is critical. This paper introduces a novel method to interpret decision-making processes of anomaly detection models without labeled attack data. It presents the Segmentation Clustering Decision Tree (SCD-Tree) and the Gaussian Boundary Delineation (GBD) algorithm. The SCD-Tree enhances clustering by integrating model predictions, while the GBD algorithm defines boundaries within segmented distributions, delineating normal from anomalous data. This approach addresses the challenges of dimensionality and data drift, ensuring robustness and flexibility in dynamic environments. The method transforms complex anomaly detection into interpretable rules, demonstrating superior explanation accuracy, fidelity, and robustness compared to existing methods, proving effective in high-stakes scenarios where interpretability is essential.

优点

By addressing the curse of dimensionality, the approach effectively segments high-dimensional data into more manageable parts, allowing for better clustering and anomaly detection.
The method ensures robustness against data drift and perturbations by using flexible boundary fitting, which adapts to changes in data distribution over time.
The method is well formulated, which is easy to understand

缺点

While the method shows robustness across several datasets, it might not perform equally well in all types of data or anomaly detection scenarios, especially those vastly different from the ones tested.
The effectiveness of the interpretative rules depends heavily on the quality of the initial black-box model.

问题

What are the limitations of the SCD-Tree and GBD algorithms when dealing with extremely large datasets or real-time data streams?

局限性

The limitations have been discussed in the paper

作者回复

2024-08-03

We sincerely thank you for your insightful feedback.

Question 1: While the method shows robustness across several datasets, it might not perform equally well in all types of data or anomaly detection scenarios.

Response: We acknowledge the concern regarding the generalizability of our method to vastly different types of data and anomaly detection scenarios. To address this, we have conducted additional experiments on datasets from domains not covered in our initial evaluation. The results, as shown in the attachment of global response Table 2, indicate that our method maintains high levels of robustness and interpretability across datasets in various fields.

However, we also recognize the inherent limitations in any single method's universal applicability. Therefore, we propose the following solutions to further enhance the generalizability of our approach and will implement in the future:

Adaptive Thresholding. Implementing adaptive thresholding mechanisms that dynamically adjust based on the characteristics of the input data can improve performance across diverse scenarios.
Domain-Specific Fine-Tuning. Developing domain-specific fine-tuning protocols to adjust the SCD-Tree and GBD algorithms based on the unique properties of different datasets.

Question 2: The effectiveness of the interpretative rules depends heavily on the quality of the initial black-box model.

Response: You hit the nail on the head, and that's what we're trying to focus on as we do our work! We appreciate your insight regarding the dependency of our interpretative rules on the initial black-box model's quality. Indeed, the primary motivation behind our research is to enhance the interpretability of black-box models in anomaly detection. Our goal is to interpret the black-box model into understandable rules by using a rule-based approach, which is essential for high-risk domains. The aim is not only to explain the decision-making processes of these models but also to improve their transparency and trustworthiness.

However, we would like to highlight several key aspects of our methodology that ensure its effectiveness, even when the black-box model is relatively small or less complex:

We have unified the output standards of the black-box models by using metrics such as Mean Squared Error (MSE) and threshold values to represent data normality. These standardized metrics serve as inputs for the SCD-Tree, ensuring consistent and reliable decision-making regardless of the black-box model’s complexity.
The SCD-Tree effectively segments the data into smaller, more manageable clusters. This segmentation reduces the dependency on the black-box model's complexity by isolating simpler patterns within each cluster.
We conducted extensive experiments using various black-box models (e.g., AE, VAE, Isolated Forests, One-Class SVM) of different sizes and complexities. The results in section 6.3 demonstrate that our method maintains high interpretability and accuracy on the currently dominant neural network models in anomaly detection field, even with smaller models. For instance, our experiments with a simple autoencoder model yielded interpretative rules with high fidelity and robustness, as evidenced by the performance metrics provided in Table 3 and Table 4(1) .

Question 3: What are the limitations of the SCD-Tree and GBD algorithms when dealing with extremely large datasets or real-time data streams?

Response: The SCD-Tree and GBD algorithms, while effective, have certain limitations when handling extremely large datasets or real-time data streams. Here are the key challenges :

The time complexity of Gaussian Processes (GP) is relatively high ( $N^3$ ), which can lead to longer processing times when dealing with high-dimensional data. However, we have used the Segmentation Clustering Decision Tree (SCD-Tree) for spatial partitioning, so this issue only arises when there is a large amount of data within a particular subspace. By partitioning the data, we localize the high complexity to smaller regions, thus making the overall process more manageable.
Our hierarchical model involves several steps: running the black-box model, applying the SCD-Tree, performing Gaussian Process delineation, and boundary acquisition, which make the process lengthy. However, once the data has been partitioned by SCD-Tree , each subspace can be processed in parallel, which reduces the training time.

However, the aforementioned challenges primarily occur during the training phase. In real-world scenarios, once the model has been trained and the rules have been inferred, anomaly detection only requires simply examines whether the sample meets these rules. The rules-based interpretation remains efficient and scalable for real-time applications and is well compatible with high-performance applications (like P4 [1], reaching up to 100Gbps throughput by integrating rule-based models [2]), which allows for the implementation of data plane processing with high efficiency and low latency.

And our model is designed to handle new data continuously and achieve incremental rule updates. Therefore, even when encountering new types of data, there is no need to re-train the entire model. Instead, we can incrementally update the rules to adapt to new data, ensuring the model remains accurate and up-to-date without extensive re-training. This is certainly efficient in real life use.

Thank you for your constructive feedback, which has been invaluable in guiding these enhancements.

References: [1] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P4: programming protocol-independent packet processors,” ACM SIGCOMM Computer Communication Review, 44(3):87–95, 2014

[2] R. Li, Q. Li, Y. Zhang, D. Zhao, X. Xiao, and Y. Jiang, “Genos: General in-network unsupervised intrusion detection by rule extraction,”

2024-08-13

Thank you for the reply. It resolved my concerns. I will keep my rating positive.

作者回复

2024-08-06

We are very glad that our efforts can clarify some of your concerns, and we really learn a lot from your valuable replies! Your suggestions are very meaningful and help us improve our work a lot!

We appreciate the reviewers' recognition of our paper's contribution to advancing anomaly detection in high-stakes environments ("The method is well formulated" by ZxG9, "address an important issue with anomaly detection" by kdm5, " addressing a critical gap in the field and offering a fresh perspective on model explainability" by 7Q5n, "addresses the challenge of distinguishing between normal and anomalous structured data" by dX38).

We are very happy that the reviewers were interested in the innovativeness of our method and the reasons for using it! By providing explanations that are both accurate and comprehensible, our method increases the trustworthiness and reliability of anomaly detection systems in environments where precision and transparency are crucial.

Novelty. Our method uniquely combines the Segmentation Clustering Decision Tree (SCD-Tree) with Gaussian Boundary Delineation (GBD), offering a novel approach to enhancing interpretability and robustness in unsupervised anomaly detection.

We observed that traditional methods struggle with rule fitting in high-dimensional data due to the complexity and sparsity of such datasets. To tackle this, we developed the SCD-Tree, which differs from conventional decision trees that rely on Euclidean distances for splits. Instead, our approach integrates predictions from black-box models directly into the tree's splitting criteria, enhancing its ability to identify distinct data patterns and extract meaningful rules. This innovative unsupervised method overcomes the limitations of entropy-based calculations, enabling better utilization and comprehension of black-box model insights. The SCD-Tree partitions data into meaningful segments, ensuring each segment represents a distinct data pattern, facilitating the extraction of interpretable rules. Data segmentation is also an important way in which our model is able to handle high-dimensional data.
In anomaly detection scenarios, normal data often contain outliers, and existing rule extraction methods with rigid boundaries fail to accommodate these variations, leading to issues like "false alarm/alert fatigue," a significant concern in cybersecurity. To address this, we incorporated the Gaussian Boundary Delineation to provide a probabilistic framework that defines boundaries and quantifies associated uncertainty, enhancing model robustness by adapting to data variability. This integration ensures that boundaries are flexible and resilient, accommodating shifts in data patterns and improving overall interpretability.
It consistently achieves high fidelity, shows that the rule-based explanations closely align with original model predictions. The system's robustness is evident in its stable performance across diverse datasets and evolving environments. High TPR and TNR metrics further confirm the effectiveness of our algorithms, indicating accurate identification of normal and anomalous instances. These results underscore the accuracy and reliability of our method, validating its deployment suitability in high-stakes environments where precision is paramount.

Additional Experiments. To address reviewers' questions and reinforce our claims, we conducted additional experiments using different domain datasets (CIC-IoT from IoT, Credit Card Fraud Detection from financial, and dataset Breast Cancer Wisconsin from the healthcare domain) and state-of-the-art anomaly detection models (LSTM, OC-NN, VRAE, DAGMM) . Our model achieves nearly the highest TPR and fidelity on different domain datasets and consistently showed high accuracy , with VRAE achieving a TPR of 0.9643 and DAGMM a TPR of 0.9931 on the CIC-IDS dataset, illustrating its robustness across cutting-edge models. And ablation experiments showing that our method can achieve a 0.12-point lift demonstrate its applicability in interpretation. At the same time, we provide information to show the model's interpretability capabilities in the field of anomaly detection. The results of these additional experiments are presented in a detailed table in the attached PDF.

Related Work. Based on the reviewers' suggestions and requests, we have refactored the related work section and added the comparative advantages of the model with the latest work. The modified "Related Work" section is structured to provide a comprehensive overview of existing research in anomaly detection and model interpretability, specifically addressing: Overview of Anomaly Detection Models, Interpretability in Anomaly Detection and Issues inside Existing interpretation Approaches. In the final version, we will also add some experimental results in Appendix that we promised to the reviewers to improve the quality of our work.

Again, thank you for your comments. We have made efforts to address the concerns raised, and we hope our revisions meet your expectations. Should there be any further questions or if additional clarifications are needed, we are more than willing to engage in further discussions or conduct additional experiments as necessary during the review phase.

最终决定Accept (poster)

2024-09-25

The paper receives four reviews, all of which are inclined to accept the paper. Prior to rebuttal, Reviewers ZxG9 and kdm5 have some major concerns over the robustness of the proposed method, its reliance on the initial black-box model, and the justification of the main claims in the paper. These concerns are addressed properly by the authors' rebuttal, resulting in an increase of their rating to weak accept. In addition, Reviewer dX38 has concerns over the ability of handling non-tabular data, computational efficiency, and the Gaussian-based assumption issue. After multi-round interaction, Reviewer dX38 agrees that these concerns are also mostly addressed and raises the overall score to weak accept. Reviewer ZxG9 has been positive toward the paper from the initial review and keeps the rating after the rebuttal.

On the other hand, there are some weaknesses that might have not been sufficiently addressed, such as the lack of more recent deep anomaly detection methods, small dimensionality size across all datasets used, and limiting to tabular data.

Overall, the work presents an interesting method that leverages Segmentation Clustering Decision Tree (SCD-Tree) and Gaussian Boundary Delineation (GBD) to provide interpretation for anomalies detected by existing methods. All reviewers give positive recommendation to the work. Thus, I recommend accepting the paper. The authors should incorporate the new discussions and empirical results into the camera ready to avoid confusions/concerns raised in the reviews. Adding discussions on the above weaknesses would enhance the paper further.