Open-Set Graph Anomaly Detection via Normal Structure Regularisation
In this work, we propose a novel open-set GAD approach, namely Normal Structure Regularisation (NSReg), to achieve generalised detection ability to unseen anomalies, while maintaining its effectiveness on detecting seen anomalies.
摘要
评审与讨论
The paper proposes an approach for open-set graph anomaly detection. The authors introduce a method called Normal Structure Regularisation (NSReg) to regularize the learning process by focusing on the structural relationships among normal nodes in the graph. Experiments show that the framework outperforms baselines on datasets included in this paper.
优点
S1. The authors introduce a method called Normal Structure Regularisation (NSReg) to regularize the learning process by focusing on the structural relationships among normal nodes in the graph.
S2. Experiments show that the framework outperforms baselines on datasets included in this paper.
缺点
W1. The contribution seems to be overclaimed. As stated in this paper, open-set GAD is an under-explored problem. However, the definition of open-set GAD problem is very similar to that of out-of-distribution detection, and there are several previous works in this area, such as GNNSAFE[1], EnergyDef[2], and GNSD[3].
W2. Experiments are not comprehensive enough. As for the supervised GAD model, there are several new baselines, such as XGBGraph[4], and CONSISGAD[5]. The authors should include them as baselines as well. Besides, as mentioned above in W1, such a task is almost the same as the out-of-distribution problem, which means the authors need to compare their framework with out-of-distribution models on out-of-distribution datasets, instead of creating synthetic datasets from T-Finance.
W3. Some parts of the paper need to be further explained. For example, in Section 3.3, Normal-node-oriented Relation Generation, the authors set alpha = 0.8 by default but did not provide any intuition or theoretical analysis of such a value.
问题
Q1. Can the authors explain the differences between open-set GAD and out-of-distribution detection?
Reference
- Qitian Wu, Yiting Chen, Chenxiao Yang, Junchi Yan. Energy-based Out-of-Distribution Detection for Graph Neural Networks. ICLR 2023.
- Zheng Gong, Ying Sun. An Energy-centric Framework for Category-free Out-of-distribution Node Detection in Graphs. KDD 2024.
- Xixun Lin, Wenxiao Zhang, Fengzhao Shi, Chuan Zhou, Lixin Zou, Xiangyu Zhao, Dawei Yin, Shirui Pan, Yanan Cao. Graph Neural Stochastic Diffusion for Estimating Uncertainty in Node Classification. ICML 2024.
- Jianheng Tang, Fengrui Hua, Ziqi Gao, Peilin Zhao, Jia Li. GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection. NeurIPS 2023.
- Nan Chen, Zemin Liu, Bryan Hooi, Bingsheng He, Rizal Fathony, Jun Hu, Jia Chen. Consistency Training with Learnable Data Augmentation for Graph Anomaly Detection with Limited Supervision. ICLR 2024.
We thank the reviewer for the constructive and affirming feedback and address the concerns of the reviewer below.
W1 and Q1. Discussion on the Difference Between Open-Set GAD and Out-of-Distribution (OOD) Detection.
We respectfully disagree with the reviewer’s comment that open-set GAD is very similar to OOD detection; the two tasks are fundamentally different. In OOD detection, the primary objective is to maintain the classification accuracy on in-distribution (ID) while at the same time distinguishing between ID and OOD data, where a pre-trained classifier on ID data and testing data with both ID data and OOD data are involved. In contrast, Open-set supervised GAD aims to identify anomalous nodes among unlabelled nodes, including both seen and unseen anomalies, not involving any pre-trained ID data classifier and thus no objective to maintain the ID classification accuracy. Further, the distributions during training and testing can remain consistent in open-set GAD, while it is the opposite in OOD detection. The unseen anomalies in Open-set GAD do not result from distribution shifts but from the inherent limitation of prior knowledge about the anomaly class, which captures only a subset of potential anomaly patterns. These unseen anomaly patterns may already exist in the graph during training, participate in message passing, and yet remain unaccounted for by the supervised GAD model’s loss function.
Additionally, OOD detection focuses on identifying instances from novel distributions absent during training, typically representing novel classes outside the classes of interest. In open-set supervised GAD, all nodes—labelled or unlabelled—are present during training, though only labelled normal and anomalous nodes are used for supervision.
Nevertheless, we agree that further clarification could benefit the reader. We will include this discussion in the revised version and acknowledge classic OOD works to better position Open-set GAD.
W2. Additional GAD and OOD Baselines.
We have clarified the distinction between open-set GAD and OOD detection and explained why their objectives cannot be used interchangeably in "To Rev. uvQ4" W1 & Q1. Open-set GAD does not deal with out-of-distribution classes, which are not defined in our problem setting. Additionally, we want to emphasise that T-Finance is a human-annotated dataset containing real-world anomalies. Regarding GAD baselines, we have included several recent methods from [4], as noted by the reviewer, and our baseline selection already covers 16 popular approaches. To acknowledge the reviewer’s suggestions, we conducted additional experiments on the two mentioned baselines, XGBGraph and CONSISGAD. The GAD performance results, in terms of AUC-ROC and AUC-PR on all test anomalies, are reported in Table 3. Please refer to the revised paper for the complete results and discussion. Consistent with our observations in the paper, NSReg achieved significantly better performance. This is because the baseline methods primarily focus on fitting seen anomaly patterns but do not effectively generalise to unseen anomaly patterns, since they are also closed-set supervised methods. One exception is that XGBGraph achieved surprisingly high performance on the Yelp dataset. We believe this is incidental, as XGBGraph is a generic hybrid method combining parameter-free node feature propagation with XGBoost, without GAD-specific training. This enables it to perform particularly well on the Yelp dataset. A similar observation was noted in the benchmark paper mentioned by the reviewer [4], where XGBGraph significantly outperformed all state-of-the-art methods on this dataset by a large margin. However, this strong performance is not universal, and NSReg consistently outperforms the baseline methods across other datasets.
Table 3: AUC-ROC and AUC-PR (mean ± std) for detecting all anomalies, comparing CONSISGAD with XGBGraph.
| Metric | Method | Photo | Computers | CS | Yelp | Avg. |
|---|---|---|---|---|---|---|
| AUC-ROC | XGBGraph | 0.792 ± 0.000 | 0.710 ± 0.000 | 0.750 ± 0.000 | 0.805 ± 0.000 | 0.764 ± 0.000 |
| CONSISGAD | 0.706 ± 0.033 | 0.597 ± 0.040 | 0.683 ± 0.067 | 0.738 ± 0.010 | 0.681 ± 0.023 | |
| NSReg | 0.908 ± 0.016 | 0.797 ± 0.015 | 0.957 ± 0.007 | 0.734 ± 0.012 | 0.849 ± 0.013 | |
| AUC-PR | XGBGraph | 0.521 ± 0.000 | 0.472 ± 0.000 | 0.553 ± 0.000 | 0.482 ± 0.000 | 0.507 ± 0.000 |
| CONSISGAD | 0.481 ± 0.035 | 0.326 ± 0.033 | 0.530 ± 0.084 | 0.365 ± 0.025 | 0.424 ± 0.027 | |
| NSReg | 0.640 ± 0.036 | 0.559 ± 0.018 | 0.889 ± 0.016 | 0.398 ± 0.014 | 0.622 ± 0.021 |
W3. Discussion on the choice of the hyperparameter .
We would like to clarify that we have provided a parameter analysis of in the paper. Please refer to our response to "To Rev VYmz" Q2 for a detailed discussion.
Thanks for the clarification. However, I still have some follow-up questions.
Q1: I appreciate that the authors include two new baselines in the supervised graph anomaly area. However, this further intrigues another question. From the paper of XGBGraph, there are 10 real-world datasets for supervised anomaly detection (Weibo, Reddit, Tolokers, Amazon, T-Finance, YelpChi, Questions, Elliptic, DGraph-Fin, and T-Social), but the authors only include YelpChi in their experiments and choose several datasets that are not usually used for graph anomaly detection. Are there any specific reasons for such an experimental choice? Would it be possible for the authors to provide more experimental results on the other commonly used datasets?
Q2: From Figure 7 in the paper, I can see that the performance is insensitive to the choice of . However, I also notice that, in the figure, the range of is [0.6, 1]. Would it be the same when the is in [0, 0.6]? Could the authors provide more experimental results and explanations about this?
Thank you for your response. Please find our responses to the follow-up questions below.
Q1 Dataset Selection Justification.
As we mentioned in Section 4 under "Datasets" (starting in line 323). Our dataset selection includes 7 real-world datasets to enable a comprehensive and specific evaluation in the open-GAD setting, where we have made our best effort to ensure significant distribution discrepancies between anomaly subclasses.
It is important to note that the evaluation of open-set GAD differs from that of supervised GAD. For open-set GAD, significant distribution discrepancies between anomaly subclasses are essential to assess the model's ability to handle unseen anomalies. Without such discrepancies, the performance on unseen anomalies would be identical to that on a test set of seen anomalies. In such cases, the evaluation effectively reduces to a closed-set scenario, which primarily relies on the learning capacity of the backbone GNN models, becoming irrelevant to our problem setting.
To the best of our knowledge, there are no publicly available GAD datasets with anomaly subclass labels. Therefore, as mentioned in line 338, adapting balanced node classification datasets with multiple minor classes is a reasonable choice to ensure discrepancies between anomaly subclasses. This approach is consistent with the general definition of anomalies, characterised by their rarity and significant difference from the majority.
For real-world GAD datasets that only provide binary anomaly labels, we would need to perform clustering on the learned representations to identify anomaly subclasses. However, in such cases, the discrepancies between manually assigned subclasses cannot be guaranteed and may not even exist. In this case, their relevance for open-set GAD evaluation remains uncertain.
Regarding the datasets from the paper that includes XGBGraph, as mentioned by the reviewer, the two datasets we selected were chosen primarily for their comprehensiveness and because they exhibit the largest observable discrepancies. For the other datasets, we were unable to identify significant discrepancies, making them unsuitable for open-set GAD evaluation.
Additionally, we want to point out that the referenced paper is a benchmark study for closed-set supervised GAD, which has a different focus compared to our evaluation. In contrast, our paper includes 7 datasets in total and evaluates 20 recent baselines after the revision, which is arguably comparable in scale to a benchmark study. As a methodology-focused paper, this should constitute a sufficiently representative evaluation.
Q2 The valid range of in parameter sensitivity analysis.
As mentioned in our previous response, the choice of should satisfy the following order: connected normal nodes unconnected normal nodes unconnected normal and unlabelled nodes. This is a necessary condition to ensure the validity of our theoretical analysis.
In the context of the labelling function, is a relative specification, and values within the range are considered invalid and should not be used to avoid unintended effects. A value of is already at the boundary and should be carefully considered.
Thanks for the clarification. However, As for the range of , I again checked your paper but didn't find why the range is invalid. As shown in Equation (3), there is no such limitation, so I wonder if the authors could conduct the experiments to address my further concerns.
Thank you to Reviewer uvQ4.
The choice of and the performance of NSReg when it is set to .
Please note that the requirements for setting are discussed in our theoretical analysis starting from line 234. There, we stated that the labelling function should ensure that, for a given labelling function, the minimum normality score assigned to relations exclusive to normal nodes should be smaller but closer to or equal to 1 (typically set for relations between two connected normal nodes) while being significantly greater than any score assigned to relations involving an anomaly node. If is set within the range , this condition will no longer hold. It would instead suggest that unconnected normal nodes exhibit normalities more similar to those of unlabeled nodes rather than connected normal nodes. Note that is a relative specification.
To further validate this point, we conducted additional experiments by running NSReg with set to 0, 0.2, and 0.4, and reported the results in Table A and Table B. These tables include the performance of NSReg when is set to 0, 0.2, 0.4, 0.6, 0.8, and 1.0. It can be seen that, on average, the performance for both metrics is better when is set to larger values, especially compared to . Although a general trend of performance decline is observed across all datasets as decreases, fluctuations can occur due to the randomness involved in the training process and the heterogeneity between different graphs.
Table A: AUC-ROC of NSReg using different values on all test anomalies.
| α | Photo | Computers | CS | Yelp | Avg |
|---|---|---|---|---|---|
| 1 | 0.894±0.022 | 0.792±0.014 | 0.960±0.006 | 0.733±0.005 | 0.845±0.012 |
| 0.8 | 0.908±0.016 | 0.797±0.015 | 0.957±0.007 | 0.734±0.012 | 0.849±0.013 |
| 0.6 | 0.894±0.022 | 0.792±0.014 | 0.960±0.006 | 0.733±0.005 | 0.845±0.012 |
| 0.4 | 0.891±0.015 | 0.782±0.012 | 0.957±0.019 | 0.732±0.003 | 0.841±0.012 |
| 0.2 | 0.872±0.017 | 0.781±0.011 | 0.953±0.019 | 0.733±0.007 | 0.839±0.014 |
| 0 | 0.866±0.017 | 0.768±0.011 | 0.897±0.026 | 0.729±0.004 | 0.815±0.014 |
Table A: AUC-PR of NSReg using different values on all test anomalies.
| α | Photo | Computers | CS | Yelp | Avg |
|---|---|---|---|---|---|
| 1 | 0.619±0.041 | 0.559±0.017 | 0.895±0.018 | 0.397±0.009 | 0.618±0.014 |
| 0.8 | 0.640±0.036 | 0.559±0.018 | 0.889±0.016 | 0.398±0.014 | 0.622±0.010 |
| 0.6 | 0.619±0.041 | 0.559±0.017 | 0.894±0.018 | 0.397±0.009 | 0.617±0.014 |
| 0.4 | 0.613±0.019 | 0.538±0.017 | 0.907±0.017 | 0.393±0.009 | 0.613±0.004 |
| 0.2 | 0.584±0.032 | 0.531±0.018 | 0.876±0.035 | 0.396±0.009 | 0.597±0.012 |
| 0 | 0.580±0.022 | 0.511±0.013 | 0.823±0.038 | 0.386±0.007 | 0.575±0.013 |
We hope this explanation clarifies the question.
Thanks for the rebuttal.
Since the authors can utilize Yelp in GADBench, which is also a binary classification dataset, they should also be able to provide the performance of other datasets in GADBench. However, although I still have concerns about the datasets, to thank the effort the authors put into the additional experiments, I would increase my score to 6.
We sincerely thank the reviewer for their thoughtful feedback and for recognising our efforts in the response by adjusting the score. Regarding the use of the datasets in GADBench, we would like to clarify that the two datasets we selected, as noted in the paper, were chosen primarily to ensure the comprehensiveness of our experiments and because observable differences in anomaly patterns could be identified within the anomaly class. This is essential for defining seen and unseen anomalies. For the other datasets, we were unable to identify anomaly subclasses with non-trivial discrepancies, making them unsuitable for open-set GAD evaluation. We hope this helps address the concern. Thanks again for your support.
This paper presents a novel open-set Graph Anomaly Detection (GAD) approach called Normal Structure Regularization (NSReg). The method aims to achieve generalized detection capabilities for unseen anomalies while retaining effectiveness in identifying known anomalies. It introduces a regularization term that encourages the learning of compact, semantically rich representations of normal nodes, informed by their structural relationships with other nodes. Extensive empirical evaluations on seven real-world datasets demonstrate that NSReg significantly outperforms state-of-the-art competing methods.
优点
- The experiments are comprehensive and yield strong results.
- The paper provides some theoretical proofs.
- It introduces a novel open-set approach to Graph Anomaly Detection (GAD).
缺点
- The paper's structure is disorganized, and the context of the paragraph is stiff. For instance, the three types of 'Normal-node-oriented Relations' are not clearly explained. Additionally, in lines 225-228, the authors mention analyzing the effects of enforcing structural normality in the representation space and its advantages for enhancing generalization to unseen anomalies. However, it is difficult to connect this to the later explanations, and the concept of structural normality even is not defined.
- The wording in the paper is imprecise. For example, in the caption of Figure 2, the term "The green teal dashed box" does not clearly indicate which part of the figure it corresponds to. Furthermore, it is unclear whether "a discriminative graph anomaly detector" refers to "a graph anomaly detector with a discriminator," as the term "detector" itself implies a discriminative function.
- The paper does not provide information about the hardware used for running experiments, nor does it include the experimental code and datasets. This raises concerns about the reproducibility of the results.
问题
- What does "connected normal nodes, unconnected normal nodes, and unconnected normal nodes to unlabelled nodes" mean? It would be best to illustrate these relationships with a diagram.
- What does the calculation symbol "" represent in the paper?
- In line 241, what do "these two types of relations" refer to? Are they the two types described in line 232: "the relations between only the normal nodes and the relations between normal and anomaly nodes"? Why does the relationship description in line 2—“connected normal nodes, unconnected normal nodes, and unconnected normal nodes to unlabelled nodes”—not use the same phrasing? Are there actually distinctions between them?
- In line 238, what is the specific operation when it states, “If Z is shared with a discriminative graph anomaly detector”?
- The equations in lines 267-269 are not labelled and should correspond to Equation 3. How is alpha determined, and why is it set to 0.8?
- In Figure 2, is the "shared representation space" represented by the grey rectangle with dashed lines? The arrows inside have three different styles, but the authors only labelled two of them.
- In line 277, why is g considered a mapping and how is it discriminative?
- What is the relationship between the equation in line 282 and g? Is this equation equivalent to ?
- How does the paper address the prediction of unseen nodes, and how does it identify unseen nodes?
We thank the reviewer for the constructive and affirming feedback and address the concerns of the reviewer below.
W1. The concept of structural normality refers to the graph structural relationships defined by the labelled normal nodes. This concept is explained in lines 76–97 and 184–190, and is further elaborated in Section 3.3. We kindly ask the reviewer to provide more specific feedback regarding this comment if we do not address the concern well. The paper’s clarity has been appreciated by other reviewers, such as Reviewers KzVj and uvQ4. We would greatly appreciate more detailed suggestions if otherwise.
W2. We would like to point out that Figure 2 contains only one dashed box filled in teal, while all other dashed shapes are spherocubes. We have improved the clarity of this distinction in the revised version. Regarding the use of the term “discriminative,” we believe it is necessary in this context. In GAD, a detector can be either supervised or unsupervised. Unsupervised graph anomaly detectors are not inherently discriminative—for example, autoencoder-based models are not discriminative by nature but can still achieve end-to-end anomaly scoring.
W3. Our apologies for unable to include hardward details due to space limit, we have included it in the revised version. Our results are gathered using a single NVIDIA A100 GPU and 28 CPU cores from an AMD EPYC 7663 Processor. We would like to point out that our datasets are publicly available, we have provided reference to all of them (in between line 327 and 338) and dataset details in Appendix B.2. Our code is included in supplementary materials.
Q1. We would like to point out that these terms are defined between line 267-268. Although we are more than happy to include an illustrative diagram, due to the page limit, we have included it in the appendix in the revised version.
Q2. This denotes the element-wise product (i.e., Hadamard product), which has been clarified in the revised version.
Q3. We kindly ask the reviewer to clarify the second question, as we are unable to locate the relationship description in line 2. The “two types of relations" refers to “the relations between normal nodes and the relations between normal and anomaly nodes," as they are discussed in close proximity within the same subsection. Please note that this subsection develops the theoretical support for NSReg, where we demonstrate that only two distinct and broad types of relations—those exclusive to normal nodes and those between normal and anomaly nodes—are necessary for our derivation. The specific connectivity between normal nodes does not affect our final conclusions. In principle, multiple types of relations among normal nodes could be defined, provided the labelling function satisfies the condition in line 231, i.e., the lowest normality score for relations exclusively among normal nodes remains significantly higher than that for relations between normal and anomaly nodes. The three types of relations mentioned in line 267 represent the default configuration in NSReg, an instance of the theoretical model. This configuration aligns with our theoretical analysis and has shown strong empirical performance.
Q4. This means that Z serves as the shared representation space for the mapping g and a supervised anomaly scoring function. In this context, the two components simply receive their input from Z, with no additional operations required.
Q5. We have labelled the equation in the revised version. Please refer to "To Reviewer VYmz Q2" for a detailed discussion of the choice.
Q6. There are only two types of arrows—orange and teal. We noticed that the orange arrows may appear slightly different due to variations in sizes and colour profiles across different monitors. However, in our plotting software, they are set to the same colour code. We have double-checked and ensured visual consistency in the revised version.
Q7. We would like to clarify that , as a discriminative mapping, is a fundamental component of our theoretical model and an intentional design choice for enforcing normal-node-oriented relations. We demonstrate that a discriminative mapping satisfying our conditions can effectively displace misplaced anomalies from the normal subspace when trained to differentiate normal-node relations jointly with the supervised GAD loss. NSReg serves as an instantiation of this theoretical model.
Q8. As mentioned in line 278, the equation represents the first sub-mapping of for generating relation representations. Here, and are instantiated from and , respectively, in the form of neural networks.
Q9. Our paper focuses on end-to-end anomaly scoring, which is identical for all nodes, regardless of whether they are seen or unseen. We have described this transformation in line 323. Note that our proposed method is a plug-and-play module that can enhance the representation learning of supervised GAD models during the training process. During inference, only the backbone anomaly detector is used for predictions, applying the same operation to any test node.
Thank you for your detailed answer. Here are the responses.
- w2: In Figure 2, the author describes it as "The green teal dashed box" but ont "one dashed box filled in teal." Moreover, I still don’t understand, as the figure doesn’t seem to include a dashed box completely filled in blue. One dashed box is filled in gray, and the other is filled in blue + yellow.
- w2: The classic unsupervised method CoLA (2021, TNNLS) also uses the term "discriminator."
- Q3: Apologies for the confusion—it's line 82, not line 2. What I meant to ask is if the descriptions of the relationship in line 232 and line 82 convey the same definition. I’m puzzled because, after introducing these three terms with emphasis in line 82, the authors then chose not to use them but instead adopted a different phrasing.
- Q7: What does `displace misplaced anomalies' mean? Eq.3 appears to be just learning the representation relationship between two nodes, and Eq.4 appears to be just a common binary loss function.
- Q8: There is no equation at line 278.
- Q9: Since the paper is titled "OPEN-SET GRAPH ANOMALY DETECTION," where is the uniqueness in handling unseen nodes reflected? Do unseen nodes include domain-shifted nodes?
We greatly appreciate the prompt response from the reviewer and would like to provide the following clarifications.
W2. Improving wording in Figure 2 Caption.
We have updated the wording to "The box with dashed border line filled in teal illustrates the decomposition of NSReg’s overall learning objective in the shared representation space ". For improved clarity, we have also labelled the box with .
W2 The use of word "discriminative".
In our previous response, we mentioned that unsupervised graph anomaly detection methods are not inherently discriminative, as many of them do not adopt discriminative training schemes. However, we did not intend to deny the possibility of applying discriminative training schemes to unsupervised GAD, but to justify the use of the word discriminative.
Our work focuses on improving supervised GAD models in the open-set setting. These models utilise ground truth labels as primary supervision, making them inherently discriminative. By using the term "discriminative," we specifically emphasise the backbone requirement of our framework, which leverages ground truth labels for supervision in a discriminative manner. This distinction is made solely to establish the foundation of our theoretical analysis.
Regarding CoLA, its framework does include a discriminator module to distinguish between positive and negative pseudo-samples. However, this does not contradict the existence of non-discriminative GAD training approaches such as DOMINANT (Ding et al., SDM 19).
W3 Further discussion on the connection between the two types of relations in the theoretical analysis and the default NSReg.
Thank you for the clarification. As we mentioned in our previous response, the three terms (types of relations) in line 82 are the ones considered in our default NSReg model, which is instantiated from the theoretical model we derived in Section 3.3 (around line 232). In our theoretical analysis, it is sufficient to derive the proposition using the two broader categories—relations exclusive to normal nodes and those between normal and anomaly nodes—allowing for a simpler but equivalent proof. The specific connectivity between normal nodes does not need to be explicitly handled and does not affect our final conclusions, as long as the condition in line 235 is satisfied. In NSReg, the three types of relations are a design choice based on empirical observations. Specifically, and corresponds to the first broader category, while mimics the second.
Q7. Explanation why is a discriminative mapping.
Regarding Q7, the original question asked, "Why is considered a mapping, and how is it discriminative?". Please note that is an abstraction in our theoretical model, while the equations mentioned by the reviewer are the loss functions for NSReg, which is instantiated from this theoretical model.
As we explained in our previous response, it is our design choice to enforce normality, as defined by the graph structure, in a discriminative manner. The equations referenced by the reviewer serve to differentiate between the three types of relations, aligning with the theoretical framework.
Q8. The connection between the Equation (4) and .
We didn't mean equations but rather intended to clarify that the connection between and is mentioned in the paper. In the latest revision, this is currently located in line 282. We have also provided the explanation in our previous response under Q8.
Q9. Further clarification on the uniqueness of NSReg in improving GAD performance on unseen anomalies and the relevance between 'domain-shift' and the definition of unseen anomalies.
As we mentioned in the abstract and the introduction (third paragraph) under the general rebuttal section "Contribution in Terms of Methodology," NSReg improves generalisation to unseen anomalies by enforcing strong structural normality in representation learning. This is clearly reflected in the title of our paper, which highlights that this is achieved via regularisation.
NSReg is implemented as a plug-and-play module and is integrated into the training process. It effectively enforces distinct and clean normal regions in the representation space while isolating anomalies from these regions. This leads to improved separability between the normal and anomaly classes, including unseen anomalies. Consequently, unseen nodes do not need to be explicitly considered during inference, which would be infeasible due to their lack of labels.
Regarding the definition of unseen nodes in our open-set framework, as mentioned in the paper (lines 48–53), unseen anomalies are those patterns that cannot be represented by the labelled anomalies due to limited and incomplete prior knowledge. This means that, in open-set GAD, the labelled anomalies represent only a subset of the anomaly patterns within the entire anomaly class. Any anomalies not represented by labelled nodes during training are considered unseen. Whether this is due to domain shift or other factors is not pertinent for the definition.
Thank you for the further explanations. I will maintain my score for several reasons:
I still have doubts about the novelty of the paper. Firstly, in the Normal-node-oriented Relation Modelling section, the division between normal and anomalous regions is rather simplistic, as it solely relies on the simple relational labels defined by C(r) for supervised training. I would like to discuss with the authors what impact the presence of isolated anomalous nodes connected to multiple normal nodes in C(r) would have on the results. Additionally, is it possible for the calculation of C(r) to take more factors into account, such as node degree, average features, and other relevant information? Secondly, the provided code is also incomplete. It does not include the main.py files nor the README file.
Thank you Reviewer JRAN for your response. We assumed that our previous responses had addressed your earlier concerns, as the new issue regarding novelty was not previously mentioned. We'd appreciate if you could let us know which specific previous concerns have not been properly addressed. Below are our responses to your new concerns.
Novelty vs Simplicity.
The novelty of our work lies in the novel application setting, the graph-structure-based regularisation, and its implementation as a plug-and-play module for supervised GAD methods. The key insight we demonstrate through this work is the usefulness of anomaly-discriminating information embedded in graph structure for supervised GAD in an open-set setting, which is firmly grounded in our theoretical analysis. No similar ideas have been explored before. Our focus is on introducing this practical yet underexplored setting and pioneering this new line of research through the proposed straightforward and effective model. Developing the most complex framework with a conceptually sophisticated architecture is not what we claim as our objective.
We believe that it is fair to say that simplicity is never in conflict with the novelty of a work. A simplistic but technically unique solution that achieves significant improvement should be considered as advantageous. In our case, the above-mentioned contributions constitute our technical novelty. More sophisticated and detailed implementations of structural regularisation are comparatively trivial. In fact, the "simplicity" of NSReg is what allows it to be integrated to other GAD models, introducing little overhead during training and not affecting inference time. Therefore, simplicity in the model design should not be penalised in any way as long as they are effective and provide interesting insights into the problem.
Impact of the presence of isolated anomalous nodes connected to multiple normal nodes.
To the best of our knowledge, isolated nodes are those that do not connect to any other nodes. Therefore, we believe they will not be connected to any normal nodes. Nevertheless, isolated nodes do not exist in all datasets, and even when they do, they constitute a very small percentage. Considering the small number of labelled anomalous nodes in training, most isolated anomalous nodes will remain unlabelled and therefore cannot be explicitly considered.
Is it possible for the calculation of to take more factors into account, such as node degree, average features, and other relevant information?
Please note that, as stated in Equation 4, the relation representations are generated from the representations of their nodes via a learnable transformation, which we believe is more effective than directly taking the average of their features. In addition, as mentioned above, our model is designed to emphasise the importance of anomaly-discriminating information embedded in the graph structure through a clear-cut and robust normality hierarchy, rather than relying on heuristic-based feature engineering. While sophisticated feature engineering might lead to marginal performance gains, it could also introduce additional overhead and make the training process more sensitive to capturing the finer granularity of normality. Please also note that with our current labelling function, NSReg has already significantly outperformed all the competing methods. Additional designs can often be made to an effective method, but they do not affect the significant contributions made by the already highly effective, innovative method.
Code release.
First, we'd like to clarify that the source code-related issue is not a matter of novelty but rather a reproducibility question, an optional evaluation criterion. Secondly, in our previous responses we have committed to releasing the cleaned-up and ready-to-use code on GitHub upon acceptance. This, along with the inclusion of partial code in our paper submission, should demonstrate our commitment to reproducibility. In the meantime, to address your concern straight away, we have provided an integrated version of the training and evaluation code at this link, which includes all training and experimental logic.
Thank you very much for your fruitful feedback. We sincerely hope that our clarifications can address the questions.
Thanks for the clarification. I would increase my score to 6.
Thank you very much again Reviewer JRAN for your thoughtful engagement and insightful discussions. We are delighted to hear that you are satisfied with our responses and deeply appreciate your support for our work.
This paper tackles the challenge of open-set graph anomaly detection, where the detection model is trained to detect both known and previously unseen anomalies. The authors introduce NSReg, a method that utilizes auxiliary relational information around normal nodes to regularize the model's training. NSReg works by distancing unseen anomalies from normal nodes, aiding in their detection. The paper also includes a theoretical analysis of the proposed method. Comprehensive experiments evaluate the model's effectiveness and the individual contributions of its components.
优点
- The paper tackles an important and impactful problem—open-set graph anomaly detection—with significant potential for real-world applications.
- The writing is clear, and the paper is well-organized, making it easy to follow.
- A theoretical analysis is provided, adding depth to the proposed approach.
- The paper includes thorough experiments that demonstrate the proposed method's effectiveness, along with ablation studies that evaluate the contribution of each component.
缺点
-
The core idea of using neighborhood information to enhance anomaly detection is not entirely novel, as similar approaches have been explored in prior works [1], [2], [3]. A comparison between NSReg and these previous methods is necessary to better contextualize its contributions.
-
I have some concerns about the effectiveness of the Normal-node-oriented Relation Generation component. Specifically, the unconnected relation between normal nodes and other nodes may include a large number of connections to unlabeled normal nodes, given the predominance of normal nodes in the dataset, as the authors also note. If most relations are between labeled and unlabeled normal nodes, distancing these pairs may not meaningfully enhance model training. At the same time, if there are very few relations involving labeled normal nodes and unlabeled unseen anomalies, the regularization term may have minimal impact on detecting unseen anomalies. I suggest that the authors conduct an empirical analysis on the distribution of different types of relations within the sampled set and discuss how this distribution affects the regularization term's effectiveness.
-
In Eq. (3), the meaning of symbols and are unclear. Also, in Eq. (2) and (5), the supervised training loss should sum over all training nodes, not the entire node set .
-
Could the authors provide justifications for updating the three components , and , and separately as stated in the pseudo-code?
-
In the Yelp dataset, NSReg displays less sensitivity to the hyper-parameter compared to other datasets (such as Photo, Computers, and CS). Could the authors provide an explanation for this? Additionally, G. SMOTE performs better on unseen anomalies for this dataset, as shown in Table 1. An explanation for this would also be necessary.
-
The source code for this work is not provided for review.
[1]: Label information enhanced fraud detection against low homophily in graphs. WWW. 2023.
[2]: Consistency training with learnable data augmentation for graph anomaly detection with limited supervision. ICLR. 2024.
[3]: Partitioning message passing for graph fraud detection. ICLR. 2024.
问题
See Weakness.
W3. The meaning of symbol and the use of notation .
The symbol represents the element-wise (Hadamard) product. We have added an explanation in the revised version. The use of is correct. As mentioned in line 307, refers to a batch of training nodes. In our paper, the set of all node is denoted as .
W4. Clarifications on the reason why gradient updates of NSReg's components are presented separately in the pseudocode.
The pseudocode is reflective for the training process. We would like to clarify that this is consistent with the joint learning objective, and the pseudocode is written specifically to align with the framework’s architecture and code implementation. and are the two transformations within the relation modelling module, and refers to the anomaly scoring network. These components operate independently, as they both take input from the graph representation space . As a result, their parameters can be updated independently, and their order in the pseudocode is interchangeable. On the other hand, is jointly optimised based on the losses from both objectives. It receives gradients propagated from subsequent modules and is therefore updated last.
W5. NSReg's improvement appears less significant on the Yelp dataset compared to the other datasets.
- NSReg is effective on all datasets. We are uncertain about the reviewer’s interpretation of "displays less sensitivity to the hyper-parameter." Our best understanding is that the reviewer is asking why NSReg demonstrates greater improvement, in terms of raw values, across the three datasets. The effectiveness of NSReg depends on several factors inherent to the datasets, such as the type of graph, connection characteristics, quality of node attributes, the discrepancy between seen and unseen anomaly classes, and the capacity of the backbone GAD model. Below are some possible explanations: First, the performance of our default backbone, as well as other baseline methods, on Yelp is generally lower than on the other datasets. NSReg still achieves significant improvement, which is more pronounced in terms of the rate of improvement. Additionally, as shown in Table 2, when other GAD models are used as backbones, their NSReg-enabled variants achieve comparable or even greater improvements on Yelp compared to the other datasets. This further supports the notion that factors beyond our design influence the degree of improvement.
- GraphSMOTE’s performance on Yelp for unseen anomalies. Regarding why GraphSMOTE performs better on the unseen classes in the Yelp dataset, we would like to emphasise that the ultimate performance metric considers all test anomalies. While GraphSMOTE may achieve better results on unseen anomalies, this improvement comes at the cost of fitting seen anomalies and the overall performance. It is important to note that GraphSMOTE is an augmentation-based method. Although augmented samples can mimic certain patterns, they may also introduce noise. From its less optimal overall performance, we infer that the augmented examples might partially align with the patterns of unseen anomalies but negatively impact the model’s ability to fit seen anomalies, ultimately doing more harm than good.
W6. Code release.
We would like to clarify that we have submitted code in the attachment of supplementary material and will make the code publicly available on github after acceptance.
We thank the reviewer for the constructive and affirming feedback and address the concerns of the reviewer below.
W1 Discussion on the difference between the core idea of NSReg and closed-set GAD baselines and additional baselines
-
The core idea of NSReg. We respectfully disagree with the reviewer and believe that describing our key contributions as the "core idea of using neighbourhood information" is inaccurate. Regarding the additional reference provided by the reviewer and the other supervised GAD baselines included in our experiments, while they somehow incorporate neighbourhood information, they do not enforce any type of explicit structure-wise normality-based regularization during GAD training. It is fair to state that almost all graph learning methods leverage neighbourhood information, either implicitly or explicitly, as this is a fundamental characteristic that distinguishes graph learning from learning on non-graph data. However, attributing our contribution and novelty solely to the use of local information disregards the unique aspect of our approach: the explicit regularisation via enforcing structural normality, specifically tailored for supervised GAD in the open-set setting. Please note that the definition of structural normality based on neighbourhood connectivity is a design choice that is both efficient and highly effective. As we mentioned in Section 5, it is entirely possible to extend the scope of relations. However, our primary goal is to emphasise the importance and usefulness of graph-structure-based regularisation for supervised GAD in the open-set setting.
-
Additional baselines. We want to point out that our baseline selection already includes a broad range of recent approaches from related fields, including but not limited to both supervised and unsupervised GAD methods (16 in total). To acknowledge the importance of the suggested baselines, we have incorporated them into the related work and conducted additional experiments on CONSISGAD [2] and PMP [3] (which significantly outperforms [1] in its comparison). At a high level, these methods are closed-set supervised GAD approaches, similar to those already discussed in the paper. As shown in the Table 1, NSReg consistently achieves superior overall performance in detecting both all test anomalies and unseen anomalies. Please refer to the revised paper for complete results and discussion. It is also worth noting that while CONSISGAD leverages both data augmentation and label information for supervision, NSReg relies solely on label information.
Table 1. UC-ROC and AUC-PR (mean ± std) for detecting all anomalies, comparing NSReg with CONSISGAD and PMP.
| Metric | Method | Photo | Computers | CS | Yelp | Avg. |
|---|---|---|---|---|---|---|
| AUC-ROC | CONSISGAD | 0.706 ± 0.033 | 0.597 ± 0.040 | 0.683 ± 0.067 | 0.738 ± 0.010 | 0.681 ± 0.023 |
| CFAD | 0.726 ± 0.003 | 0.718 ± 0.006 | 0.889 ± 0.003 | 0.712 ± 0.016 | 0.761 ± 0.006 | |
| NSReg | 0.908 ± 0.016 | 0.797 ± 0.015 | 0.957 ± 0.007 | 0.734 ± 0.012 | 0.849 ± 0.013 | |
| AUC-PR | CONSISGAD | 0.481 ± 0.035 | 0.326 ± 0.033 | 0.530 ± 0.084 | 0.365 ± 0.025 | 0.424 ± 0.027 |
| CFAD | 0.447 ± 0.007 | 0.460 ± 0.010 | 0.789 ± 0.004 | 0.245 ± 0.023 | 0.485 ± 0.008 | |
| NSReg | 0.640 ± 0.036 | 0.559 ± 0.018 | 0.889 ± 0.016 | 0.398 ± 0.014 | 0.622 ± 0.021 |
W2. Discussion on the potential implications of unconnected normal nodes in .
The third case works collaboratively with the first two cases to distinguish unlabelled normal nodes from unseen anomalies. We agree with the reviewer that most relations of this type are between labelled and unlabelled normal nodes. Consequently, the unlabelled normal nodes in the third case may slightly interfere the normal relation score of the labelled normal nodes, which is an unavoidable but worthwhile trade-off. However, their scores remain higher than those of unseen anomalies due to their similarity and structural connections with labelled normal nodes in the first two cases. This distinction does not apply to unseen anomalies, which consequently receive low normal relation scores. Additionally, the supervised GAD loss is sufficiently powerful to produce a dense normal region, further mitigating any unwanted effects. We report the percentages of anomaly node in in the table below. It is worth noting that other factors, such as graph properties and the backbone GAD model, also influence the effectiveness of NSReg.
| Photo | Computers | CS | Yelp |
|---|---|---|---|
| 0.08 | 0.15 | 0.2 | 0.15 |
Thank you for your detailed response.
Overall, I am satisfied with your responses to Weaknesses 1–5. However, I share Reviewer JRAN's concern regarding the source code. Specifically, the README file and the training procedure for the model appear to be missing, with only the inference procedure provided. This level of detail is insufficient for full reproducibility.
Based on your clarifications and improvements, I am inclined to increase my rating to support this work, albeit with some reservations.
Thank you and code release.
Thank you Reviewer KzVj, for your response. We are pleased that you are satisfied with our responses. Regarding the code release, as previously mentioned, we have committed to releasing the cleaned-up code on GitHub upon acceptance. In the meantime, we have provided an integrated version of the training and evaluation code at this link, which includes all training and experimental logics. We hope this addresses your reservations and increases your confidence in supporting this work.
The paper studied the unsupervised graph anomaly detection problem under the lack of prior knowledge setting. In particular, it proposes an approach focusing on learning normality using supervised signals to distinguish unseen anomalies from normal nodes. The proposed method, NSReg, models three types of normal-node-oriented relations as a discriminative task, enhancing representation learning with enriched normality semantics. The authors argue that this approach effectively disentangles unseen anomalous nodes from normal nodes in the representation space. The paper also discusses the plug-and-play design of the method, which allows for its application in various scenarios.
优点
- The problem studied in the paper is interesting. Due to the lack of prior knowledge, unsupervised graph anomaly detection often suffers from high error rates. The authors focus on learning normality to enhance the model's ability to detect unseen anomalies, potentially improving the performance of existing methods in this area.
- The Plug-and-Play design is a notable strength, as it allows the method to be easily integrated into existing frameworks and applied to various scenarios.
- The experiments conducted are comprehensive, covering the performance on all anomalies and unseen anomalies, ablation study, and the performance as a plug module, which evaluates the proposed method's effectiveness well.
缺点
- There are some unsupervised graph anomaly detection works that achieve considerable inductive performance. The author did not point out why these methods cannot meet the requirements. There should be a more thorough comparison of these existing methods and an analysis of the deeper reasons behind them.
- The description of the proposed method did not clarify the innovation of this paper. One innovation of this paper is the focus on learning normality on structure. This differentiates this work from existing one-class classification/anomaly detection methods, but this point is unclear.
- The presentation of the paper needs to be improved.
问题
- Aegis (Inductive anomaly detection on attributed networks) is an unsupervised graph anomaly detection using GAN to strengthen the generalization ability of the model. The authors should discuss how their method compares with this work.
- Line 271, why is the set to 0.8? What is the significance of this value? The author should discuss this in detail and provide some experiment results to support this choice.
- Figure 3 is hard to understand.
We thank the reviewer for the constructive and affirming feedback and address the concerns of the reviewer below.
W1. Discussion on unsupervised GAD methods for open-set GAD.
We would like to kindly ask the reviewer to clarify the meaning of “considerable inductive performance." Additionally, we would like to point out that, as mentioned in lines 41, 43, and 138, while it is possible to apply unsupervised GAD methods, they are far less effective than supervised methods due to the lack of prior knowledge about the anomalies of interest. This is further validated by our experimental results, which include seven widely used recent unsupervised baselines. As shown in Table 1, these methods exhibit significantly lower performance compared to supervised approaches. Furthermore, in line 94, we clearly defined the scope of our study, emphasising that it focuses on supervised methods in open-set settings. We also highlighted that unsupervised methods face different challenges when no label information is available. Please refer to our General Rebuttal above.
W2, Clearer discussion on the difference between enforcing structural normality and existing one-class classification or anomaly detection methods.
We would like to point out that this has been discussed in Section 4.2, where we highlighted the importance and advantages of enforcing structural normality compared to other one-class classification and anomaly detection objectives, such as Deep-SAD (hypersphere-based) and contrastive learning. Additionally, our unsupervised baselines encompass a wide range of GAD objectives, including one-class classification (OCGNN), contrastive learning (COLA), and reconstruction loss (ADA-GAD). These methods are significantly less effective than NSReg. These approaches primarily focus on node attributes or the volume of the normal region in the representation space, without fully utilising the valuable anomaly-discriminating information embedded in the graph structure. The key innovation of NSReg lies in capturing this information by modelling normal-node-oriented relations. Furthermore, as demonstrated through the theoretical analysis in Section 3.3, we provide a rigorous foundation for the effectiveness of structural regularisation, whereas the effectiveness of other objectives remains largely heuristic.
W3. The presentation of the paper needs to be improved.
We would like to kindly ask the reviewer to provide more specific feedback regarding this comment. We are more than happy to clarify and improve any aspects of the presentation if the reviewer could kindly point them out.
Q1. Inclusion of AEGIS as a baseline.
We would like to clarify that we have explicitly stated in the paper that our primary focus is on the supervised setting, as noted in response to "To Rev. VYmz" W1. The inclusion of unsupervised baselines is intended for comprehensiveness. According to Table 1 in the main paper, supervised GAD baseline methods consistently demonstrate significantly better performance compared to their unsupervised counterparts. For comprehensiveness, we have already included seven unsupervised baselines, including recent methods such as CoLA, CONDA, TAM, and ADA-GAD, which are more recent than AEGIS. Additionally, we included GGAN, a GAN-based unsupervised method with a similar learning scheme to AEGIS, as part of our baseline comparisons. However, to acknowledge the reviewer’s input, we have now included the results for AEGIS. The overall GAD performance of AEGIS, in terms of AUC-ROC and AUC-PR, is reported in Table 2. Please refer to the revised paper for results on unseen anomalies. Similar to the other unsupervised baselines, AEGIS achieves significantly less optimal performance than the supervised methods. This is because its augmentation process is heuristic, lacking anomaly prior knowledge, which makes it challenging to generate data samples that accurately mimic actual anomalies.
Table 2: AUC-ROC and AUC-PR (mean ± std) for detecting all anomalies, comparing NSReg with AEGIS. “-” denotes unavailable result due to out-of-memory.
| Metric | Method | Photo | Computers | CS | Yelp | Avg. |
|---|---|---|---|---|---|---|
| AUC-ROC | AEGIS | 0.543 ± 0.053 | 0.426 ± 0.067 | 0.491 ± 0.053 | - | - |
| NSReg | 0.908 ± 0.016 | 0.797 ± 0.015 | 0.957 ± 0.007 | 0.734 ± 0.012 | 0.849 ± 0.013 | |
| AUC-PR | AEGIS | 0.095 ± 0.015 | 0.132 ± 0.019 | 0.227 ± 0.025 | - | - |
| NSReg | 0.640 ± 0.036 | 0.559 ± 0.018 | 0.889 ± 0.016 | 0.398 ± 0.014 | 0.622 ± 0.021 |
Q2. Clearer discussion on the choice of .
The parameter specifies the relative normality of relations between unconnected normal nodes with respect to other types of relations. It is designed to define the normality hierarchy we aim to enforce, such that the relation scores satisfy the following order: connected normal nodes > unconnected normal nodes unconnected normal and unlabelled nodes. This hierarchy is based on the assumption of homophilic behaviour among normal nodes, distinguishing between the most related normal nodes and other, less related normal nodes. We find that the performance of our model is not sensitive to the choice of , as long as this hierarchy is maintained. As noted in the paper (line 375), we reported this observation, and due to space constraints, the parameter sensitivity analysis is provided in Appendix C.5.
Q3. Understandability of Figure 3.
Figure 3 shows the data efficiency plot of NSReg compared to supervised GAD baselines in terms of AUC-PR on all test anomalies. Due to space constraints, the complete results—including both AUC-ROC and AUC-PR for all test anomalies and unseen anomalies—are presented in Figures 8 and 9 in Section C.6.
These plots adhere to standard practices commonly used in many supervised AD and GAD studies to comprehensively evaluate model performance under varying levels of data availability. Our results demonstrate that NSReg consistently outperforms the supervised baselines across different numbers of labelled seen anomalies.
If this response does not help address the concern, we would like to kindly ask the reviewer to provide more specific feedback for us to better clarify the confusion/misunderstanding.
Dear Reviewer VYmz,
We have provided a detailed point-by-point response to your comments. Could you please kindly check whether our response helps address your concerns? We look forward to engaging with you to address any further questions you may have. Thank you very much!
Cheers,
Dear Reviewer VYmz,
We sincerely appreciate the time and effort you have dedicated to reviewing our paper and providing constructive feedback. We have carefully addressed all your comments in detail and provided a point-by-point response to clarify your concerns.
As the discussion period approaches its conclusion, we kindly seek your acknowledgment of our responses. We would greatly appreciate hearing whether they sufficiently address your concerns or if there are any additional clarifications or explanations you require.
To summarise our responses:
- Unsupervised Baselines: We emphasised that our paper already includes a comprehensive set of popular unsupervised GAD methods, along with additional experiments on the method AEGIS as you suggested, bringing the total to seven unsupervised baselines.
- Novelty and Advantages of NSReg: We elaborated on the advantages of NSReg over other one-class and anomaly detection objectives, as highlighted in our ablation study. Additionally, we summarised the key novelties of NSReg in our responses to your comments and the general response to provide a clear understanding of its contributions.
- Choice of : We clarified that this parameter has been discussed and experimented on in our paper. Furthermore, we conducted additional experiments to evaluate NSReg’s performance across a wider range of values, as detailed under "The choice of and the performance of NSReg when it is set to [0, 0.5]" in our second-to-last response to Reviewer uvQ4.
- Paper Presentation: We have further improved the presentation of the paper, incorporating suggestions from all reviewers.
We have addressed all of your questions in the revised PDF. These questions are similar to those raised by other reviewers, who have expressed satisfaction with our responses and support for our paper. We hope our responses address your concerns as well.
Thank you once again for your time and valuable insights.
Best regards,
Authors of #7056
Dear Reviewer VYmz,
We really appreciate the time and effort you’ve dedicated to reviewing our paper and providing constructive feedback. As the discussion period nears its conclusion, we kindly seek your acknowledgment of our responses.
Most of your questions align with those of other reviewers, who have expressed satisfaction with our responses and supported the paper by increasing their ratings. We hope our clarifications address your concerns as well.
Thank you once again for your time and valuable input. We look forward to hearing from you.
Best regards,
Authors of #7056
Dear Reviewer VYmz,
We sincerely appreciate the time and effort you’ve devoted to reviewing our paper and providing valuable feedback. As the discussion period is coming to a close in a few hours, we kindly request your acknowledgment of our responses.
Many of your questions are similar to those raised by other reviewers, who have expressed satisfaction with our responses and supported the paper by increasing their ratings. We hope our clarifications address your concerns as well.
Your acknowledgment or any feedback at this stage would be greatly appreciated. Thank you once again for your time and thoughtful insights.
Best regards,
Authors of #7056
We would like to express our sincere appreciation for the constructive and informative feedback provided by all reviewers, as well as the coordination by the area chair, senior area chair, and program chair throughout the review process.
In particular, we are pleased that the reviewers have endorsed the paper in the following aspects:
-
Interesting and impactful problem setting with an attractive design. “The paper tackles an important and impactful problem." (Reviewer KzVj). “The Plug-and-Play design is a notable strength." and “The problem studied in the paper is interesting." (Reviewer VYmz). “It introduces a novel open-set approach to Graph Anomaly Detection (GAD)." (Reviewer JRAN).
-
Comprehensive experiments. “The paper includes thorough experiments... along with ablation studies that evaluate the contribution of each component." (Reviewer KzVj). “The experiments conducted are comprehensive." (Reviewer VYmz).
-
Strong performance and theoretical analysis. “Theoretical analysis is provided, adding depth to the proposed approach." (Reviewer KzVj). “The experiments are comprehensive and yield strong results." and “The paper provides some theoretical proofs." (Reviewer JRAN). “The framework outperforms baselines on datasets included in this paper." (Reviewer uVQ4).
-
Clear writing and good presentation. “The writing is clear, and the paper is well-organized, making it easy to follow." (Reviewer KzVj).
In addition, we would like to clarify the following issue.
Problem setting, roles, and contributions of NSReg. We hope to emphasise these key points to clarify any potential misunderstandings that may have led to some reviewers’ questions, such as the performance of unsupervised methods in Open-set GAD or confusing the unique insights of NSReg for general supervised GAD approaches.
- Problem setting. We address an open-set GAD setting, a very practical application scenario where there can be test anomalies that are not from the same training anomaly classes/distributions, e.g., new types of frauds in financial networks or abusive behaviours in co-purchasing networks that do not appear in the labelled training data. Existing supervised GAD methods primarily focus on closed-set settings, where training and testing anomalies are assumed to be from the same classes/distributions.
- Contribution in terms of methodology. NSReg introduces a novel graph-structure-based regularisation to enforce strong normality in representation learning, significantly reducing the chance of misclassifying unseen anomalies as normal (i.e., better generalisation to unseen anomaly classes). Existing relevant GAD are focused on improving GNN architectures. Leveraging structural normality to regularise supervised GAD in open-set settings has not been explored in previous works; no similar work has been done on unsupervised GAD, too.
- Theoretical contribution and applicability in enabling existing SOTAs. Unlike many GAD objectives that are largely heuristic, the effectiveness of our approach is theoretically grounded. Furthermore, NSReg functions as a plug-and-play module that can be integrated into any end-to-end supervised GAD model. It is only active during the training process and does not affect the original inference runtime of the backbone model.
Role of unsupervised GAD in this work. Regarding the unsupervised GAD baselines, the discussion and experimental results on these unsupervised methods are included for comprehensiveness. Since these models do not leverage label information, they are typically less advantageous, leading to less effective performance compared to supervised methods.
Interpretation of our experimental results. As mentioned above, many GAD methods focus on improving GNN architectures, meaning that most of our baselines employ more sophisticated backbones with stronger learning capacities. The strong empirical performance of NSReg can be understood in two ways: when added to a simple two-layer GraphSAGE representation learner, the default model already outperforms state-of-the-art methods. Furthermore, when integrated with other GAD models, NSReg can also significantly enhances the open-set performance of the original backbone models. While overall GAD performance is important, it is equally crucial to emphasise the substantial improvements NSReg brings to the backbone models. Since GAD performance depends heavily on the backbone, these improvements highlight the effectiveness of NSReg and the significance of our structural regularisation.
In addition to the general rebuttal, we have provided point-by-point responses to each question from all reviewers. We hope our rebuttal helps clarify any confusion and sincerely appreciate your prompt response. We look forward to further discussion with the reviewers and the area chair.
This paper focuses on training a detection model using a small number of normal and anomaly nodes to detect both seen anomalies and unseen anomalies and enhancing the generalization for both seen and unseen anomaly nodes. It introduces a regularizer to enforce the learning of compact, semantically-rich representations of normal nodes, and differentiates the labelled normal nodes that are connected in their local neighborhood from those that are not. The regularizer incorporates strong normality into the modelling and avoids overfitting the seen anomalies. The proposed method is a plug-and-play module and can be integrated as a plugin module into multiple supervised graph anomaly detection learning approaches to enhance the generalizability to unseen anomalies. Experimental results on several datasets and metrices demonstrate the effectiveness.
After rebuttal, one out of four reviewers still holds negative score. Reviewer VYmz thinks there should be a more thorough comparison of existing methods and an analysis of the deeper reasons behind them as well as a detailed innovation description. The parameter setting is also not discussed. Reviewer KzVj has some reservations about the reproducibility. Reviewer uvQ4 has concerns about the datasets. Reviewer JRAN raises its scores to 6. After reading the rebuttal, I think these issues are cleared to some extent. For example, about the comparison with existing methods, the authors employ 8 supervised methods and 4 unsupervised methods, as shown in Table 1 of the manuscript. The analysis is provided in Section 4.1 and C.1. The innovation description is emphasized in General Rebuttal. The hyper-parameter analysis is newly added in Section C.6. The concerns about the datasets, reproducibility and presentation, I think the authors can further clear these minor issues in the next version. Based on these, I tend to vote (borderline) acceptance.
审稿人讨论附加意见
After rebuttal, one out of four reviewers still holds negative score. Reviewer VYmz thinks there should be a more thorough comparison of existing methods and an analysis of the deeper reasons behind them as well as a detailed innovation description. The parameter setting is also not discussed. Reviewer KzVj has some reservations about the reproducibility. Reviewer uvQ4 has concerns about the datasets. Reviewer JRAN raises its scores to 6.
Accept (Poster)