PaperHub
6.3
/10
Poster4 位审稿人
最低5最高7标准差0.8
7
7
6
5
4.0
置信度
正确性3.3
贡献度2.8
表达3.3
NeurIPS 2024

UniGAD: Unifying Multi-level Graph Anomaly Detection

OpenReviewPDF
提交: 2024-05-14更新: 2024-11-06
TL;DR

We propose the first unified framework for detecting anomalies at node, edge, and graph levels.

摘要

关键词
Graph Anomaly DetectionGraph Neural Networks

评审与讨论

审稿意见
7

Traditional graph anomaly detection focus on single type of graph object (e.g., node, edge, graph). To address this, this paper introduces the first unified framework (UniGAD) for detecting anomalies at the node, edge, and graph levels jointly. The authors propose two core modules, MRQSampler and GraphStitch, to address the challenge of unifying multi-level task formats and unifying multi-level task training, respectively. Moreover, the authors theoretically prove that MRQSampler maximizes the accumulated spectral energy of subgraphs (i.e., the Rayleigh quotient) to preserve the most significant anomaly information. Extensive experiments demonstrate that UniGAD surpasses existing single-task GAD methods and graph prompt-based approaches for multiple tasks, offering robust zero-shot task transferability.

优点

Originality:

  1. This paper focus on the gap of multi-level graph neural network at the node, edge, and graph levels, which is an important problem. In many scenarios/data, the labels of these different graph objects are often relevant, but this is always overlooked.
  2. The problem formulation in Definition 2.1 is novel in graph learning, which is feasible to have labels at one or more levels, which makes the framework broadly applicable and provides strong zero-shot capabilities.
  3. The authors designed a theoretically guaranteed graph sampling algorithm that uses the Maximum Rayleigh Quotient to sample the most anomalous subgraphs. This approach ensures sampling nodes that contain anomalous information, preventing the anomalous information from being smoothed out.
  4. The authors implement multi-level task information transfer through a module called GraphStitch. Since multi-task in graph learning research remains rare, this network design is very interesting and prospective.

Quality: The paper exhibits a high level of technical quality, with theoretical proofs and rigorous dynamic programming algorithms provided for the effectiveness of the MRQSampler in maximizing the accumulated spectral energy of subgraphs. The paper proposes a robust framework that achieves the unified multi-level tasks through a pre-trained GNN encoder, MRQsampler, and GraphStitch network. The experiments look solid, including 13 datasets with single-graph datasets and multi-graph datasets and 17 SOTA model with node-level models, edge-level models, graph-level models and multi-task models.

Clarity: The paper is well-written and clearly presents its contributions.

Significance: The framework of UniGAD further improves the effect of graph anomaly detection and has strong zero-shot capability. Besides, it also promote the development of multi-task learning in the graph learning domain.

缺点

A substantive assessment of the weaknesses of the paper. Focus on constructive and actionable insights on how the work could improve towards its stated goals. Be specific, and avoid generic remarks. For example, if you believe the contribution lacks novelty, provide references and an explanation as evidence; if you believe experiments are insufficient, explain why and exactly what is missing, etc. Please keep in mind that the rebuttal period is not necessarily enough to run new experiments. As a result, asking for new results is rather unrealistic.

Overall, this paper is technically solid and novel. In Page 8, the authors mention that the other multi-task method, like GraphPrompt and All-in-One, often run out of time (OOT) and calculate redundantly. What is it that makes UniGAD avoid this? Are each node’s representations calculated only once? Please explain the difference in more detail.

Minors: in Additional Experimental Results, the caption of the table 8 and table 9 should be F1-macro? Otherwise this will repeat table 1,2.

问题

Refer to weaknesses, please explain the difference in more detail and check the typos.

局限性

The authors addressed limitations and societal impacts in the conclusion and appendices.

作者回复

We are greatful for your helpful comments! Below are our responses.

Q1: How UniGAD avoid Redundant calculation

Prompt-based methods for handling multiple objects convert all objects into induced graphs during pre-processing. This approach results in the number of induced graphs to be processed equaling the sum of nodes, edges, and graphs, potentially reaching tens of millions in some datasets. Each induced graph contains numerous nodes, often appearing repeatedly across different induced graphs, leading to significant computational redundancy.

In contrast, UniGAD employs a more efficient approach. It first obtains the representation of each node through a pre-trained GNN encoder. Then, using a sampler and pooling architecture, it derives the embeddings for other levels of objects (edges/graphs). The framework optimizes the process by eliminating the need to generate and process numerous induced subgraphs, instead leveraging the original graph structure and minimizing repetitive calculations.

Q2: Minor Corrections

Thank you for bringing this to our attention. The caption of Tables 8 and 9 should be F1-macro. We will thoroughly examine the manuscript for any other minor inconsistencies.

评论

I have no further questions.

审稿意见
7

The paper introduces UniGAD, a unified framework for multi-level graph anomaly detection, capable of identifying anomalies at the node, edge, and graph levels. Key contributions include the Maximum Rayleigh Quotient Subgraph Sampler (MRQSampler), which optimizes subgraphs to maximize significant anomaly information, and the GraphStitch Network, which facilitates information sharing across different levels while maintaining effectiveness. Experiments on 13 datasets show that UniGAD outperforms existing methods and demonstrates strong zero-shot transferability. Overall, UniGAD leverages spectral properties and multi-task learning to achieve state-of-the-art performance in graph anomaly detection.

优点

1.Innovative Unified Framework: UniGAD is the first framework to jointly detect anomalies at the node, edge, and graph levels, addressing a significant gap in current GAD research. The integration of multiple anomaly detection tasks into a single model enhances its versatility and applicability across various scenarios.

2.Advanced Methodologies: The Maximum Rayleigh Quotient Subgraph Sampler (MRQSampler) and the GraphStitch Network are key innovations. MRQSampler maximizes spectral energy to preserve critical anomaly information, while the GraphStitch Network facilitates information sharing across different levels, harmonizing conflicting training goals and maintaining the effectiveness of individual tasks.

3.Robust Performance and Transferability: UniGAD is comprehensively evaluated on 13 diverse datasets, consistently outperforming existing methods. It demonstrates robust zero-shot transferability, effectively transferring knowledge across different GAD tasks without prior exposure to specific anomalies, showcasing its robustness and generalizability.

4.Strong Theoretical and Practical Foundations: The approach is supported by solid theoretical foundations, with proofs provided for the optimal conditions of MRQSampler. Additionally, the dynamic programming algorithm ensures computational efficiency, making the method scalable to large datasets. The availability of the implementation code promotes transparency and reproducibility of the results, facilitating further research and application.

缺点

1.The paper modifies existing methods like All-in-One and GraphPrompt into multi-task versions for comparison. However, All-in-One already supports node, edge, and graph tasks, and GraphPrompt supports node and graph tasks. A more straightforward approach would have been to directly apply these methods to the relevant anomaly detection tasks without modifications to see their effectiveness.

2.The paper mentions that All-in-One and GraphPrompt often run out of time (OOT), but it is unclear whether this refers to the prompt tuning stage or the pre-training stage. Since both methods are known for their relatively fast prompt tuning, clarifying this distinction is crucial for understanding their performance limitations.

3.The paper lacks comparisons of temporal and spatial performance metrics. Including an analysis of the time and space efficiency of UniGAD compared to other methods would provide a more comprehensive evaluation of its practicality for large-scale and real-time applications.

问题

Is it possible to provide All-in-One and GraphPrompt code for multi-task versions?

局限性

None

作者回复

Thank you for your valuable comments to our work! Below are our responses.

Q1: Modification of All-in-One and GraphPrompt

We would like to clarify that we did not alter the core methodologies of GraphPrompt and All-in-One. Our modifications were limited to the data preprocessing component to accommodate the simultaneous handling of multiple object types (node/edge or node/graph) within induced graphs. The original implementations of these methods support processing only one type of graph object at a time, either by handling one object type exclusively or by processing different types sequentially (i.e., one type for pre-training and another type for prompt tuning). They do not inherently support scenarios where multiple object types are input/output simultaneously.

Code Access: We have included the modified versions of All-in-One and GraphPrompt in our source code, which is accessible through the link provided in our manuscript for further reference and verification.

Q2: OOT issues for All-in-One and GraphPrompt

The time limitation mentioned refers specifically to the preprocess stage and prompt tuning stage. During the pre-training stage, both these methods and UniGAD use GraphMAE, and the time consumption is similar and short. The reasons for significant time consumption of All-in-One and GraphPrompt are:

  • Large amount of induced graphs in pre-processing: The number of induced graphs becomes the sum of node and edge numbers rather than just the number of graphs. This results in tens of millions of induced k-hop subgraphs in some datasets, causing substantial pre-processing time.
  • Large amount of training samples in prompt tuning: Although both methods are known for their fast prompt tuning, it only held for few-shot settings (e.g., 5 or 10 samples for tuning). However, we use the fully supervised setting and have much more training samples, which significantly increased the data required for tuning.
  • Redundant computation: Induced graphs between neighboring nodes inevitably contain duplicated nodes. However, the prompt-based approach treats these duplicates as distinct nodes to be computed separately in different graphs, significantly increasing the computational load. In contrast, UniGAD employs a more efficient approach. It first obtains the representation of each node through a pre-trained GNN encoder. Then, using a sampler and pooling architecture, it derives the embeddings for other levels of objects (edges/graphs). The framework optimizes the process by eliminating the need to generate and process numerous induced subgraphs, instead leveraging the original graph structure and minimizing repetitive calculations.

To address the OOT issue, we employ faster GPUs, extended time limits to 2 days, and optimized the pre-processing code. The updated results are as follows:

GraphPrompt(AUROC/AUPRC/F1-macro)All-in-OneUniGAD(Ours)
Amazon(Node)50.01/6.62/40.9356.11/1.02/48.6797.84/87.29/91.33
Amazon(Edge)50.96/2.64/35.9554.8/3.13/2.4592.18/42.01/73.59
Yelp(Node)49.83/12.41/40.9049.77/46.10/14.4386.23/61.00/74.57
Yelp(Edge)49.56/13.63/42.9449.13/13.49/46.2979.05/40.90/66.66
MNIST0(Node)81.16/82.89/80.66OOT99.99/99.99/99.99
MNIST0(Graph)83.88/36.25/52.39OOT99.61/97.92/95.54
T-Group(Node)47.40/1.06/50.77OOT96.19/31.31/68.69
T-Group(Graph)50.81/2.36/49.78OOT88.78/55.64/78.09

We found that GraphPrompt can complete all datasets in our additional experiments, while All-in-One still fails to finish on the MNIST0/1 and T-Group datasets. This difference in performance can be attributed to their underlying mechanisms. Based on the All-in-One source code, the model must learn the token structure represented by pairwise relationships among tokens. It calculates the dot product between prompt tokens and input graph nodes to determine link establishment, which is computationally intensive. In contrast, GraphPrompt employs a simpler and faster approach, incorporating a learnable vector integrated into graph pooling through element-wise multiplication.

Q3: Time and space efficiency of UniGAD

Thank you for your suggestion. We have incorporated both time and space efficiency metrics into our evaluation, using the large-scale, real-world T-Group dataset (37,402 graphs, 93,367,082 edges, and 11,015,616 nodes). We used the same batch size for all models to ensure a fair comparison.

To provide a more straightforward comparison between single-task and multi-task baselines, we calculated the average, minimum, and maximum for combinations of single-task node-level and graph-level models, and compare these with multi-task models. The results, as shown in Figure 1(a) in the supplementary PDF, indicate that in terms of execution time, our method is slower than the combination of the fastest single-level models but faster than the average of the combination.

Regarding peak memory usage, Figure 1(b) demonstrates that graph-level models consume significantly more memory than node-level models. Our method maintains memory consumption comparable to node-level models and substantially lower than both graph-level GAD models and prompt-based methods.

评论

Thanks for the authors' reply. I will keep positive score.

评论

Thank you for your continued engagement and positive feedback on our work! We will carefully incorporate your comments into our manuscript.

审稿意见
6

This article presents an anomaly detection model, UniGAD, designed to be applicable across different levels including nodes, edges, and whole graphs. Leveraging the relationship between the Rayleigh quotient and anomaly degree, as described in Lemma 1, the authors have developed a novel subgraph sampling algorithm, MRQSampler. This algorithm recursively adds nodes that maximize the Rayleigh quotient, ensuring the subgraph contains the most anomalous information. Consequently, node-level and edge-level tasks are converted into graph-level tasks for unified processing. Additionally, the authors introduce the innovative GraphStitch Network, which jointly considers multi-level representations. Extensive experimental results substantiate the effectiveness of UniGAD.

优点

  1. The task of unifying different levels of anomaly detection on graphs is challenging and very novel.
  2. Complete theoretical proof, explaining the motivation of MRQSampler.
  3. Complete experiment proves the effectiveness of UniGAD

缺点

  1. The graph signal xx in all proofs is treated as a vector, which implies that the node feature is a scalar. However, in practice, graph node features are often vectors. This discrepancy is not clearly addressed by the authors.

  2. The MRQSampler appears to be a recursive algorithm rather than a dynamic programming (DP) algorithm, as suggested by Algorithm 1 in the appendix. This distinction impacts the efficiency of the proposed method.

  3. UniGAD is a supervised algorithm, yet the baseline comparisons in the experiment include unsupervised algorithms (OCGIN,OCGTL). This raises concerns about the fairness and validity of the experimental comparisons.

问题

  1. How efficient is MRQSampler? (execution time and memory requirements), can it be processed in parallel on Gpus
  2. Can UniGAD be unsupervised? I think this is more realistic (collecting abnormal samples during the training phase is difficult)

局限性

No limitations need to discuss

作者回复

W1: The discrepancy of scalar node feature in proofs and vector node feature in practice.

Thank you for highlighting this discrepancy. In theoretical derivations, we followed the established foundations of BWGNN and RQGNN, which consider single-dimensional features in their proofs. This simplification enhances mathematical tractability and facilitates clearer theoretical analysis. In fact, the primary focus of spectral graph theory and graph signal processing is on one-dimensional vectors.

However, we recognize that real-world scenarios typically involve multi-dimensional feature vectors. To address this, we developed two approaches:

  • Pre-processing: Normalize all feature dimensions and then take the norm (1-norm in our case) to obtain a composite feature for each node, allowing us to identify the most anomalous nodes based on this comprehensive feature.
  • Post-processing: Identify anomalous nodes in each feature dimension separately, and then take the union or combination of these node sets across all dimensions.

We implemented both methods in our code, and our early experiments showed similar results between the two approaches. For efficiency, we used the pre-processing approach in UniGAD. This method ensures that the computational complexity does not increase with the number of feature dimensions, making it more scalable for high-dimensional data. We will clarify this point in the manuscript to provide a more comprehensive understanding of the rationale behind our design choices.

W2. MRQSampler: DP or Recursive

The MRQSampler algorithm utilizes principles of dynamic programming (DP), which may not be immediately apparent and thus requires clarification. The key distinction between DP and recursive or divide-and-conquer approaches is the use of memoization, i.e., storing the results of subproblems to avoid redundant computations in future calculations.

In the MRQSampler, as described in Algorithm 1 (lines 18-29), the process incorporates a bottom-up calculation where the maximum Δ\Delta values of sub-trees are computed starting from the leaves up to the root. It is important to note that sub-trees not selected in initial iterations at lower levels might still be reconsidered in subsequent iterations at higher levels. Therefore, the maximum Δ\Delta for these sub-trees, referred to as 'inferior candidates' in the algorithm, could be computed and stored during their first evaluation. These stored values are then reused in later computations, which is a hallmark of DP.

To better illustrate this process, we restructure Algorithm 1 into the following two stages:

  • Stage 1: Compute and store the maximum Δ\Delta for each sub-tree recursively, starting from the leaf nodes and moving upwards to the root. This stage has a computational complexity of O(NlogN)\mathcal{O}(N \log N).
  • Stage 2: Initiate from the root node and iteratively select the sub-tree with the highest Δ\Delta from the available candidates, incorporating it into the resultant sub-graph until the Δ\Delta of the selected sub-tree exceeds the current RQ. As new sub-trees are added to the resultant set, their child sub-trees are also added to the pool of candidates. The complexity of this stage is O(N)\mathcal{O}(N). Without memoization, this stage would necessitate recalculations of the maximum Δ\Delta for these sub-trees.

The implementation of memoization in the DP paradigm to store and reuse the results of sub-tasks effectively reduces redundant computations and optimizes the overall complexity of the algorithm to O(NlogN)\mathcal{O}(N \log N). We will make the necessary revisions to the manuscript to better explain this algorithm.

Q1. Time and memory efficiency for MRQSampler

Following your suggestion, we have conducted a comprehensive evaluation of both time and space efficiency on the large-scale, real-world T-Group dataset. In Figure 1(a), we have specifically highlighted the time consumption of the MRQSampler module separately from other parts of UniGAD, about 37% of the total time. The MRQSampler offers a key efficiency advantage: it requires only once computation to generate and record subgraphs, which can be reused across multiple trials without recalculation when tuning parameters. Besides, we find that k-hop subgraph sampling also takes a lot of time in the prompt-based method.

Moreover, the subgraph sampling process for different nodes can be parallelized, as each node's sampling is independent. This parallelization potential further enhances the scalability of our approach, particularly for large-scale graphs. We are actively working on optimizing MRQSampler for GPU acceleration but it presents some challenges. Currently, we offer a CPU-based parallel version of the code.

W3/Q2: Can UniGAD be Unsupervised?

OCGIN and OCGTL are widely used unsupervised graph-level GAD baselines, but some supervised methods also compared with them (e.g., GmapAD and RQGNN). We agree that such comparison is not rigor and will highlight this in our analysis.

UniGAD is also designed with label scarcity in mind. It focuses on scenarios with missing labels at different levels, exploring how labels from one level can compensate for another. While UniGAD requires labels, it exhibits zero-shot transferability---applying learned knowledge to unseen scenarios without requiring labels in a secific type.

Moreover, the MRQSampler can be considered as an independent module in our framework. It leverages the correlation between spectral domain Rayleigh quotients and anomaly degrees to identify the most anomalous subgraphs. Essentially, it functions as a graph algorithm that can theoretically identify the most anomalous node-centered subgraphs in an unsupervised manner. This characteristic makes the MRQSampler adaptable to other unsupervised methods beyond the scope of UniGAD. We view this as an area of independent interest and anticipate its potential applications in unsupervised GAD methods.

评论

The author's response basically convinced me and I will keep my postive score

评论

Thank you for your insightful feedback and for recognizing our contributions. We greatly appreciate your engagement throughout the review process. We'll carefully incorporate your comments into our manuscript.

审稿意见
5

This paper presents a novel framework for detecting anomalies at node, edge, and graph levels within graph-structured data. The authors introduce the Maximum Rayleigh Quotient Subgraph Sampler (MRQSampler) to transform multi-level tasks into graph-level tasks by sampling subgraphs with high spectral energy, thus preserving significant anomaly information. Additionally, the GraphStitch Network integrates information across different levels and balances multi-level training objectives. Experimental results demonstrate the promising results on different GAD tasks.

优点

  1. The paper studies an interesting and underexplored problem that unifies multiple-level GAD tasks in a single framework. The proposed unification strategy is interesting. The paper is overall nicely presented and easy to follow. A comprehensive set of baseline methods are considered in the evaluation.

缺点

  1. AUPR is a popular complementary metric to AUROC, commonly used by the majority of recent GAD papers. It is important for readers to understand the performance of the model with a focus on the anomaly class. Considering that the improvement on some datasets is marginal, discussing the model's performance under AUPR would be beneficial.

  2. It seems that on the edge prediction task for some datasets, the baselines achieve the top performance. I am wondering how this method would perform on datasets with a very high degree, like the T-Finance dataset. In addition, it would be interesting to see how this method performs on large-scale datasets to understand its robustness better.

  3. Some of the experiments are marked as OOM) and OOT, especially for those prompt-based methods on larger graphs. While I understand these methods can be resource-intensive, a limit of 24GB max GPU RAM or 1 day wall time might be insufficient for adequately evaluating such methods.

问题

Please refer to weakness.

局限性

N/A

作者回复

We greatly appreciate the reviewer’s thorough and constructive feedback on our paper.

Q1: AUPRC as a complementary metric

We acknowledge the importance of AUPRC as a complementary metric to AUROC and F1-macro, especially for anomaly detection with imbalanced labels. In light of your suggestion, we have now included AUPRC results in our evaluation. We show the all results in the supplementary PDF Table 1-4.

Based on the results, we observe that UniGAD's performance under the AUPRC metric aligns closely with its AUROC and Macro-F1 scores. UniGAD achieves state-of-the-art performance across nearly all scenarios.

Q2: Results on T-Finance, large-scale datasets, and edge prediction

We appreciate the reviewer's interest in high-degree and large-scale datasets. We've conducted additional experiments on the high-degree T-Finance dataset and highlighted our results on another large-scale dataset T-Group.

AUROCF1-macroAUPRC
ModelNodeEdgeNodeEdgeNodeEdge
GCN/GCNE96.0387.6380.5779.0784.9462.12
GIN/GINE90.7079.0570.6173.4070.8152.01
GraphSAGE/SAGEE86.4377.1476.8167.1261.7918.79
SGC/SGCE78.1683.0162.6368.7619.6233.47
GAT/GATE74.2183.9156.6965.7530.3554.13
BernNet/BernE90.6087.8069.1163.1654.7045.01
PNA/PNAE92.3786.1970.5257.4567.6543.70
AMNet/AME68.1792.2727.6970.8823.0768.14
BWGNN/BWE93.5869.0174.3164.5174.7330.22
UniGAD-GCN93.9393.7584.9284.0875.3069.90
UniGAD-BWGNN96.4994.3289.7584.9085.3474.37

The above results on T-Finance indicate that UniGAD outperforms all baseline methods in both node-level and edge-level GAD tasks. Regarding two prompt-based multi-task approaches, GraphPrompt and All-in-One, the preprocessing phase proved to be excessively time-consuming and consume substantial memory, failing to complete within a 2-day timeframe. The primary reason for this inefficiency is the need to generate a distinct induced graph for each edge, which dramatically increases computational demands and memory usage on the high-degree T-Finance dataset.

Large-scale Dataset Performance Analysis: We highlight our results on the T-Finance (39,357 nodes, 21,222,543 edges) and T-Group dataset (37,402 graphs, 11,015,616 nodes, 93,367,082 edges) as reported in the manuscript. Our method demonstrated superior performance in both node-level, edge-level and graph-level anomaly detection on this large-scale, real-world dataset. It surpasses two recent powerful GAD baselines: the node-level method BWGNN and the graph-level method RQGNN. This highlights the versatility and effectiveness of our method across different levels of objects, particularly in handling large-scale scenarios.

Edge Prediction Results Analysis: For edge prediction, UniGAD achieves the best performance on 4 out of 7 datasets for AUROC and 6 out of 7 datasets on F1-macro. While our method is designed for a multi-task setting, the performance on a single level might be slightly compromised to ensure the model performs well across all tasks.

Q3: OOT and OOM under limited resources

We acknowledge the reviewer's concern about the OOM and OOT issues for some baselines on large graphs. We agree that the limit of 24GB GPU RAM and 1-day running time might be insufficient for these resource-intensive methods. To address this, we borrowed A800 80G GPUs for additional experiments during the rebuttal period and extended the time limit to 2 days.

For reference, the datasets having OOM and OOT issues are:

DatasetGraph_numEdge_numNode_num
Amazon18,847,09611,944
Yelp17,739,91245,954
MNIST070,00041,334,3804,939,668
MNIST170,00041,334,3804,939,668
T-Group37,40293,367,08211,015,616

For prompt-based multi-task methods previously facing OOT issues, we employed faster GPUs, extended time limits, and optimized the pre-processing code. These improvements enabled GraphPrompt to complete experiments on all datasets except T-finance. However, All-in-One remained slower than GraphPrompt and failed to finish MNIST0/1 and T-Group within 2 days. This is because All-in-One requires learning token structures through pairwise relationships and calculating dot products between prompt tokens and input graph nodes, which is computationally intensive. In contrast, GraphPrompt simply incorporates a learnable vector into graph pooling via element-wise multiplication, enabling faster processing. The updated results are as follows:

GraphPrompt(AUROC/AUPRC/F1-macro)All-in-OneUniGAD(Ours)
Amazon(Node)50.01/6.62/40.9356.11/1.02/48.6797.84/87.29/91.33
Amazon(Edge)50.96/2.64/35.9554.8/3.13/2.4592.18/42.01/73.59
Yelp(Node)49.83/12.41/40.9049.77/46.10/14.4386.23/61.00/74.57
Yelp(Edge)49.56/13.63/42.9449.13/13.49/46.2979.05/40.90/66.66
MNIST0(Node)81.16/82.89/80.66OOT99.99/99.99/99.99
MNIST0(Graph)83.88/36.25/52.39OOT99.61/97.92/95.54
T-Group(Node)47.40/1.06/50.77OOT96.19/31.31/68.69
T-Group(Graph)50.81/2.36/49.78OOT88.78/55.64/78.09

Two graph-level anomaly detection methods, iGAD and GmapAD, initially encountered OOM issues. Using 80GB of GPU RAM, iGAD successfully ran on the MNIST0 and MNIST1 datasets. However, T-Group exceeded memory limits due to the large number of nodes per graph. We switched the processing to CPU, which was completed in 2 days. Conversely, GmapAD could operate within the 80GB memory limit on the A800 GPU but still timed out even with a 2-day limit. We discovered that the final SVM predictor in GmapAD becomes significantly slower with a large number of training samples.

iGAD(AUROC/AUPRC/F1-macro)UniGAD(Ours)
MNIST098.93/94.79/87.7399.61/97.92/95.54
MNIST199.50/97.98/95.0499.98/98.60/97.60
T-Group64.44/5.92/46.5188.78/55.64/78.09
评论

Thank you to the authors for addressing my concerns. I will maintain my current rating for now. I look forward to discussions with the other reviewers and the area chair, and I am open to adjusting my rating if necessary after the discussion.

评论

Thank you for your continued engagement and thoughtful consideration of our work. We'll carefully incorporate your comments into our manuscript. We’re glad that our responses have addressed the concerns you raised. We’ll be available until the end of the rebuttal if you have any follow-up questions or further points of discussion.

作者回复

Dear Reviewers,

We are deeply grateful for your constructive and insightful feedback. We sincerely appreciate your recognition of our contributions to the field of graph anomaly detection. The reviewers have highlighted several key strengths of our work, including UniGAD's unique capability to unify node, edge, and graph-level tasks, the advanced methodologies introduced through MRQSampler and GraphStitch Network, robust empirical performance with strong zero-shot transferability, solid theoretical foundations, and clear presentation. These aspects collectively address a significant gap and promote multi-task learning in current GAD research. We have carefully considered each comment and provided detailed, point-by-point responses to address all the feedback received, further strengthening our manuscript.

The additional experiments carried out during the rebuttal period are summarized as follows:

  • AUPRC: All results related to the additional AUPRC metric are presented in supplementary PDF Tables 1-4 and discussed in Q1 of the NHz6 rebuttal.
  • High-degree dataset T-Finance: The new results for high-degree T-finance dataset are included in Q2 of the NHz6 rebuttal.
  • Time and space evaluation: Experiments assessing time and space efficiency on the extensive T-Group dataset are depicted in Figure 1 of the supplementary PDF and are discussed in Q1 of ji27 and Q3 of ymWJ rebuttal.
  • OOM and OOT Issues: The results under the extended time and space constraints to address out-of-memory and out-of-time issues are detailed in Q3 of NHz6 and Q2 of ymWJ.

We attached a one-page PDF summarizing the additional experimental results. For detailed discussion, please refer to our reviewer-specific feedback.

最终决定

All reviewers are positive about this paper. I agree with them that this paper has proposed an effective graph anomaly detection framework to jointly address anomalies at the node, edge, and graph levels. This research direction is important and still at its early stage. Overall, this paper presents a solid work and is potentially useful to the community. I am confident that this paper should be accepted.