PaperHub
5.5
/10
Poster4 位审稿人
最低2最高4标准差0.7
2
3
4
3
ICML 2025

Towards a Unified Framework of Clustering-based Anomaly Detection

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24
TL;DR

A unified theoretical framework models the intrinsic connections among representation learning, clustering, and anomaly detection.

摘要

关键词
Anomaly DetectionClustering

评审与讨论

审稿意见
2

This paper introduces UniCAD, a novel method for anomaly detection based on clustering and representation learning. The method leverages an anomaly indicator function and a mixture of student’s t-distributions to learn an anomaly score based on the mixture distribution density. Inspired by Newton’s law of universal gravitation, the authors further refine the anomaly score by adding positional information about a sample representation. In experiments, they demonstrate the efficiency of their method by comparing their approach against various baselines on 30 different datasets.

给作者的问题

  • Could the authors provide a better intuition or motivation for adding directional information to the anomaly score? In which scenarios do we expect this to help? Are there examples where it could hurt to add this information? How does directional information influence normal samples, which are expected to lay inside a cluster?
  • Is there a particular reason why the authors learn representations using an autoencoder? Why would you expect this to work better than more recent self-supervised approaches, such as masking or contrastive approaches?
  • How did authors tune hyperparameters?
  • Is there a clear intuition as to why student’s t-distribution mixture models work better than Gaussian mixture models?
  • Under which conditions does it make sense to view anomaly detection from a clustering perspective? Are there scenarios where tackling anomaly detection without clustering would be better? An experiment on datasets where we know about the existence of multiple modes would benefit this paper, as we could compare the chosen hyperparameter kk with the actual number of clusters in the datasets and evaluate its influence. E.g., the authors could take a multi-class dataset and define a set number of classes as normal whereas the rest is considered anomalous.

论据与证据

This paper contains many design choices that are not very clearly motivated.

  • It is unclear why the student’s t-distribution should work fundamentally better than a Gaussian mixture model. The authors should elaborate on this point.
  • How to learn a good representation is only briefly mentioned. The authors use an autoencoder-based approach. Why is this a good choice, and how does it compare to other self-supervised representation learning approaches?
  • Drawing a parallel to Newton’s law of universal gravitation seems far-fetched, and the authors seem to just randomly define terms to make their formula fit the definition of Newton’s gravity.
  • The explanation of why vector information fundamentally improves the anomaly score is vague and should be elaborated better. Figure 2 is hard to understand and is not convincing.

方法与评估标准

Approaching the anomaly detection problem from a clustering perspective makes sense, and the experimental setup is reasonable. However, experiments are limited to tabular datasets, and baselines do not include recent advances in the field.

理论论述

The paper does not contain any theoretical claims.

实验设计与分析

The experiments are based on ADBench and seem to be sound and valid.

补充材料

I did not review the supplementary materials.

与现有文献的关系

The paper explores anomaly detection from a clustering perspective. This is an interesting perspective, and there are not too many prior works that explore the problem in a similar fashion.

遗漏的重要参考文献

Since the authors test their methods on tabular data only, they should compare to more recent works on this problem. For instance, see [1], [2], and [3], which include more recent anomaly detection methods for tabular data.

[1] Qiu, Chen, et al. "Self-Supervised Anomaly Detection with Neural Transformations." IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).

[2] Li, Aodong, et al. "Anomaly detection of tabular data using llms." arXiv preprint arXiv:2406.16308 (2024).

[3] Qiu, Chen, et al. "Neural transformation learning for deep anomaly detection beyond images." International conference on machine learning. PMLR, 2021.

其他优缺点

Strengths

  • This paper tackles anomaly detection from a clustering perspective, an interesting and novel angle for the problem.
  • I like the ablations in the experiment that demonstrate the contributions of specific parts of their method.

Weaknesses

  • The method seems very overengineered, and many parts of the pipeline lack a clear motivation or intuition other than working better on the test set in the experiments.
  • Experiments are restricted to tabular data only but do not include recent advances in tabular anomaly detection.

其他意见或建议

  • Figure 2 is unintuitive and would benefit from a more extensive description in the caption.
  • In section 3.1.1, the authors use the abbreviation MM, which is not defined.
  • There is a broken reference at the beginning of section 4.3.
  • The impact statement is missing.
作者回复

It is unclear why the student’s t-distribution should work fundamentally better than a Gaussian mixture model.

The choice of the Student’s t-distribution over a Gaussian mixture model (GMM) is motivated by its heavy-tailed nature, which enhances robustness to outliers—a critical factor in anomaly detection. Unlike GMMs, where Gaussian tails decay rapidly and outliers can skew parameter estimates, the Student’s t-distribution mitigates this by assigning lower influence to extreme values. Our ablation study’s empirical evidence (Table 3) shows that performance significantly decreases when switching to a GMM.

The explanation of why vector information fundamentally improves the anomaly score is vague and should be elaborated better. ...

Thank you for your feedback. Due to space limitations, we provided a detailed explanation of the benefits of our scoring design in Appendix C and illustrated its differences from traditional clustering scores in Figure 4. We will emphasize these advantages more clearly in the next version to better highlight the rationale behind our approach.

Since the authors test their methods on tabular data only, they should compare to more recent works on this problem. ...

Thank you for introducing these excellent research works. Following your suggestion, we have properly cited these articles in our revised manuscript. Additionally, we have incorporated NeuTraLAD and DROCC as comparison baselines and reported the results in the following anonymous link: https://anonymous.4open.science/r/ICML2025_9292-4E31.

The method seems very overengineered, and many parts of the pipeline lack a clear motivation or intuition other than working better on the test set in the experiments.

Thank you for your input. We hope to clarify your misunderstanding regarding the design of our method. Far from lacking clear motivation, every module in our pipeline is purposefully designed, rooted in a unified optimization objective and theoretical framework. Our approach’s strength is validated through rigorous, fair comparisons across 30 diverse datasets spanning multiple domains, where it consistently achieves superior average performance, highlighting its robustness and broad applicability.

Other Comments Or Suggestions

Thank you for your detailed feedback. We will improve the clarity of Figure 2’s caption, define "MM" in Section 3.1.1, fix the broken reference in Section 4.3, and add an impact statement in the revision.

Could the authors provide a better intuition or motivation for adding directional information to the anomaly score? ...

Directional information enhances anomaly scoring by modeling samples as influenced by multiple cluster “forces,” identifying anomalies with weak or conflicting resultant forces. It may hurt in datasets with irregular or overlapping clusters, where directionality could misclassify points. For normal samples within clusters, since the "force" exerted by the center of their own cluster will be much greater than that of other clusters, the direction information of other cluster centers will be avoided from influencing the assessment of the abnormality degree of the samples.

Is there a particular reason why the authors learn representations using an autoencoder? ...

Thank you for your comment. Autoencoders are widely used as a foundational framework for anomaly detection due to their ability to learn low-dimensional representations via reconstruction loss. We appreciate your suggestion and agree that exploring alternative representation learning objectives holds potential for future work. However, the focus of our approach is on anomaly-aware maximum likelihood optimization, which better guides representation learning. Our ablation study (Table 3) demonstrates that performance significantly drops when only the autoencoder objective is used.

How did authors tune hyperparameters?

We used fixed hyperparameters (K=10K=10, l=1%l=1\%) across all datasets for a fair comparison with baselines, which used default settings. These were effective on average, but tuning could improve results, as shown in the sensitivity analysis in Section 4.4.

Is there a clear intuition as to why student’s t-distribution mixture models work better than Gaussian mixture models?

Yes, the t-distribution’s heavy tails (Equation 3) make it robust to outliers, unlike GMMs’ exponential tails. This enhances clustering and scoring in anomaly detection, as shown in Table 3 (AUC-ROC drops with GMM).

Under which conditions does it make sense to view anomaly detection from a clustering perspective? ...

Thank you for your question. A clustering perspective is advantageous for anomaly detection when the data has natural groupings or multiple modes, allowing anomalies to be identified as deviations from these structures. However, in cases where clusters are absent or irrelevant to anomalies, density-based or distance-based methods may be more effective.

审稿人评论

I thank the authors for addressing my comments. I am still not entirely convinced by the proposed method, specifically concerning the usage of a mixture of students' t-distributions and limiting themselves to autoencoders. However, the motivation for incorporating directional information appears more reasonable now. I thus raised my score to a 2.

作者评论

Thank you for your constructive feedback. We’re glad that our rebuttal addressed your concerns regarding the directional information, and we appreciate the score increase to 2. Regarding the mixture of Student’s t-distributions and the use of autoencoders, although we have empirically validated their effectiveness through the ablation experiments in Table 3, we will seriously consider your valuable comments to further clarify this aspect in the writing. While the final score did not turn positive, we still sincerely appreciate the valuable comments you provided.

审稿意见
3

The paper presents an unsupervised anomaly detection framework with two key components: (i) a framework for joint learning of representation, clustering, and mixture models; (ii) a gravity-inspired anomaly scoring function based on mixture model outputs. Experiments on 30 datasets show it outperforms 17 baselines.

update after rebuttal

From the response, I acknowledge that the gravity-inspired scoring function does have a certain degree of independent contribution to improving model performance. However, regarding the two key issues I raised—the concrete real-world application scenarios and the applicable scope of the scoring function — the authors have only promised to address them in future revisions, without providing specific experiments or supporting data in the current manuscript. Therefore, based on the present version, I will maintain my original score of 3.

给作者的问题

  1. On the experimental specifics of the gravity - inspired scoring function: The paper states this function can more comprehensively measure sample - to - cluster relationships but does not clearly define its scope of application. What types of anomalies does this method excel at detecting? Can the authors provide concrete examples of its improved performance on different anomalies?
  2. On verifying the independent contribution of the gravity-inspired scoring function: The paper claims this function significantly boosts overall performance, yet lacks in-depth validation of its individual contribution. Could ablation studies be conducted to analyze its impact? For instance, comparing model performance with and without this function under identical experimental conditions would clarify its specific effects.

论据与证据

In Section 3.2 of the paper, a novel gravity-inspired anomaly scoring method is introduced, which takes into account not only the distance from a sample to each cluster center but also the directional information. The authors assert that by synthesizing vectors, this method can more comprehensively measure the relationship between a sample and multiple cluster centers, thereby aiding in capturing complex relationships within clusters. However, the paper does not elaborate on the specific scenarios in real datasets that this method can address. Although this scoring method theoretically holds advantages, further clarification is needed regarding its applicable scope.

方法与评估标准

The paper's method jointly models representation learning, clustering, and anomaly detection, incorporating a gravity-inspired scoring function for parameter optimization. This approach theoretically enhances anomaly detection performance. The model was evaluated using AUC-ROC and AUC-PR metrics and tested on 30 datasets against 17 baselines, demonstrating strong generalization.

理论论述

The paper's theoretical claims are sound. I reviewed the construction of the gravity-inspired anomaly scoring formula in section 3.2 and the detailed EM algorithm derivation in Appendix B, and found no issues.

实验设计与分析

The experimental design is generally sound. The authors conducted extensive experiments across 30 diverse datasets and compared them with 17 baseline methods, effectively validating the framework's superiority. However, the independent evaluation of the gravity-inspired scoring function in section 3.2 is inadequate. Its specific impact on overall performance hasn't been separately verified, making it hard to gauge the method's actual contribution. It is recommended to supplement with relevant ablation experiments for a more precise assessment of its effectiveness.

补充材料

I reviewed the supplementary material. In the methods section, I examined the detailed derivation of the EM algorithm and the discussion on group anomalies, which were rigorous and clear, enhancing my understanding of the methodology. For experimental details, I checked the specific usage of datasets, hyperparameter settings during training, and statistical analyses of different anomaly detectors' performance. Overall, the supplementary material offers rich background and experimental details, aiding in a comprehensive understanding of the paper's methods and results.

与现有文献的关系

Previously, anomaly detection and clustering were separate research paradigms. This paper presents a novel joint modeling framework for unsupervised anomaly detection, integrating representation learning, clustering, and anomaly detection to significantly enhance current methods. Based on a mixture model, this framework optimizes parameters by maximizing anomaly-aware data likelihood, reducing the impact of anomalous data on model training and improving detection performance.

遗漏的重要参考文献

No.

其他优缺点

The gravity-inspired anomaly scoring method in the paper is notably innovative, offering a fresh perspective and solution for anomaly detection. This interdisciplinary approach provides new ways to tackle anomaly detection in complex data distributions and high-dimensional data.

其他意见或建议

No more.

伦理审查问题

No

作者回复

However, the paper does not elaborate on the specific scenarios in real datasets that this method can address. Although this scoring method theoretically holds advantages, further clarification is needed regarding its applicable scope.

We thank the reviewer for highlighting the need for more detail on specific application scenarios. We acknowledge that the original manuscript lacked sufficient elaboration on real-world use cases. To address this, we will revise the paper by adding a new subsection that discusses the method’s applications in diverse anomaly detection tasks. Specifically, we will include examples such as financial fraud detection (e.g., identifying irregular transaction patterns), cybersecurity intrusion detection (e.g., detecting anomalous network traffic), and medical diagnostics (e.g., flagging unusual patient records). These examples will be supported by experimental results to demonstrate the method’s performance advantages in these domains.

On the experimental specifics of the gravity-inspired scoring function: The paper states this function can more comprehensively measure sample-to-cluster relationships but does not clearly define its scope of application. What types of anomalies does this method excel at detecting? Can the authors provide concrete examples of its improved performance on different anomalies?

We appreciate the reviewer’s request for clarification on the gravity-inspired scoring function’s scope. This method excels at detecting anomalies in datasets with complex, multi-cluster structures, particularly where anomalies lie at cluster boundaries or exhibit ambiguous cluster affiliations. We will add synthetic and real-data experiments showcasing improved detection of these anomaly types compared to traditional methods.

On verifying the independent contribution of the gravity-inspired scoring function: The paper claims this function significantly boosts overall performance, yet lacks in-depth validation of its individual contribution. Could ablation studies be conducted to analyze its impact? For instance, comparing model performance with and without this function under identical experimental conditions would clarify its specific effects.

Thank you for your valuable feedback. We would like to clarify that in Table 1, we have already compared the two anomaly scoring methods and treated them as two separate variants of the model. We apologize for not clearly indicating this distinction in the ablation section. We will revise the paper to highlight this and improve the explanation, ensuring that the independent contribution of the gravity-inspired scoring function is more clearly addressed.

审稿人评论

I thank the authors for addressing my comments. From the response, I acknowledge that the gravity-inspired scoring function does have a certain degree of independent contribution to improving model performance. However, regarding the two key issues I raised—the concrete real-world application scenarios and the applicable scope of the scoring function — the authors have only promised to address them in future revisions, without providing specific experiments or supporting data in the current manuscript. Therefore, based on the present version, I will maintain my original score of 3.

审稿意见
4

The paper presents UniCAD, a novel unsupervised anomaly detection framework that integrates representation learning, clustering, and anomaly detection into a unified theoretical framework. By maximizing an anomaly-aware data likelihood based on a mixture model with the Student-t distribution, UniCAD effectively mitigates the impact of anomalies on representation learning and clustering. The framework derives a theoretically grounded anomaly score inspired by universal gravitation, which considers the complex relationships between samples and multiple clusters. Extensive experiments on 30 datasets across various domains demonstrate the effectiveness and generalization capability of UniCAD, outperforming 17 baseline methods and establishing it as a state-of-the-art solution for unsupervised anomaly detection.

update after rebuttal

The authors have responded to most of the concerns in a clear and constructive manner. In particular, their clarifications on the theoretical differences between UniCAD and traditional GMM-EM models are well explained. Their justification for the vector-based anomaly scoring method is reasonable, and they have acknowledged the limitations of the gravitational analogy, with a plan to revise it accordingly.  However, I remain concerned about the limited addition of recent baseline methods. Although the authors incorporated NeuTraLAD (2024) and DROCC (2020), there is still a lack of advanced baseline methods from recent years, which weakens the claim of state-of-the-art performance. I encourage the authors to further improve the comprehensiveness of their baseline selection in the final version.  Overall, while the core idea and contributions are interesting and mostly well-supported, the incomplete experimental comparison affects the overall strength of the work. I maintain my final rating as “4: Accept”.

给作者的问题

  1. Please explain the difference between the proposed method and the typical learning of Gaussian Mixture Models (GMMs) with EM?

  2. Why is Anomaly Scoring with Vector Sum necessarily beneficial? Although the authors provide an explanation in Section 3.2.3, they do not discuss scenarios such as the following: If a sample point belongs to a normal cluster that happens to be centrally located in the feature distribution, could it inadvertently receive a high anomaly score?

  3. It is recommended to carefully reconsider the use of gravitation as an introductory concept in Section 3.2. In Section 3.2.1, the analogy between the equation of universal gravitation and the anomaly scoring formula appears somewhat forced, as their similarity is essentially limited to both being expressed as fractions. In Section 3.2.2, the connection is further reduced to merely involving vector summation. While the gravitational analogy provides an intuitive perspective, its relevance to the proposed method is not sufficiently substantiated.

论据与证据

Yes. The claims are supported by both theoretical derivation and experimental results.

方法与评估标准

Yes, the evaluation criteria are reasonable to me.

理论论述

Yes, I have checked the theoretical claims. Most are correct and I suggest the authors make the Equ 5 more clear where ‘the l lowest’ should be more accurately described.

实验设计与分析

Yes, the experiments are extensive where over 30 datasets are evaluated, and the results are objective.

补充材料

I have reviewed all the parts.

与现有文献的关系

I think both clustering and anomaly detection are fundamental task in scientific discovery. Although the clustering and anomaly detection have been widely discussed together, including the representation in the unified frameworks looks interesting and reasonable. But I suggest the authors provide broader type of datasets (such as images) for scientific usage.

遗漏的重要参考文献

I do not have such suggestions.

其他优缺点

Strengths:

  1. The paper proposes a unified framework, UniCAD, which integrates representation learning, clustering, and anomaly detection by maximizing an anomaly-aware data likelihood. This unified approach helps to better understand the interdependencies among these tasks and allows them to mutually enhance each other, leading to improved anomaly detection performance.

  2. The paper not only introduces an anomaly score based on generation probability but also further proposes a vector summation-based anomaly scoring method inspired by the theory of universal gravitation. This method effectively captures the complex relationships between samples and multiple clusters, enabling more accurate anomaly detection.

  3. The paper conducts extensive experiments on 30 datasets from diverse domains, demonstrating the effectiveness and generalization capability of UniCAD. The results show that UniCAD outperforms 17 baseline methods on most datasets, establishing it as a state-of-the-art solution for unsupervised anomaly detection.

Weaknesses:

  1. The paper lacks comparisons with the latest baseline methods. Among the 17 baseline methods included, only 3 are from after 2020, and there are no methods from 2024.

  2. Although the paper proposes an efficient iterative optimization strategy, the computational complexity of UniCAD remains high, especially for large-scale datasets. The iterative nature of the EM algorithm and the processing of high-dimensional data may lead to long training times, limiting its applicability in real-time scenarios.

  3. Two minor issues. In 4.3, "with additional experiments on other data domains presented in Appendix ??" In 3. Methodology, "Finally, we present an efficient iterative optimization strategy to optimize this model and provide a complexity analysis for the proposed model." However, I fail to find complexity analysis in the text.

其他意见或建议

I think the symbols should be correctly used. For example, the vectors are represented by both bold symbols and symbols with arrows. I think this may cause misunderstanding.

作者回复

The paper lacks comparisons with the latest baseline methods. Among the 17 baseline methods included, only 3 are from after 2020, and there are no methods from 2024.

Thank you for noting the need for updated baselines. Given the rapid advancements in unsupervised anomaly detection, we have added recent methods, including NeuTraLAD (2024) and DROCC (2020), to our revised experiments. These methods are evaluated on the same datasets and metrics as UniCAD to ensure a fair and comprehensive comparison. The results are available at the following anonymous link: https://anonymous.4open.science/r/ICML2025_9292-4E31

Although the paper proposes an efficient iterative optimization strategy, the computational complexity of UniCAD remains high, especially for large-scale datasets. The iterative nature of the EM algorithm and the processing of high-dimensional data may lead to long training times, limiting its applicability in real-time scenarios.

We appreciate this observation. The computational complexity of UniCAD, detailed in Section 3.4 as O(tN(logN+Td(D+K)))\mathcal{O}(tN(\log N + Td(D + K))), is manageable with reasonable settings for iteration count tt and cluster number KK. Table 2 demonstrates that UniCAD’s training and inference times are competitive with deep learning methods like DAGMM and DCOD. To address concerns about large-scale and real-time scenarios, we will provide additional runtime results on larger datasets and explore optimization techniques (e.g., parallelization or approximation) to further reduce training time, enhancing its practical utility.

Two minor issues. In 4.3, 'with additional experiments on other data domains presented in Appendix ??' In 3. Methodology, 'Finally, we present an efficient iterative optimization strategy to optimize this model and provide a complexity analysis for the proposed model.' However, I fail to find complexity analysis in the text.

Thank you for pointing out these issues. For the appendix reference in Section 4.3, we will ensure the additional experiments are fully detailed in the appendix and correct the placeholder “??”. Regarding the complexity analysis, it is provided in Appendix D.4; we will revise Section 3 to explicitly reference this section for clarity.

Please explain the difference between the proposed method and the typical learning of Gaussian Mixture Models (GMMs) with EM?

UniCAD distinguishes itself from traditional GMM-EM through several key innovations. Unlike GMM-EM which operates directly on raw features, UniCAD incorporates deep representation learning via neural networks to project data into a lower-dimensional space. It introduces an anomaly indicator function δ(xi)δ(x_i) to dynamically filter out outliers during optimization, overcoming GMM-EM's assumption of normal-only samples. For scoring, UniCAD replaces conventional likelihood probabilities with a novel gravity-inspired vector-sum approach that better captures complex sample-cluster relationships. Additionally, it adopts Student's t-distribution mixture models instead of Gaussian distributions for improved outlier robustness.

Why is Anomaly Scoring with Vector Sum necessarily beneficial? Although the authors provide an explanation in Section 3.2.3, they do not discuss scenarios such as the following: If a sample point belongs to a normal cluster that happens to be centrally located in the feature distribution, could it inadvertently receive a high anomaly score?

The vector-sum anomaly scoring is beneficial because it comprehensively assesses a sample’s relationship with all clusters via directional “gravitational” forces, unlike scalar methods that consider only magnitude. For a sample in a normal cluster’s center, its strong “force” from that cluster dominates, yielding a large resultant force and thus a low anomaly score. In contrast, anomalies, weakly influenced by all clusters or pulled between them, have a smaller resultant force, increasing their score.

It is recommended to carefully reconsider the use of gravitation as an introductory concept in Section 3.2. In Section 3.2.1, the analogy between the equation of universal gravitation and the anomaly scoring formula appears somewhat forced, as their similarity is essentially limited to both being expressed as fractions. In Section 3.2.2, the connection is further reduced to merely involving vector summation. While the gravitational analogy provides an intuitive perspective, its relevance to the proposed method is not sufficiently substantiated.

We appreciate this critique and will revise Section 3.2 to de-emphasize the gravitational analogy as a direct equivalence. Instead, we will frame it as an intuitive inspiration, focusing on how vector-sum scoring models sample-cluster interactions as “forces” to capture complex relationships. We will provide detailed explanations and visualizations to substantiate its relevance, ensuring the method’s merit stands independent of the analogy.

审稿意见
3

The paper proposes UniCAD, a novel model for Unsupervised Anomaly Detection (UAD) that unifies representation learning, clustering, and anomaly detection within a single theoretical framework. By leveraging a mixture model with the Student-t distribution, UniCAD introduces an anomaly-aware data likelihood that enhances the joint optimization process, reducing the influence of anomalies on both representation learning and clustering. Additionally, the model formulates a theoretically grounded anomaly score inspired by universal gravitation, which captures complex relationships between data points and multiple clusters. Extensive experiments across 30 datasets from various domains demonstrate UniCAD’s superior performance and generalization ability, surpassing 15 baseline methods and establishing it as a state-of-the-art approach in unsupervised anomaly detection.

给作者的问题

see comments above

论据与证据

The experiments overall support the claims. There are issues with parameter settings though

方法与评估标准

Yes

理论论述

Skimmed over the math derivations to check correctness

实验设计与分析

There are issues with the experimental settings

补充材料

N/A

与现有文献的关系

Good coverage

遗漏的重要参考文献

N/A

其他优缺点

Strengths:

  • Unified framework for anomaly detection that integrations representation learning and clustering
  • New anomaly score measure based on physics-inspired laws
  • Multiple datasets and methods used for experimentally demonstrating the potential of the idea

Weaknesses:

  • Experimental settings are not fair

Default parameters for baseline methods were used, which raises concerns. All methods have to re-run and be re-optimized for the datasets, similar to what you do for your solution

  • The solution seems novel but several of its components are based on existing ideas.

Clarifying novelty vs. only citing earlier work can help strengthen the contribution

  • Substantial runtime overhead

其他意见或建议

N/A

作者回复

Experimental settings are not fair Default parameters for baseline methods were used, which raises concerns. All methods have to re-run and be re-optimized for the datasets, similar to what you do for your solution

Thank you for highlighting the concern regarding the fairness of our experimental settings. In the paper, we also used the default parameters for our method, in line with the experimental setup used for the baseline methods. Specifically, we followed the ADBench benchmark [Han et al., 2022] to ensure a fair and consistent comparison. We acknowledge that tuning parameters could potentially improve performance, but as this is an unsupervised task, it is difficult to rely on techniques such as cross-validation for hyperparameter optimization. Therefore, we opted to maintain the same default settings across all methods to ensure fairness in the comparison.

The solution seems novel but several of its components are based on existing ideas. Clarifying novelty vs. only citing earlier work can help strengthen the contribution

We appreciate your feedback on the novelty of UniCAD. While components like mixture models and the Student’s t-distribution draw from existing ideas, the core contribution of UniCAD lies in its unified theoretical framework that integrates representation learning, clustering, and anomaly detection via anomaly-aware maximum likelihood estimation, alongside a novel gravity-inspired anomaly scoring mechanism. Specifically: (1) the joint optimization of these tasks with theoretical grounding, (2) the gravity-inspired vector-sum-based anomaly scoring capturing complex sample-cluster relationships, and (3) an efficient iterative optimization strategy are unique to our approach. To clarify this, we will revise the paper to explicitly distinguish between foundational concepts and our original contributions, supported by comparative experiments and theoretical analysis to underscore UniCAD’s distinctiveness.

Substantial runtime overhead

We acknowledge the concern about runtime overhead. In the paper (Table 2), we provided a runtime comparison showing that UniCAD is competitive with deep learning baselines like DAGMM and DCOD, despite the iterative EM algorithm. The efficient iterative strategy and parameter updates reduce computational overhead, with a complexity of O(tN(logN+Td(D+K)))\mathcal{O}(tN(\log N + Td(D + K))), making it feasible for large-scale datasets.

最终决定

This paper received one accept, two weak accepts, and one weak reject. All reviewers agree that the combined use of clustering and representation learning for anomaly detection is a valid and meaningful approach. Initial concerns were raised regarding the use of a t-distribution in place of a Gaussian, the adoption of an autoencoder architecture, comparisons with outdated baselines, and the overall fairness of the experimental evaluation. While the authors' rebuttal appears to have satisfactorily addressed most of these concerns, some reservations persist—particularly regarding the use of autoencoders and the fairness of the evaluation protocol. Nevertheless, given that the majority of reviewers lean toward acceptance, I recommend the acceptance of the paper.