Demeaned Sparse: Efficient Anomaly Detection by Residual Estimate
摘要
评审与讨论
This paper proposes a novel test for detecting anomalies in structural images using a discrete Fourier transform (DFT) under a factor model framework, which enables interpretable and effective reconstruction-based anomaly detection.
给作者的问题
See comments above.
论据与证据
To my understanding, the claims are articulated clearly enough to be comprehensible.
方法与评估标准
Yes, the proposed methods and evaluation criteria make sense for the anomaly detection problem at hand.
理论论述
Yes. Most of the theoretical proofs are logically sound. However, some aspects are controversial. On page 3, line 154, the question arises as to why the first statistic is weakly convergent and, after integration, becomes convergent in distribution.
实验设计与分析
Yes. Most of the experimental design and analysis are reasonable. The regularization coefficient alpha has not been discussed. Therefore, the results of incorporating additional values for alpha should be considered.
补充材料
No supplementary materials have been submitted.
与现有文献的关系
This paper adopts a factor model commonly used in time series analysis in economics. Building upon existing methods for detecting structural changes in time series data, this paper proposed a test which incorporates a multi-DFT dual cross-sectional dimension of positional information for image anomaly detection which also provides the relevant asymptotic theory.
遗漏的重要参考文献
No.
其他优缺点
Strength:
- This paper provides a relatively complete theoretical framework and derivations, particularly matching the properties of the asymptotic theory with the practical issues. The proofs are based on a well-established factor model framework, and the related theory is fairly comprehensive.
- The module proposed based on the test is simple and quick to implement, with validation provided regarding its resource consumption.
- The experimental results validate the effectiveness and consistency of the theory.
Weakness:
- As I mentioned above, the experimental design could be further improved.
- This paper requires more detailed explanations. For example, on page 3, in the section on constructing the complex-valued empirical process, it is not explained why the common factors and residuals are combined together. However, this is actually used in the later asymptotic theory section. Another issue is that the title of the paper mentions residual estimation, yet the theoretical section also incorporates common factors. The authors have many ideas to convey, but the details are not sufficiently thorough.
- The sparsification method proposed in the manuscript appears to offer several advantages; however, these benefits need to be supported by more detailed experimental analyses. Currently, the manuscript does not include experiments examining the effects of different sparsity levels on model performance. Adding such experiments would provide valuable insights into how varying degrees of sparsity influence performance outcomes. This additional analysis would significantly strengthen the manuscript and validate the proposed method's effectiveness.
- The description of the method in the paper is not very intuitive. It would be helpful to summarize the method into a structured algorithm, presented in pseudocode format.
其他意见或建议
Some minor problems:
- The notation in Equation 9 seems unusual since both loss terms are labeled as L_reg.
- There are similar typos in the paper. Please review and correct them accordingly.
- Whether the module constructed by this method potentially be applicable to a broader range of anomaly detection problems, including non-reconstructive anomaly detection methods?
Dear Reviewer 2ntV:
Firstly, we express our gratitude for your thorough review and insightful comments on our paper. Your recognition of the novelty and contribution of our work is greatly appreciated. We also greatly appreciate your rigorous and professional feedback. Your suggestions are highly meaningful, especially for a theory-driven paper like ours. We have taken your feedback as an opportunity to further refine our manuscript. We address your concerns below:
Q1:” On page 3, line 154, the question arises as to why the first statistic is weakly convergent and, after integration, becomes convergent in distribution”
Thank you for raising this question. Due to space limitations, we did not provide a detailed explanation. In our initial assumption, the Fourier transform is not performed over the , but rather over a finite interval. However, our integration is carried out over the entire real domain, which corresponds to the weak and strong convergence relationships in functional analysis. This setting is also valid for the weight function .
Q2:” The regularization coefficient alpha has not been discussed”
We have added experiments regarding the regularization coefficient and provide some of the results here:
| Ours | |
|---|---|
| 1e-2 | 95.50 / 97.10 / 89.27 |
| 1e-3 | 95.45 / 97.04 / 88.74 |
| 1e-4 | 96.19 / 96.58 / 89.01 |
| 1e-5 | 97.92 / 97.78 / 92.42 |
| 1e-6 | 98.69 / 98.32 / 93.24 |
| 1e-7 | 97.79 / 97.98 / 91.36 |
| 1e-8 | 97.03 / 96.97 / 89.32 |
W1:” on page 3, in the section on constructing the complex-valued empirical process, it is not explained why the common factors and residuals are combined together”and” this is actually used in the later asymptotic theory section. Another issue is that the title of the paper mentions residual estimation, yet the theoretical section also incorporates common factors”
The purpose of constructing the complex-valued empirical process by combining the common factors and the residuals is to unify the estimated quantities within the test statistics. Additionally, combining the common factors and the residuals helps prevent the residuals from summing to zero. In the process of estimating the residuals, the common factor is also involved, but differs across theoretical framework assumptions (as we mentioned in Section 3.1, maybe is not based on the factor structure). However, can be expressed in the form of residuals in all cases. Therefore, the estimation of the residuals is a central component of the complete theory, with serving as an intermediate variable.
W2:” the manuscript does not include experiments examining the effects of different sparsity levels on model performance”
Thank you for raising this question. In our theoretical section, our asymptotic theory suggests that as long as the estimated values are smaller than the true values, the anomaly detection task will be effective. In the experimental section, we constrain the sparsity using the regularization coefficient. Since we are applying this in an unsupervised setting, the level of sparsity is learned by the model itself. We believe that the sparsification operation enhances the effectiveness of the anomaly detection task, but the degree of sparsity cannot provide a precise bound. This is because the sparsity degree may vary for different types of unknown anomalies. Nevertheless, this raises an insightful question, and we will continue to explore this in our future research.
W3:” Writing suggestion”
Thank you for carefully pointing out our small mistake. We will make the correction in the subsequent version. Additionally, we will provide the pseudocode of our method to make it more intuitive and clearer for the readers.
Q3:” Whether the module constructed by this method potentially be applicable to a broader range of anomaly detection problems”
We have validated our method on multi-task tasks, and the results are as follows. Theoretically, our method remains effective in more anomaly detection methods. Here, we attempt to apply our method to the multi-class task on MvTec-AD dataset:
| Ours | Ours-Base | |
|---|---|---|
| Average | 93.3/94.6/85.9 | 58/70.8/39.3 |
Compared to the baseline method, our approach still shows a significant improvement in the multi-class task. We will follow up on this and conduct corresponding tests in the subsequent versions.
We sincerely hope that our clarifications above have increased your confidence in our work. We will be happy to clarify further if needed. We thank you again for sharing your valuable feedback on our work.
This paper proposes a reconstruction-based method for anomaly detection, by using the construction of a mask in the Fourier domain to sparsify the information by reducing the number of estimated common factors of the input images. Then a U-Net is used to reconstruct the images, using the reconstruction error as the anomaly score. Experiments are conducted on MVTec AD and VisA datasets.
给作者的问题
See strengths and weaknesses.
论据与证据
Not always.
方法与评估标准
Yes.
理论论述
Yes.
实验设计与分析
Yes.
补充材料
Yes.
与现有文献的关系
N/A
遗漏的重要参考文献
[a] OCGAN: One-class Novelty Detection Using GANs with Constrained Latent Representations. CVPR 2019. [b] Attribute Restoration Framework for Anomaly Detection. TMM 2020. [c] A Unified Model for Multi-class Anomaly Detection. NeruIPS 2022.
其他优缺点
The proposed method is simple yet effective to improve the reconstruction-based anomaly detection. However,
- Lacking important reviews in the literature. At the very beginning, OCGAN [a] proposed to add Gaussian noise to the input, while ARNet [b] used some transformations to erase some important attributes in the images, which have a very similar idea to this paper. Then, some MAE-based methods used masks to remove part of the inputs, transferring the reconstruction task to image inpainting. Finally, UniAD [c] illustrated that traditional AE failed in AD because of the identity shortcut of the model, by simply using the transformer-based architecture, the model's generalizability can significantly be improved, and can even be used for all tasks with only one model. However, this paper still focuses on one-model-one-task.
- The theoretical part and the method design part are very disconnected.
- The paper is very incremental. In terms of theory, the author's contribution is only to simply extend the existing one-dimensional theory to two-dimensional space.
- Also, in terms of experimental design, it seems that only DFS Module was added on the U-Net. What if the DFS Module is added on the other architectures, such as UniAD?
[a] OCGAN: One-class Novelty Detection Using GANs with Constrained Latent Representations. CVPR 2019. [b] Attribute Restoration Framework for Anomaly Detection. TMM 2020. [c] A Unified Model for Multi-class Anomaly Detection. NeruIPS 2022.
其他意见或建议
See strengths and weaknesses.
Dear Reviewer a5G4:
We appreciate your review of our paper and the insightful comments you provided. We are glad to hear that you find the method we proposed to be both simple and effective. We have addressed the issues you raised as follows:
W1:” Lacking important reviews in the literature”
As mentioned in your comments, we will include a related literature review in the subsequent version. We greatly appreciate your thorough feedback. Unlike the papers mentioned, which operate at the feature level, our method performs a demeaned Fourier transform in the frequency domain and provides a complete theoretical framework and proof, which is not mentioned in previous literature. (We were motivated by the application cases and abstract it into a theoretical model, based on this, we further give the Asymptotic theory of our model. ) Regarding the comment "However, this paper still focuses on one-model-one-task," since our contribution is mainly focus on the theory filed, our initial approach was to validate it on benchmark problems. However, your comment has been very insightful. Based on your suggestion, we have attempted to apply a multi-task, one-model approach on MvTec-AD dataset:
| Ours | Ours-Base | |
|---|---|---|
| Average | 93.3/94.6/85.9 | 58.0/70.8/39.3 |
Compared to the baseline method, our approach still shows a significant improvement in the multi-class task. We will refine the complete theoretical framework and methodology in future work. Thank you for your constructive suggestions.
W2:” The theoretical part and the method design part are very disconnected”
We thank the reviewer for raising this question, as it will help others better understand our work. (Maybe some of our expressions in writing let you not clearly see our writing logic, but in fact, the theoretical derivation of the previous sections and the application of the later sections are closely related, you could see when apply the DFT method, our theory clearly explains why the applied method works). As we mentioned in Section 4, the decentralization operation corresponds to the construction of the complex-valued empirical process in our theoretical section. It simultaneously constructs the test statistic, which helps us derive the asymptotic lower bound. Based on our asymptotic theory, in order to establish the quantitative relationship () between the true common factors and the estimated values, we designed the sparsification operation. This operation weakens the main information (non-anomalous parts) in the original image, causing the residuals (anomalous parts) to occupy a larger proportion, making anomalies easier to detect. The experimental results based on the above operations validate our asymptotic theories.
W3:” The paper is very incremental”
Thank you for raising this question, In terms of theory, our contribution is not simply an extension of existing one-dimensional theories to two-dimensional space. We also provide a complete asymptotic theory along with the corresponding proofs. This method and its applications really make sense: first, it requires the development of new hypothesis tests, complex-valued empirical processes, test statistics, and upper and lower bounds for the asymptotic theory. Second, unlike the one-dimensional time series theory, the theory at the two-dimensional image level requires spatial relationships to serve as the central element of the theory. Finally, the proof section involves new derivations and experimental validation of the constructed asymptotic theory, with validation metrics and methods that differ from those in the one-dimensional time series scenario. We are committed to constructing a complete theoretical framework, not merely extending the methods and theories.
W4:”it seems that only DFS Module was added on the U-Net. What if the DFS Module is added on the other architectures, such as UniAD?”
We thank the reviewer for raising this question. Since our main contribution is the theoretical part, it is essential to validate the effectiveness of the theory and its corresponding asymptotic properties using a simple model, this is very common in the Monte Carlo simulation section of such theoretical articles, and our main aim is similar. This allows us to eliminate the additional effects introduced by complex models. Therefore, we opted for the relatively simple U-Net architecture. As pointed out in your comment, adding the DFS module to the UniAD architecture should indeed be effective. However, the UniAD framework operates at the feature token level, which conflicts with our frequency domain operations. This results in our module not being directly applicable to the UniAD framework. We appreciate your valuable suggestion, which will be of great help for our future work.
We sincerely hope that our clarifications above have increased your confidence in our work. We will be happy to clarify further if needed. We thank you again for sharing your valuable feedback on our work.
The paper "Demeaned Sparse: Efficient Anomaly Detection by Residual Estimate" proposes a novel approach for unsupervised anomaly detection in structural images using a factor model framework combined with Discrete Fourier Transform (DFT). The authors introduce a test to detect anomalies by analyzing weighted residuals in the Fourier domain, comparing them to a zero spectrum under a null hypothesis of no anomaly. They develop the Demeaned Fourier Sparse (DFS) module, which constructs masks in the Fourier domain to enhance reconstruction-based anomaly detection. The method leverages residuals to identify anomalies without requiring prior knowledge of anomaly types, offering both theoretical rigor (via asymptotic properties) and practical applicability. Experimental results on datasets like MvTec-AD demonstrate competitive performance in anomaly detection and localization, with the approach being computationally efficient compared to some existing methods.
Update after rebuttal
After reviewing the authors’ rebuttal, I appreciate the effort they’ve put into addressing the feedback. Their responses have largely alleviated my concerns, and despite some differing perspectives, I will maintain my original score.
给作者的问题
N/A. Please check other sections.
论据与证据
- Strengths:
- The paper’s theoretical backbone is a standout feature. Section 3 delivers a rigorous statistical framework through asymptotic properties, such as Theorems 3.2, 3.4, and 3.6, which mathematically define how residuals behave under normal and anomalous conditions. The derived detection rate of offers a concrete lower bound for spotting subtle anomalies, marking a meaningful contribution to the field.
- Experimental evidence supports its claims effectively in several cases. The results show the method can outperform established approaches like PatchCore and DRAEM in specific scenarios, suggesting practical potential.
- Weaknesses:
- The connection between theory and practice isn’t fully realized. The asymptotic properties assume large image dimensions (H, W → ∞), but typical dataset images (e.g., 256x256 in MvTec-AD) don’t meet this scale. The paper doesn’t bridge this divide with empirical validation, such as testing on varying image sizes. To strengthen its case, it could have emphasized how its theoretical edge translates into tangible benefits, making the evidence more compelling against competitors.
方法与评估标准
- Strengths:
- The methodology shines with its originality, blending a factor model with DFT and introducing the DFS module. This module projects residuals into the Fourier domain and iteratively optimizes masks (e.g., via Bernoulli sampling and sigmoid mapping) to isolate anomalies efficiently. A key advantage is its self-contained residual generation—detecting anomalies using the model itself without external data structures like PatchCore’s memory bank or DRAEM’s dual-network setup. This lean approach enhances its appeal for unsupervised settings.
- Efficiency is a clear strength, as evidenced in Table 3. The proposed method significantly undercuts heavier models like SSNF (294.67M parameters, 102.23G FLOPs) and PyramidFlow (162.20M parameters, 81.13G FLOPs), positioning it as a viable option for resource-constrained environments, such as industrial applications with limited compute power.
- Weaknesses:
- Robustness isn’t fully substantiated. The absence of statistical significance tests (e.g., p-values or confidence intervals) for the reported metrics means readers can’t confidently assess whether the performance edge is meaningful or due to chance.
理论论述
The asymptotic theory is a robust intellectual achievement. Section 3’s derivations (e.g., Proposition 3.1, Theorem 3.6) establish how residuals behave under null and alternative hypotheses, providing a statistical lens for anomaly detection. This clarity sets a high bar for theoretical rigor in the domain.
实验设计与分析
- Strengths:
- The experimental design is well-structured, testing the method on established datasets (MvTec-AD and VisA) against strong baselines like PatchCore and DRAEM. This ensures a fair comparison within the anomaly detection community’s standard benchmarks.
- Hyperparameter exploration in Appendix E adds depth. By varying sampling functions and epochs, it demonstrates the method’s robustness and adaptability, giving readers insight into its operational flexibility.
- Qualitative results (Figures 6, 7) are visually persuasive, effectively showcasing the method’s ability to pinpoint anomaly locations in images, which aligns with its localization claims.
- Weaknesses:
- As mentioned in the Claims and Evidence section, testing on variable image sizes, if possible, would benefit the paper by demonstrating that the asymptotic property holds in practice.
补充材料
- The supplementary material enriches the paper.
- Theoretical proofs in Appendix B (though partially accessible) strengthen the core claims, giving readers a deeper dive into the mathematical underpinnings.
- Analysis of computational cost in Appendix D is another important part that highlights the advantages of this methodology.
- Appendix E’s hyperparameter study provides a detailed look at how choices like sampling functions affect performance, offering practical guidance for tuning the method.
与现有文献的关系
- The paper builds thoughtfully on prior work, extending factor models (Fu et al., 2023) and reconstruction-based detection (Zavrtanik et al., 2021) into a new image-focused framework, blending time-series inspiration with visual analysis.
- Its Fourier-based approach is a fresh take, diverging from the CNN-heavy norm and offering a frequency-domain perspective that could inspire further exploration.
遗漏的重要参考文献
Appendix A effectively addresses relevant references, providing a comprehensive overview of related work that ties the method to the current research landscape.
其他优缺点
- Strengths:
- The paper is clear and well-written.
- The method offers a theoretically robust framework, with rigorous asymptotic properties and a novel Fourier-based approach enhancing anomaly detection.
- Weaknesses:
- There are some minor issues, which I mentioned in the previous sections.
其他意见或建议
The paper struggles to clearly showcase its comparative advantages—such as greater efficiency, competitive performance, and a self-contained residual construction that detects anomalies without the need for additional data—over other methods, and it would benefit from more prominently integrating these strengths into the abstract or introduction to convince readers of its practical value.
Dear Reviewer LTm8,
Firstly, we express our gratitude for your thorough review and insightful comments on our paper. Your recognition of the rigor and contribution of our work is greatly appreciated. We also greatly appreciate your rigorous and professional feedback. Your suggestions are highly meaningful, especially for a theory-driven paper like ours. We have taken your feedback as an opportunity to further refine our manuscript. We address your concerns below:
W1:” more test on variable image sizes”
We agree with the viewpoint that the asymptotic global theory should be validated on more image sizes, as you mentioned in your comment: "it could have emphasized how its theoretical edge translates into tangible benefits." To address this, we have applied our method to larger-scale scenarios, with experimental results on MvTec-AD dataset presented for 512×512 and 1024×1024 image sizes. Due to space and time limitations, we only provide the average results of metrics.
| Resolution | Ours | Ours-Base |
|---|---|---|
| 256 256 | 98.69/98.32/93.24 | 95.04/96.59/84.10 |
| 512 512 | 98.52/96.13/91.69 | 85.11/89.97/69.24 |
| 1024 1024 | 97.19/91.49/86.82 | 81.65/82.83/58.79 |
results show that our method continues to maintain a high degree of effectiveness and shows greater stability compared to the baseline method. If possible, we will include relevant comparative results in future versions.
W2:” Robustness isn’t fully substantiated.”
We agree that statistical significance tests for the reported metrics are necessary, as the main contribution of this paper is statistical theory. Since we apply our method to practical detection problems, more comprehensive metrics are needed to assess overall performance. Specifically, in the validation of the asymptotic global theory (for the entire image), the p-value contribution of a single pixel might be overlooked (p-values are less adaptive to imbalanced data, as they often fail to account for the sparsity of anomalies). Therefore, we prefer to use metrics commonly employed in the anomaly detection field as our evaluation criteria. We provide partial p-values for our method on MvTec-AD dataset.
| Ours | Ours-Base | |
|---|---|---|
| P-value | 0.0276 | 0.0792 |
As can be seen, after incorporating our method, The p-value is below the significance level of 0.05, indicating that we can reject the null hypothesis (nonanomalous) with a high probability. If possible, we will include corresponding statistical significance tests for both our method and the comparative methods in future versions.
W3:” As mentioned in the Claims and Evidence section”
Please see our response to W1.
W4:”There are some minor issues, which I mentioned in the previous sections”
Please see our response to W1 and W2.
We sincerely hope that our clarifications above have increased your confidence in our work. We will be happy to clarify further if needed. We thank you again for sharing your valuable feedback on our work.
After reviewing the authors’ rebuttal, I appreciate the effort they’ve put into addressing the feedback. Their responses have largely alleviated my concerns, and despite some differing perspectives, I will maintain my original score.
Dear Reviewer LTm8,
We thank you for your thorough review of our paper and our response for providing constructive feedback that has significantly contributed to its improvement. Your insights have been invaluable in helping us refine our work.
Best regards,
Paper9325 Authors
Although this paper did not receive unanimous support from the reviewers before and after the rebuttal, with unchanged ratings of 4, 4, and 2, the area chair believes that the two reviewers who rated it positively provided useful comments on the paper's contributions.
From Reviewer LTm8:
The method offers a theoretically robust framework, with rigorous asymptotic properties and a novel Fourier-based approach enhancing anomaly detection.
From Reviewer 2ntV:
This paper provides a relatively complete theoretical framework and derivations, particularly matching the properties of the asymptotic theory with the practical issues.
Several concerns were raised, such as (1) the connection between theory and practice, (2) the experimental design, (3) the effects of different sparsity levels, and (4) applicability to a broader range of anomaly detection problems. The authors addressed these concerns by incorporating experiments involving "more tests on variable image sizes," "statistical significance tests," " the regularization coefficient," and "validation on the multi-task MvTec-AD dataset." These efforts were acknowledged by the reviewers.
After considering all assessments, the area chair recommends that we accept the paper.