4.0

/10

withdrawn5 位审稿人

最低3最高6标准差1.3

3.4

置信度

正确性2.2

贡献度2.2

表达2.0

ICLR 2025

When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series

Min-Yeong Park,Won-Jeong Lee,Seong Tae Kim,Gyeong-Moon Park

OpenReview PDF

提交: 2024-09-26更新: 2024-11-27

摘要

关键词

Time SeriesTime Series ForecastingAnomaly Detection

评审与讨论

审稿意见

评分: 3置信度: 42024-10-28

This paper addressed an interesting scenario named Anomaly Prediction (AP), where the model needs to detect abnormal time points from the unarrived future signals. It tackled this issue by establishing a unified architecture that shares the feature extractor for forecasting and anomaly detection models.

优点

S1. Time series anomaly prediction is important to various domains.

S2. This work focuses on an important problem that could have real-world applications.

S3. Appropriate ablation studies.

S4. Experimental results on various datasets.

缺点

W1. Codes and more implementation details should be provided for reproducibility.

W2: There are too few baseline methods for comparison. Many important related works and baselines are missing, such as [1] and [2]. Without caparison to these existing related works, the contributions and novelty in this paper are flawed.

[1]Multivariate Time-Series Anomaly Detection with Temporal Self-supervision and Graphs: Application to Vehicle Failure Prediction. ECML 2023.

[2]Time Series Based Data Explorer and Stream Analysis for Anomaly Prediction. 2022.

W3. The evaluation metrics reported in the work are relatively limited. Can the performance of the algorithm be reflected on other diversified metrics. Such as "AUC-ROC", "VUS_ROC", "VUS_PR"[3]. Including additional or alternative evaluation metrics could provide a more comprehensive assessment of AD methods, especially in complex scenarios.

[3] Volume Under the Surface: A New Accuracy Evaluation Measure for Time-Series Anomaly Detection.

W4. The specific innovation points of the entire article are insufficient, mostly utilizing existing technologies, such as the forecasting mechanism are widely used in anomaly tasks. The authors should add more clarification on motivations for why they choose these existing technologies.

W5. The writing quality of the article is poor and many related works are missing, making it difficult to follow, and the charts are not clearly presented. For example, Figure 2 of 68 is labeled incorrectly, it should be Figure 1.

问题

See weaknesses.

Q1: More benchmarks, such as UCR [4] should be considered. The authors should justify their choice of datasets and explain why they believe their current selections are sufficient or how they plan to expand their evaluation.

[4] Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress.

Q2: Why are the experiments in Table 8 not conducted when t<10, especially when t=0. Please discuss the implications of using t=0 or very small t values for evaluating the precision of the anomaly prediction method.

Q3: Why were the experiments in Table 3-8 only conducted on a few datasets, especially the experiments in Tables 7 and 8 were only conducted on MSL.

审稿意见

评分: 3置信度: 42024-10-30

The paper introduces a novel time series scenario, "anomaly prediction," where time series forecasting and anomaly detection are conducted simultaneously. First, it proposes a new AAFN designed to enhance anomaly-aware time series forecasting. Additionally, the paper presents an SAP module, incorporating a novel concept called APP. These APPs are trainable parameters that represent different types of anomalies. The proposed approach jointly trains this module by optimizing a predictive loss, a reconstruction loss, and two newly designed intra-signal and inter-signal correlation losses. Finally, in the main stage, part of the parameters from the AAFN and SAP modules are frozen, allowing for training of the anomaly prediction task to achieve both anomaly-aware forecasting and anomaly detection in time series data.

The key contributions of this paper include:

Proposing an interesting and novel application scenario in time series data, "anomaly prediction."
Designing the AAFN model to enable anomaly-aware time series forecasting.
Developing the SAP module with partially trainable parameters that provide anomaly indications, which effectively support the subsequent anomaly prediction task.
Conducting comprehensive experimental evaluations and ablation studies.

优点

This paper proposes a novel and compelling time series scenario, "anomaly prediction," which considers anomaly information during time series forecasting while simultaneously performing anomaly detection.
For each target, dedicated loss functions are designed to address specific objectives effectively.
Extensive experiments are conducted to demonstrate the method's effectiveness.

缺点

While the problem definition is interesting, in my opinion, predicting future anomaly patterns under the current theoretical framework is impractical. The connection between current anomalies and future anomalies is often weak in many scenarios. Your experimental results indirectly support this view, as the F1 scores after point adjustment mostly fall below 0.5, which is unacceptable in anomaly detection. Even with an F1 of 0.9, many false alarms are likely; an F1 of 0.5 would mean that operations personnel would be handling alerts almost daily.
Anomalies are generated through injection, which may lead the SAP module to learn anomaly parameter information that does not accurately reflect real-world anomaly scenarios. Likewise, the parameters learned by AAFN might differ significantly from those that would represent anomalies in real environments.
There are some grammatical errors, such as "figure 3" in line 235.
The design of numerous loss functions risks complicating training and requires additional effort to optimize parameters.

问题

There is still significant room for development in time series forecasting, especially in long-term forecasting, which presents numerous challenges due to high randomness. How do you conclude that time series forecasting can be performed concurrently with anomaly detection?
In the AAFN, you handle anomalies similarly to a 0-1 classifier. However, in real-world scenarios, classifiers cannot manage the complexities of time series anomaly detection. Furthermore, your anomalies are synthetically injected, which greatly differs from real-world environments. How, then, is the effectiveness of AAFN achieved?
In my opinion, the SAP module includes time series forecasting but does not consider anomaly information, while the AAFN includes time series forecasting and does consider anomaly information. Why are there two separate forecasting modules? Could this be redundant? And why does the SAP module’s forecasting omit anomaly information?
Upon reproduction, it appears that DCdetector and the anomaly-transformer are not the state-of-the-art in anomaly detection. Using other methods might yield better results. Additionally, you mentioned that time series forecasting tends to predict future normal patterns, so are the comparative methods used in your experiments appropriate? Shouldn’t you also train the baseline models on time series data with substantial injected anomalies for consistency?

审稿意见

评分: 3置信度: 42024-11-01

This paper propose a simple yet effective framework, Anomaly to Prompt (A2P) to tackle a problem of time series anomaly prediction. It derives the anomaly probability by random anomaly injection to forecast abnormal time points. It achieves robust anomaly detection by synthesizing anomalies to enhance the diversity of abnormal signals.

优点

S1. This paper presents an interesting approach for the task of time series anomaly prediction.

S2. The experimental findings illustrate the effectiveness of this approach.

S3. The experiments show improvements with the ablation study.

缺点

W1. The main weakness of this paper is its claimed novelty in defining a new scenario that "forecasts and detects anomaly points in the future signal". However, prediction-based anomaly detection methods already address similar tasks by identifying anomalies through trend forecasting in time series. For example, "Timeseries anomaly detection using temporal hierarchical one-class network"[1] and "Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly Detection"[2]. The authors should include and compare these existing models by novelty and effectiveness.

W2. Meanwhile, "Precursor-of-Anomaly Detection for Irregular Time Series" [3] already introduces a novel task for precursor of anomaly detection, which not only examines current anomalies but also predict potential anomalies in future signals. This paper lacks innovations and novelty in task definition and ignore these most related works and baselines.

W3. As the paper consider the anomaly probability by random anomaly injection, the authors should also include related works of existing probability models and and anomaly injection models, and compare them by novelty and effectiveness, such as "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"[4] and "AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series Data"[5].

W4. Directly using MSE loss to compute AAFN's output anomaly probability with the true labels is not a compatible approach. The authors should explain why they choose MSE.

W5. Injecting random anomalies into the future predictions in $X_{out}$ seems unreasonable. Anomaly prediction should be based on known historical trends, and injecting anomalies into future time segments is more likely to cause errors. For example, how to make decision that one historical trends should injecting anomalies into future time segments and another historical trends should not inject anomalies into future time segments. Moreover, this design disrupts the true normal trend, making it challenging for the model to distinguish genuine anomalies from normal sequences.

W6. $M$ and $N$ determine the number of anomaly prompts and the number of best-matched prompts with the input signal, respectively. The paper lacks specific selection ranges for $M$ and does not provide guidance on whether adjustments are needed based on the requirements of different domains.

W7. To ensure the reproducibility of this work, can you make the code anonymously publicly available?

问题

See weaknesses.

Q1: Why does the paper lack classic anomaly detection and prediction models? The authors should add more baselines for comparison, such as [1-5].

Q2: Why only a single baseline and a non-optimal one are used for comparison in the Results on Forecasting and Results on Diverse Tolerances experiments?

审稿意见

评分: 5置信度: 22024-11-03

The authors propose to predict anomalies in future amples in a time seriese that have not arrived yet. First, the future samples are forecasted, then anomalies are detected. Cross attention is used between the forecasted future input and previous input. Based on QKV embeddings, Q is the future input, K and V are from previous input. The probability of an anomaly is estimated as the activation function of the attention value from Anomaly-Aware Forecasting Network (AAFN). The loss function is the MSE of estimated and actual probability.

For forecasting future samples with potential synthetic anomalies, an Anomaly Prompt Pool (APP) is used, which is learned. Anomalies from the pool are injected into the forecasted signal similar to (Darban et al 2025). First, a feature extractor is trained on normal signal. Second, the previous signal is the query, and an anomaly prompt in the pool is a key. The prompts that correspond to the closest keys that match the query are "attached" to the embedded previous input. The prompts help generate abnormal signals. To train APP, they use Intra-Signal Anomaly Discrepancy Loss. They use the energy score from (Liu et al, 2020)--low energy scores indicate normal samples, while higher engergy scores indicate abnormal samples.

Additionally, they have a reconstruction loss for the previous signal and a forecast loss for the future signal. Moreover, they have Inter-Signal Anomaly Divergence Loss.

They compare their porposed method A2P with 4 existing methods over 6 datasets. Empirical results indicate that A2P generally outperforms the existing methods. Ablation studies indicate contributions from the different components.

优点

Anomaly prediction is an interesting problem. The proposed methods for training the Anomaly Prompt Pool is interesting.

Empirical results indicate that A2P generally outperforms the existing methods. Ablation studies indicate contributions from the diferent components.

缺点

Different parts of the methods could use more explanation, particularly how the future signal, $\hat{X}_{out}$ , is forecasted (see questions below).

When synthetic anomlies are injected into the forecasted signal, you already know where they are. Do you need to detect them?

Evaluating how well the future signal is forecasted would be beneficial. Though not the primary goal, forecasting the future signal is an intermediate goal.

问题

line 300: How do you tranfrom the original normal signal to an abnormal signal based on the "attached" prompts. Also, "the output anomaly prompts are then removed". How are these output prompts obtained? Are they the prompts attached in Eq 2?

Eq 4: What are the gamma function and k_m?. Why does the division in the first term measure discrepancy?

Eq 5: The two X' terms could use more explanation--what are they and how are they obtained?

Eq 7: How is $\hat{X}_{out}$ obtained?

伦理问题详情

n/a

审稿意见

评分: 6置信度: 32024-11-04

This paper primarily introduces the scenario of anomaly prediction and provides a solution called AAF. Based on historical input time series, AAF predicts whether an anomaly will occur at a future point. Before training AAF, an Anomaly Prompt Pool (APP) is pre-trained to generate an anomaly prompt pool. Detailed experiments have proven that this method significantly outperforms well-known existing methods, such as iTransformer and P-TST.

优点

Anomaly prediction is a promising direction that can leave valuable time for subsequent fault elimination and even predict and avoid potential faults in advance.
The method of joint training for time series prediction and anomaly detection makes sense.

缺点

There is an inconsistency between Figure 4 and Figure 3 in the paper. In Figure 3, e_{in} serves as a key and value, but in Figure 4, e_{in} serves as a query.
See the following questions.

问题

It is common that many anomalies are sudden drops or rises in indicators that have no relation to the preceding time series. Theoretically, these cannot be predicted. The paper's generally low F1-score below 50% seems to confirm this point. Can the authors provide statistics on how many of these unpredictable cases there are in the evaluation dataset to prove the feasibility of anomaly prediction or its theoretical limits?
The APP pre-training generates an anomaly prompt pool for the subsequent random injection of anomalies during training to learn the relationship between anomalies and time series. However, these injected faults, no matter how similar they appear with normal input, may differ significantly from actual situations and could affect the model's performance. Can the authors explain why these artificially injected anomalies can help learn the relationship between real-world anomalies and time series?
During the large-scale pre-training process, was the test dataset seen, and is there a data leakage issue?

撤稿通知

2024-11-27

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.