PaperHub
5.3
/10
Poster4 位审稿人
最低5最高6标准差0.4
5
5
6
5
3.5
置信度
正确性3.3
贡献度3.0
表达3.0
NeurIPS 2024

Frequency-aware Generative Models for Multivariate Time Series Imputation

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06
TL;DR

This paper proposes a frequency-aware generative model (FGTI) for multivariate time series imputation, which integrates frequency-domain information and uses cross-domain representation learning modules to enhance imputation accuracy.

摘要

关键词
Time series; Time series Imputation; Generative Models; Frequency domain

评审与讨论

审稿意见
5

This paper proposes FGTI that uses frequency-domain information with high-frequency and dominant-frequency filters for accurate imputation of residual, trend, and seasonal components. Experimental results demonstrate FGTI's effectiveness in improving both data imputation accuracy and downstream applications.

优点

  1. The authors leverage high-frequency and dominant-frequency features to enhance time series imputation.

  2. They introduce time-domain and frequency-domain representation learning modules to comprehensively capture relevant information.

  3. Extensive experiments are conducted to demonstrate the method's effectiveness.

缺点

  1. The formulation and writing of this paper should be improved, with attention to correcting several typos.

  2. The authors claim that the residual of a time series mainly comprises high-frequency components. However, in many time series, there may be multiple scales of seasonality and trend features, potentially resulting in a final residual that is too noisy to contain useful information.

  3. It appears that the proposed method utilizes frequency features of time series as conditions for the diffusion model, which seems straightforward and may limit its contribution.

问题

  1. In Table 2, the absence of the High-frequency filter appears to impact performance less on KDD and Guangzhou datasets. However, on PhysioNet, the situation is reversed. What factors might explain this discrepancy?

  2. How were the hyper-parameters for the experiments chosen? Could you elaborate on the methodology used for their selection?

  3. What is the purpose of Proposition 3.1? It seems unnecessary and almost self-evident.

局限性

None

作者回复

Thank you for the thoughtful comments and valuable suggestions. We have tried our best to incorporate the suggestions in the revised version. Below, we provide our response to the questions and concerns.

1. (W1) The formulation and writing of this paper should be improved

We appreciate your feedback on improving clarity and grammatical accuracy. We have meticulously revised the manuscript, correcting errors such as "Recently, researchers attempt to utilize large language models" to "Recently, researchers have attempted to utilize large language models" (line 62), and ensuring all sentences are structurally sound and clearly presented.

2. (W2)Explain how multiple scales of seasonality and trend features in time series affect the residual’s usefulness, which may be too noisy

Thank you for your comments. Since our method does not explicitly perform STL decomposition, it is not affected by the multiple scales of trends and seasonality. Our method leverages Fourier transform to analyze both dominant-frequency and high-frequency information, which inherently addresses multiple scales of trend and seasonality. By our dominant-frequency filter, it is possible to capture multiple scales of trend and seasonal feature information. In addition, even when the high-frequency information that mainly contributes to the residual is noisy, we can still use the dominant-frequency information and adjust the weights of the high-frequency condition by attention layers for accurate imputation. As shown in Table 1, our method always outperforms other methods in different scenarios.

3. (W3)Straightforward and may limit its contribution

Our core insight is that existing methods cannot accurately imputation the residual term, therefore introducing the corresponding frequency-domain information. The insight of modelling high-frequency signals for time series imputation has not yet been recognized by existing studies. Our proposed filters and representation learning modules are simple and effective for imputation, and can shed light on further works. In addition, our proposed filters and representation learning modules can also be flexibly ported to other generative models and are not limited to applying diffusion models.

4. (Q1)Explain why the absence of the High-frequency filter impacts performance differently on KDD, Guangzhou, and PhysioNet datasets

Thank you for your question. The difference in impact of dominant-frequency and high-frequency information depends on the characteristics of datasets. For the PhysioNet dataset, the obtained dominant-frequency information may be biased due to it contains at least 79.71% missing values. For the KDD and Guangzhou datasets, the dominant-frequency information will play a larger role because of the lower missing rates. In these cases, our method can adaptively adjust the weight of high-frequency information and dominant-frequency information to help the imputation model.

5. (Q2)Hyper-parameters for the experiments chosen

For the two critical hyperparameters of the high-frequency filter and the dominant-frequency filter, we conduct empirical search to identify the optimal hyperparameters, as shown in Fig. 9 and Fig. 10 in Appendix A.4.2. For other settings related to diffusion model, we adopt hyperparameters recommended by the existing well-established models [23,43]. These models have demonstrated strong performance in similar tasks, and their hyperparameters have been extensively validated in those literature. To improve the presentation of the hyperparameters, we will add the above description of the parameters for the diffusion model to Appendix A.4.2.

6. (Q3)Purpose of Proposition

The purpose of Proposition 3.1 is to illustrate the mechanism behind introducing the high-frequency condition and the dominant frequency condition for facilitating generative models to impute missing values. This Proposition shows that with the introduction of the two conditions, the entropy of the target distribution will be reduced, the randomness of the target distribution will be narrowed, so that the generative models will more accurately impute missing values. To add empirical evidence for this proposition, we also add experiments using CRPS to evaluate the gap between the learned and ground truth distributions for different generative baselines in the following table:

DatasetMiss. RateMIWAEGPVAETimeCIBGAINCSDISSSDPriSTIFGTI
KDD10%0.5240.4430.4660.7090.2240.3520.2320.158
20%0.5260.4450.4670.7180.2450.3700.2480.170
30%0.5320.4470.4690.7290.2590.3740.2680.186
40%0.5300.4570.4710.7460.2780.4010.3010.216
Guang.10%0.3120.3330.3600.6920.2650.3160.2090.155
20%0.3120.3300.3570.6940.2770.2990.2440.168
30%0.3120.3350.3560.6950.2920.3530.3100.193
40%0.3120.3330.3580.6970.3240.3820.3620.243
Phy.10%0.6890.6590.4660.7390.5440.6170.4440.343
20%0.7170.6650.4670.7610.5890.6650.4570.369
30%0.7500.6740.4690.7870.6270.6300.4670.389
40%0.7790.6800.4710.8140.6710.6760.4910.441

As shown in the above table and Table 1, our method outperforms other methods for all metrics due to the introduction of frequency domain conditions, thus echoing the proposition.

审稿意见
5

This paper proposes a generative model, Frequency-aware Generative Models for Multivariate Time Series Imputation (FGTI), for multivariate time series imputation. The proposed model is designed to enhance the imputation performance by modeling the residual component of time series data. To this end, the paper leverages the observation that the residual components are usually high-frequent. It utilizes two filters in the frequency domain to extract high-frequency and dominant-frequency information from the time series using FFT and IFFT. The frequency information is then input into a denoising network to generate the missing data conditioned on the observed part of the time series. Experiments are performed using three real-world datasets and a number of recent baselines are benchmarked.

优点

  • The idea of using frequency filters to model the residual component of time-series data is natural and reasonable.
  • The empirical evaluation is systematic and the experiment results show consistent improvement compared with existing methods.

缺点

  • It is not clear if the modeling of temporal dependency is sufficient in the time-frequency and attribute-frequency representation learning module. From Eq. (8-9), it seems that only two attention layers are used without considering the time stamp.
  • The model architecture and specification used for Projectors in the denoising network is not described.
  • There is a gap between Sections 3.2 and 3.3. The time-frequency representation and attribute-frequency representation learning are described and the final output Ra\mathbf{R}^a is defined. However, it is not clear how this representation is used in the diffusion model. It seems from Eq. (14) and Appendix A.2.1 that Ra\mathbf{R}^a is not involved in the parameterization ϵθ\epsilon_\theta.

问题

Please kindly see and clarify my comments in the Weaknesses section.

局限性

Limitations of high demand of computational resources are discusses with relevant experiment results analyzed.

作者回复

We really appreciate your thoughtful comments and valuable suggestions. We have incorporated the suggestions in the revised paper. Below, we provide our response to the concerns.

1. (W1) Clarify if temporal dependency modelling is sufficient

We would really appreciate your feedback. We apologize for the inadequate illustrations and explanations of the time-frequency and attribute-frequency representation learning modules In line 144, we state that the Encoder(.) is implemented by the transformer backbone. Since a position encoding layer exists in this encoder, we can add the timestamp information for the corresponding Query and Key in the two representation learning modules. To improve the presentation, we add a figure illustrating the detailed architecture of the denoising network, which is uploaded in the supplementary pdf file at the top of the page. And we will add this figure to Section 3.3. In addition, the setup where modelling temporal dependencies than modelling attribute dependencies by two attention layers is common in time series imputation methods [26,43]. Considering that our method outperforms other imputation methods for different missing rates and different missing mechanisms in Table 1, Fig. 4, Fig. 7 and Fig.8, the modelling of the time dependence is sufficient.

2. (W2) Describe the model architecture and specifications used for Projectors in the denoising network

Thank you for your suggestions. We apologize for the inadequate description and explanation of the detailed structure of the denoising network. As shown in Fig.2, which is uploaded in the supplemental pdf, the input projector is an MLP layer, and the output projector is a 2-layer MLP with a ReLU activation function. This setup is consistent with the other SOTA imputation methods [26,43], which have been extensively validated in those literature.

3. (W3) Clarify how time-frequency and attribute-frequency representations are used in the diffusion model

Thank you for your valuable suggestions for boosting our manuscript. We apologise for not having a very clear representation of the denoising network architecture of the diffusion model. The Fig.2 in supplemental pdf illustrating the detailed architecture of the denoising network will add in the Section 3.3. To achieve the imputation of generative models that are helped by the guidance of frequency-domain information, we should incorporate the representation learning modules into the generative model architecture. As shown in the figure, Ra\mathbf{R}^{a} is a hidden vector in the denoising neural network of the diffusion model.

评论

I appreaciate the response from the authors and the additional figure to clarify the model architecture. My concerns have been addressed. Specifically,

  • The time information can indeed by captured by Transformer models. I would suggest authors consider spelling this out by revising Eq. (7) from Encoder()\text{Encoder}(\cdot) to Transformer()\text{Transformer}(\cdot) to enhance clarity.
  • The architecture and specifications for the Projectors are clarified. Please add the relevant descriptions in the main text.
  • The use of Ra\mathbf{R}^a and Rt\mathbf{R}^t are clarified with the revised architecture figure.

Given that my concerns are addressed, I increase my overall rating from 4 to 5.

评论

Thank you for your thoughtful evaluation and the feedback provided in your review. We are pleased to hear that the additional figure and our responses have helped in addressing your concerns.

  • We will certainly revise Equation (7) and Fig.3 to replace Encoder() with Transformer() to more explicitly indicate that time information can be captured via Transformer models, as you've suggested.

  • Additionally, we will incorporate detailed descriptions of the input projector and output projector in the main text, as detailed in the Rebuttal of response W2, to avoid any ambiguities regarding their architecture and specifications.

  • We also acknowledge your satisfaction with the clarification provided through the revised figure regarding the use of Ra and Rt. We will ensure that similar clarity is maintained throughout the manuscript to aid the understanding of our model components.

Thank you again for your invaluable feedback and guidance. If you have any questions, please send them to us, we look forward to discussing with you to further improve our work.

Best regards,

Authors

审稿意见
6

The paper proposes a new model, called FGTI, to address the issue of missing data in multivariate time series extracting frequency-domain information. The authors argue that existing methods, in general, neglect the residual term, which is the most significant contributor to imputation errors. FGTI incorporates frequency-domain information using high-frequency and dominant-frequency filters to improve the imputation of the residual, trend, and seasonal terms. The model is evaluated on three real-world datasets and demonstrates superior performance in both imputation accuracy and downstream tasks compared to existing methods.

优点

  • The paper is well-written and well-motivated.
  • The authors tackle a very important problem of time-series imputation, focusing on fully utilizing frequency-domain information, which has not been considered enough.
  • The experiments are thorough, including ablation studies and different missing scenarios. The method is compared with cutting-edge baselines.

缺点

  • The experiments are thorough and include multiple state-of-the-art imputation methods. However, the paper misses two important time-series imputation methods: one with a VAE-based approach [A] and one applying trend-seasonality decomposition [B]. While it's understandable that the authors might not have had enough time to include these papers since they are from ICLR 2024, technical comparisons with these methods would be necessary before acceptance.

  • While the authors provided experiments with various missing scenarios and missing rates, it is not clear why the proposed method outperforms the conventional methods and under what circumstances. For instance, extracting high-frequency information could be significantly distorted if missingness occurs with a certain frequency (which seems realistic considering periodic noises). The paper would be more convincing if the authors provided experiments that assess the effect of the trend and seasonality components of the datasets (or using synthetic datasets) and missing patterns.

  • The experiments would be more appealing if the authors provided a breakdown (seasonality and trend) of the assessments on the imputed time-series.

Minor comment:

  • Regarding Figure 1, since the RMSE and MAE comparison essentially conveys similar information about the population-level importance of the trend-seasonality decomposition, replacing one with specific time-series examples where the trend and seasonality of the missing values are correctly imputed would better illustrate the motivation. Also, information about how Figure 1 is reported is missing.

References:

[A] Choi and Lee, "Conditional Information Bottleneck Approach for Time Series Imputation, ICLR 2024. [B] Liu et al., "Multivariate Time-series Imputation with Disentangled Temporal Representations," ICLR 2024.

问题

  • Regarding Weakness 2: Please describe why the proposed method works better than conventional methods and under what circumstances. Do the seasonality and trend profiles of the given time-series dataset matter, and if so, how?

  • It seems that how to mask the input (to learn the underlying diffusion model) is very important. How does the model's performance change depending on the masking ratio and/or masking patterns? This is crucial as masking may distort the trend and seasonality components of the given time-series.

局限性

N/A

评论

2. (W2,Q1) Clarify why your method outperforms conventional ones, considering trend, seasonality, and missing patterns in datasets

Thank you for your valuable suggestions for boosting our manuscript. We conduct a case study of FGTI for trend, seasonal, and residual terms over the pre-decomposed KDD dataset. We first perform STL decomposition of the KDD dataset into Trend, Seasonal and Residual terms. Then we select 10% observations of the original KDD dataset as the mask positions by MCAR, and then mask the corresponding positions of the three terms. To study the role of high-frequency information, dominant-frequency information, and frequency-domain information, we consider the three ablation scenarios: (1) w/o Dominant-frequency filter (2) w/o High-frequency filter and (3) w/o Frequency condition in Section 4.3. We report the imputation results of different scenarios over different terms in following table:

ComponentTrendSeasonalResidual
RMSEMAERMSEMAERMSEMAE
w/o Frequency condition0.0480.01550.05720.03640.51320.2975
w/o Dominant-frequency filter0.04820.01570.05330.03340.49560.2814
w/o High-frequency filter0.04090.01430.04850.03010.51290.2912
FGTI0.04480.01590.05230.03250.50680.2885

We can find that for the Trend term, retaining the dominant-frequency information gives the best results, while the-high frequency information may interfere the imputation. For the Seasonal term, the results are similar to the Trend term, but the dominant-frequency information contributes less to the imputation for the Seasonal term than for the Trend term. This suggests that the Seasonl term mainly corresponds to the dominant-frequency information, but also contains some of the high-frequency information. In contrast, the results of the experiments on the Residual term show that it mainly corresponds to high-frequency information. Since we choose Transformer as the encoder in Cross-domain Representation Learning and utilize cross-attention as the fusion mechanism of the two frequency domain information in Time-frequency Representation Learning and Attribute-frequency Representation Learning, our method can self-adaptively adjust the weights of the high-frequency information and the dominant-frequency information for different timestamps. Thus our method can outperform conventional methods for datasets with any circumstances in most cases, as illustrated in Table 1.

In addition, our method's pipeline does not perform STL decomposition of the time series, but extracts the frequency domain informations to guide imputation, so we do not need to know the trend and seasonality of the dataset. Thanks to the encoder structure and cross-attention based fusion mechanism in Cross-domain Representation Learning, we can adaptively adjust the weights of the two frequency domain information on datasets dominated by different terms.

Furthermore, we consider three missing mechanisms, MCAR, MAR, and MNAR, in our comparative experiments. MCAR implies that each attribute could be missing at any frequency, while MAR potentially implies that all attributes' missing status depends on the frequency of a particular attribute (e.g., high temperatures could cause sensor failure). For MNAR, the missing status of all attributes depends on the frequency of that attribute. As shown in Fig 4, Fig 7 and Fig 8 in the manuscript, our method still outperforms conventional methods with different missing mechanisms (patterns).

作者回复

We thank the constructive comments and suggestions for further experiments. We will try our best to incorporate the suggestions in the revised version. Below, we provide our response to the questions and concerns.

1. (W1) Compared with the most recent baselines

Thank you for the valuable suggestions. TimeCIB [a] is a very competitive VAE-based imputation method, so we incorporate TimeCIB into the baselines for comparative experiments. In addition, we have already considered TIDER [27] in our experiments.

We first report the part of updated Table 1 for the added baseline:

 Miss. RateMetricFreTSFGTI
KDD10%RMSE0.6300.406
 MAE0.4120.149
20%RMSE0.7410.451
 MAE0.4890.161
30%RMSE0.7960.448
 MAE0.5460.176
40%RMSE0.8500.478
  MAE0.5910.205
Guang.10%RMSE0.4560.230
 MAE0.3400.170
20%RMSE0.6020.258
 MAE0.4600.176
30%RMSE0.7090.291
 MAE0.5470.202
40%RMSE0.7870.356
  MAE0.6110.254
Phy.10%RMSE0.8040.580
 MAE0.5400.286
20%RMSE0.8250.577
 MAE0.5760.309
30%RMSE0.8610.624
 MAE0.6030.336
40%RMSE0.8830.669
  MAE0.6260.376

We then report the imputation results for varying missing mechanism with 10% missing values in the following table, and update Fig.7, Fig.8 and Fig.9:

 Miss. Mech.MetricTimeCIBFGTI
KDDMCARRMSE0.5890.406
 MAE0.3670.149
MARRMSE0.5890.406
 MAE0.3670.149
MNARRMSE0.6930.499
 MAE0.4080.174
Guang.MCARRMSE0.4510.230
 MAE0.3000.170
MARRMSE0.3760.218
 MAE0.2450.150
MNARRMSE0.3270.200
 MAE0.2300.140
Phy.MCARRMSE0.6970.580
 MAE0.4500.286
MARRMSE0.3270.200
 MAE0.2300.140
MNARRMSE0.6970.580
 MAE0.4500.286

After this, we report the updated resource consumption over the KDD dataset with 10% missing values in the following table, and update Fig.5:

GPU Consumption Usage (MiB)Running Time (s)
TimeCIB9306.52
FGTI91033874.30

Finally, we report on the updated downstream application experiments in the following table, and update Fig.6:

Air quality prediction (RMSE)Mortality forecast (AUC)
TimeCIB0.630.82
FGTI0.590.86

We can find that FGTI still outperforms all baselines due to the guidance of high-frequency information and dominant-frequency information, as well as the generating ability of the diffusion model.

评论

3. (W3) Provided a breakdown (seasonality and trend) of the assessments

Thank you for your suggestions. The pipeline of our method does not perform a direct STL decomposition, but instead uses the high-frequency and dominant-frequency information to potentially guide the imputation of the three terms. Due to the effect of missing values, we will get different decomposition results before and after imputation, so for each term the ground truth will be lacking for the evaluation. Therefore, we first perform STL decomposition of the original KDD dataset, then inject the missing values for the three terms respectively to verify the imputation results, and the results are shown in Fig. 1 of Section 1. In addition, we also add a Case study about the trend, seasonal and residual terms using the pre-decomposed KDD dataset, and the results are shown in the above table. The results show that dominant-frequency information can mainly help the imptuation of the Trend and Seasonal terms, and the high-frequency information is mainly help to the Residual term.

4. (C1) Replace RMSE or MAE with examples, and clarify Figure 1 reporting details

Thank you for your valuable suggestions for boosting our manuscript. Regarding the setting for the study of Fig. 1, we first perform STL decomposition of the KDD dataset into Trend, Seasonal and Residual terms. Then we select 10% observations of the original KDD dataset as the mask positions by MCAR, and mask the corresponding positions of the three terms. Finally, we impute the missing values of the three terms by different methods separately. Following your valuable suggestion, we add the examples of imputations on trend, seasonal and residual terms in Fig. 1, which is uploaded in the supplementary pdf file at the top of the page. and add the setup for this survey study. The results also indicate that the SOTA time-series imputation methods are inaccurate in imputing the residual term.

5. (Q2) Explain how masking ratio/patterns affect model performance

Thank you for your comments. We agree that mask ratio and mask pattern are critical for the learning process. We randomly mask different ratios of observations as the imputation target at each training step, instead of using a fixed mask ratio, following CSDI [43]. The performance of FGTI with different mask ratios is shown in the following table:

Mask RatioKDDGuangzhouPhysioNet
RMSEMAERMSEMAERMSEMAE
10%0.43720.19250.23880.16470.59920.3235
20%0.41430.17000.23120.15760.58530.2893
30%0.42570.17070.23350.16000.58360.2800
40%0.41850.16970.23720.16340.61810.3105
50%0.41830.17140.23430.15780.63080.3138
Random0.40570.14890.23250.15840.58010.2856

We can find that since the random mask strategy can increase the learning complexity and enhance the modelling ability of the diffusion model, thus our masking strategy achieves optimal or sub-optimal performance in most cases.

Then, we explore the performance when using different mask patterns. Following CSDI [43] and PriSTI [26], we consider (1) Block missing (2) Mix missing strategy (3) Random missing mask pattern,the results is shown in the following table:

Mask PatternKDDGuangzhouPhysioNet
RMSEMAERMSEMAERMSEMAE
Block0.41870.17780.23870.15960.62240.3384
Mix0.41930.17920.23250.15310.60340.3208
Random0.40570.14890.23250.15840.58010.2856

It can be found that Block missing or Mix missing is not comparable to Random missing in most cases due to the possibility that the mask pattern may not correspond to the actual missing scenario. So, we use the Random missing mask pattern by default.

Reference:

[a] Conditional Information Bottleneck Approach for Time Series Imputation

评论

I have read all the rebuttals and I fully appreciate the experiments the authors made. This is an interesting paper with good motivation and thorough exepriments. I tend to keep my initial decision, which is "Weak Accept".

评论

Thank you for appreciating our detailed rebuttals and experimental efforts. We are delighted to know you find our paper interesting with solid motivation and thorough experiments.

Given these positive remarks and to strengthen our position for final acceptance, may we kindly request you to reconsider your score and potentially upgrade it? This would greatly enhance our chances of contributing our findings to a broader audience.

Best regards,

Authors

审稿意见
5

The authors present Frequency-aware Generative Models for Multivariate Time Series Imputation (FGTI), a model which addresses the challenge of missing data in multivariate time series by focusing on the often-overlooked residual term. The paper also incorporates frequency-domain information to enhance imputation performance. Experiments show that this outperforms many existing time series imputation baselines on three real-world datasets.

优点

S1. The work proposes an approach for time series imputation, which is an important topic for a wide range of applications.

S2. The authors aim to address the influence of frequency-domain information, w.r.t, both high-frequency condition and dominant-domain condition.

S3. Extensive experiments have been carried out.

缺点

W1. In Introduction and Section 3.1.2, it shows less explanation on why frequency components with large amplitudes could guide both the imputation of trend and seasonal terms. Did it address the intricately entangled trend-seasonal representations, which is highly important in current approaches?

W2. In the section of Related Work, I did not find discussions regarding existing research work and this study in frequency domain for time series tasks, this would be hard to clearly justify the main difference/novelty of this paper compared with current approaches.

W3. In the evaluation, only RMSE and MAE are used as metrics. However, it would be better to include additional metrics such as CRPS.

W4. It would be fairer if the authors compared their approach with a time series model using frequency domain information, which also addresses the trend, seasonal, and residual terms.

W5. Both CSDI and PriSTI address the issue of missing strategies. From the results of Figure 4, 7 and 8, the improvement of MAR and MNAR was much lower than that of MCAR on the Guangzhou and Physio datasets. Was it due to differences in the data distribution, or was it due to the setting of the hyper-parameters?

[1] Learning Latent Seasonal-Trend Representations for Time Series Forecasting. [2] Frequency-domain MLPs are More Effective Learners in Time Series Forecasting [3] CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation [4] PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation

问题

Please see Weaknesses.

局限性

  1. My main concern with the work is that the contribution of the proposed method might be incremental. Compared with CSDI, frequency domain filters are utilized to incorporate frequency domain information, and further cross-domain representation learning and frequency-aware diffusion framework are natural implementations, which lack additional novelty.
评论

5. (W5) What cause lower MAR and MNAR improvements compared to MCAR for Guangzhou and PhysioNet

Thank you for your question. MCAR, MAR and MNAR reflect different missing scenarios in reality. For MCAR, the missing value is not dependent on other attributes. For MAR, the probability of missingness depends only on available information (e.g. missing records of traffic flow data may be related to rush hour, activities in public places). For MNAR, the missing observation depends on othe the attribute itself (e.g. reliability of recording devices). The effect of the models on different mechanisms depends on the temporal and attribute relations they capured in the dataset. Under the MAR and MNAR mechanisms, the critical temporal or attribute dependencies are more likely to be missing compared to MCAR, and therefore models may perform not well for MAR and MNAR. This also reflects the importance of introducing frequency domain information to guide the imputation.

6. (L1) The contribution of the proposed method might be incremental

Thank you for your comments. As far as we know, this is the first work to recognize the insight of paying special attention to residual term when impute time series, and introduce frequency-domain information to guide generative models. We choose to implement FGTI through the diffusion model because diffusion models are currently demonstrating excellent performance in several fields [17,23]. The frequency domain filters and the cross-domain representation learning module can be flexibly ported to other generative models. We think this insight can shed light into potential future directions in time series imputation. Therefore, we argue that our FGTI is not an incremental work.

References:

[a] MSTL: A Seasonal-Trend Decomposition Algorithm for Time Series with Multiple Seasonal Patterns

[b] A wavelet-based approach for imputation in nonstationary multivariate time series

[c] Rethinking general time series analysis from a frequency domain perspective

[d] FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

[e] Frequency-domain MLPs are More Effective Learners in Time Series Forecasting

[f] Learning Latent Seasonal-Trend Representations for Time Series Forecasting

评论

4. (W4) Compared with the frequency domain method, and the method address the three terms

Thank you for the valuable suggestions. We add LaST [f] and FreTS [e] as baselines for comparison. Since they focus on the time series forecasting task, we adapt them to the imputation task based on the TimesNet [48] setting. Furthermore, we have considered the TIDER model which takes into account trend, seasonal and residual terms in the MF process. We first report the part of updated Table 1 for the added baselines:

 Miss. RateMetricLaSTFreTSFGTI
KDD10%RMSE0.4730.6300.406
 MAE0.2870.4120.149
20%RMSE0.5320.7410.451
 MAE0.3100.4890.161
30%RMSE0.5740.7960.448
 MAE0.3500.5460.176
40%RMSE0.6340.8500.478
  MAE0.3930.5910.205
Guang.10%RMSE0.3470.4560.230
 MAE0.2440.3400.170
20%RMSE0.4400.6020.258
 MAE0.3120.4600.176
30%RMSE0.5450.7090.291
 MAE0.3880.5470.202
40%RMSE0.6370.7870.356
  MAE0.4580.6110.254
Phy.10%RMSE0.7680.8040.580
 MAE0.5160.5400.286
20%RMSE0.7860.8250.577
 MAE0.5500.5760.309
30%RMSE0.8250.8610.624
 MAE0.5780.6030.336
40%RMSE0.8500.8830.669
  MAE0.6030.6260.376

We then report the results for varying missing mechanisms with 10% missing values in the following table, and update Fig.7, Fig.8 and Fig.9:

 Miss. Mech.MetricLaSTFreTSFGTI
KDDMCARRMSE0.4730.6300.406
 MAE0.2870.4120.149
MARRMSE0.4730.6300.406
 MAE0.2870.4120.149
MNARRMSE0.6190.8090.499
 MAE0.3260.4730.174
Guang.MCARRMSE0.3470.4560.230
 MAE0.2440.3400.170
MARRMSE0.3270.4650.218
 MAE0.2340.3560.150
MNARRMSE0.3090.4410.200
 MAE0.2270.3370.140
Phy.MCARRMSE0.7680.8040.580
 MAE0.5160.5400.286
MARRMSE0.3090.4410.200
 MAE0.2270.3370.140
MNARRMSE0.7680.8040.580
 MAE0.5160.5400.286

After this, we report the resource consumption over KDD dataset with 10% missing values in the following table, and update Fig.5:

GPU Consumption Usage (MiB)Running Time (s)
LaST4444.63
FreTS11189.09

Finally we report the updated downstream application experiments in the following table, and update Fig.6:

Air quality prediction (RMSE)Mortality forecast (AUC)
LaST0.640.80
FreTS0.650.80
FGTI0.590.86

We can find that our FGTI still outperforms all baselines due to the guidance of high-frequency information and dominant-frequency information, as well as the generating ability of the diffusion model.

评论

3. (W3) Include additional metrics such as CRPS

Thank you for the thoughtful comments and valuable suggestions. Since the CRPS is used to evaluate probabilistic imputation baselines, we report the CRPS for all probabilistic imputation baselines according to the experimental settings in CSDI [43]and PriSTI [26]. First, we report the CRPS performance with different missing rates in the following table:

DatasetMiss. RateMIWAEGPVAETimeCIBGAINCSDISSSDPriSTIFGTI
KDD10%0.5240.4430.4660.7090.2240.3520.2320.158
20%0.5260.4450.4670.7180.2450.3700.2480.170
30%0.5320.4470.4690.7290.2590.3740.2680.186
40%0.5300.4570.4710.7460.2780.4010.3010.216
Guang.10%0.3120.3330.3600.6920.2650.3160.2090.155
20%0.3120.3300.3570.6940.2770.2990.2440.168
30%0.3120.3350.3560.6950.2920.3530.3100.193
40%0.3120.3330.3580.6970.3240.3820.3620.243
Phy.10%0.6890.6590.4660.7390.5440.6170.4440.343
20%0.7170.6650.4670.7610.5890.6650.4570.369
30%0.7500.6740.4690.7870.6270.6300.4670.389
40%0.7790.6800.4710.8140.6710.6760.4910.441

Then we report the CRPS by varying the missing mechanism with 10% missing values in the following table:

DatasetMiss. Mech.MIWAEGPVAETimeCIBGAINCSDISSSDPriSTIFGTI
KDDMCAR0.5240.4430.4660.7090.2240.3520.2320.158
MAR0.5390.4370.4700.7100.2290.4890.2390.164
MNAR0.6150.4350.4900.7150.2440.4560.2520.174
Guang.MCAR0.3120.3330.3600.6920.2650.3160.2090.155
MAR0.2580.3340.2980.6920.2520.3670.2080.148
MNAR0.2410.3400.2940.6930.2510.2670.2100.144
Phy.MCAR0.6890.6590.4660.7390.5440.6170.4440.343
MAR0.6790.6600.5930.7390.5500.7240.4540.356
MNAR0.7260.6550.6080.7430.5660.7150.4760.366

Combining the results in Table 1 and Fig.4, Fig.7 and Fig.8, it can be found that the variations of CRPS are basically the same as RMSE and MAE for a specific model with different settings. Thanks again to the authors for their valuable suggestions. We will add this part of the experiment in the Appendix.

作者回复

Thank you for the thoughtful comments and valuable suggestions. Below, we provide our response to the questions and concerns.

1. (W1) Why large amplitude frequency components guide trend and seasonal term imputation

Thank you for your comments. We recognise that our current expression is insufficient. In STL decomposition, the dominant frequency components(with large amlitude) typically correspond to the trend and seasonal terms, while the high-frequency information is often associated with the residual term [a]. This is because trend captures long-term movements, and seasonal captures major period patterns. We also add an empirical evidence on the pre-decomposed KDD dataset in following table:

ComponentTrendSeasonalResidual
RMSEMAERMSEMAERMSEMAE
w/o Frequency condition0.0480.01550.05720.03640.51320.2975
w/o Dominant-frequency filter (Preserving high-frequency information)0.04820.01570.05330.03340.49560.2814
w/o High-frequency filter (Preserving dominant-frequency information)0.04090.01430.04850.03010.51290.2912
FGTI0.04480.01590.05230.03250.50680.2885

This suggests that the Trend term mainly corresponds to the dominant-frequency information, the Seasonl term mainly corresponds to the dominant-frequency information, but also contains some of the high-frequency information, the Residual term mainly corresponds to high-frequency information. Since we choose transformer as the encoder and utilize cross-attention as the fusion mechanism of the two frequency domain information, our method can self-adaptively adjust the weights of the high-frequency information and the dominant-frequency information for the characteristics of the three terms in different datasets.

And we will add the content of the above expressions in Section.1 and Section 3.1.2.

2. (W2) Discuss existing frequency domain research for time series tasks

Thank you for your valuable suggestions for improving the quality of the manuscript. Our current Related work section mainly focuses on the currently state-of-the-art time series imputaiton methods. For the time series imputation methods in frequency domain, mvLSWimpute [b] utilizes wavelet transforms to guide imputation, APDNet [c] uses the Fourier Temporal and Fourier Variable Interaction modules to model dependencies. In addition, the frequency domain time series forecasting methods FEDformer [d], FreTS [e] can also be applied to imputation task. However, they did not consider how to use frequency domain information to accurately model the residual terms of the missing data, which is critical for boosting the overall imputation performance. In contrast, our FGTI captures high-frequency information and dominant-frequency information to get a more accurate modeling of the residual term, while assisting in describing trend and seasonal terms.

We will add to the discussion on imputation methods in frequency domain to the related work section.

作者回复

We sincerely thank all the reviewers for their thoughtful and constructive comments. We are greatly encouraged that they found our idea and contributions to be significant (Reviewer WWjA, qcDA, Q2t8 and SUMs), and technical sound (Reviewer WWjA, qcDA, SUMs). We are grateful that they identified our method to be effective (Reviewer WWjA, qcDA, Q2t8 and SUMs) and our paper to be well-written (Reviewer WWjA, qcDA, Q2t8). However, we believe that there are still several questions and concerns mentioned in the reviews that need to be addressed. Meanwhile, we also revised the paper and the appendix according to the reviewers' valuable suggestions. The main changes are as follows:

  • We add several competitive imputation methods (i.e. LaST, FreTS, TimeCIB) recommended by reviewer WWjA and qcDA in our experiments.
  • Thanks to the comments from WWjA and SUMs,we add a section about experimental results of CRPS metric as suggested by reviewer .
  • We also add a case study exploring the role of high-frequency condition and dominant-frequency condition in the Appendix, relying on the comments of Reviewers WWjA and qcDA.
  • We provide the experiment analysis of the mask ratio and mask pattern in the Appendix as recommended by Reviewer qcDA.
  • We check the full manuscript thoroughly and improve the presentation and grammar as recommended by Reviewer qcDA and Q2t8.
  • As shown in the attached pdf file, we update Fig. 1 according to Reviewer qcDA's suggestion and add a figure presenting the detailed architecture of the denoising network according to Reviewer Q2t8's comment.

Best,

Authors

最终决定

This paper proposes a method called Frequency-aware Generative Models for Multivariate Time Series Imputation (FGTI). The approach aims to more carefully deal with high-frequency residual components when performing time series imputation tasks. The paper has uniformly positive reviews. The reviewers indicate that the paper is generally well-written and well-motivated. The experiments are extensive with a large and representative set of prior approaches and strong results. Some additional baselines and ablations were suggested and the authors provided them during the rebuttal phase. The reviewers who checked in during the author response indicated that their concerns were addressed by the author response.