5.3

/10

Poster4 位审稿人

最低4最高7标准差1.1

4.0

置信度

正确性3.0

贡献度3.3

表达3.0

NeurIPS 2024

Addressing Spatial-Temporal Heterogeneity: General Mixed Time Series Analysis via Latent Continuity Recovery and Alignment

Jiawei Chen,Chunhui Zhao

OpenReview PDF

提交: 2024-05-14更新: 2024-12-24

TL;DR

A mixed time series analysis framework, which recovers and aligns latent continuity of mixed variables for complete and reliable spatial-temporal modeling, being amenable to various analysis tasks.

摘要

关键词

Mixed Time SeriesGeneral Time Series ModelingSpatial-Temporal HeterogeneityLatent Continuity RecoveryAdversarial Learning

评审与讨论

审稿意见

评分: 7置信度: 42024-06-25

The authors propose to model mixed time series with both continuous variables and discrete variables by constructing latent continuous variables (LCVs) from discrete variables (DVs). Several super-supervised learning constraints are proposed to help improve the effectiveness of LCVs as well as the co-learning of LCVs and CVs with both cross-attention and self-attention modules.

优点

Clarity in paper-writing. Both figures and languages are easy to understand and well-polished.
Originality. The idea is novel, easy to follow, and most importantly, it makes sense.
Quality. The paper is conducted on various datasets for various tasks to support its claim as a general model.
Code is released and sufficient training details are provided.

缺点

The paper lacks evidence to support the effectiveness of the proposed self-supervised objective functions. There is no theoretical or strong intuition behind them.

1.1 For example, why does Temporal Adjacent Smoothness Constraint matter? I know from experiments it seems necessary, but intuitively and by commonsense, some discrete variables could indeed have sudden changes. In your example of meteorology, rainfall could indeed suddenly stop at a time point because the cloud moves from where the measurement device is. How do you defend your model in case where a DV is indeed subject to sudden changes by its nature? Moreover, for this constraint, do we have to use this specific formulation, because minimizing every two adjacent points could be too strong? How about K-lipschitz continuity as constraint, where k can be determined by the CVs? I would love to see some discussion on why you select this formulation specifically.

1.2 In addition, the adversarial framework that discriminates between CV and LCV seems weak. There is no guarantee that it will work in theory, as GAN-like structures are unstable. Even if the optimization is stable, in what condition would a model think "I cannot discriminate between CV and LCV"? What specific properties are in CVs that LCV could learn to trick a discriminator? For facial recognition, we can tell if the object in an image is like a real human face or not, based on whether some details are distorted. For time series, how to interpret that? If at some time points, the time series are experiencing abnormal variations, that does not mean it is “fake”. What specific properties are you trying to look into when you design this loss function? If you just want the LCVs to reflect some of the variations in the CVs, you could design some simpler constraints just like your Temporal Adjacent Smoothness Constraint, correct? I would love to see some discussions.

Some issues with presentation. Variable z is first used in Eq.2 and then in Eq.4, but it was never explained in text. Usually, the audience is much more interested in the error bars instead of showing both MAE and MSE, as both metrics are extremely similar. Please adjust accordingly.
Some issues with the experimental setting. The reconstruction loss is essential, but the key issue is, is the generated LCVs actually correct? Wouldn't it be wonderful if you could test on some datasets where some DVs are discretized from CVs, and you can try to recover the CVs and plot the results? If the accuracy of actually recovering CVs is not guaranteed, we do not need to have to “recover” a continuous time series, but we can obtain some latent embedding from the DV, which should also work well by intuition. Therefore, this experiment should be added as a demonstration that recovering LCV is possible. It would be a prerequisite to prove it is helpful.

问题

The paper lacks evidence to support the effectiveness of the proposed self-supervised objective functions. There is no theoretical or strong intuition behind them. For example, why does Temporal Adjacent Smoothness Constraint matter? I know from experiments it seems necessary, but intuitively and by commonsense, some discrete variables could indeed have sudden changes. In your example of meteorology, rainfall could indeed suddenly stop at a time point because the cloud moves from where the measurement device is. How do you defend your model in case where a DV is indeed subject to sudden changes by its nature? Moreover, for this constraint, do we have to use this specific formulation, because minimizing every two adjacent points could be too strong? How about K-lipschitz continuity as constraint, where k can be determined by the CVs? I would love to see some discussion on why you select this formulation specifically.

In addition, the adversarial framework that discriminates between CV and LCV seems weak. There is no guarantee that it will work in theory, as GAN-like structures are unstable. Even if the optimization is stable, in what condition would a model think "I cannot discriminate between CV and LCV"? What specific properties are in CVs that LCV could learn to trick a discriminator? For facial recognition, we can tell if the object in an image is like a real human face or not, based on whether some details are distorted. For time series, how to interpret that? If at some time points, the time series are experiencing abnormal variations, that does not mean it is “fake”. What specific properties are you trying to look into when you design this loss function? If you just want the LCVs to reflect some of the variations in the CVs, you could design some simpler constraints just like your Temporal Adjacent Smoothness Constraint, correct? I would love to see some discussions.

局限性

I do not see limitations are discussed.

作者回复

2024-08-05

We deeply appreciate Reviewer nZaK's positive acknowledgment of our work's originality, clarity, and quality. We are especially grateful for the detailed and insightful feedback provided. Rest assured, we are dedicated to addressing your concerns and enhancing our work.

W1.1&Q1: Smoothness constraints and K-Lipschitz continuity

Thanks for your insightful comments. These are all great points. We would like to address them with the following aspects:

Purpose of the Smoothness Constraint: Consider an ideal continuous variable should be be equipped with interpretable autocorrelation or smooth variations, we design $\mathcal{L} _{\mathrm{Smooth}}$ to promote the recovered LCVs to achieve this. Also, we implement $\mathcal{L} _{\mathrm{Smooth}}$ with an adjustable coefficient $\lambda _1$ . If a DV is known to commonly undergo sudden changes, $\lambda _1$ can be reduced accordingly.
Collaboration and Mutual Restraint with Other Losses: Essentially, the smoothness loss works synergistically with other losses, such as reconstruction and task losses. While the smoothness loss aims to promote smooth changes, it is balanced by the other losses to prevent over-smoothing, which could negatively impact the performance of reconstruction and downstream tasks. That is to say, our model can adaptively learn a proper smoothness degree. We elaborated this point in $\underline{\text{Section 3.5 on Page 6}}$ of our paper, which may be of interest to you. Also, the sensitivity analysis of $\lambda _{1}$ presented in $\underline{\text{Figure 17 on Page 22}}$ verifies its robustness.
K-Lipschitz Continuity as Constraints: We greatly appreciate your invaluable suggestion of using K-Lipschitz continuity, which is a wonderful idea and allows us to use CVs as guidance to determine the smoothness degree. We can formulate it as $\mathcal{L}_{\mathrm{smooth}}^{K\text{-Lipschitz}}$ =

$\sum_{t=1}^{T-1} \max( 0,|x_{t}^{LCV}-x_{t+1}^{LCV}|-K)$ ,

This term penalizes the consecutive points that exceed the bound $K$ , which is determined by CVs. Empirically, we conducted experiments on the ETTh1&h2 dataset to verify its effectiveness as in $\underline{\text{Table 4 in global response PDF}}$ , which shows comparable performance to our original formulation. In light of your suggestions, we promise to add the relevant results discussions to our paper.

W1.2&Q2: About the adversarial discrimination framework

Thank you for raising these excellent points. We are glad to address your concerns as follows:

Discriminating Features in Time Series: For the recovered LCVs, we believe that if they present similar temporal variation features (e.g., autocorrelation, periodicity) and statistical features (e.g., distributions) like those in CVs, they can trick the discriminator. Also, we have provided visualization evidence in $\underline{\text{Figure 1 in global response PDF}}$ , showing the LCVs accurately recover their actual temporal variations and distributions.
Learning to Align Distributions: Actually, due to the difference in information granularity and distributions between CVs and DVs, directly modeling mixed variables would inevitably cause errors. The objective of our adversarial framework is to promote aligning the distributions of LCVs and CVs, akin to Domain Adaptation [1]. This alignment facilitates to model correlation across DVs (represented by LCVs) and CVs, which is crucial for downstream tasks.
About Anomaly Samples: We agree that time series may experience abnormal variations, specifically for anomaly detection tasks. Here we would like to clarify that during the training phase, we focus on leveraging normal samples to capture the typical distributions and temporal variations, which is in line with previous works [2]. In the testing phase when anomaly may occur, the discriminator is discarded and irrelevant to the LCV recovery process.
Training Stability: We ensure the overall training stability by incorporating other supervision signals, e.g., smoothness loss and task loss, whose synergy and mutual restraint have been discussed in $\underline{\text{Section 3.5 on Page 6}}$ .

W2: Presentation issues

Thanks for your scientific rigor. Actually, we explain $z$ in lines 194~195 of our paper and we will further emphasize it. Also, in light of your suggestion, the error bars of MiTSformer will be included in our paper. You can also refer to $\underline{\text{Table 4 in global response PDF}}$ for relevant results.

W3: The effectiveness of LCV recovery

Thank you for the opportunity to clarify these points and improve our paper. We would like to address your concerns with the following aspects:

Visualization Evidence: As suggested, we have visualized the recovered LCVs and the actual LCVs, together with quantitative deviation analysis. Both results are included in $\underline{\text{Figure 1 and Table 1 in global response PDF}}$ , showing that MiTSformer achieves accurate recovery of LCVs for DVs, which re-establishs their latent fine-grained and informative temporal variations.
Necessity of LCV Recovery: By recovering LCVs, we reduce the information granularity gap between DVs and CVs and align their distributions, enabling us to flexibly and reliably model correlations across CVs and DVs for various tasks.
Directly Modeling DVs: We conducted ablation experiments ( $\underline{\text{Row 5 in Table 4 on Page 9}}$ ) where we directly fused the embeddings of CVs and DVs without LCVs recovery. As presented, the performance in both forecasting ( $\mathrm{MAE}: 0.433 \rightarrow 0.529$ ) and classification tasks ( $\mathrm{Acc}: 71.9 \rightarrow 69.3$ ) significantly decreased, demonstrating the necessity and effectiveness of LCV recovery.

Reference:

[1]. Unsupervised domain adaptation by backpropagation. ICML, 2015

[2]. Anomaly transformer: Time series anomaly detection with association discrepancy. ICLR, 2022.

2024-08-08

I appreciate the authors' detailed response. 40% of my concerns are addressed. I increase the score due to curving the overall low grading of the NeurIPS submissions.

About Smoothness constraints: I was not implying that you must implement K-Lipschitz Continuity as an alternative, but it is okay. Still appreciate the efforts. You still do not reply how your formulation could address time series with sudden changes. I think you will need to highlight this in your paper!! In any case, highlighting a work's limitation is only going to be rewarded instead of being penalized. Please feel free to discuss these in your paper. It means, at least in most time series that are less bumpy, your method will work well.

About the discrimination: I can accept your idea of "similar temporal variation features". But this is very vague and not intuitive. It would be great if you could show examples of these in real data, if the paper gets accepted. It will greatly help the audience understand why it is important and makes sense.

About LCV and CV: Still, I would recommend "Wouldn't it be wonderful if you could test on some datasets where some DVs are discretized from CVs, and you can try to recover the CVs and plot the results?" This will fully dispel my concerns.

2024-08-08

Dear Reviewer nZaK：

We are thrilled to hear that our responses have made a positive impact on the paper. We sincerely apologize for any misunderstanding that may have occurred regarding your question and we appreciate your patience and time in rephrasing these questions. Rest assured, we are delighted to resolve your remaining concerns:

A1. About Smoothness Constraints:

Thanks for your detailed and constructive feedback. We promise to highlight the issues of addressing DVs with inherent sudden changes in our revised paper. Here we would like to address your concerns as follows:

Handling DVs with Inherent Sudden Changes: Our strategy for managing DVs with inherent sudden changes involves two aspects:
1. Adjusting the Weights of Smoothness Loss. We implement the smoothness loss with an adjustable coefficient $\lambda_1$ as $\lambda_1\mathcal{L} _{\mathrm{Smooth}}$ , allowing for flexibility based on the specific characteristics of the DVs. If a DV is known to undergo inherent sudden changes, we can set a relatively small $\lambda_1$ accordingly to reduce the potential impact.
2. Balancing the Smoothness Loss with Restraint from Other Losses. While the smoothness loss aims to promote smooth changes, it is restrained by the other losses to prevent LCVs from being too smooth to be their actual nature. Because over-smoothing for DVs with inherent sudden changes could detrimentally affect other critical loss objectives, such as can not accurately reconstruct the original DVs (i.e., impacting the reconstruction loss $\mathcal{L} _{\mathrm{Rec}}$ in Equation 5) and affect the downstream tasks (i.e., impacting the task loss $\mathcal{L} _{\mathrm{Task}}$ such as classification accuracy). Thereby, these loss terms can balance and constrain $\mathcal{L} _{\mathrm{Smooth}}$ to some extent when encountering DVs with inherent sudden changes.
Limitations: We acknowledge that our framework, despite being applicable in most time series scenarios, may still have limitations, especially for some extreme cases that involve drastic sudden changes. As suggested, we will explicitly highlight the applicable scopes and limitations as follows:
"The smoothness loss in our method is suitable for DVs with sudden changes that are caused by inherent smooth variations. However, it may not adequately account for DVs with inherent sudden changes that are essential characteristics of the dataset. For such cases, we can further leverage less restrictive constraints such as K-Lipschitz continuity to determine the proper smoothness level."

A2. About the Discrimination:

We are delighted that you can agree with our ideas. Also, we greatly agree that real data examples can help to improve the interpretability. We are keen to provide the comparison showcases of the CVs and LCVs recovered from DVs for you in our anonymous Github:https://anonymous.4open.science/r/MiTSformer/Visualization_showcases/Supplementary_Figures_for%20Reviewer_nZaK.pdf. By the way, you can also refer to $\underline{\text{Figure 1 in global response PDF}}$ , which plots the recovered LCVs and the actual LCVs, showing the recovered LCVs are indeed equipped with key temporal variation features like autocorrelations, trends, and periodic patterns, like those in CVs.

A3. About LCV and CV:

Thanks for kindly raising this concern. Our original statements may cause some misunderstandings. Here we would like to kindly clarify that the results you requested might be included in $\underline{\text{Figure 1 in Global Response PDF}}$ ( you can download it at the bottom of Summary of Revisions and Global Response ). Specifically, the results are based on ETTh1 and ETThh2 benchmarks [1][2], which are real-world electric transformer datasets. The original ETT datasets consist of purely continuous variables. To meet our mixed variable setting, we processed these datasets by randomly selecting half of the variables discretizing them into discrete variables (DVs). Such operation also allows us to visualize the recovered LCVs and the actual LCVs. As depicted in the figures, our MiTSformer achieves accurate recovery of LCVs for DVs, which re-establishes their actual fine-grained and informative temporal variations. Also, we conduct quantitative error analysis in $\underline{\text{Table 1 in Global Response PDF}}$ , further validating the accuracy and necessity of LCV recovery.

We hope these revised explanations have addressed your concerns. Please do not hesitate to contact us if we have not answered your question completely. In addition, we would like to express our gratitude for requesting clarification. Your feedback has been immensely valuable in enhancing the overall quality of our work!

Reference

[1]. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. AAAI, 2021.

[2]. Film: Frequency improved legendre memory model for long-term time series forecasting. NeurIPS, 2022.

评论- A supplementary note for Reviewer nZaK

2024-08-14

Dear Reviewer nZaK:

We deeply appreciate the time and effort you have invested in reviewing our paper and providing such invaluable comments. We are also truly grateful for your timely and insightful feedback during the discussion period. In the days following your feedback, we have dedicated ourselves to thoroughly studying your comments. This reflection has significantly deepened our understanding of the critical issues you highlighted. In response, we are eager to offer additional discussions and clarifications to better illustrate our work and effectively address the concerns raised.

Supplement to A1: About Smoothness Constraints:

Purpose of the Smoothness Constraint: we design $\mathcal{L} _{\mathrm{Smooth}}$ to promote the recovered LCVs to be equipped with interpretable autocorrelation or smoothly varying trend properties, which are suitable for DVs with apparent sudden changes that is potentially caused by inherent smooth variations. Intuitively, consider a sine wave $\mathrm{sin}(t)$ with a mean of 0 that is discretized based on a threshold of 0, transforming it into a binary discrete time series as

x^{DV}_t = 1 \text{ if } \sin(t) > 0, \quad 0 \text{ if } \sin(t) \leq 0

Around the threshold $0$ , even small variations would lead to sudden changes of $x^{DV}_t$ to the opposite state, while the underlying variation process of $\mathrm{sin}(t)$ is smooth and continuous. Our smoothness constraint can help to recover its inherent actual continuous nature.

Limitations to handle DVs with Inherent Sudden Changes: We agree that in some cases there are DVs with inherent sudden changes that are essential characteristics of the dataset. Our strategy for managing them involve adjusting the weights $\lambda _1$ of $\mathcal{L} _{\mathrm{Smooth}}$ and balancing the $\mathcal{L} _{\mathrm{Smooth}}$ with restraint from other losses, e.g., $\mathcal{L} _{\mathrm{Rec}}$ and $\mathcal{L} _{\mathrm{Task}}$ , as discussed in our previous reply. Also, we acknowledge that such strategies may still have limitations, especially for some extreme cases that involve drastic sudden changes, and we committed to discussing this issue in our revised paper.

Supplement to A2. About the Discrimination & A3. About LCV and CV:

Visualization Evidence: We would like to kindly remind you that the results you requested, i.e., visualization plots of the recovered LCVs and the actual LCVs, might be included in $\underline{\text{Figure 1 in Global Response PDF}}$ (in the bottom of Summary of Revisions and Global Response). For your convenience, we have also prepared an anonymous link for you as https://anonymous.4open.science/r/MiTSformer/Visualization_showcases/Supplementary_Figures_for%20Reviewer_nZaK.pdf, which plots the actual/recovered LCVs (Figure 1) and other observed CVs (Figure 2). The visualization suggests that the LCVs are accurately recovered and are indeed equipped with key temporal variation features like autocorrelations, trends, and periodic patterns, like those in CVs.
Necessity and Effectiveness of LCV recovery: Due to the difference in information granularity and distributions between CVs and DVs, directly modeling mixed variables would inevitably cause errors. Thereby, in our work, we propose to recover the latent continuous variables (LCVs) behind DVs to reduce the information granularity gap between CVs and DVs and align the distributions. The recovered LCVs can further facilitate the spatial-temporal correlation modeling among mixed variables, and benefit various downstream tasks.

评论- Further clarifications on our methodology and experimental design

2024-08-14

Dear Reviewer nZaK,

We sincerely value the time and effort you have dedicated to reviewing our manuscript and offering such detailed feedback. Additionally, we are immensely grateful for the constructive comments you provided during the discussion process.

As we continue to refine our paper during the rebuttal and discussion period, we wish to discuss a concern raised by Reviewer GdPs regarding our experimental design of adopting benchmark datasets to generate DVs. Reviewer GdPs noted: “I do not understand the necessity and benefit of converting some of them to DVs and then to LCVs. In this experimental setting, I think it is better to simply observe the relationship between CVs.”

We recognize that the concerns raised may stem from an initial lack of clarity in our presentation, which could have obscured the understanding of the rationale and effectiveness of our experimental setting. We are actively working to enhance our explanations to ensure the objectives and benefits of our experimental design are clearly communicated. While we have taken this feedback seriously and have carefully explained these concerns directly in a detailed response to Reviewer GdPs, including both qualitative and quantitative explanations, we also wish to ensure that the rationale behind our experimental setup is clear to all reviewers.

On the one hand, we'd like to clarify our research focuses on modeling mixed time series. Despite being a fundamental issue, this problem remains underexplored in academia, with a lack of specialized benchmark datasets. Thereby, we employed abundant benchmark time-series datasets and implemented discretization to convert some CVs into DVs. We would like to emphasize that such discretization strictly follows our problem formulation and aligns with the generation mechanisms of real-world DVs. Also, the original CVs allow for evaluating the accuracy of the recovered LCVs.

On the other hand, we have carried out experiments on a private thermal power plant dataset composed of CVs and natural DVs that are not modified from continuous signals. This experiment is a mixed time series extrinsic regression task to predict a target temperature value. Due to the lack of actual LCV labels, we can not directly evaluate the accuracy of LCV recovery. Instead, we verified its necessity and effectiveness through the performance of downstream tasks. We'd like to share the experimental results as follows, showing that MiTSformer outperforms baselines impressively due to effective LCV recovery.

Metric	MiTSformer	iTransformer	PatchTST	Dlinear	TimesNet
MAE	0.0597	0.0652	0.0638	0.0723	0.0668
RMSE	0.0664	0.0773	0.0732	0.0892	0.0751

Apologies for taking your time. We hope we have clarified our motivation and the rationale behind our setting. It's our honor to engage in such a fruitful discussion with a knowledgeable reviewer like yourself, which has been most beneficial. We'd love to receive more insightful comments from you.

Best regards!

审稿意见

评分: 5置信度: 42024-07-13

• This paper introduces a type of spatial-temporal heterogeneity caused by the gap between continuous variables (CVs) and discrete variables (DVs). • The author introduces latent continuous variables to create a unified continuous numerical space for both CVs and DVs, with the aim to address the heterogeneity caused by the gap between these types of variables. • The Latent Continuity Recovery architecture is innovative.

优点

The introduction of latent continuous variables (LCVs) to bridge the gap between DVs and CVs is an innovative approach. The proposed method for latent continuity recovery through adaptive and hierarchical aggregation of multi-scale adjacent context information is a creative combination of existing ideas.

The code is open-sourced, clean, and well-organized, demonstrating a high standard of technical implementation.

The proposed solution addresses a fundamental issue in spatial-temporal modeling, making it highly relevant for various applications in fields such as precipitation, temperature, humidity, etc.

缺点

The presentation for the transformation process for LCVs can be improved for better clarity. A more explicit and detailed connection between the mathematical formulas and the LCV transformation process would strengthen the clarity and comprehensiveness of the paper.
The formulation and application of the regularization term to encourage smoothness across time need to be better explained.
The experiment is mostly comprehensive, but the analytical explanation of latent variables can be strengthened. This analysis could be interesting to broad audience.

问题

1.How does the proposed method scale with large datasets or in real-time applications? what is the computational complexity of the proposed method? 2. Could you explain more explicit and detailed connection between the mathematical formulas and the LCV transformation process?

局限性

Yes

作者回复

2024-08-05

We sincerely thank Reviewer ZwAv for the positive evaluation of our work's innovativeness, creativity, and technical quality. Your detailed and insightful comments have helped us improve our work substantially. We hope your concerns are addressed.

W1&Q3: Mathematical formulas and the LCV Recovery

Thanks for your kind valuable suggestion. We are glad to provide a more detailed and explicit explanation:

LCV Recovery Network: Our LCV recovery is based on a recovery network composed of residual dilated convolutional neural networks, which hierarchically aggregate multi-scale temporal context information to transform DVs into LCVs. Also, we provided the detailed network structure in $\underline{\text{Appendix A.3 on Page 13}}$ .
Mathematical Formulation: The recovery network receives a DV as input and outputs its LCV, as depicted in $\underline{\text{Figure 4 on Page 5}}$ . This process can be mathematically described as

x^{LC} = \mathrm{Rec}\text{-}\mathrm{Net}(x^{D}) = h_{n}

where the residual dilated convolutional network is defined by the iterative process as

h_{i} = \mathrm{Conv}^{d_{i}}(h_{i-1}) + h_{i-1}, \quad \text{for} \; i = 1, 2, \ldots, n

In this formulation, $h_{i}$ represents the output of the $i$ -th residual block, $\mathrm{Conv}^{d_{i}}$ denotes the convolution operation along the temporal axis to aggregate contextual information with dilation rate $d_{i}$ . The final output, $x^{LC}$ , is the result after $n$ residual blocks. Additionally, we perform z-score normalization for each $x^{LC} \in \mathbb{R}^{1 \times T}$ to ensure training stability as

x^{LC} = \frac{x^{LC}-\mathrm{Mean}(x^{LC})}{\mathrm{Std}(x^{LC})}

where $\mathrm{Mean}$ and $\mathrm{Std}$ denote the mean and the standard deviation along the time axis, respectively. According to your suggestion, we promise to include these detailed formulations in our submission to enhance clarity and comprehensiveness.

W2: Better explaining the smoothness regularization term

We appreciate your constructive feedback. We would like to clarify this issue in detail:

Inspired by the temporal autocorrelation nature of continuous variables, we devise the smoothness constraint loss $\mathcal{L} _{\mathrm{smooth}}$ to regularize the LCV recovery process. Mathematically, we aim to enforce the smoothness of adjacent points of LCVs as

\sum_{t=1}^{T-1}{\mathrm{Abs}( x_{t+1}^{D}-x_{t}^{D}) \cdot (x_{t+1}^{LC}-x_{t}^{LC}) ^2}

where $\mathrm{Abs}\left( x_{t-1}^{D}-x_{t}^{D} \right)$ denotes the absolute difference between two adjacent time points of DVs, whose value can be $\\{0, 1 \\}$ ( $1$ indicates "sudden change" and $0$ indicates "no sudden change"). For computational efficiency, we introduce a constant-valued smoothness matrix $\boldsymbol{S}$ to achieve the above objective by multiplying it with $x^D$ and $x^{LC}$ as

\mathcal{L} _{\mathrm{smooth}}=\left\| \mathrm{Abs}\left( \boldsymbol{S}x^D \right) \otimes \left( \boldsymbol{S}x^{LC} \right) \right\| _{2}^{2}

where $\otimes$ denotes the Hadamard product operation.
Thanks for highlighting this issue, we promise to better explain the smoothness regularization term in our paper.

W3: Strengthening the analytical explanation of LCVs

Thanks for your kind valuable suggestion. For a better investigation of LCV recovery, we have provided both visualization and quantitative analysis of the recovered LCVs and their actual LCVs. You can refer to $\underline{\text{Figure 1 and Table 1 of global response PDF}}$ for the results.

As presented, MiTSformer achieves accurate recovery of LCVs for DVs, which re-establishs their continuous fine-grained and informative temporal variation patterns. By recovering LCVs, we reduce the information granularity gap between DVs and CVs and align their distributions, which facilitates sufficient and balanced correlation modeling of mixed variables and further enhances the performance of various tasks. We promise to add relevant results and discussions to our paper to further validate the effectiveness and necessity of LCV recovery.

Q1: Scale with large datasets or in real-time applications

Thanks for your insightful questions. We would like to address your concerns from the following aspects.

For Scaling with Large Datasets: First, MiTSformer exhibits great computational efficiency as demonstrated in $\underline{\text{Appendix B on Page 20}}$ . Secondly, we can empower MiTSformer with efficient linear self-attention techniques, e.g., Flowformer [1] to further reduce the computational burdens. Furthermore, we can leverage efficient pre-training and fine-tuning frameworks like LoRA [2] for scalability at deployment.
For Real-Time Applications: MiTSformer can process real-time mixed time series data in a similar way to most time series models. In real-time applications, well-trained MiTSformer can be deployed in online systems. When new data points arrive, they can be appended to the historical data to form a full sequence and then fed into MiTSformer to generate online predictions.

According to your feedback, we will supplement the above discussions in our paper.

Q2: Computational complexity of MiTSformer

Thanks for your invaluable feedback. We highly agree that computational complexity is a key issue for investigating the MiTSformer's practicality, and we analyzed it in terms of training time and memory cost, which are included in $\underline{\text{Appendix B on Page 20}}$ . In general, MiTSformer maintains great performance and efficiency, especially for the training time. Enlightened by your feedback, we promise to further emphasize computational complexity analysis in the main text of our paper.

Reference:

[1]. Flowformer: Linearizing transformers with conservation flows. ICML, 2022.

[2]. LoRA: Low-Rank Adaptation of Large Language Models. ICLR, 2022.

评论- A supplementary note for Reviewer ZwAv

2024-08-13

Dear Reviewer ZwAv:

Thanks sincerely for your thorough review and the invaluable feedback you have provided on our paper. Over the past few days, we have been diligently studying and reflecting upon this feedback, gaining a deeper understanding of the issues raised. We recognize that your schedule may be quite demanding, and it seems we have missed the opportunity for a direct discussion. Despite this, we have compiled additional explanations and discussions in response to the issues highlighted in your review, aiming to better articulate our work and address any concerns.

Supplement to W3: Strengthening the analytical explanation of LCVs

Our work focuses on modeling mixed time series that are composed of both continuous variables (CVs) and discrete variables (DVs). Due to the difference in information granularity and distributions between CVs and DVs, directly modeling mixed variables would inevitably cause errors. Thereby, in our work, we focus on recovering the latent continuous variables (LCVs) behind DVs to reduce the information granularity gap between CVs and DVs and align the distributions. The recovered LCVs can further facilitate the spatial-temporal correlation modeling among mixed variables, and benefit various downstream tasks.

Supplement to Q2: Computational complexity of MiTSformer

We analyzed the computational effort of MiTSformer regarding memory footprint, training time, and task performance (MAE) in $\underline{\text{Appendix B on Page 20}}$ of our paper. For your convenience, here we would like to present the full computational efficiency analysis results based on ETTh1 and Electricity datasets as below.

ETTh1	MiTSformer	iTransformer	ModernTCN	TimesNet	PatchTST	Crossformer	MICN	LightT	Dlinear	FiLM	FEDformer	Pyraformer
MAE	0.442	0.460	0.445	0.473	0.459	0.443	0.498	0.496	0.444	0.514	0.469	0.567
Time costs (s/iter)	0.065	0.061	0.062	0.375	0.063	0.112	0.117	0.055	0.047	0.089	0.282	0.068
Memory (Mb)	1397	1357	1395	8469	1409	2553	2419	1369	1331	1947	2531	1283

Electricity	MiTSformer	iTransformer	ModernTCN	TimesNet	PatchTST	Crossformer	MICN	LightT	Dlinear	FiLM	FEDformer	Pyraformer
MAE	0.291	0.331	0.306	0.318	0.327	0.371	0.319	0.368	0.343	0.361	0.385	0.401
Time costs (s/iter)	1.837	2.348	2.346	3.213	2.497	7.516	2.343	2.279	2.498	7.298	7.327	2.2
Memory (Mb)	7503	2447	3035	7771	3813	23645	2925	1917	1801	16185	3031	6755

In general, we can observe our MiTSformer maintains great performance and efficiency compared with most baselines in datasets with a relatively small number of variables. When encountering datasets with a relatively large number of variables (Electricity), MiTSformer occupies a relatively large memory footprint, while the training time of MiTSformer is still efficient.

审稿意见

评分: 5置信度: 42024-07-18

The paper "Addressing Spatial-Temporal Heterogeneity: General Mixed Time Series Analysis via Latent Continuity Recovery and Alignment" introduces MiTSformer, a framework designed to address the challenges of mixed time series (MiTS) data, which include both continuous variables (CVs) and discrete variables (DVs). The framework recovers latent continuous variables (LCVs) behind DVs to ensure sufficient and balanced spatial-temporal modeling. MiTSformer employs hierarchical aggregation of temporal context and adversarial learning to align DVs with CVs. The framework is validated on five MiTS analysis tasks, showing state-of-the-art performance across multiple datasets.

优点

Novel Approach: The introduction of latent continuous variables (LCVs) to handle the spatial-temporal heterogeneity in mixed time series is innovative and addresses a significant gap in current methodologies. Comprehensive Framework: MiTSformer is versatile, capable of handling various tasks such as classification, regression, anomaly detection, imputation, and long-term forecasting. Robust Performance: The framework demonstrates superior performance across a wide range of datasets and tasks, indicating its robustness and effectiveness. Detailed Analysis: The paper provides thorough empirical evaluations and ablation studies, which validate the effectiveness of the proposed method.

缺点

Complexity: The framework's complexity may pose implementation challenges for practitioners, potentially limiting its accessibility and usability. Data Dependency: The performance heavily relies on the quality and diversity of the training data, which may limit its applicability in scenarios with limited data availability. Scalability: While the method shows promising results, its scalability to very large datasets or real-time applications is not fully demonstrated. Limited Explanation of Hyperparameters: The paper could benefit from a more detailed explanation of the choice and tuning of hyperparameters, which is crucial for replication and practical application.

问题

How does MiTSformer handle highly non-linear temporal patterns in both CVs and DVs? Can the framework be extended to handle non-binary discrete variables, and if so, how? What are the computational requirements for training MiTSformer on large-scale datasets? How does the framework perform in real-time applications where data arrives sequentially?

局限性

The paper presents a compelling and innovative approach to handling mixed time series data. However, the complexity of the method, data dependency, and scalability issues suggest that further development and validation are needed to ensure its broad applicability and practical utility.

作者回复

2024-08-05

We deeply appreciate Reviewer 3NoE's positive acknowledgment of our methodology's innovation, comprehensiveness, effectiveness, and experimental robustness. We are especially grateful for the constructive and insightful feedback provided. Rest assured, we are dedicated to addressing your concerns.

W1: Implementation and usability

Thanks for kindly raising this concern and we would like to address it for you below:

Necessity of Each Module: We would like to emphasize that mixed time series modeling differs significantly from typical time series tasks due to variable heterogeneity. Thereby, each component in MiTSformer is essential to address it via latent continuity recovery and alignment. Our experiments, particularly the ablation studies displayed in $\underline{\text{Table 4 on Page 9}}$ further justify this point.
User Accessibility: Like most deep time series models, MiTSformer can be trained and deployed in an end-to-end manner. Also, we have tried our best to make our framework accessible to practitioners by providing open-source and well-organized codes and scripts with sufficient implementation details ( $\underline{\text{Appendix A on Pages } 13\sim 19}$ ) to facilitate ease of use.

In light of your feedback, we will add relevant clarifications to our paper.

W2: Data dependency

You are right that the performance of MiTSformer, like most deep learning models, benefits from high-quality training data. However, we can empower MiTSformer's robustness under data limitations with advanced techniques such as Data Augmentation [1] and Self-supervised Pre-Training [2]. We promise to update these discussions in our paper.

W3: Scalability to very large datasets

This is an interesting question. We would like to elaborate it with the following points:

Computational Efficiency: As demonstrated in the computational efficiency analysis in $\underline{\text{Appendix B on Page 20}}$ , MiTSformer exhibits great efficiency concerning memory footprint and training time, which ensures the feasibility for very large datasets.
Fast Attention: MiTSformer adopts self- and cross-attention to model variable correlations, which is the primary source of computational load. When encountering datasets with a large number of variables, we can adopt efficient linear attentions, e.g., Flowformer [3] to further reduce the computational burdens.
Efficient Pre-training and Fine-tuning: We can empower MiTSformer with efficient pre-training and fine-tuning frameworks like LoRA [4] to further enhance the scalability.

Enlightened by your feedback, we will include these discussions in our paper.

W4: Explanation of Hyperparameters

We highly agree that hyperparameters are key issues for MiTSformer. Owing to page constraints, we had to provide the details of the hyperparameter setting in $\underline{\text{Appendix A on Pages 13-15}}$ , and hyperparameter sensitivity analysis in $\underline{\text{Appendix C on Pages 20-22}}$ . In summary, we find that MiTSformer is relatively stable in the selection of $d_{model}$ and $l_{layer}$ . Also, MiTSformer is quite robust to the weights of loss items (i.e., $\lambda_1$ , $\lambda_2$ , and $\lambda_3$ ), and moderate weights (e.g., $0.3 \sim 1.0$ ) would bring optimal performance. Also, we promise to further emphasize hyperparameter analysis in the main text.

Q1: Handle non-linear patterns

MiTSformer effectively handles highly non-linear temporal patterns in both CVs and DVs via Spatial-temporal attention blocks as depicted in $\underline{\text{Figure 5 on Page 6}}$ , where self-attention and cross-attention can model the nonlinear spatial-temporal correlations within LCVs and CVs and across LVs and CVs, respectively.

Q2: Handle non-binary discrete variables

This is an insightful and interesting question. A quick answer is "yes". Actually, DVs can take on multiple states ( $\geq 2$ ) that reflect the magnitude and can be directly input into the recovery network to obtain LCVs outputs. It is noted that more states in a DV imply richer information granularity. We chose the binary scenarios in our experimental settings as they are the most challenging and commonly encountered in the real world.

Also, we conducted experiments on classification datasets with various numbers of discrete states in $\underline{\text{Table 2 of global response PDF}}$ , showing that as the number of states increases, accuracy improves due to the richer information. The above results and discussions will be included in our paper.

Q3： Computational efficiency

We highly agree that computational complexity is a key issue for MiTSformer's practicality, and we analyzed it in terms of training time and memory cost, which are included in $\underline{\text{Appendix B on Page 20}}$ . In general, MiTSformer maintains great performance and efficiency. Enlightened by your feedback, we promise to further emphasize the analysis of computational efficiency in the main text.

Q4: Real-time applications

Thanks for your valuable question. MiTSformer can process sequentially arrived data in a similar way to most time series models. For example, well-trained MiTSformer can be deployed in real-time systems. When new data sequentially arrive, they are appended to the historical data to form a full sequence, which is then fed into the well-trained model to generate online predictions.

Reference：

[1]. Self-supervised contrastive representation learning for semi-supervised time-series classification. IEEE TPAMI, 2023.

[2]. TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling. ICML, 2024.

[3]. Flowformer: Linearizing transformers with conservation flows. ICML, 2022.

[4]. LoRA: Low-Rank Adaptation of Large Language Models. ICLR, 2022.

评论- A supplementary note for Reviewer 3NoE

2024-08-13

Dear reviewer 3NoE:

We would like to express our gratitude to you for taking the time to review our paper and providing invaluable feedback. Over the past few days, we have been diligently studying and reflecting upon this feedback, gaining a deeper understanding of the issues raised. Although it appears that you may be occupied in this period and we were unable to engage in a discussion to address the concerns directly, we would like to provide additional discussions and results in response to the review questions raised to better illustrate our work and address the concerns.

Supplement to W2 - Data Dependency:

To enhance MiTSformer with better robustness against low-quality data, we can empower it with advanced techniques designed for tackling data limitations, including:

Data Augmentation [1]: We can utilize data augmentation techniques to enhance the diversity of the training data, which helps improve model robustness.
Efficient Pre-Training [2]: MiTSformer can be pre-trained on large, publicly available datasets with self-supervision and then fine-tuned on smaller datasets, ensuring good performance even with limited data.
Domain Adaptation[3]: We can adjust MiTSformer to generalize better across different but related datasets, improving its robustness to varying data distributions.

These strategies are proven effective in handling data limitations and collectively ensure that MiTSformer remains effective and adaptable, even in data-constrained environments. Also, we promise to update the discussions in the final version.

Supplement to Q2: Handle non-binary discrete variables (DVs)

Our MiTSformer can effectively handle non-binary DVs, which can be directly inputted into the recovery network of MiTSformer to obtain LCVs outputs. Also, more states in a DV imply richer information granularity (with an infinite number of states representing continuous variables). In our previous response, we provided preliminary experimental results in $\underline{\text{Table 2 of global response PDF}}$ . Over the past few days, we have conducted further experiments to enrich these results and have made significant improvements, particularly in the application of anomaly detection tasks. Here we would like to share these results with you as follows:

Mixed Time Series Classification Results (Accuracy)

Dataset	Num of DV states
	$N_{\mathrm{DVs}}= 2$	$N_{\mathrm{DVs}}= 4$
EthanolConcentration	30.4	30.4
FaceDetection	67.9	68.3
Handwriting	22.6	23.1
Heartbeat	74.6	74.1
JapaneseVowels	94.6	95.9
PEMS-SF	93.1	92.5
SelfRegulationSCP1	91.1	92.8
SelfRegulationSCP2	60.0	61.3
SpokenArabicDigits	98.5	98.7
UWaveGestureLibrary	86.3	85.9
Average	71.9	72.3

Mixed Time Series Anomaly Detection Results

Dataset	Metric	Num of DV states
		$N_{\mathrm{DVs}}= 2$	$N_{\mathrm{DVs}}= 4$
SMD	Precision	88.92	88.37
	Recall	86.78	87.92
	F1-score	87.84	88.14
MSL	Precision	90.66	90.81
	Recall	80.54	81.22
	F1-score	85.30	85.75
SMAP	Precision	96.71	96.82
	Recall	72.21	75.26
	F1-score	82.69	84.69
SWaT	Precision	96.31	94.33
	Recall	95.98	94.82
	F1-score	96.15	94.57
PSM	Precision	97.88	98.41
	Recall	94.85	95.62
	F1-score	96.83	96.99

The results show that as the number of discrete states $N_{\mathrm{DVs}}$ increases, the classification accuracy and anomaly detection performance improve due to the richer information that can be observed.

Reference:

[1]. Self-supervised contrastive representation learning for semi-supervised time-series classification. IEEE TPAMI, 2023.

[2]. TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling. ICML, 2024.

[3]. CauDiTS: Causal Disentangled Domain Adaptation of Multivariate Time Series. ICML, 2024.

2024-08-14

Thanks for your response, I will increase my rating

2024-08-14

Dear Reviewer 3NoE,

We are immensely grateful for your positive recognition of our work and are deeply honored by your decision to increase the score. Guided by your suggestions, we have reflected on our work and have significantly deepened our understanding of the critical issues you highlighted, which have further improved the quality of our work.

We express our deepest gratitude for your time, effort, and support. Your constructive feedback has been immensely invaluable in enhancing the overall quality of our work. We are committed to incorporating all the discussions and experiment results into our revised paper.

Thank you once again for your support and thoughtful guidance! Best Regards!

审稿意见

评分: 4置信度: 42024-07-24

The MiTSformer framework includes Latent Continuity Recovery, which recovers latent continuous variables (LCVs) from discrete variables (DVs) using multi-scale temporal context and adversarial guidance, and Spatial-Temporal Attention Blocks, which capture dependencies within and across LCVs and continuous variables (CVs) through self- and cross-attention mechanisms. The paper explores general mixed time series analysis, addressing spatial-temporal heterogeneity. MiTSformer adapts to recover LCVs and capture spatial-temporal dependencies, demonstrating state-of-the-art performance across five tasks on 34 datasets.

优点

The paper introduces a fresh problem definition by addressing the heterogeneity between continuous variables (CVs) and discrete variables (DVs) in mixed time series analysis. The MiTSformer framework effectively recovers latent continuous variables (LCVs) from DVs and captures spatial-temporal dependencies, offering a balanced and comprehensive modeling approach. The framework is highly effective in handling various types of variables, providing clear insights into their relationships, which is invaluable for designing robust foundation models for time series.

缺点

Firstly, the experimental setup relies on converting continuous variables (CVs) to discrete variables (DVs) for experimentation, rather than using datasets with naturally mixed CVs and DVs. This may introduce unnecessary complexity and potential information loss, weakening the paper's motivation. Secondly, the complexity of the framework, which includes the recovery network and adversarial learning components, may pose challenges for practical implementation and scalability. The claim that the framework effectively recovers the inherent continuous nature of DVs and maintains temporal similarity through adversarial learning lacks robust verification and evidence. The improvements attributed to L_Dis are minimal, as shown in Table 4's ablation study, and further analytical reasons should be provided to support these findings.

MiTSformer

问题

The paper claims that using MiTSformer allows for freedom from the strict limitations of mixed naive Bayes models and variational inference previously used to match the distributions of continuous variables (CVs) and discrete variables (DVs). However, the reasons for this claim are not clearly explained. A detailed explanation is needed to justify why MiTSformer is less restrictive. Additionally, a performance comparison between MiTSformer and these traditional methods would be beneficial to substantiate its effectiveness.

Secondly, did you compare the latent continuous variables (LCVs) obtained from randomly sampled DVs in the forecasting tasks with the actual CV information? Such an analysis would be interesting and could provide deeper insights into the model's performance.

局限性

One limitation mentioned is that the framework cannot be directly applied to categorical discrete variables. In practice, many time series datasets combine categorical data with time series data, such as event data and product sales demand. This limitation is significant since these types of datasets are common in real-world applications.

作者回复

2024-08-05

We sincerely thank Reviewer GdPs for the favorable recognition of our work's problem setting, effectiveness, and invaluable contribution to designing robust time series foundation models. We deeply appreciate your detailed and perceptive feedback. Rest assured, we are committed to addressing your concerns and improving our work.

W1: Experimental settings of DVs

Thanks for raising this insightful point. Actually, in the real world, many DVs originate from latent CVs due to constraints like measurement and storage limitations. We elaborate on this in $\underline{\text{Line } 47 \sim 58 \text{ and Figure 2}}$ in our paper, which may interest you. Our conversion process simulates the generation process of DVs and discretizes variables while preserving their inherent coupling relationships and properties. This operation also allows us to compare the recovered LCVs with the actual ones, which are investigated in the responses to W3&Q2 for you. In light of your feedback, we will add relevant clarifications to our paper.

W2: Practical implementation and scalability

We appreciate your valuable insights and are glad to address your concerns below.

Necessity: We would like to emphasize that mixed time series modeling differs largely from typical time series tasks due to the variable heterogeneity. Each module in our MiTSformer is essential to address this problem. Specifically, the recovery network is a lightweight module but is indispensable for recovering LCVs. The ablation study displayed in $\underline{\text{Table 4 on Page 9}}$ also justifies its effectiveness.
Computational Efficiency: We analyzed computational effort in $\underline{\text{Appendix B on Page 20}}$ , demonstrating the great efficiency of MiTSformer. Additionally, we conducted an experiment for the recovery network based on ETTh1, showing that the computational costs brought by the recovery network are minimal, while the performance improvements are significant.

	MiTSformer	w/o Recovery Network
MAE	0.442	0.626
Time Costs (s/iter)	0.065	0.060
Memory Footprint (Mb)	1397	1336

Pre-training for Real-world Applications: We can employ self-supervised pre-training for the backbone of MiTSformer, whose parameters can be fixed during deployment and we can fine-tune the task head if needed, ensuring both efficiency and scalability.

Thanks for the opportunity to clarify these issues and we will add them to our paper.

W3&Q2: Verification of LCV recovery

This is an invaluable and interesting suggestion. As suggested, we have included the relevant visualizations and quantitative evaluations in $\underline{\text{Figure 1 and Table 1 of global response PDF}}$ , showing MiTSformer achieves accurate recovery of LCVs for DVs, which re-establishs their continuous and fine-grained temporal variations. In light of your suggestion, we will add relevant results and discussions to our paper.

W4: Effectiveness of the discrimination loss

Thanks for your rigorous concerns. we design the variable modality discrimination loss $\mathcal{L} _{\mathrm{Dis}}$ to align the feature distributions of CVs and LCVs, facilitating their correlation modeling. The discrimination loss works collaboratively with other loss terms and we discussed their synergy in $\underline{\text{Section 3.5 on Page 6}}$ . The performance decrease without the discrimination loss is not substantial because smoothness loss $\mathcal{L} _{\mathrm{Smooth}}$ and reconstruction loss $\mathcal{L} _{\mathrm{Rec}}$ can promote the recovery of LCVs to some extent. However, incorporating $\mathcal{L} _{\mathrm{Dis}}$ can further provide a consistent performance gain throughout all datasets for various tasks.

Q1: Comparison with mixed naive Bayes (NB) and variational inference (VI)-based methods

Thanks for your constructive feedback. We would like to resolve your concerns regarding the following aspects:

Theoretically, mixed NB and VI methods are typically designed for tabular data [1][2]. They struggle with time series, and often rely on certain assumptions like conditional independence [3], limiting their ability to model the correlations of DVs and CVs [2]. In contrast, MiTSformer leverages the temporal adjacencies to achieve latent continuity recovery, making it capable of handling time series data and effectively capturing inherent nonlinear correlations.
Empirically, we compare MiTSformer against typical mixed NB-based models HVM [1] and VI-based methods VAMDA [2] on mixed time series classification datasets and summarized the results in $\underline{\text{Table 1 of global response PDF}}$ , showing MiTSformer consistently outperforms HVM and VAMDA. We promise to include these results and discussions in our paper.

About the limitations:

Thanks for highlighting this. The original expressions in Appendix D may have caused some misunderstanding. We would like to clarify that our framework is designed for DVs with numerical magnitude by effectively recovering their LCVs. Actually, categorical variables can be divided into those with numerical magnitude (e.g., ordinal categories) and those without numerical magnitude (i.e., nominal categories such as gender). Our framework is effective for the former, which is frequently encountered in real-world time series. For the event data and sales demand that you have mentioned, if categorical data represent varying levels of a quantity (e.g., sales volume), MiTSformer can also effectively handle it. We will revise the relevant clarifications to avoid any confusion.

Reference:

[1]. Hybrid variable monitoring: An unsupervised process monitoring framework with binary and continuous variables. Automatica, 2023.

[2]. A flexible probabilistic framework with concurrent analysis of continuous and categorical data. IEEE TII, 2023.

[3]. Naive Bayesian classification of structured data. Machine learning, 2004

评论- Official Comment by Reviewer GdPs

2024-08-13

The motivation for the proposed method is convincing. However, since the experimental dataset consists entirely of CVs, I do not understand the necessity and benefit of converting some of them to DVs and then to LCVs. In this experimental setting, I think it is better to simply observe the relationship between CVs.I am concerned about how much continuous and fine-grained temporal variations can be recovered when changing the dataset to LCV for the actual DV. Therefore, I still think that the experimental setting is not suitable for the proposed method. I will maintain my score.

2024-08-13

Dear Reviewer GdPs：

Thanks sincerely for acknowledging the motivation of our work. We sincerely apologize for any misunderstanding that may have occurred regarding your question and we appreciate your patience and time in rephrasing these questions. Rest assured, we are glad to resolve your concerns.

Q1. The necessity and benefit of converting some of CVs to DVs:

Here we would like to further clarify our experimental settings for generating DVs from the following aspects：

Modeling Mixed Time Series: Different from existing methods that primarily focus on time series composing purely continuous variables, our research focuses on modeling mixed time series that are composed of both continuous variables (CVs) and discrete variables (DVs), which are frequently encountered in real-world scenarios. Essentially, due to measurement limitations or storage requirements, many intrinsically continuous-valued signals are often recorded with discrete-valued forms as DVs. Given an example in industrial sensing systems, some temperature variables are recorded as binary alarm signals based on whether they exceed the control limits.
Necessity and Benefits: Despite being a fundamental issue, modeling mixed time series remains underexplored in academia, with a lack of specialized benchmark datasets for mixed time series. To meet the tasks for mixed time series, we employed benchmark time-series datasets and implemented discretization to convert some CVs into DVs. We would like to emphasize that such discretization strictly follows our problem formulation and aligns with the aforementioned generation mechanisms of DVs, which does not alter the intrinsic relationships of the variables but presents them in discrete forms. Also, the original CVs selected for discretization can be considered as the ground truth of the latent continuous variables (LCVs), which allows for comparisons between the actual LCVs and the recovered LCVs. Thereby, the discretized datasets can be viewed as benchmarks for mixed time series to facilitate future research.
Application to datasets with natural DVs: In addition, our model is also applicable to datasets with natural DVs. We are carrying out experiments on a real thermal power plant dataset that contains alarm discrete signals without the corresponding original continuous signals. Also, we compare the differences between these two ways as follows:

	Properties of datasets	DVs	CVs	Actual LCVs labels	Ways to evaluate LCV recovery
Adopting benchmark datasets with discretization	Sufficient and covering various tasks and domains	√	√	√	Direct (by comparing the recovered LCVs with the actual LCVs)
Adopting datasets with natural DVs	Limited and private	√	√	×	Indirect (via downstream task performance)

Owing to time limitations, we would like to explain the following questions in the current reply first and we are committed to including the relevant results in our revised paper.

Q2. The necessity of recovering LCVs from DVs & About simply observe the relationship between CVs

In our experimental setting, the LCVs are unobservable, and we can only observe the DVs and other CVs. Thereby, we can not directly observe relationships between CVs and LCVs. Also, due to the difference in information granularity and distributions between CVs and DVs, directly modeling mixed variables would inevitably cause errors. Thereby, in our work, we focus on recovering the LCVs for DVs to reduce the information granularity gap and align the distributions, which can facilitate the spatial-temporal correlation modeling among mixed variables and benefit various tasks.

Q3. How much continuous and fine-grained temporal variations can be recovered when recovering LCVs for DVs:

We have included showcases based on ETTh1&h2 datasets to visualize the (i). observed DVs, (ii) recovered LCVs, and (iii) actual LCVs (i.e., DVs before discretization) in $\underline{\text{Figure 1 in Global Response PDF}}$ . These visualization plots show our method accurately re-establishes the continuous fine-grained and informative temporal variation patterns of actual LCVs for the observed DVs. Specifically, the recovered LCVs are indeed equipped with key temporal variation features such as autocorrelations, trends, and periodic patterns, like those in CVs. For your convenience, we have also prepared an anonymous link for you (https://anonymous.4open.science/r/MiTSformer/Visualization_showcases/Supplementary_Figures_for_Reviewer_GdPs.pdf), which involves the visualization plots of the recovered/actual LCVs and observed CVs.

评论- A supplementary note for Reviewer GdPs

2024-08-14

Dear Reviewer GdPs:

We would like to express our gratitude to you for actively engaging in discussions with us! Over the past two days, we have been diligently studying your feedback to gain a deeper understanding of the concerns raised, especially the concerns regarding our experimental setting. As mentioned in our previous reply, we have been carrying out experiments on our private thermal power plant dataset to verify the effectiveness of MiTSformer's application to real-world datasets with natural DVs. Here we would like to share the relevant results and discussions as follows.

I. Dataset Description:

This dataset is collected from a real-world coal mill machine of the thermal power plant (TPP), which comprises 3 DVs (Variable No. 1 $\sim$ 3) and 7 CVs (Variable No. 4 $\sim$ 10). The variable descriptions are summarized below.

Variable No.	Variable Type	Variable Description	Unit
1	DVs	Motor bearing temperature alarm	/
2	DVs	Outlet temperature of the coal mill alarm	/
3	DVs	Rotary separator bearing temperature alarm	/
4	CVs	Ambient temperature	◦C
5	CVs	Sealed air pressure	kPa
6	CVs	Outlet pressure of the coal mill	kPa
7	CVs	Inlet air pressure	kPa
8	CVs	Coal mill current	A
9	CVs	Coal feed rate	t/h
10	CVs	Motor coil temperature	◦C

In this dataset, the DVs are naturally formed and are not modified from continuous signals. Such naturally formed DVs provide a more realistic mixed time series case with the preservation of inherent relationships among variables. Meanwhile, the naturally formed DVs mean there are no ground truth labels of the latent continuous variables (LCVs) behind the DVs since the original signals are not recorded or stored. We would like to emphasize that such scenarios are very common in the real world, where only coarse-grained DVs are stored instead of their fine-grained LCVs due to storage limitations or operational preferences. Such widespread phenomenons further motivate our efforts to recover the unobservable LCVs. Also, in this experimental setting, the lack of actual LCVs impedes direct evaluation of the LCV recovery accuracy. By the way, this is one of the main reasons for utilizing benchmark datasets with discretization in our paper's experimental settings, as it can allow us to directly assess the quality of the recovered LCVs by comparing them with the actual LCVs.

For better readability, in the following, we would like to summarize the differences between (i). Adopting benchmark datasets with discretization and (ii).Adopting datasets with natural DVs(used in the experimental setting here) once again:

	Properties of datasets	DVs	CVs	Actual LCVs labels	Ways to evaluate LCV recovery
(i). Adopting benchmark datasets with discretization	sufficient and covering various tasks and domains	√	√	√	Direct (by comparing the recovered LCVs with the actual LCVs)
(ii).Adopting datasets with natural DVs	limited and private	√	√	×	Indirect (via downstream task performance)

II. Experimental Setting for the TPP Dataset:

In this experiment, we aim to predict the motor coil temperature (No. 10) by using the other 9 Variables (No.1 $\sim$ 9) as a mixed time series extrinsic regression task.

III. Results and Discussions: We are pleased to share the results of two key aspects:

Visualization of DVs and Recovered LCVs: We have included plots that compare the original DVs with the LCVs recovered by our method in https://anonymous.4open.science/r/MiTSformer/Visualization_showcases/TPP_showcases_for_Reviewer_GdPs.pdf. These visualizations offer a qualitative insight into how our approach captures the underlying continuous dynamics of the DVs. Due to the lack of true LCV labels, we can not directly evaluate the accuracy of LCV recovery. However, we can verify its necessity and effectiveness through the performance of downstream tasks.
Performance on Downstream Tasks: We evaluated the prediction performance of our MiTSformer, along with advanced baselines, e.g., iTransformer and PatchTST, and summarized the results below

Metric	MiTSformer	iTransformer	PatchTST	Dlinear	TimesNet
MAE	0.0597	0.0652	0.0638	0.0723	0.0668
RMSE	0.0664	0.0773	0.0732	0.0892	0.0751

It can be observed that MiTSformer outperforms baselines impressively. It is probably because the task involves predicting continuous values for temperature variables, where other related DVs of temperature alarm signals play a crucial role. By recovering the LCVs, our model can efficiently and sufficiently capture the correlations necessary for accurate regressions.

We hope these additional experiments and results address your concerns and further substantiate the validity and potential of our research. Your feedback has been immensely valuable in enhancing the overall quality of our work and we'd love to answer any additional questions.

Best regards and have a nice week!

作者回复

2024-08-05

Summary of Revisions and Global Response

We sincerely thank all the reviewers for their insightful reviews and valuable comments, which are instructive for us to improve our paper further.

Pioneering the exploration of the mixed time series analysis, this paper proposes a task-general framework MiTSformer by recovering and aligning the latent continuity of mixed variables for sufficient and balanced spatial-temporal modeling, being amenable to various tasks. Experimentally, MiTSformer establishes SOTA performance on 34 public benchmarks covering five mixed time series analysis tasks.

We are delighted that the reviewers generally held positive opinions of our paper, in that the problem definition is "fresh", "novel" and is a "fundamental issue", the proposed method "addresses a significant gap", and is "creative", "innovative", and "invaluable for designing robust time series foundation models", the empirical evaluation is "thorough", "comprehensive" and "robust", and the presentation is "clear", "easy to understand".

The reviewers also raised insightful and constructive concerns. We made every effort to address all the concerns by providing detailed clarifications, sufficient evidence, requested results, and in-depth analysis. Here is the summary of the major revisions:

Add in-depth investigation of LCV recovery (Reviewers GdPs, ZwAv, and nZaK). We have supplemented the visualization and quantitative evaluations to compare the recovered LCVs with their actual ones, demonstrating the necessity and effectiveness of LCV recovery.
Add comparison with traditional mixed data modeling approaches (Reviewer GdPs): We have supplemented the comparison with traditional mixed data modeling methods, i.e., mixed naive Bayesian-based and variational inference-based methods, with MiTSformer both theoretically and empirically.
Add experiments for DVs with multiple discrete states (Reviewer 3NoE): We've analyzed and conducted experiments to demonstrate MiTSformer's capability to process multi-state DVs effectively.
Illustrate the applications for large datasets and real-time deployments (Reviewers 3NoE and ZwAv): We've analyzed the feasibility and efficiency of MiTSformer to tackle real-world large datasets and real-time processing. Also, we've provided relevant instructions and guidance.
Clarify some key concepts (Reviewers 3NoE and nZaK): We have clarified key aspects of MiTSformer, such as the mathematical formulations of LCV recovery, the implementation and effectiveness of smoothness constraints, and the adversarial framework.
Resolve some presentation issues (Reviewer nZaK): We resolved presentation issues by polishing equations and adding error bars of experiments for enhanced clarity.

The valuable suggestions from four reviewers are very helpful for us to improve our paper. We'd be very happy to answer any further questions. Looking forward to the reviewer's feedback.

The mentioned Tables and Figures are included in the following PDF file.

Figure 1: Visualization of the recovered LCVs and the actual LCVs for Reviewers GdPs, ZwAv, and nZaK.
Tabel 1: Quantitative deviations between the recovered LCVs and the actual LCVs for Reviewers GdPs, ZwAv, and nZaK.
Tabel 2: Performance comparison with mixed NB- and VI-based methods for Reviewer GdPs.
Table 3: Results under the different number of DV's states for Reviewer 3NoE.
Table 4: Results of K-Lipschitz Continuity-based smoothness constraint for Reviewer nZaK.
Table 5: Robust analysis of MiTSformer with error bars for Reviewer nZaK.

最终决定Accept (poster)

2024-09-25

The paper proposes MiTSformer, a general framework for mixed time-series consisting of both discrete variables (DVs) and continuous variables (CVs). Key components include the recovery of latent continuous variables (LCVs) from discrete variables (DVs) and the exchange of information between LCVs/CVs with self- and cross-attention blocks. The training process is guided by self-supervised smoothness/adversarial/reconstruction losses as well as task-specific supervision.

The reviews of this paper are cautiously positive after the rebuttal phase. All reviewers acknowledge the practical relevance and limited exploration of mixed time-series. The concept of latent continuity recovery with the help of a residual dilated CNN, in particular, received attention from the reviewers and was praised for its effectiveness and innovation. The reviewers also commended the paper for its comprehensive evaluation on a variety of datasets and tasks, including ablation studies of all relevant network components, and the availability of high-quality code and training details.

The concerns raised by the reviewers broadly fall into the categories presentation, technical design, and experiments. While the paper is generally well presented, the technical presentation of the latent continuity recovery, including the introduction of the smoothness loss, is at times difficult to follow. Multiple reviewers were concerned about the complexity of the MiTSformer framework, especially with regard to reimplementation and reproduction, and the lack of intuition behind the individual objective functions. The authors defended their design choices well during the rebuttal phase, including an alternative smoothness loss based on K-Lipschitz continuity. The experiments, while generally comprehensive, notably lacked comparisons between recovered and actual LCVs, as well as tests with multi-state DVs, both of which the authors provided in their rebuttal. The reviewers also suggested additional baseline comparisons, including HVM and VAMDA, which were shown to be inferior to MiTSformer in the rebuttal as well.

With all major weaknesses addressed and the insight that the scores might also have been influenced by misunderstandings that the authors were able to clarify during the discussion period, the paper is recommended for acceptance as a poster, with the expectation that all relevant discussions and results from the rebuttal period will be integrated into the camera-ready version.