PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
4
4
4
5
4.0
置信度
创新性2.3
质量2.5
清晰度2.5
重要性2.5
NeurIPS 2025

SGN: Shifted Window-Based Hierarchical Variable Grouping for Multivariate Time Series Classification

OpenReviewPDF
提交: 2025-05-07更新: 2025-10-29

摘要

关键词
Deep LearningMultivariate Time Series Classification

评审与讨论

审稿意见
4

This paper proposes SwinGroupNet (SGN), a novel framework for multivariate time series (MTS) classification that addresses limitations of existing methods by integrating structured variable grouping and hierarchical temporal feature extraction. The model comprises three core modules: 1) Variable Group Embedding (VGE), which partitions variables into groups based on similarity to capture intra-group and inter-group dependencies; 2) Multi-Scale Group Window Mixing (MGWM), which extracts multi-scale temporal features via periodic window partitioning and convolutional operations; 3) Periodic Window Shifting and Merging (PWSM), which leverages sequence periodic patterns to enable hierarchical temporal interaction. Extensive experiments on diverse benchmarks show SGN achieves state-of-the-art performance, with an average 4.2% accuracy improvement over existing methods.

优缺点分析

Strengths:

  1. This paper introduces a novel perspective transforming multivariate interactions into structured intra/inter-group relationships, effectively balancing fine-grained dependency capture and over-smoothing avoidance.
  2. Experimental validation covers domains including healthcare and human activity recognition, demonstrating consistent superiority over baselines.

Weaknesses:

  1. In the Variable Group Embedding module, BDC (Brownian Distance Covariance) is used to model dependencies between variables. However, referring to Equation (11), its essence still relies on Euclidean distance. Does this approach genuinely capture better non-linear dependencies? Is there any performance improvement compared to other methods? Additionally, when calculating dependencies between variables, do you use the entire time series of that variable?
  2. "Considering the extensibility of periodic patterns, we further incorporate multiple scales by merging additional period windows corresponding to the remaining Top-K dominant frequencies." (page=5, lines 155-157) This statement mentions merging period windows corresponding to the remaining frequencies in Top-K. Could you elaborate in detail on how this merging process is implemented?
  3. In the "Multi-Scale Group Window Extracting" section (Lines 159–172), operations of intra-group and inter-group pointwise convolutions are not explained. Mathematical formulations are needed to clarify channel dimension transformations.
  4. While moving and merging periodic windows enhances the capture of cross-window dependencies, what specific offset value and window size after merging yield optimal results? Please provide a quantitative analysis of this.
  5. In Section 4.2, the authors selected 10 multivariate time series datasets from the UEA. Many research reports have shown the performance of all datasets. How does your work perform on all datasets? Additionally, the hyperparameter sensitivity analysis lacks an evaluation of the regularization parameter β\beta.
  6. Notation inconsistency: kik_i in Equation (6) conflicts with nkn_k in Line 699 of Appendix D, causing confusion. Notation should be unified.

问题

--- In the Variable Group Embedding module, BDC (Brownian Distance Covariance) is used to model dependencies between variables. However, referring to Equation (11), its essence still relies on Euclidean distance. Does this approach genuinely capture better non-linear dependencies? Is there any performance improvement compared to other methods? Additionally, when calculating dependencies between variables, do you use the entire time series of that variable? --- "Considering the extensibility of periodic patterns, we further incorporate multiple scales by merging additional period windows corresponding to the remaining Top-K dominant frequencies." (page=5, lines 155-157) This statement mentions merging period windows corresponding to the remaining frequencies in Top-K. Could you elaborate in detail on how this merging process is implemented? --- While moving and merging periodic windows enhances the capture of cross-window dependencies, what specific offset value and window size after merging yield optimal results? --- In Section 4.2, the authors selected 10 multivariate time series datasets from the UEA. Many research reports have shown the performance of all datasets. How does your work perform on all datasets? Additionally, the hyperparameter sensitivity analysis lacks an evaluation of the regularization parameter β\beta.

局限性

The applicability of SGN in other time series tasks such as forecasting and imputation remains to be further explored. Moreover, the use of intrinsic similarity to generate the assignment matrix introduces additional computational overhead.

最终评判理由

I have increased my rating considering the work from authors to clarify and improve the paper.

格式问题

None.

作者回复

We sincerely appreciate your recognition of our novel perspective on modeling multivariate interactions through structured intra/inter-group relationships, as well as your acknowledgment of the strong empirical validation across diverse domains. We would like to provide some clarifications in the hope of addressing your concerns.


W1. While the VGE module employs BDC to model inter-variable dependencies, Equation (11) reveals its reliance on Euclidean distance, raising questions about its ability to truly capture non-linear relationships. Clarification is needed on whether this offers performance gains over other dependency measures, and whether the full time series of each variable is used in the computation.

A1. We thank the reviewer for the insightful question regarding our similarity metric.
In our method, we adopt BDC to measure similarity, where a positive value of dCov(X, Y) > 0 indicates the existence of statistical dependence—including non-linear relationships. To validate its effectiveness, we compare BDC with Pearson correlation on both the FLAAP and UCI-HAR datasets. As shown in the table below, BDC demonstrates a slight improvement in performance.
Notably, BDC computes similarity over the entire time series, allowing it to better preserve the temporal structure during the similarity estimation process.

DatasetFLAAPUCI-HAR
MetricAccuracyF1AccuracyF1
Pearson79.6478.9495.5295.55
BDC80.8180.3595.6295.64

W2. The paper mentions merging period windows from the remaining Top-K frequencies, but it lacks details—could the authors clarify how this merging is implemented in practice?

A2. In our experiments, we observed that the top-k dominant periods in time series often exhibit a multiple relationship. Based on this insight, we design our model to start with a small base period window, and then progressively fuse multiple windows to reach larger period lengths.
This periodic window fusion strategy allows SGN to capture diverse periodic patterns effectively across various temporal resolutions.


W3. In the "Multi-Scale Group Window Extracting" section (Lines 159–172), operations of intra-group and inter-group pointwise convolutions are not explained. Mathematical formulations are needed to clarify channel dimension transformations.

A3. Given the input data tensor XRnum×(dmodel×vgroup)×TX \in \mathbb{R}^{num \times (d_{\text{model}} \times v_{\text{group}}) \times T}, where num is the number of temporal windows, T is the periodic window length, vgroupv_{\text{group}}​​ is the number of variable groups, and dmodeld_{\text{model}}​​ is the feature dimension per group:

  • We first apply group-wise pointwise convolutions by setting the number of convolution groups to vgroupv_{\text{group}}​​, enabling intra-group interaction.
  • Next, we transpose the vgroupv_{\text{group}} and dmodeld_{\text{model}}​​ dimensions, resulting in a shape of num × (vgroupv_{\text{group}}​​ x dmodeld_{\text{model}}​​) × T.
  • Then, we apply another pointwise convolution with dmodeld_{\text{model}}​​ groups, allowing inter-group feature interactions across different variable embeddings.

This two-stage group-wise and inter-group convolution design enhances both local specialization and global coordination across variable groups.


Q4. What are the specific offset and merged window size values that optimize cross-window dependency capture, and could you provide quantitative analysis supporting these choices?

A4. We provide detailed ablation results on the effect of shift offsets according to the window length T in the table below. As shown, different offset values yield similar performance, indicating that the essential contribution lies in enabling information exchange across shifted temporal windows, rather than relying on a specific offset.

Shift_offset0T/4T/23/4T
FLAAP78.0580.0780.8179.48
UCI-HAR87.0594.7495.6294.62
TDBRAIN99.7999.9199.999.75
PTB-XL73.3273.7573.874.19

For the window merging strategy, we adopt a doubling-based merging approach, where the merged window size is a multiple of the original. The corresponding results and in-depth analysis are presented in Table 10 of Appendix E, demonstrating the effectiveness of our period merging design.


Q5. The evaluation on only 10 UEA datasets is limited; how does the method perform on the full set? Also, the hyperparameter sensitivity analysis omits the regularization parameter.

A5. We have supplemented the results on the full UEA multivariate time series classification benchmark. Under the same experimental settings, we compare SGN with a broader range of recent state-of-the-art methods (see the Table). Wins/Draws/Losses indicate the number of datasets (out of 30) on which the SGN method achieves higher, equal, or lower accuracy compared to the corresponding baseline methods. As shown, SGN consistently achieves a greater number of first-place results and better average ranking compared to state-of-the-art methods, demonstrating its robustness and generalization across diverse datasets.

Data/ModelW.MUSEM.FCNTapNetShapeNetTodyNetSVPTShapeformerMPTSNetSGN(Ours)
ArticularyWordRecognition9997.398.798.798.799.39997.799
AtrialFibrillation33.326.733.34046.74053.353.366.7
BasicMotions10095100100100100100100100
CharacterTrajectories9998.599.798N/A9999.2N/A99.6
Cricket10091.795.898.610010094.494.4100
DuckDuckGeese57.567.557.572.55870646864
EigenWorms8950.448.987.88492.5N/AN/A85.5
Epilepsy10076.197.198.797.198.698.697.197.8
EthanolConcentration13.337.332.331.23533.141.143.344.5
ERing4313.313.313.391.593.787.494.495.9
FaceDetection54.554.555.660.262.751.265.869.870.3
FingerMovements49585358.967.660556464
HandMovementDirection36.536.537.833.864.939.241.963.575.7
Handwriting60.528.635.745.143.643.330.234.450.4
Heartbeat72.766.375.175.675.67981.575.677.1
InsectWingbeatN/A16.720.825N/A18.431.4N/A68
JapaneseVowels97.397.696.598.4N/A97.899.298.698.9
Libras87.885.68585.68588.395.587.283.9
LSST5937.356.85961.566.663.860.463.7
MotorImagery505159616465N/A6565
NATOPS8788.993.988.397.290.696.194.498.3
PenDigits94.897.89897.798.798.399.198.999.1
PEMS-SFN/A69.975.175.17886.7N/A94.288.4
PhonemeSpectra191117.529.830.917.629.314.423.1
RacketSports93.480.386.888.280.384.288.887.593.4
SelfRegulationSCP17187.465.278.289.888.491.892.893.9
SelfRegulationSCP24647.25557.8556056.157.260.6
SpokenArabicDigits98.29998.397.5N/A98.699.799.599.7
StandWalkJump33.36.74053.346.746.766.753.353.3
UWaveGestureLibrary91.689.189.490.68594.19088.192.2
Average rank5.577.176.134.804.273.703.193.742.13
Number of top-15022477315
Wins2028272321191520-
Draws40122354-
Loses42253873-

In addition, we conduct an ablation study on the β parameter (see Table Y). The results reveal a non-monotonic trend: as β increases, the performance initially improves and then gradually declines, indicating that an appropriate choice of β is crucial for balancing the similarity regularization term.

β00.10.20.3
FLAAP80.2480.8179.480.09
UCI-HAR94.8695.6294.0594.42
TDBRAIN99.8899.999.7599.88
PTB-XL70.8473.873.8473.8

W6. Notation inconsistency:  in Equation (6) conflicts with  in Line 699 of Appendix D, causing confusion. Notation should be unified

A6. We denote kik_{\text{i}}​​ as the kernel size and nkn_{\text{k}}​​​ as the number of convolution kernels. For example, when nkn_{\text{k}}​​=6, we use multiple convolution kernels with sizes kik_{\text{i}}​​∈{1,3,5,7,9,11}. This design enables the model to capture temporal patterns at multiple scales effectively.


Q. The problem is the same as the weakness.

A7. We have provided the corresponding responses in Answers A1–A6 above.

评论

Thank you for taking the time to review our rebuttal and for completing the mandatory acknowledgement.

We hope that our responses have sufficiently addressed your concerns and clarified the points you raised. If there are any remaining uncertainties, we would be glad to further elaborate.

If permissible, we would sincerely appreciate knowing whether our clarifications contributed to a change in your overall assessment. Such feedback would be very helpful as we continue to improve the work.

评论

Dear Reviewer,

We would like to kindly follow up regarding our rebuttal. We value your feedback greatly and would appreciate it if you could share any additional comments or questions when convenient. We are happy to provide further clarifications, experiments, or supporting materials at any time to facilitate the discussion.

Thank you very much for your time and consideration.

审稿意见
4

This paper proposes SwinGroupNet (SGN), a novel framework for multivariate time series classification, which integrates three key components: Variable Group Embedding (VGE), Multi-scale Group Window Mixing (MGWM), and Periodic Window Shifting and Merging (PWSM). By combining these modules, SGN effectively captures both intra-group and inter-group variable dependencies across multiple temporal scales from raw multivariate time series data. Extensive experiments on four medical multivariate time series datasets and ten benchmark datasets from the UEA archive demonstrate that SGN achieves state-of-the-art classification performance.

优缺点分析

Strengths:

  1. The proposed VGE, MGWM, and PWSM modules are thoughtfully designed to address key aspects of multivariate time series analysis, including variable-wise dependency modeling, multi-scale feature extraction, and frequency-domain periodicity—factors that are central to advancing current time series research.

  2. The figures and tables are clearly presented and enhance the interpretability of the proposed method.

  3. The related work section is well-organized and comprehensive, providing a clear overview of the field’s recent advancements.

Weaknesses:

  1. The overall novelty of the proposed SGN appears incremental, as it primarily integrates ideas from existing works. For instance, the core concept of the VGE module has been previously summarized and discussed in [R1], and shares similarities with contributions from [R2, R3]. Likewise, the multi-scale mixing strategy in MGWM resembles the approach adopted in TimeMixer [R4].

  2. Although VGE, MGWM, and PWSM are individually designed to capture distinct characteristics of multivariate time series, the conceptual and functional relationships among them are not clearly articulated. In particular, MGWM already captures multi-scale temporal patterns, which often include periodic structures. It remains unclear how the additional periodic modeling via PWSM complements or interacts with MGWM, and what underlying mechanism ensures their synergy.

  3. The authors claim in the abstract that existing methods either ignore inter-variable dependencies or model all variables jointly. However, a number of prior works have explicitly addressed inter-variable dependency modeling in multivariate time series classification [R5, R6], which are not discussed or empirically compared in the paper.

  4. For multivariate time series classification, the UEA 30 dataset is a widely accepted benchmark [R7, R8]. The authors evaluate their method on only 10 selected datasets, raising concerns about potential cherry-picking and limited generalizability of the results.

[R1] A Comprehensive Survey of Deep Learning for Multivariate Time Series Forecasting: A Channel Strategy Perspective. arXiv, 2025.

[R2] Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. ICLR, 2023.

[R3] From similarity to superiority: Channel clustering for time series forecasting. NeurIPS, 2024.

[R4] Timemixer: Decomposable multiscale mixing for time series forecasting. ICLR, 2024.

[R5] SVP-T: A shape-level variable-position transformer for multivariate time series classification. AAAI, 2023.

[R6] Fully-Connected Spatial-Temporal Graph for Multivariate Time Series Data. AAAI, 2024.

[R7] The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. DMKD, 2021.

[R8] Shapeformer: Shapelet transformer for multivariate time series classification. KDD, 2024.

问题

  1. The UEA archive does not provide predefined validation sets for its sub-datasets. However, in the provided source code (data_loader.py, lines 723–725), it appears that a validation set is being generated. Could the authors clarify the strategy used to create the validation sets for the UEA datasets? Additionally, the released code does not include experimental scripts for running on the UEA datasets—would the authors consider making them publicly available to ensure reproducibility?

  2. Regarding the results shown in Figure 4, were all baseline results (e.g., ROCKET, TimesNet) re-implemented and re-evaluated by the authors under the same experimental setup? If so, could the authors provide details on how the reproduction was performed, including any modifications or hyperparameter settings used?

  3. Given the experimental settings adopted in [R8, R9] on the full UEA 30 time series dataset, could the authors report the classification performance of SGN under the same setting, using the published results from [R8, R9] as baselines (without the need to re-run those methods if time and resources are limited)? Such a comparison would provide a clearer understanding of SGN’s effectiveness relative to recent state-of-the-art approaches.

  4. The weaknesses outlined above—including concerns about incremental novelty, unclear interactions among proposed modules, limited UEA dataset coverage, and insufficient comparison with recent related methods—require further clarification and justification.

[R9] MPTSNet: Integrating multiscale periodic local patterns and global dependencies for multivariate time series classification. AAAI, 2025.

局限性

Yes.

最终评判理由

Based on the author's clarification, I have decided to raise my score to Borderline Accept.

格式问题

None.

作者回复

Many thanks for your thoughtful and encouraging comments. We sincerely appreciate your recognition of the design and motivation behind our proposed VGE, MGWM, and PWSM modules. We would like to offer further clarifications to enhance the understanding of our contributions.


W1. The novelty of SGN seems incremental, as its core components (VGE, multi-scale mixing) closely resemble ideas from prior works such as [R1–R3].

A1. We would like to clarify the design and motivation behind our modeling of variable dependencies. We have observed in our experiments that performing variable aggregation based solely on intrinsic similarity at the initial stage fails to capture the complex dependencies among variables (see the ablation results in Table 3). To address this, we propose a two-stage variable dependency modeling strategy in SGN.

In the first stage, VGE module leverages pairwise similarity to group variables and perform group embedding. However, our goal here is to obtain a coarse segmentation of variables, analogous to patching along the temporal axis, allowing for more efficient downstream processing. In the second stage, we explicitly model major variable dependencies within and across variable groups. This allows for deeper and more expressive modeling of intra- and inter-group variable relationships.

Compared with existing methods:

  • [R1] treats variable dependency modeling as a plugin aggregation module and does not further explore dependencies after grouping. In contrast, SGN performs continued hierarchical interactions after the initial grouping.

  • [R2] applies operations to all variables individually, without leveraging variable grouping as an inductive bias. This may lead to over-smoothing and less structured dependency modeling.

Regarding multi-scale strategies, [R3] applies linear projections to generate multiple temporal views. In contrast, SGN leverages the strong local feature extraction capability of convolutions and employs multiple 1D convolutional kernels with different receptive fields to extract multi-scale patterns within periodic windows.

[R1] From similarity to superiority: Channel clustering for time series forecasting. NeurIPS, 2024.

[R2] Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. ICLR, 2023.

[R3] Timemixer: Decomposable multiscale mixing for time series forecasting. ICLR, 2024.


W2. The conceptual distinctions and interactions among VGE, MGWM, and PWSM are unclear, particularly regarding how PWSM complements MGWM’s existing multi-scale temporal modeling.

A2. We appreciate the opportunity to further clarify the interactions among SGN’s core modules.

  • The VGE module serves as the foundation by constructing both the periodic windows and variable groups, enabling structured and localized modeling.

  • Built upon the VGE outputs, the MGWM module operates within each periodic window, performing feature extraction along both the temporal and variable dimensions.

  • While MGWM focuses on within-window modeling, PWSM is designed to capture dependencies across windows, thereby enabling global information flow and alignment across different temporal windows.

These three components are mutually supportive and closely coupled.


W3. The paper overlooks prior works that explicitly model inter-variable dependencies, despite claiming a gap in existing methods without proper discussion or comparison.

A3. We divide variable interactions into three main categories: variable-independent modeling, variable aggregation, and variable mixing. These three strategies have been explicitly discussed in the Related Work section to highlight their conceptual differences and practical trade-offs. We appreciate the reviewer’s attention to this aspect and will incorporate additional relevant literature in the final version to provide a more comprehensive overview of related approaches.


W4. Evaluating on only 10 selected datasets from the UEA archive, rather than the full 30, raises concerns about cherry-picking and limits the generalizability of the results.

A4. We evaluate the effectiveness of SGN on four major datasets. To further assess the robustness and generalizability of SGN under diverse datasets—such as long sequences and weak periodicity—we additionally conduct experiments on ten UEA multivariate time series datasets. These datasets are commonly used in prior studies such as [R4-R6]. We have added results on all 30 UEA datasets and compared SGN with more baseline methods, as summarized in the table below.

[R4] TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. ICLR, 2023.

[R5] ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis. ICLR, 2024.

[R6] TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis. ICLR, 2025.


Q1. The paper lacks clarity on the validation set construction for UEA datasets, as the code suggests a custom split despite no predefined sets. Moreover, the absence of experimental scripts for UEA datasets hinders reproducibility.

A5. We adopt a fixed random seed and utilize the StratifiedShuffleSplit method to ensure that the class distribution in the split subsets remains consistent with the original dataset. Specifically, we reserve 20% of the training set as the validation set, and use the remaining 80% for model training. Due to NIPS policy, we are unable to upload our UEA implementation script to the Anonymity link. However, we will make the code publicly available after the review process to ensure full reproducibility.


Q2. Figure 4 lacks clarity on whether baselines were re-implemented under a consistent setup; details on reproduction, modifications, and hyperparameter settings are needed.

A6. We directly follow the data used in their paper. However, unlike their setting where the test set is directly used as a validation set, we adopt a more rigorous strategy by reserving 20% of the training set as a validation set, which helps prevent information leakage and better simulates real-world scenarios.


Q3. To better assess SGN’s effectiveness, the authors are encouraged to report its performance under the same experimental settings used in [R7, R8] on the full UEA 30 dataset, using published results as baselines if re-running those methods is impractical.

A7. To further validate the effectiveness of the SGN method, we have supplemented additional comparison results under the same experimental settings with methods such as Shapeformer and MPTSNet on the full UEA dataset.
Due to time constraints, we were only able to reproduce a subset of the existing methods. The detailed results are provided in the following table. As shown, SGN consistently achieves a greater number of first-place results and better average ranking compared to state-of-the-art methods, demonstrating its robustness and generalization across diverse datasets. Notably, the SGN# column reports results using 20% of the training set as a validation set, rather than directly using the test set for training as in other settings, and still demonstrates strong performance.

Data/ModelW.MUSEM.FCNTapNetShapeNetTodyNetSVPTShapeformerMPTSNetSGNSGN#
AW9997.398.798.798.799.39997.79998
AF33.326.733.34046.74053.353.366.753.3
BasicMotions10095100100100100100100100100
Character9998.599.798N/A9999.2N/A99.697.8
Cricket10091.795.898.610010094.494.410098.6
DDG57.567.557.572.5587064686462
EWorms8950.448.987.88492.5N/AN/A85.580.2
Epilepsy10076.197.198.797.198.698.697.197.897.1
EC13.337.332.331.23533.141.143.344.544.5
ERing4313.313.313.391.593.787.494.495.991.9
FaceDetection54.554.555.660.262.751.265.869.870.370.3
FingerMove49585358.967.66055646457
HandMove36.536.537.833.864.939.241.963.575.760.8
Handwriting60.528.635.745.143.643.330.234.450.443.2
Heartbeat72.766.375.175.675.67981.575.677.177.1
InsectN/A16.720.825N/A18.431.4N/A6866.2
JV97.397.696.598.4N/A97.899.298.698.998.9
Libras87.885.68585.68588.395.587.283.983.9
LSST5937.356.85961.566.663.860.463.760.8
MI505159616465N/A656557
NATOPS8788.993.988.397.290.696.194.498.395.6
PenDigits94.897.89897.798.798.399.198.999.198.7
PEMS-SFN/A69.975.175.17886.7N/A94.288.488.4
Phoneme191117.529.830.917.629.314.423.120.4
RS93.480.386.888.280.384.288.887.593.491.4
SCP17187.465.278.289.888.491.892.893.991.1
SCP24647.25557.8556056.157.260.657.8
SA98.29998.397.5N/A98.699.799.599.799.7
StandWalkJump33.36.74053.346.746.766.753.353.340
UWave91.689.189.490.68594.19088.192.290.3
Average rank6.218.036.135.304.694.133.564.302.134.40
Number of top-150224773154
Wins2028272321191520-
Draws40122354-
Loses42253873-

[R7] MPTSNet: Integrating multiscale periodic local patterns and global dependencies for multivariate time series classification. AAAI, 2025.

[R8] Shapeformer: Shapelet transformer for multivariate time series classification. KDD, 2024.


Q4. The weaknesses outlined above require further clarification and justification.

A8. We have provided the corresponding responses in Answers A1–A4 above.

评论

The authors addressed my concerns.

评论

We sincerely thank for the valuable comments and suggestions, which helped us improve the quality of the paper. We are also grateful for the positive evaluation and the score adjustment. Your thoughtful feedback is greatly appreciated and will be reflected in the final version through updated experimental details and open-sourced results.

Regarding ShapeFormer, while it uses 80% of the training data and reserves 20% as a validation set during the shapelet discovery phase, it still uses the test set as the validation set during the final training stage. This setup is explicitly described in the original paper’s Experimental Implementation Details.

Our model was trained using the RAdam optimiser with an initial learning rate set as 0.01, a momentum of 0.9, and a weight decay of 5e-4. The training process involved a batch size of 16 for a total of 200 epochs. We configured the number of attention heads to be 16 and followed the protocol outlined in [47, 50]. This protocol involves splitting the training set into 80% for training and 20% for validation, allowing us to fine-tune hyperparameters. Once the hyperparameters were finalised, we conducted model training on the entire training set and subsequently evaluated its performance on the designated official test set.

Moreover, the public GitHub repository of ShapeFormer confirms this usage: in main.py, line 140 loads the test_loader, which is subsequently passed as the validation set on lines 164 and 168.

Therefore, ShapeFormer follows the same test-as-validation strategy as other baselines. Under this setting, SGN achieves 76.81% accuracy, compared to ShapeFormer’s 73.45%, demonstrating SGN’s superior performance.

We will update the experimental setup and related descriptions in the final version, and release all results and code for full reproducibility. We sincerely thank you again for your valuable comments and feedback.

评论

Regarding W1 and W2: The authors' responses have addressed most of my concerns. However, I still believe the methodological novelty is incremental.

Regarding W3: My main concern is that the paper lacks in-depth analysis of works on multivariate time series classification. The discussion on variable-independent modeling, variable aggregation, and variable mixing in the Related Work section primarily focuses on forecasting tasks. However, the title and experiments in this paper are classification-oriented. Furthermore, the authors do not sufficiently review related work specifically on multivariate time series classification.

Regarding W4: The authors state that the selected datasets are commonly used in prior works such as [R4–R6]. However, [1] points out (page 8) that many recent SOTA baselines (e.g., those shown in Figure 4 of the main text) suffer from leakage—using the test set as the validation set. If all deep learning models follow this setup, comparisons may be fair among them. However, Rocket is a non-deep learning method and does not rely on such validation practices, making the comparison unfair.

Regarding Q1: The authors clarify that they “reserve 20% of the training set as the validation set, and use the remaining 80% for model training.” However, their results in Figure 4 outperform the best baseline by 1.2%. According to [1], if other baselines use the test set for validation, how can SGN—with stricter validation from training data only—achieve significantly better performance?

Regarding Q2: While I appreciate the authors' clarification, I still question the rationale for including comparison results in Figure 4 that are based on validation leakage. I suggest referencing [1] and clearly distinguishing experimental setups.

Regarding Q3: MPTSNet follows the settings of [R4–R6], achieving an average accuracy of 75.4% on 10 UEA datasets. The authors report an average accuracy of 76.1% in Figure 4. Meanwhile, MPTSNet, using strict training-validation splits (i.e., not using the test set, if it is true), reports 74.0% average accuracy over 25 UEA datasets.

From the SGN results provided, the paper reports an average accuracy of 76.81% on 25 UEA datasets. SGN# (using 20% of training set as validation) achieves 73.17%, which is similar to Shapeformer (73.45%). This raises the question: How does SGN achieve 76.81% under the UEA-25 setting when MPTSNet, under the same setting, only reaches 74.0%?

Overall, the discrepancy between experimental setups—especially regarding validation leakage—needs to be thoroughly clarified. This is particularly important for the results shown in Figure 4. If the authors can clearly address these issues in the final version, I will consider raising my score.

Reference: [1] TOTEM: TOkenized Time Series Embeddings for General Time Series Analysis, TMLR, 2024.

评论

We sincerely appreciate your insightful suggestions. We will incorporate additional clarifications and explanations in the revised version of the paper.


Response to W3:

Our current paper primarily focuses on methods that approach the problem from the variable-wise perspective. In future versions, we will incorporate a broader set of related works on MTSC to provide a more comprehensive context.


Response to W4/Q1/Q2:

We sincerely thank the reviewer for raising this important concern regarding validation leakage in prior works.

In our experiments reported in Figure 4, we adopt a stricter protocol by reserving 20% of the training set as the validation set, while using the remaining 80% for model training. Under this setup, SGN achieves a top-1 accuracy of 76.1%, already outperforming other baselines by 1.2%, despite the stricter validation policy.

To ensure fair and comprehensive comparisons, we will additionally report SGN’s performance using the same protocol adopted by other baselines (i.e., using the test set as the validation set), which yields a slightly higher accuracy of 77.6%. This updated result will be included in the final version of the paper along with clear annotations and descriptions distinguishing the experimental setups.

We emphasize that SGN’s strong performance stems not from validation policies but from its core architectural design: the VGE module constructs variable groups and periodic windows to expose inherent structure; the MGWM module jointly models temporal and inter-variable interactions within each periodic window using multi-scale convolutions; and the PWSM module captures rich dependencies between windows. These three modules work synergistically to enable robust and generalizable modeling of multivariate time series, which is consistently reflected in our empirical results across diverse benchmarks.

We appreciate the reviewer’s suggestion to reference [R1], and we will cite it appropriately and further clarify all experimental protocols in the final version.


Response to Q3:

We would like to further clarify the experimental protocol. Under the same evaluation setting as adopted by prior works [R2–R4]—where the test set is used for validation—SGN achieves an average accuracy of 77.6% on the 10 UEA datasets, compared to 75.4% for MPTSNet. In contrast, the result reported in the main paper (76.1%) was obtained using 20% of the training set as the validation set, which is a stricter and more realistic setup.

Furthermore, for the 25 UEA datasets we report in the Rebuttal, SGN achieves 76.81% average accuracy under the same experimental setting (i.e., using the test set for validation), while MPTSNet achieves 74.0%. This setup is consistent with the implementation in MPTSNet’s public codebase on github(see data_provide.py, line 69 and train.py, line 152, where test_loader is used for validation).
In contrast, SGN# refers to our result using 20% of the training set for validation, maintaining a stricter evaluation protocol. We also include results under both validation setups in the updated tables for completeness.

DatasetsModel/setting20% train dataset for validtest dataset for valid
UEA-10SGN76.177.6
MPTSNet/75.4
UEA-25SGN73.1776.81
MPTSNet/74.00

The superior performance of SGN can be attributed to its effective modeling of both variable-wise and temporal-wise dependencies. SGN explicitly captures intra- and inter-group variable interactions and performs multi-scale temporal extraction. In contrast, MPTS primarily focuses on temporal patterns and lacks dedicated modeling on the variable dimension, which may limit its performance on complex multivariate data.

To ensure reproducibility, we will release our full UEA experiment scripts, and provide clear documentation on the experimental settings and configurations.

[R1] TOTEM: TOkenized Time Series Embeddings for General Time Series Analysis_, TMLR, 2024.

[R2] TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. ICLR, 2023.

[R3] ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis. ICLR, 2024.

[R4] TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis. ICLR, 2025.

评论

Thank you for your response. Based on your clarification, I now understand that MPTSNet reports classification results on the UEA 25 datasets using the test set as the validation set. This experimental setup is unfair to deep learning methods like Shapeformer and non-deep learning baselines such as Rocket and DTW, which do not use test samples during training.

According to your reply, under a fair setting where 20% of the training set is used as the validation set (e.g., SAN# with an average accuracy of 73.17%), your proposed method performs on par with existing SOTA methods such as Shapeformer (average accuracy of 73.45%) on the UEA 25 datasets, showing no clear advantage.

I strongly encourage the authors to revise Figure 4 in the final version, ensuring that all reported results are based on a fair evaluation protocol—specifically, without using the test set as validation. Please also provide corrected results for the UEA 25 datasets under this setting.

Finally, while I do not have sufficient time to verify the results using the provided code, I hope the authors will make the final code publicly available to facilitate reproducibility and further research in this area. I believe that if the reported results can be reliably reproduced, this work will make a valuable contribution to multivariate time series classification.

In light of these considerations, I have decided to raise my score to Borderline Accept.

审稿意见
4

This paper proposes SwinGroupNet (SGN), a novel approach for multivariate time series (MTS) classification, with key innovations including: Variable Group Embedding (VGE): A dynamic grouping strategy based on Brownian Distance Covariance (BDC) that partitions variables into clusters, enabling separate modeling of intra-group and inter-group dependencies to balance the trade-offs between independent and mixed variable modeling paradigms. Integrated periodic analysis and multi-scale convolution for simultaneous feature extraction across both temporal and variable dimensions. Periodic window shifting to enhance cross-window interactions and mitigate limitations of localized modeling.

优缺点分析

Strengths:

  1. It demonstrates relatively excellent performance in multivariate time series classification tasks.

Weaknesses:

  1. The architecture presented in Figure 2 is difficult to interpret. especially for the subfigures.
  2. Line 142. the final loss includes the L_task, in fact, the experimental validation appears limited to classification tasks.

问题

The proposed PWSM module exhibits strong dependence on periodic assumptions through its FFT-based windowing mechanism. This architectural choice may degrade model performance when processing non-periodic signals (e.g., white noise), (b) transient events (e.g., fault spikes in industrial data), or (c) irregular sampling scenarios - all common in real-world time series applications. What considerations did the author have regarding this?

局限性

1.Regarding the selection of baseline methods, we recommend incorporating state-of-the-art Multivariate Time Series representation approaches. 2. Evaluating the method's generalizability through additional experiments on fundamental time-series tasks such as forecasting and anomaly detection.

最终评判理由

Thanks for the responses.. I'll maintain my original rating.

格式问题

no

作者回复

Thank you for your positive feedback. We're glad that you find our method demonstrates relatively excellent performance in multivariate time series classification tasks. We would like to provide some clarifications in the hope of addressing your concerns.


W1. The architecture presented in Figure 2 is difficult to interpret. especially for the subfigures.

A1. We apologize for the lack of clarity in the original model illustration, and we thank the reviewer for bringing this to our attention. Below, we provide a more detailed explanation of the SGN model architecture, which consists of three main modules: VGE, MSWM, and PSPM.

  • VGE (Variable Grouping and Embedding):

    • This module serves two purposes. First, it computes an assignment matrix based on intrinsic variable similarity, which is used to fuse variables into groups. These groups are then encoded using group embeddings to obtain compact representations.
    • Second, the module applies FFT to estimate dominant periods and uses them to segment the time series into periodic windows.
  • MSWM (Multi-Scale Window Module):
    This module performs joint feature extraction along both the temporal and variable dimensions.

    • First, in the temporal dimension, it applies multi-scale depthwise convolutions using multiple kernel sizes.

    • Next, in the variable dimension, it performs group-wise intra/inter pointwise convolutions to capture both within-group and across-group interactions.

  • PSPM (Period Shifting and Merging Module):
    This module includes two components:

    • Period Merging, which fuses features across adjacent windows to enhance temporal continuity.

    • Period Shifting, which introduces dynamic offsets to the window boundaries, helping the model adapt to imperfect or misaligned periodicity.

We hope this explanation provides a clearer understanding of our model architecture and its components.


W2. Line 142. the final loss includes the Ltask\mathcal{L}_{\text{task}}, in fact, the experimental validation appears limited to classification tasks.

A2. We apologize for the confusion. To clarify, the SGN model is primarily designed for multivariate time series classification.

  • Ltask\mathcal{L}_{\text{task}}​ corresponds to the classification loss, which serves as the main supervision signal. We will correct the symbols in subsequent versions.

  • Lsim\mathcal{L}_{\text{sim}}​ is a similarity-based regularization term, which encourages more structured and meaningful variable grouping by penalizing inconsistent similarity patterns.


Q1. The proposed PWSM module exhibits strong dependence on periodic assumptions through its FFT-based windowing mechanism. This architectural choice may degrade model performance when processing non-periodic signals (e.g., white noise), (b) transient events (e.g., fault spikes in industrial data), or (c) irregular sampling scenarios - all common in real-world time series applications. What considerations did the author have regarding this?.

A3. Thanks for your valuable insight. We would like to respond from the following two perspectives:

  1. In the UEA datasets used in Table 7 of Appendix C.3, there are several datasets with few variables and weak periodicity, such as EthanolConcentration and Handwriting, each containing only three variables. Despite these challenges, our method consistently outperforms baseline models, demonstrating its robustness in low-dimensional and weakly periodic settings.
  2. While SGN is designed to leverage periodic patterns, it does not rely rigidly on periodicity. Instead, we incorporate a dynamic shifting mechanism that enables local context fusion across windows, allowing the model to remain effective even when the underlying periodic structure is weak or inconsistent.

In addition, we have included more multivariate datasets and conducted comparisons against a broader range of state-of-the-art multivariate methods to further validate the effectiveness of our approach. The specific results are shown in the following table. Wins/Draws/Losses indicate the number of datasets (out of 30) on which the SGN method achieves higher, equal, or lower accuracy compared to the corresponding baseline methods. As shown, SGN consistently achieves a greater number of first-place results and better average ranking compared to state-of-the-art methods, demonstrating its robustness and generalization across diverse datasets.

Data/ModelW.MUSEM.FCNTapNetShapeNetTodyNetSVPTShapeformerMPTSNetSGN(Ours)
ArticularyWordRecognition9997.398.798.798.799.39997.799
AtrialFibrillation33.326.733.34046.74053.353.366.7
BasicMotions10095100100100100100100100
CharacterTrajectories9998.599.798N/A9999.2N/A99.6
Cricket10091.795.898.610010094.494.4100
DuckDuckGeese57.567.557.572.55870646864
EigenWorms8950.448.987.88492.5N/AN/A85.5
Epilepsy10076.197.198.797.198.698.697.197.8
EthanolConcentration13.337.332.331.23533.141.143.344.5
ERing4313.313.313.391.593.787.494.495.9
FaceDetection54.554.555.660.262.751.265.869.870.3
FingerMovements49585358.967.660556464
HandMovementDirection36.536.537.833.864.939.241.963.575.7
Handwriting60.528.635.745.143.643.330.234.450.4
Heartbeat72.766.375.175.675.67981.575.677.1
InsectWingbeatN/A16.720.825N/A18.431.4N/A68
JapaneseVowels97.397.696.598.4N/A97.899.298.698.9
Libras87.885.68585.68588.395.587.283.9
LSST5937.356.85961.566.663.860.463.7
MotorImagery505159616465N/A6565
NATOPS8788.993.988.397.290.696.194.498.3
PenDigits94.897.89897.798.798.399.198.999.1
PEMS-SFN/A69.975.175.17886.7N/A94.288.4
PhonemeSpectra191117.529.830.917.629.314.423.1
RacketSports93.480.386.888.280.384.288.887.593.4
SelfRegulationSCP17187.465.278.289.888.491.892.893.9
SelfRegulationSCP24647.25557.8556056.157.260.6
SpokenArabicDigits98.29998.397.5N/A98.699.799.599.7
StandWalkJump33.36.74053.346.746.766.753.353.3
UWaveGestureLibrary91.689.189.490.68594.19088.192.2
Average rank5.577.176.134.804.273.703.193.742.13
Number of top-15022477315
Wins2028272321191520-
Draws40122354-
Loses42253873-
评论

The authors addressed my concerns.

评论

Thank you for your careful review of our work. We are glad to have addressed your concerns and sincerely appreciate your recognition and constructive feedback. We will incorporate the corresponding updates in the revised version of the paper.

审稿意见
5

The paper introduces SwinGroupNet (SGN), a novel framework for multivariate time series (MTS) classification that explicitly models both intra- and inter-variable dependencies using a grouped embedding approach and hierarchical temporal feature extraction. By combining three components—Variable Group Embedding (VGE), Multi-Scale Group Window Mixing (MGWM), and Periodic Window Shifting and Merging (PWSM)—SGN aims to preserve semantic structures, address heterogeneous variable types, and exploit periodic temporal patterns. Experimental results on multiple benchmark datasets show promising improvements over prior work.

优缺点分析

Strengths

  1. Clear Motivation: The paper convincingly discusses the limitations of univariate decomposition and full multivariate modeling approaches in MTS.
  2. Novel Architecture: The use of group-based embeddings and hierarchical temporal modeling (through PWSM and MGWM) is an interesting and original idea.
  3. Strong empirical results: Demonstrates consistent state-of-the-art performance across diverse benchmarks with a reported 4.2% improvement on average.
  4. Code Availability: Open-sourcing the implementation increases reproducibility and community impact.

Weaknesses

  • Architectural Complexity Without Computational Justification

    While SGN demonstrates strong performance, the proposed design—incorporating Variable Group Embedding (VGE), Multi-Scale Group Window Mixing (MGWM), and Periodic Window Shifting and Merging (PWSM)—introduces considerable architectural complexity. However, the paper does not provide any analysis of computational cost (e.g., training/inference time, parameter count, FLOPs, or GPU memory usage), which is critical when evaluating practical deployment, especially for long multivariate sequences. Without this, it is difficult to assess the efficiency vs. performance tradeoff compared to simpler or more lightweight baselines.

  • Lack of Controlled Comparison With Existing Channel Dependency Methods

    Although the motivation for capturing inter-variable dependency is valid, the paper does not thoroughly compare SGN with alternative methods for modeling channel correlations. For instance, prior work like MSGNet (AAAI 2024) has explored efficient multi-scale inter-series correlation modeling. Additionally, no study is conducted using a common architectural backbone to test whether simpler methods (e.g., channel attention, global attention, temporal convolutions) underperform consistently compared to SGN.

  • Scalability: No discussion is included on the computational complexity, especially for large MTS with many variables and long sequences.

问题

1- How does the computational cost (in terms of FLOPs, training time, inference time, and memory usage) of SGN compare to the baseline models? 2- Does the model generalize well to datasets with fewer variables or less clear periodic structure? The reliance on periodic patterns in PWSM might not apply broadly.

局限性

no

最终评判理由

The authors have clarified more detailed about implementation and baseline comparison

格式问题

No

作者回复

Many thanks for your valuable and encouraging feedback. We sincerely appreciate that you found our paper to be well-motivated, with a novel architecture and strong empirical performance. We would like to provide further clarifications to address your comments in more detail.


W1. Architectural Complexity Without Computational Justification

A1. We appreciate the reviewer’s concern regarding computational efficiency. For the input time series data XRdm×group×LX \in \mathbb{R}^{d_{\text{m}} \times \text{group} \times L}, where dmd_{\text{m}}​ denotes the embedding dimension and L is the temporal length (with L>>dmL>>d_{\text{m}}​​). We adopt a combination of depthwise and pointwise convolutions for feature extraction. This design allows for efficient modeling with the following complexity: Time complexity O(Ldm2)\mathcal{O}(L \cdot d_{\text{m}}^2) and Space complexity O(dm2)\mathcal{O}(d_{\text{m}}^2). A comparison with other methods is detailed in the table below:

MethodsTime ComplexitySpace complexity
SGN(Ours)O(Ldm)O(Ld_m)O(Ldm)O(Ld_m)
TVNetO(Ldm2)O(Ld_m^2)O(dm2+Ldm)O(d_m^2 + Ld_m)
MICNO(Ldm2)O(Ld_m^2)O(Ldm2)O(Ld_m^2)
FEDformerO(L)O(L)O(L)O(L)
CrossformerO(dmLseg2L2)O\left(\frac{d_m}{L_{\text{seg}}^2} L^2\right)O(Ldm)O(Ld_m)

We have also conducted a detailed comparison with baseline methods in terms of FLOPs and GPU memory usage, as shown in Appendix E. The results confirm that our approach achieves favorable computational efficiency while maintaining strong performance.


W2. Lack of Controlled Comparison With Existing Channel Dependency Methods.

A2. We sincerely appreciate your insightful suggestion. Following your advice, we have included additional baseline methods for a more comprehensive comparison. Specifically, we categorized these methods into the following groups:

  • Multi-scale methods: MSGNet, MedFormer
  • Channel attention methods: iTransformer, Crossformer
  • Global attention methods: Transformer, PatchTST
  • Temporal convolution methods: TCN, ModernTCN

This categorization allows for a more structured evaluation, and the extended comparisons further demonstrate the effectiveness and robustness of our proposed method. The specific results are shown in the following table:

DatasetTDBRAINFLAAPUCI-HARPTB-XL
MetricAccuracyF1AccuracyF1AccuracyF1AccuracyF1
MSGNet85.6485.6374.3874.0192.4392.8469.9657.64
Medformer89.6289.627473.8490.1790.2772.8762.02
Crossformer81.5681.576.3376.1490.6690.6873.362.59
iTransformer74.6774.6575.8375.5793.4793.4669.2856.2
TCN88.588.4276.1175.7992.7292.6672.6762.04
MordenTCN87.687.5471.6671.3792.7592.872.8561.33
Transformer87.1787.174.1473.7190.6890.6970.5959.05
PatchTST79.2579.256.2355.5786.8387.1773.2362.61
SGN(ours)99.999.980.8180.3595.6295.6473.863.43

W3. No discussion is included on the computational complexity, especially for large MTS with many variables and long sequences.

A3. We sincerely thank the reviewer for the thoughtful suggestion. Our SGN model effectively reduces computational complexity by combining periodic windowing and fusion mechanisms for long sequences, along with the use of depthwise and pointwise convolutions. As shown in Table 7 of Appendix C.3, our method outperforms other baselines even on challenging datasets such as EthanolConcentration(with a sequence length of 1751) and FaceDetection(with 144 variables), demonstrating strong performance on both long sequences and high-dimensional multivariate data.


Q1. How does the computational cost (in terms of FLOPs, training time, inference time, and memory usage) of SGN compare to the baseline models

A4. We have provided the corresponding response in A1 of W1.


Q2. Does the model generalize well to datasets with fewer variables or less clear periodic structure? The reliance on periodic patterns in PWSM might not apply broadly.

A5. Thanks for your valuable insight. We would like to respond from the following two perspectives:

  1. In the UEA datasets used in Table 7 of Appendix C.3, there are several datasets with few variables and weak periodicity, such as EthanolConcentration and Handwriting, each containing only three variables. Despite these challenges, our method consistently outperforms baseline models, demonstrating its robustness in low-dimensional and weakly periodic settings.
  2. While SGN is designed to leverage periodic patterns, it does not rely rigidly on periodicity. Instead, we incorporate a dynamic shifting mechanism that enables local context fusion across windows, allowing the model to remain effective even when the underlying periodic structure is weak or inconsistent.
评论

Thanks to the authors for the detailed response. While addressed part of my concerns, my issue of unified backbone is yet to be addressed, I will keep my scores

评论

We sincerely thank the reviewer for the valuable comments and constructive feedback. We would like to clarify that we implemented our method and all the baseline models within the unified framework provided by the Time-Series-Library project from Tsinghua University, following the same code structure to ensure fairness and consistency in experimental settings.

评论

Dear Reviewer,

We would like to kindly follow up regarding our rebuttal. We value your feedback greatly and would appreciate it if you could share any additional comments or questions when convenient. We are happy to provide further clarifications, experiments, or supporting materials at any time to facilitate the discussion.

Thank you very much for your time and consideration.

最终决定

The paper proposes a method for multivariate timeseries classification which works by grouping features, modeling inter and intra-group dependencies and learning multi-scale features. Most reviewers praised the paper's novelty, strong empirical results and clarity of presentation.

The comparison against other channel dependency methods is appreciated, as are the additional experimental results presented in response to reviewers eNCr and 6yyx.

During the response period, the authors addressed the reviewer's concerns, leading to a unanimous decision to accept the paper.