6.8

/10

Poster4 位审稿人

最低3最高5标准差0.8

3.3

置信度

创新性2.5

质量2.5

清晰度2.8

重要性2.8

NeurIPS 2025

SIFusion: A Unified Fusion Framework for Multi-granularity Arctic Sea Ice Forecasting

Jingyi Xu,Shengnan Wang,Weidong Yang,Keyi Liu,Yeqi Luo,Ben Fei,LEI BAI

OpenReview PDF

提交: 2025-05-10更新: 2025-10-29

摘要

关键词

Arctic Sea Ice ForecastingMulti-granularity Fusion

评审与讨论

审稿意见

评分: 5置信度: 32025-07-01

This paper addresses forecasting pan-Arctic sea ice concentration (SIC). Unlike prior (deep learning-based) methods, the proposed method takes into account several time scales from short-term fluctuations and long-term trends. Particularly, it embeds time-series data of three time scales independently, which are then processed by Transformer encoders. The attention block of the Transformer encoder captures interactions between time scales, while the subsequent feedforward network applies a nonlinear transformation to each time scale independently. The experiments show that the proposed method better predicts the sea ice extent (SIE) even when unusual increases in Arctic sea ice occur. The quantitative evaluations also exhibit the superiority of the proposed method.

优缺点分析

Strengths

The proposed method is designed with a clear awareness of the weakness of prior methods, namely, not considering the distinct contributions from different time scales.
A new deep model based on Transformer encoders with input and output embedding networks is proposed so that several time scales can be taken into account with their interplay.
The experiments demonstrate that the explicit introduction of multiple time scales significantly improves SIC forecasting. Table 1 shows that the proposed method (SIFusion) outperforms baseline methods that are specialized to specific time scales.

Weaknesses

Overall, this work is solid and appears to have no significant weaknesses. I raise the following questions to improve this work further.

The idea of introducing multiple time scales is intuitive, but deeper analysis may provide valuable insights. Namely:

The prior U-Net-based methods only use single-granularity input, but what if we train a model on datasets containing multiple time scales in a mixture? Does it still encounter non-trivial challenges? Why can the proposed model resolve it?
The embedding of input of different time scales is done by a shared spatial encoder. Are there any differences in the embedding vectors between time scales?
Which time scales are more important than others? For the task with 7 7-day lead time, does the longest time scale really help? Table 2 shows that training on single-granularity is insufficient, but it would be more interesting to know how other time scales complement the target time scale to predict.

The time complexity is also of interest. Generally speaking, taking into account multiple scales is more expensive than using a single scale.

Minor comments. Please be aware of the implication of the italic fonts in math. For example, in Eq. (1), $LN$ can be viewed as a product of two variables $L$ and $N$ . If authors want to mean any abbreviation of words, e.g., Layer Normalization (LN), it should be presented in the roman font $\mathrm{LN}$ . Same applies to $MLP$ , $z_s$ , and $\mathrm{SIC}_{diff}$ .

问题

Please refer to the three questions raised in Weaknesses.

局限性

Yes.

最终评判理由

This paper addresses forecasting pan-Arctic sea ice concentration (SIC). Unlike prior (deep learning-based) methods, the proposed method considers several time scales from short-term fluctuations and long-term trends. The experiments show that the proposed method better predicts the sea ice extent (SIE) even when unusual increases in Arctic sea ice occur. The quantitative evaluations also exhibit the superiority of the proposed method.

The rebuttal and discussion clarified my major concerns regarding the ablation study, providing evidence for the value of introducing independent time scales. Through the discussion, I understood that the proposed architecture does not offer computational efficiency gains. However, I do not consider this a critical limitation, and I therefore recommend that this work be accepted to the conference.

格式问题

No concerns on paper formatting

作者回复

2025-07-31

Thanks for acknowledging the value of our work and our effort to explore effective neural network architecture for polar scientific research.

Response to Q1: Training on a mixture of temporal scales.

Thanks for your question. Indeed, training prior U-Net-based approaches with a dataset that contains a mixture of time scales is an intriguing idea. Considering that the temporal distribution of different time scales is not identical, this could lead to inferior performance of those models. It could be elaborated by a succinct example. If we set the input length of the model to 7, i.e., the temporal step equals 7. The model will simultaneously learn the evolving Arctic sea ice pattern from 7 days, 7 weeks’ average, and 7 months’ average values. The mixture of mean and variance derived from the above three different time scales would generate a misleading data distribution that causes forecasting error.

In contrast, our proposed architecture mitigates this non-trivial issue by explicitly mapping encoded input sea ice data sequences from different temporal scales into independent sequential tokens. Therefore, the unique temporal variation pattern of a specific time scale is preserved rather than mixed with patterns from other time scales. Additionally, by adopting the proposed approach, our model could exploit inter-granularity correlations to improve the overall performance.

Response to Q2: Embedding vectors of different time scales.

Many thanks for raising this point. Based on the analysis from Q1, we know that the temporal distributions of different time scales are distinctive from each other. On the contrary, the spatial structure across different time scales is similar. For instance, given a monthly average of sea ice maps and a sampled daily sea ice map from the same month, it would be difficult to tell which one stands for the averaged value and which one represents the daily value. This also applies to the comparison between daily value and weekly average, or between monthly average and weekly average. It is because all time scales spatially represent the identical pan-Arctic area. The variation caused by averaging along the time axis will not drastically change the spatial structure and impede the learning of spatial embedding. Therefore, we adopt the shared spatial encoder approach, which is sufficient for compressing sea ice data derived from different time scales into a latent feature space. Additional benefits would be the simpler architecture and a reduction in overall parameters.

Response to Q3: Complement of different time scales.

We appreciate your insightful question. To further validate the importance of a specific time scale, we conduct experiments under the following settings: 1. Training set period: from the year 2000 to 2013; validation and test period remain unchanged. 2. We trained baselines of three single granularities and SIFusion with all three time scales on the training dataset. 3. We perform model training of three combinations: Daily-Weekly (Model-1), Weekly-Monthly (Model-2), and Daily-Monthly (Model-3). The overall architecture is similar to SIFusion, only that one granularity is removed from the modeling process.

The experiment results are organized as following:

Performance at Daily Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-1: Daily-Weekly (7 Days Daily)	0.0667	0.0120	0.9884	0.9863	0.109
Model-3: Daily-Monthly (7 Days Daily)	0.0610	0.0139	0.9861	0.9836	0.129
7 Days Daily Baseline	0.0823	0.0200	0.9670	0.9613	0.285
SIFusion (Daily Scale)	0.0261	0.0132	0.9883	0.9862	0.087

Adding long-term predictions to the daily time scale could improve the performance of daily forecasts. Since both weekly and monthly data contain previous long-term trends, it is expected to be useful and to facilitate initialization of the Arctic sea ice state. Considering the lower SIE error and higher R Squared for Model-1, it could suggest that the 7-day weekly data aligns more naturally with weekly granularity.

Performance at Weekly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-1: Daily-Weekly (8 Weeks' Average)	0.0658	0.0160	0.9743	0.9693	0.269
Model-2: Weekly-Monthly (8 Weeks' Average)	0.0719	0.0154	0.9697	0.9638	0.203
8 Weeks' Average Baseline	0.0775	0.0177	0.9671	0.9624	0.291
SIFusion (Weekly Scale)	0.0639	0.0132	0.9650	0.9582	0.253

We could find that predicting at a weekly time scale, along with shorter or longer time scales, could both be effective for improving the overall performance. This could be the benefit of additional training data. By combining weekly and monthly granularity together, Model-2 could offer RMSE and MAE performance close to the SIFusion model, and it could achieve more accurate sea ice extent identification.

Performance at Monthly Scale

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-2: Weekly-Monthly (6 Months' Average)	0.0698	0.0190	0.9144	0.9075	0.299
Model-3: Daily-Monthly (6 Months' Average)	0.0589	0.0121	0.8499	0.8393	0.315
6 Months' Average Baseline	0.0714	0.0167	0.8859	0.8766	0.346
SIFusion (Monthly Scale)	0.0640	0.0133	0.8870	0.8788	0.251

For the forecasting at monthly scale, we could find that adding daily granularity was slightly less significant than adding weekly averages prediction to the model.

Overall, from the above experiments, we could find that longer-term temporal granularity tends to provide useful information to help improve the overall performance. Adding an extra time scale to the model could be rewarding and verify our proposed approach.

Response to the Increased Training Time

Based on the above experiments, we collect the training time of all models to assess the increased training time by introducing extra temporal scales:

Training Time of Experiments

Model	Training Time (GPU hours)
7 Days Daily Baseline	16.6
8 Weeks' Average Baseline	18.3
6 Months' Average Baseline	14.5
Daily-Weekly	38.9
Weekly-Monthly	33.1
Daily-Monthly	31.9
SIFusion (All three time scales)	49.3

From the above table, we can see that, as temporal granularity increases, the training time almost linearly increases, which could be reasonable since the models are calculating with more data.

Response to Minor Comments

Thanks for your kind suggestion. We will revise the italic fonts in equations to accurately convey their meanings.

Once more, thank you for acknowledging the contribution of our work. We look forward to your active participation in the discussion. We will respond promptly.

评论- Response from Reviewer

2025-08-02

Thank you for your response. I have a few follow-up comments; please see them below. Overall, I'm willing to keep my scores, while the final judge can be affected by the consequences of the communication between the authors and other reviewers.

Response to Q1

The example assumes a naive mixture without time-scale info. This will fail for sure, as the problem is ill-defined - the model has no clue to distinguish, e.g., daily and weekly sequences. I assume some straightforward injection of time-scale, e.g., by adding or concatenation of a position vector that represents a time scale. My original question asks about the advantage of the proposed architecture over the simplest but reasonable baseline (i.e., mixture + simple injection of time-scale info that requires no architectural change).

Response to the Increased Training Time

Thank you for the supplemanray information. We have three scales and three times more training time (7 Days Daily Baseline vs SIFusion ). This means that the proposed architecture has no gain in time efficiency. So, SIFusion should beat all the time-scale specialized models simultaneously (Table 1 shows this is indeed the case); otherwise, training three specialized models is more practical, and they can be trained in parallel and efficiently in re-training.

2025-08-04

Respond to Q1:

Thank you for your speedy response and the kind explanation with insightful suggestion of directly injecting time scale information. We acknowledge the former ill-posed problem setting in our previous response. We re-design the training settings to further explore the impact of a mixture of time scales on the model's performance. The time information corresponding to a unique time scale is first embedded into tokens, then injected into the spatially tokenized SIC data. The newly generated tokens now have information about both the variation of sea ice and which time scale the SIC data represents. These tokens are input to the sequential modeling backbone for generating future predictions.

We have trained four models under the following mixture of time scale settings:

Model-1: Trained on a mixture of 7-day daily and 8 weeks' average data.
Model-2: Trained on a mixture of 8 weeks' average and 6 months' average data.
Model-3: Trained on a mixture of 7-day daily and 6 months' average data.
Model-4: Trained on a mixture of all three time scales.

We have attached the evaluation results below.

Performance at Daily Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-1: Mixture of Daily-Weekly (7 Days Daily)	0.1441	0.0490	0.8917	0.8722	0.298
Model-3: Mixture of Daily-Monthly (7 Days Daily)	0.1448	0.0490	0.9117	0.8958	0.399
Model-4: Mixture of All Time Scales (7 Days Daily)	0.1659	0.0685	0.7833	0.7440	0.509
7 Days Daily Baseline	0.0823	0.0200	0.9670	0.9613	0.285
SIFusion (Daily Scale)	0.0261	0.0132	0.9883	0.9862	0.087

From the table above, we can find that with direct injection of embedded time scale information, the mixture of training data causes a noticeable performance drop in daily scale prediction (SIFuion vs. Model-4). Specifically, the mixture of the daily-weekly scale (Model-1) suffers less severe performance drop than the other mixture settings (Model-2 & Model-3).

Performance at Weekly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-1: Mixture of Daily-Weekly (8 Weeks' average)	0.1410	0.0479	0.8784	0.8546	0.337
Model-2: Mixture of Weekly-Monthly (8 Weeks' average)	0.1414	0.4757	0.8533	0.8247	0.424
Model-4: Mixture of All Time Scales (8 Weeks' average)	0.2145	0.1064	0.4427	0.3372	1.969
8 Weeks' Average Baseline	0.0775	0.0177	0.9671	0.9624	0.291
SIFusion (Weekly Scale)	0.0639	0.0132	0.9650	0.9582	0.253

Similar to the model's performance at the daily time scale, their weekly performances when trained on different settings of a mixture of time scale data are also inferior to our SIFusion.

Performance at Monthly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-2: Mixture of Weekly-Monthly (6 Months' average)	0.1334	0.0453	0.4686	0.4317	2.209
Model-3: Mixture of Daily-Monthly (6 Months' average)	0.1343	0.0456	0.4254	0.3853	2.295
Model-4: Mixture of All Time Scales (6 Months' average)	0.1337	0.4562	0.5011	0.4665	2.163
6 Months' Average Baseline	0.0714	0.0167	0.8859	0.8766	0.346
SIFusion (Monthly Scale)	0.0640	0.0133	0.8870	0.8788	0.251

As to the performance on a monthly scale, the mixture of time scale data greatly impede the learning of useful monthly information in all settings (Model-1 to Model-4).

Overall, according to additional experiments, we could find that simultaneously learn from different temporal distributions is non-trivial and challenging.

Respond to Increased Training Time:

We fully agree that the increase of training time should bring benefits to the task to some extend. Our primary movtivation of our work is to explore the fusion of different temporal time scales and to advance the forecasting skill of sea ice models. Since sea ice forecasts at different time scales are constantly approaching the upper bound of forecast skill, our proposed SIFusion framework further pushes this upper bound through multi-granularity fusion while maintaining reasonable inference speed. We believe that this trade-off is acceptable.

Sincere thanks for your comments and patience for waiting us to complete the additional experiments. We look forward to your reply. Since we are now in the Author-Reviewer Discussion phase, please do not hesitate to raise any additional questions you may have. We are committed to addressing all reviewer concerns and will ensure timely responses. Thank you once again for your contributions to our work.

审稿意见

评分: 5置信度: 42025-07-02

Sea ice forecasting is critically important geopolitically, for transport, for ecosystems, and because of its effect on the long-term climate. AI models for sea ice forecasting have recently been developed over multiple different time scales (medium-range, sub-seasonal, and seasonal) which are competitive with, or outperform, the traditional physics-based modelling approaches at orders of magnitude smaller computational cost. Previous AI approaches have largely focussed on a single forecasting lead time and they have used inputs of just one temporal granularity. Due to the multiple time-scales of relevance for sea-ice forecasting, this paper takes a different approach, using inputs with a range of different granularities (daily, weekly or monthly averages of sea ice concentration) and predicting at medium-range, sub-seasonal, and seasonal time scales. The approach improves over strong baselines in several sea ice forecasting tests.

优缺点分析

Strengths

I enjoyed reading the paper. I like the central idea of a single model operating across all time-scales and granularities. I found the results in the paper impressive. I like the figures — it’s clear that a lot of work has gone into them — figure 3 is especially helpful for understanding the architecture. It was good that the paper compared alternatives to their multi-granular fusion module. The paper was pretty clear.

Weaknesses

I found the description of the architecture quite dense and hard to get through. The figure definitely helped, but I felt that the architecture involved many standard deep learning blocks arranged in a sensible way, rather than a major innovation. To be clear, I don’t see this as an issue — I think the major contribution is the idea to model many different temporal scales jointly and to show substantial improvement in the experiments — rather than a specific architecture that is necessarily the only one capable of doing this. However, I wonder whether the model description could be improved.

I might have expected some comparison to physical models in the results section. I suspect that in some cases the baseline AI approaches which are compared to have been shown to go beyond the physical models already, and so improving on the AI models entails further improvement over the physical models, but it would be useful to state what is known about this.

In the limitations section, there is a reference to climate variables. I think rather the authors may mean that the model predictions could be improved by coupling the sea ice models to medium range, sub-seasonal and seasonal weather forecasts. Meteorological variables, particularly wind speed and temperature, could be added as input variables. I would interpret climate as referring to time-scales beyond 1 year

Minor comments

line 84 "models usually rely on the high-performance computing of the CPU cluster “ -> "models usually require a high-performance CPU cluster”

line 89 "the temporal information inherent in sea ice modeling can not be fully exploited” - I think that this needs to be justified or backed up with a citation

line 93 - I found the discussion of multi-scale representations a bit confusing. On the one hand there is discussion of modelling multiple “spatial” scales in vision. Then this is connected to modelling multiple temporal scales for sea ice, but isn't it also important to capture multiple spatial scales in sea ice modelling? I’m not sure why this isn’t mentioned as it appears to be the more natural connection. Moreover, in weather modelling which is also arguably a similar multi-scale phenomenon, state-of-the-art models like GenCast or Aurora only condition on the last two six-hour snapshots of the atmosphere, but there are of course a range of temporal dynamical behaviour with different spatial scales which these methods capture (e.g. from tropical cyclones to the MJO). So it’s not strictly necessary to explicitly have multi-resolution temporal information in the input. At least in principle, as the physics are Markovian, if we did have the complete state, then it wouldn’t be necessary to have access to other inputs. In the light of this, I think the argumentation needs to be tightened up in this section.

line 122 "ill-posed properties” this isn’t the right term here - perhaps just say that the UNet is “ill-suited” to sequence modeling and spatial-channel fusion

line 215 "provided in Appendix” -> "provided in the Appendix”

line 236 "we adopt performance of sub-seasonal forecasting methods as SICNet90 [36], IceFormer [18], and seasonal forecasting methods IceNet [7], MT-IceNet [16] that reported in the original paper for reference.” -> "we use the performance of sub-seasonal forecasting methods SICNet90 [36] and IceFormer [18], and seasonal forecasting methods IceNet [7] and MT-IceNet [16] that is reported in the original papers."

The term “variate” is a bit over-used in my opinion and “variable” would be simpler, clearer, and more usual.

问题

It seems to me that there are at least two different reasons why the model might be performing well. First, the multi-scale information contains more information than one modality alone and this improves forecasts. Second, training across multiple different temporal scales learns a better shared representation of sea ice which facilitates better prediction. Do we know which one of these is the main contribution? E.g. I could imagine for the longer term predictions, the second reason might be the dominant one — would pre-training at medium range and fine-tuning to seasonal time-scales be an even better approach in this case, a bit like the Aurora model used for weather? I think this experiment is outside of the scope of the paper, but thinking through whether there is evidence of how the model is operating and whether the pre-train and fine-tune approach would be worse, would be fruitful.

局限性

Yes

最终评判理由

I thought this was a good contribution and the authors did quite an impressive amount of work during the rebuttal period.

格式问题

None

作者回复

2025-07-31

Thanks for valuing our work and acknowledging the effectiveness of the proposed approach.

Response to Q1: The main contributing time scale.

We totally agree with your opinion that longer-term predictions contribute more to the improvement of overall performance. For instance, assuming that our model forecasts at the beginning of a melting season, on the one hand, the prediction of the monthly average sea ice could provide critical hints that the extent of sea ice is about to drop drastically, which would facilitate the short-term predictions. On the other hand, the input of historical monthly average also provides the model with information about the amount and variation of sea ice and during the winter season, which is correlated to the strength of albedo and the energy exchange in the pan-Arctic region. This helps the short-term prediction to set up a correct initial state.

Response to Q2: Exploring pre-training and fine-tuning.

We appreciate your insightful question, and we fully agree that the pre-train and fine-tune approach is definitely worth exploring. Hence, we set up the following experiment for further investigation: 1. Training set period: from the year 2000 to 2013; validation and test periods remain unchanged. 2. Decreased model parameters for a shorter training set period due to the limited computational resources and time. 3. Baselines of three single granularities and SIFusion with all three time scales. 4. Pre-train and fine-tune experiments design: - a. Pre-train a 7 days single granularity model, then fine-tune on 6 months' average data. - b. Pre-train a 8 weeks' average model, then fine-tune on 6 months' average data. 5. We additionally trained two models for comparison: the SIFusion and the single-granularity model that performs at a monthly scale.

We report our experiment as follows: Performance at Monthly Scale

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Pre-train on 7 Days and Fine-tune on 6 Months' Average	0.1147	0.0396	0.5630	0.5362	1.89
Pre-train on 8 Weeks' Average and Fine-tune on 6 Months Average	0.0544	0.0129	0.9441	0.9400	0.283
6 Months' Average Baseline	0.0714	0.0167	0.8859	0.8766	0.346
SIFusion (Monthly Scale)	0.0640	0.0133	0.8870	0.8788	0.251

We could find that pre-training a 7 days forecasting model then fine-tuning on monthly data, suffers a performance drop in all metrics. This could be caused by the temporal distribution of daily and monthly scales, which are distinct from each other; hence fine-tuning the pre-trained model on monthly data could be difficult to converge. For the other setting, pre-training a 7 days forecasting model then fine-tuning for predicting 6 months' averages could improve the performance on most evaluation metrics, except that the identification of sea ice extent is inferior. The success of fine-tuning the model in this setting could indicate that the temporal distribution of medium range and seasonal time scales is closer than the distribution between daily and monthly scales, hence the forecasting skill could be further improved. We will continue investigating this approach and fully exploit the potential of the proposed pretrain-finetune architecture.

Response Weaknesses: Description of proposed architecture and performance of physical models.

Sincere thanks for your kind suggestion.

Improve the description of the model:

We agree that explaining our proposed architecture in plain text is hard to get through. We think the description of the model could be improved by providing an equation or example of the manipulation of the input data dimension. This could facilitate the understanding of the difference between our multi-granularity fusion module and the previously used vanilla Transformer.

Regarding the physical model:

Considering the sea ice forecasting physical model SEAS5 [1] only provides prediction of the binary index, which indicates the sea ice extent, from the first date of each month, it could only provide limited evalation metrics (for reference, we applied for the SEAS5 to forecast the first 216 days of 2016, the calculated SIE difference is 0.3359 millions of square killometers). We choose a surrogate model [2] that is usually utilized to benchmark dynamical sea ice forecasting models to provide more evaluation metrics. We train the model from the dynamical model benchmark under our experimental setting and compare it with our model. From the following table, we can find that the dynamical model benchmark performs well at the recent time scale, and the performance drops as the prediction horizon extends to the seasonal scale.

Comparison with Dynamical Model Benchmark Performance at Daily Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑
Dynamical Model Benchmark	0.0476	0.0081	0.9735	0.9705
7 Days Daily Baseline	0.0823	0.0200	0.9670	0.9613
SIFusion (Daily Scale)	0.0261	0.0132	0.9883	0.9862

Performance at Weekly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑
Dynamical Model Benchmark	0.0897	0.0190	0.9008	0.8902
8 Weeks' Average Baseline	0.0775	0.0177	0.9671	0.9624
SIFusion (Weekly Scale)	0.0639	0.0132	0.9650	0.9582

Performance at Monthly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑
Dynamical Model Benchmark	0.1020	0.0219	0.8892	0.8758
6 Months' Average Baseline	0.0714	0.0167	0.8859	0.8766
SIFusion (Monthly Scale)	0.0640	0.0133	0.8870	0.8788

Indeed, our initial intention in the limitation section is to discuss the potential of leveraging sea ice correlated meteorological and oceanic variables to facilitate forecasting. We will revise the expression to a more precise version.

Response to Minor Comments

Thanks you for kindly bringing these issues to our attention. We will accordingly revise the expression in line 84, 89, 122, 215, 236 and the use of term “variate”.

Reference

[1] Johnson, S. J.; Stockdale, T. N.; Ferranti, L.; Balmaseda, M. A.; Molteni, F.; Magnusson, L.; Tietsche, S.; Decremer, D.; Weisheimer, A.; Balsamo, G.; et al. 2019. SEAS5: the new ECMWF seasonal forecast system. Geoscientific Model Development, 12(3): 1087–1117.

[2] Niraula, B.; and Goessling, H. F. 2021. Spatial damped anomaly persistence of the sea ice edge as a benchmark for dynamical forecast systems. Journal of Geophysical Re- search: Oceans, 126(12): e2021JC017784.

We thank you once again and warmly invite you to engage in the ongoing discussion. We will response promtly.

评论- thanks for the response

2025-08-05

I'm happy with the author's response and will maintain my score.

2025-08-05

Thank you once again for dedicating your valuable time to the review. We sincerely appreciate your positive assessment of our work and the insightful suggestions that could help us further refine our paper.

Best regards,

Sincerely yours,

Authors.

审稿意见

评分: 3置信度: 32025-07-03

Thanks to the authors for their contributions.

This paper proposes the SIFusion or Sea Ice Fusion framework, a transformer-based forecasting framework that unifies multi-granularity Arctic sea ice concentration (SIC) prediction. The authors claim that the existing deep learning models typically forecast SIC at a fixed temporal resolution (e.g., daily or seasonal), missing the inter-granularity dependencies that may exist between different temporal scales. SIFusion introduces a novel design that independently embeds spatial features from daily, weekly, and monthly SIC data, then fuses them through a multi-head attention-based transformer backbone. The architecture models both intra- and inter-granularity dependencies. Extensive experiments on pan-Arctic SIC data are presented. Additionally, the model demonstrates robustness under abnormal SIC trends (e.g., 2022 anomalies) and superior generalization over unseen conditions.

优缺点分析

Strengths-

A novel contribution for Sea Ice Forecasting comprising of fusing multi-temporal information
Practical solution to a relevant problem, considering multi-time stamp input and multi-scale analysis for climate applications
Comprehensive documentation of the ablation study

Weaknesses-

No benchmarking with the existing multi-temporal modeling frameworks is present

问题

My questions are inline below-

It is not clear to me how the model will perform if additional climatic variables are added for the analysis. Do the authors perform any such ablation study to compare it with the baseline models?
How was the AI-ready dataset curated? It is recommended to explain the data curation steps properly for the reproducibility of research.
If the contribution is fusion of inter and intra-granularity, then how does the transformer architecture accommodate this?
What is the reason behind selecting only these time-stamps for temporal granularities? Will the model still perform if I change the temporal granularities? I'm curios to see how sensitive the model is for different multi-temporal granularities.
If the idea is to use transformer architecture for leveraging multi-temporal granularity information, why did the authors not consider using pre-trained weather and climate foundation models (FM)? Is it possible to use any of the existing pre-trained FMs and map it for SIC?
It is hard to assess the novelty in the proposed method since it seems like a conventional spatio-temporal processing, which many transformer networks are capable of doing. How is the "independent spatial tokenization" mentioned in the contribution different?

局限性

Yes. The authors have discussed the limitations.

最终评判理由

The authors provided clarification for the multi-temporal granularity and explained their implementation of independent spatial tokenization in the discussion period. However, lacking a direct comparison with the existing prominent deep learning models is still a concerning issue. I increased my score to acknowledge their efforts in providing detailed clarifications.

格式问题

Small-case Headings - introduction
Inconsistent formatting - Page 5

作者回复

2025-07-31

Thanks for your valuable review and comments of our work.

Response to Q1: Additional climatic variables.

Thanks for your insightful question. Adding multiple climatic variables could be effective and beneficial, as you suggested, but it also poses a non-trivial challenge for the model to learn multiple variables. We implement the integration of variable correlation to the variation of Arctic sea ice for verification based on the SIFusion architecture. We choose three atmospheric ERA5 variables, i.e., air temperature of 2 meters (t2m), u component of 10 meter wind (u10), and v component of 10 meter wind (v10), to investigate the impact of integrating them into our proposed architecture.

We conduct additional experiments under the following settings:

Training set period: from the year 2000 to 2013; validation and test period remain unchanged.
We download hourly global ERA5 data (with resolution of 128 x 256) and average 24-hour files into daily values. To geophysically align global ERA5 with pan-Arctic SIC data, we regrid and interpolate those variables to the same Arctic grid.
We trained baselines of three single granularities and SIFusion with all three time scales on the training dataset.
We utilize two sets of spatial encoder-decoders, one for SIC data and the other for encoding three ERA5 variables. The sequential backbone is the same as SIFusion. The input and prediction length of all ERA5 data and SIC are identical, i.e., 7 days. The additional variables are normalized to [0,1]. We denote this model as Model-1.

Experiment Results:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-1: Trained with Selected ERA5 Variables (7 Days Daily Performance)	0.0800	0.0462	0.9337	0.9157	0.401
7 Days Daily Baseline	0.0823	0.0200	0.9670	0.9613	0.285
SIFusion (Daily Scale)	0.0261	0.0132	0.9883	0.9862	0.087

Comparing the performance of Model-1 with 7 Days daily baseline, we find that adding ERA5 variables could lead to a slightly lower RMSE but a much higher MAE. Considering the lower R Squared, NSE, and larger SIE difference, the prediction error could be around the 15% sea ice concentration threshold for determining sea ice extent. Adding additional variables to the model could be challenging, especially when the spatio-temporal distribution of the variables is unique from each other. We will further investigate the effective integration of extra climatic variables.

Response to Q2: Dataset pre-processing.

Thanks for your kind suggestion. We provide a brief summary of our pre-processing procedure to curate the AI-ready dataset and will open-source the data pre-processing code to facilitate reproduction.

We download the NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration G02202 Version 4 dataset from the official website.
The official data was preserved in the NetCDF format. We use the open-sourced python package “netCDF4” to read data from the file with “.nc” suffix.
The “netCDF4” package provides us an API to extract sea ice concentration data by providing the official variable name, i.e., “cdr_seaice_conc”.
Since the acquired SIC data contains mask values that stand for land, ocean, and lake, we filter out these values with special meanings.
We calculate the weekly and monthly average of SIC based on daily values to generate the AI-ready dataset. We will add a brief summary of data curation to the dataset section.

Response to Q3: Intra and inter-gruanularity.

Thanks for raising this point. In our SIFusion architecture, the modeling of intra-granularity is naturally achieved by two steps:

Spatial patterns of sea ice are tokenized for each time step within a specific granularity.
Temporally concatenated spatial tokens are further embedded as one holistic token that represents the temporal pattern of the sea ice sequence.

For inter-granularity modeling, sea ice variations of each temporal granularity are first independently embedded as a single token. Then the multi-head self-attention is applied across the granularity dimension to explicitly capture pairwise correlations.

Response to Q4: Change of the temporal granularities.

Thanks for your intriguing question. Firstly, we choose to investigate the fusion of daily, weekly, and monthly granularity based on the literature of sea ice research. Since these three granularities are commonly studied scales, our motivation is to further exploit the inter-granularity between those time scales and to achieve simultaneous forecasting at all scales. Secondly, to further investigate the sensitivity of different granularity and quantitatively enhance the understanding of adding or removing specific granularity, we perform model training of three combinations: Daily-Weekly (Model-2), Weekly-Monthly (Model-3), and Daily-Monthly (Model-4). The overall architecture is similar to SIFusion, only that one granularity is removed from the modeling process. The experiment results are organized as follows:

Performance at Daily Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-2: Daily-Weekly (7 Days Daily)	0.0667	0.0120	0.9884	0.9863	0.109
Model-4: Daily-Monthly (7 Days Daily)	0.0610	0.0139	0.9861	0.9836	0.129
7 Days Daily Baseline	0.0823	0.0200	0.9670	0.9613	0.285
SIFusion (Daily Scale)	0.0261	0.0132	0.9883	0.9862	0.087

Adding long-term predictions to the daily time scale could improve the performance of daily forecasts. Since both weekly and monthly data contain previous long-term trends, it is expected to be useful and to facilitate initialization of the Arctic sea ice state. Considering the lower SIE error and higher R Squared for Model-2, it could suggest that the 7-day weekly data aligns more naturally to weekly granularity.

Performance at Weekly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-2: Daily-Weekly (8 Weeks' Average)	0.0658	0.0160	0.9743	0.9693	0.269
Model-3: Weekly-Monthly (8 Weeks' Average)	0.0719	0.0154	0.9697	0.9638	0.203
8 Weeks' Average Baseline	0.0775	0.0177	0.9671	0.9624	0.291
SIFusion (Weekly Scale)	0.0639	0.0132	0.9650	0.9582	0.253

We could find that predicting at a weekly time scale, along with shorter or longer time scales, could both be effective for improving the overall performance. This could be the benefit of additional training data. By combining weekly and monthly granularity together, Model-3 could offer RMSE and MAE performance close to the SIFusion model and could achieve more accurate sea ice extent identification.

** Performance at Monthly Scale:**

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
Model-3: Weekly-Monthly (6 Months' Average)	0.0698	0.0190	0.9144	0.9075	0.299
Model-4: Daily-Monthly (6 Months' Average)	0.0589	0.0121	0.8499	0.8393	0.315
6 Months' Average Baseline	0.0714	0.0167	0.8859	0.8766	0.346
SIFusion (Monthly Scale)	0.0640	0.0133	0.8870	0.8788	0.251

For the forecasting at the monthly scale, we could find that adding daily granularity was slightly less significant than adding weekly averages prediction to the model. Lastly, from the above experiments, we could find that longer-term temporal granularity tends to provide useful information to help improve the overall performance. Adding an extra time scale to the model could be rewarding and verify our proposed approach.

Response to Q5: Using pre-trained weather model.

We sincerely appreciate your insightful inquiry. Indeed, it is an innovative and quite promising idea to leverage pre-trained weather and climate foundation models to boost the overall performance. Specifically, the hidden atmospheric and oceanic dynamics learned by these foundation models are coupled with sea ice changes, and they could be leveraged to improve the forecasting skill. We will definitely look into the transfer of knowledge embedded in climate foundation models to SIC forecasting. We will add this to the discussion of future work.

Response to Q6: Independent spatial tokenization.

Thanks for your question. Most of prior works that leverage the U-Net architecture will concatenate variables from different time step to construct input batch with a dimension of [B,C,H,W], where B is the batch size, C stands for the total number of concatenated daily/weekly/monthly SIC data, H and W represents the shape of SIC data. There are no independent spatial and temporal encoding processes, since the data are concatenated along the time axis, and it will be transformed altogether by U-Net encoder. Different from prior approach, the shape of our proposed SIFusion has a [B,C,1,H,W]. This means that C stays unchanged during spatial feature extraction, i.e., "independent spatial tokenization", and we will get spatial tokens with dimension of [B,C,Feats]. This allows us to explicitly extract useful intra-granularity information that would benefit forecasting changes of sea ice. Moreover, we could further model inter-granularity correlations.

Response to weakness.

Thanks for your question. Since prior sea ice frameworks mainly focus on forecasting at single granularity, we evaluate the proposed framework by comparing their reported results.

Response to Formatting Issues.

We sincerely thank you for bringing these issues to our attention.

We will revise the heading to 'Introduction'.
We will adjust the spacing of Figure 4 to produce a consistent line of text on page 5.

Once more, sincere thanks for your review, and grateful for your insightful query. We look forward to your active participation in the discussion and we will respond promptly.

2025-08-05

Thank you to the authors for providing detailed explanations. Although, my concerns still remains the same that there is no comparison with other prominent deep learning models for sea-ice forecasting. I'm increasing my score to 3 attributing authors detailed explanation on temporal granularity.

2025-08-06

Thank you for raising your scoring of our paper.

Response to Concern:

To address your concern “no comparison with other prominent deep-learning models is provided”, we explicitly compare with MT-IceNet, a prominent approach for pan-Arctic sea-ice forecasting that leverages single- and bi-monthly SIC sequences.

We summarize the comparison of forecasting accuracy of different deep learning models in our main paper as follows (we add the additional recent weekly subseasonal DL model Unicorn, and the monthly seasonal DL model Atsicn for reference):

Performance at Daily Scale:

Model	RMSE ↓	MAE ↓
SICNet	0.0490	0.0100
ConvLSTM	0.0681	0.0263
PredRNN	0.0594	0.0220
SimVP	0.0640	0.0238
SIFusion (Daily Scale)	0.0261	0.0132

Performance at Weekly Scale:

Model	RMSE ↓	MAE ↓
Unicorn[1] (4 Weeks' Average)	0.071	0.23
SIFusion (8 Weeks' Average)	0.0639	0.0132

Performance at Monthly Scale:

Model	RMSE ↓	MAE ↓
IceNet	0.1820	0.0916
MT-IceNet	0.0777	0.0197
Atsicn [2] (Ensemble of LSTMs)	0.0987	NA
SIFusion (Monthly Scale)	0.0640	0.0133

To the best of our knowledge, no existing sea ice forecasting deep learning models adopt a multi-temporal approach at the daily or weekly scale, which further underscores the novelty of our proposed SIFusion framework.

To further verify the effectiveness of our proposed method, we re-implement the benchmark for state-of-the-art dynamical models, namely the spatially damped anomaly persistence [3]. The comparison results using our model are attached below.

Performance at Daily Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑
Dynamical Model Benchmark	0.0476	0.0081	0.9735	0.9705
7 Days Daily Baseline	0.0823	0.0200	0.9670	0.9613
SIFusion (Daily Scale)	0.0261	0.0132	0.9883	0.9862

Performance at Weekly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑
Dynamical Model Benchmark	0.0897	0.0190	0.9008	0.8902
8 Weeks' Average Baseline	0.0775	0.0177	0.9671	0.9624
SIFusion (Weekly Scale)	0.0639	0.0132	0.9650	0.9582

Performance at Monthly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑
Dynamical Model Benchmark	0.1020	0.0219	0.8892	0.8758
6 Months' Average Baseline	0.0714	0.0167	0.8859	0.8766
SIFusion (Monthly Scale)	0.0640	0.0133	0.8870	0.8788

The performance figures validate that our proposed SIFusion is more skillful in all three temporal scales.

Moreover, we applied for a run of the sea ice forecasting physical model from ECMWF, i.e., SEAS5 [4]. Since SEAS5 only starts the forecasting step at the first date of each month and only predicts the SIE, we calculate the first 6 months' average SIE predicted by the physical model as an additional reference:

Performance at Monthly Scale:

Model	SIE Difference ↓
SEAS5 for the first 6 month of 2016	0.335
6 Months' Average Baseline	0.346
SIFusion (Monthly Scale)	0.251

Thank you once again for your comments. Please do not hesitate to raise any further questions; we are committed to addressing all reviewer concerns promptly.

Reference

[1] Park, J., Hong, S., Cho, Y., & Jeon, J. J. (2024). Unicorn: U-Net for Sea Ice Forecasting with Convolutional Neural Ordinary Differential Equations. arXiv preprint arXiv:2405.03929.

[2] Zhu, Y., Qin, M., Dai, P., Wu, S., Fu, Z., Chen, Z., ... & Du, Z. (2023). Deep learning‐based seasonal forecast of sea ice considering atmospheric conditions. Journal of Geophysical Research: Atmospheres, 128(24), e2023JD039521.

[3] Niraula, B.; and Goessling, H. F. 2021. Spatial damped anomaly persistence of the sea ice edge as a benchmark for dynamical forecast systems. Journal of Geophysical Re- search: Oceans, 126(12): e2021JC017784.

[4] Johnson, S. J.; Stockdale, T. N.; Ferranti, L.; Balmaseda, M. A.; Molteni, F.; Magnusson, L.; Tietsche, S.; Decremer, D.; Weisheimer, A.; Balsamo, G.; et al. 2019. SEAS5: the new ECMWF seasonal forecast system. Geoscientific Model Development, 12(3): 1087–1117.

审稿意见

评分: 4置信度: 32025-07-03

This paper introduces SIFusion, a unified deep learning framework for multi-granularity Arctic Sea Ice Concentration (SIC) forecasting. The proposed method addresses the problem of fixed temporal granularity in existing works, and proposes to cultivate temporal multi-granularity (daily, weekly, and monthly specifically). The effectiveness is demonstrated empirically on real-world datasets.

优缺点分析

The paper is rather complete, positions itself as a worth-trying method on NSIDC data.

There are improved performance in SIC forecasting area.

The major concern is that the significance of this work is not enough for this venue. The paper lacks theoretical investigation on understanding how the method improves the performance. The setup seems pretty much tailored to SIC forecasting. Is there any idea that might be valuable beyond this problem?

问题

Maybe I missed it, how Swim transformer is selected? Is that a crucial part for the performance?

Is this method applicable to time series with missings?

Is the choice of daily, weekly, monthly crucial? I can see the choice of daily is natural, for weekly/monthly, is there any specific reason about not using a different scale, say 5 days, 10 days, etc?

局限性

yes.

最终评判理由

The authors addressed my concerns about the application on missingness and granularities. I therefore increased my rating to 4.

格式问题

Figure 5, what's meaning of Single and SIFM? line 147-148 are not properly shown.

作者回复

2025-07-31

Thanks for your dedication to reviewing our work.

Response to Q1: Choice of Swin Transformer.

Thanks for your question. We select Swin transformer as our spatial encoder mainly due to the following reasons: a. Swin transformer uses a shifted window strategy that helps the model to effectively capture spatial correlations and features between the neighboring local Arctic regions. b. Swin transformer builds a hierarchical feature map by merging neighboring sea ice map patches. This design could preserve local detail while effectively extracting the global feature of the pan-Arctic region, which would facilitate the model to more accurately capture the overall distribution of sea ice extent.

Response to Q2: Application for missing time series.

Thanks for your insightful question.To verify whether our method is applicable to missing time series, we conduct additional experiments under the following settings: 1. Training set period: from the year 2000 to 2013; validation and test period remain unchanged. 2. Decreased model parameters for a shorter training set period due to the limited computational resources and time. 3. We trained baselines of three single granularities and SIFusion with all three time scales on the training dataset without masking. 4. We randomly mask the input of SIFusion at a daily scale, and the number of applied daily masks N in [1,3].

We report the results as follows.

Performance at Daily Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
7 Days Daily Baseline	0.0823	0.0200	0.9670	0.9613	0.285
SIFusion	0.0261	0.0132	0.9883	0.9862	0.087
SIFusion (Masked at Daily Scale)	0.0633	0.0141	0.9751	0.9713	0.094

We could find that when training SIFusion with randomly masked daily values, the performance slightly drops, but still outperforms the daily baseline.

Performance at Weekly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
8 Weeks' Average Baseline	0.0775	0.0177	0.9671	0.9624	0.291
SIFusion	0.0639	0.0132	0.9650	0.9582	0.253
SIFusion (Masked at Daily Scale)	0.0679	0.0162	0.9688	0.9627	0.267

Performance at Monthly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
6 Months' Average Baseline	0.0714	0.0167	0.8859	0.8766	0.346
SIFusion	0.0640	0.0133	0.8870	0.8788	0.251
SIFusion (Masked at Daily Scale)	0.0719	0.0166	0.8711	0.8612	0.307

Similarly, the performance at the weekly and monthly scales also drops slightly, but they still perform better than the baselines. This could indicate that our SIFusion is still applicable to missing time series.

Response to Q3: Change of temporal scale.

We appreciate your insightful inquiry. Incorporating a more temporal scale contributes to the improvement of SIFusion's forecasting skills. We agree that incorporating weekly/monthly scales is a natural choice. The main reason why we specifically use 7 days on a daily scale is that the 7 days time naturally aligns with one week's time. This alignment with tokens in the weekly scale could facilitate the learning of the model. We perform a revised version of SIFusion with 10 days daily forecast time.

Performance at Daily Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
7 Days Daily Baseline	0.0823	0.0200	0.9670	0.9613	0.285
SIFusion	0.0261	0.0132	0.9883	0.9862	0.087
SIFusion (Trained with 10 Days at Daily Scale)	0.0643	0.0140	0.9772	0.9739	0.098

Performance at Weekly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
8 Weeks' Average Baseline	0.0775	0.0177	0.9671	0.9624	0.291
SIFusion	0.0639	0.0132	0.9650	0.9582	0.253
SIFusion (Trained with 10 Days at Daily Scale)	0.0664	0.0155	0.9579	0.9498	0.347

Performance at Monthly Scale:

Model	RMSE ↓	MAE ↓	R Squared ↑	NSE ↑	SIE Difference ↓
6 Months' Average Baseline	0.0714	0.0167	0.8859	0.8766	0.346
SIFusion	0.0640	0.0133	0.8870	0.8788	0.251
SIFusion (Trained with 10 Days at Daily Scale)	0.0663	0.0153	0.8874	0.8789	0.295

By comparing the variance of performance between SIFusion and SIFusion trained with 10 10-day leads at the daily scale, we could find that the daily and monthly figures are close, but the identification of sea ice extent at the weekly scale suffers a larger performance drop. This could indicate that keeping this alignment between the daily scale and a one-week token is beneficial. As to the 8 weeks' average at the weekly scale, the time span of 8 weeks is still within a month, so that one one-month token could represent the information of the weekly scale.

Response to Weaknesses: Valuable application beyond this problem.

Many thanks for raising this point. Our model is capable of capturing the variation of sea ice across multiple temporal scales, thereby enhancing overall forecasting skill. Specifically, the monthly scale incorporates important seasonal trends for predicting future sea ice. At the onset of the melt season, for example, monthly prediction of SIFusion provides guidance that could subsequently steer shorter-range forecasts based on the expected long-term trend. Moreover, the historical monthly means fed into the model carry the imprint of winter ice volume and its variability information intimately linked to pan-Arctic albedo strength and energy exchange, thereby facilitating short-term predictions with a more accurate initial state.

Considering the surface ocean directly contacts with the sea ice and the oceanic variable, e.g., sea surface salinity, which impacts the concentration of sea ice, our model could be extended to the prediction of oceanic variables and the joint modeling of multiple sea-ice-related variables.

Response to Formatting Issues.

Thanks for you for bringing formatting issues to our attention. We will revise Figure 5 and the caption to make a clearer description that it represents the comparison between single daily time scale and our proposed SIFusion.

Once again, sincere thanks for your review and insightful questions. We look forward to your active participation in the discussion and we will respond promptly.

2025-08-06

Thanks for the detailed reply. Most of concerns are addressed, and I hereby increased my rating and am inclined for acceptance. I'd encourage the authors include the experiments in the reply about missingness and granularity to the final version to close the gap.

2025-08-06

Thank you once again for your precious time in reviewing our paper, insightful suggestions raised in the comments, and for raising the rating towards acceptance of our paper. We fully acknowledge that the model’s capability to handle missingness and different granularity is indeed crucial, and we will surely incorporate this discussion into the experiment section of the revised final version of our paper.

Best regards,

Sincerely yours,

Authors.

2025-08-08

Comment:

Dear Reviewers, Program Chairs, Senior Area Chairs, and Area Chairs,

We sincerely thank you for your precious time, insightful suggestions, and engaging discussions throughout the review process.

As the Author-Reviewer Discussion session comes to a close, we are pleased that our rebuttal and revisions have addressed many of the reviewers’ concerns, and we are encouraged by the positive feedback we have received:

Reviewer 73pK found our work to be a “novel contribution” of fusing multi-temporal data for sea-ice forecasting, recognized our work as a “practical solution” to an important climate problem, and praised the “comprehensive documentation” of our ablation study.
Reviewer YeEK highlighted our paper as “rather complete” and a “worth-trying method” on NSIDC data, and the “improved performance” in SIC forecasting.
Reviewer vhuq “enjoyed reading the paper”, likes our “central idea of a single model operating across all time-scales and granularities” and praising the results as “impressive”, while also praising the “pretty clear” presentation and “helpful” figures.
Reviewer X1aA recognized our work “clear awareness of prior weaknesses,” praised the “new deep model” that enables “several time scales” and their “interplay”, and highlighted the “significant improvement” in SIC forecasting where SIFusion “outperforms” scale-specialized baselines.

In response to Reviewers' constructive suggestions, we have conducted additional experiments for clarification and exploration:

Design Choice and Ablation:
- As Reviewer YeEK suggested, we clarified that we leverage Swin Transformer to facilitate our model to capture the overall distribution of sea ice extent.
- As Reviewer YeEK suggested, we used a different scale for daily forecasts, and we will add this discussion to the experiment section.
- As Reviewer vhuq suggested, we added a comparison with the dynamical model benchmark.
- As Reviewer 73pK suggested, we changed the combinations of different temporal granularities. Our proposed SIFusion still has superior performance.
- As Reviewer 73pK suggested, we added two recent deep learning models for a more comprehensive comparison and further validate the effectiveness of our model.
Model Description and Explanation
- As Reviewer vhuq suggested, we will add equations to facilitate the understanding of the proposed architecture.
- As Reviewer 73pK suggested, we have elaborated on the curation of AI-ready dataset.
- As Reviewer vhuq and X1aA suggested, we explained that the long-term time scale could be the main contribution to the increase in performance.
- As Reviewer X1aA suggested, we discussed the spatial similarity between embedding vectors from different time scales.
- As Reviewer 73pK suggested, we explained the modeling of intra- and inter-granularity within our proposed SIFusion.
- As Reviewer 73pK suggested, we have clarified the difference between our independent spatial tokenization and previous approaches.
Extended Capability
- As Reviewer YeEK suggested, we applied our model to the missing time series scenario, and we will add this discussion to the experiment section.
- As Reviewer X1aA suggested, we realized models trained on different combinations of mixture of time scales along with tokenized scale information, which proved SIFusion could effectively leverage three temporal granularities.
- As Reviewer vhuq suggested, we explored the pre-training and fine-tuning approach.
- As Reviewer 73pK suggested, we added additional climatic variables to our model.
- As Reviewer 73pK suggested, we will add the discussion of leveraging pre-trained climate foundation models to our future work.
Computational Efficiency
- As Reviewer X1aA suggested, we reported the training time of the proposed SIFusion and baseline models.

Key contributions

We explored the potentially overlooked inter-granularity information by previous methods for Arctic SIC forecasting that could facilitate accurate predictions.
We proposed SIFusion to leverage independent spatial tokenization of SIC and effectively unify three temporal granularities for better overall representation and improved forecasting performance.
SIFusion could outperform previous deep learning models that only leverage single temporal granularity data, and it could be further applied to sea-ice related oceanic variables to advance towards a more practical Arctic sea ice forecasting system.

With less than 12 hours remaining, we welcome any further feedback and are happy to continue the discussion. Thank you once again for your thoughtful reviews and kind support.

Best regards,

Authors.

最终决定Accept (poster)

2025-09-17

This paper introduces SIFusion, a transformer-based framework for Arctic sea ice concentration (SIC) forecasting that addresses the limitation of existing deep learning models, which typically operate at a fixed temporal resolution and overlook dependencies across time scales. By independently embedding spatial features from daily, weekly, and monthly SIC data and fusing them through a multi-head attention-based transformer, SIFusion captures both intra- and inter-granularity correlations. Empirical results on pan-Arctic datasets show that SIFusion achieves improved accuracy, robustness to anomalies (e.g., 2022 SIC trends), and better generalization compared to strong baselines. Overall, while the contribution may be somewhat application-specific, the paper presents a well-executed study with convincing empirical evidence.