/10

Rejected4 位审稿人

最低2最高3标准差0.4

ICML 2025

ASRC-SNN: Adaptive Skip Recurrent Connection Spiking Neural Network

Shang Xu,Jiayu Zhang,Ziming Wang,Runhao Jiang,Rui Yan,Huajin Tang

OpenReview PDF

提交: 2025-01-22更新: 2025-06-18

摘要

关键词

spiking neural networks (SNNs)，recurrent spiking neural networks (RSNNs)，long-term temporal modeling，adaptive skip recurrent connection

评审与讨论

审稿意见

评分: 22025-03-07

This research considers neurons and recurrent structures as an integrated system and systematically analyzes gradient propagation along the temporal dimension, uncovering a difficult gradient vanishing problem. To tackle this challenge, the study proposes innovative architectural modifications that enhance the network's ability to maintain and propagate gradients over extended temporal sequences. Specifically, the introduction of ASRC significantly mitigates the gradient vanishing issue, allowing the network to better capture long-term dependencies in sequential data.

给作者的问题

See weakness

论据与证据

Yes

方法与评估标准

Yes

理论论述

Yes

实验设计与分析

Yes

补充材料

Yes

与现有文献的关系

N/A

遗漏的重要参考文献

N/A

其他优缺点

well written
The SRC, as an alternative to the vanilla recurrent structure, can effectively alleviate the gradient vanishing problem, which is crucial for the stability and training effect of the model when dealing with time series data

Weakness

The ASRC model does not outperform existing methods such as PMSN [1] and TC-LIF [2] on the PS-MNIST dataset. This suggests that the ASRC approach may face challenges in handling certain types of spatiotemporal data, indicating a need for further optimization to enhance its generalizability across diverse tasks.
The ASRC raise the model's computational complexity and increase training time and escalating computational resources.

[1] PMSN: A Parallel Multi-compartment Spiking Neuron for Multi-scale Temporal Processing. Arxiv 2024.

[2]Tc-lif: A two-compartment spiking neuron model for long-term sequential modelling. AAAI 2024.

其他意见或建议

N/A

作者回复

2025-03-31

We thank the reviewer's feedback. We also hope the reviewer will take note of other strengths of the paper, such as the positive points mentioned by Reviewer 3VEv、 Reviewer keA4 and Reviewer 9KzL. Below, we provide point-by-point responses to the weeknesses.

Weakness 1: The ASRC model does not outperform existing methods such as PMSN [1] and TC-LIF [2] on the PS-MNIST dataset. This suggests that the ASRC approach may face challenges in handling certain types of spatiotemporal data, indicating a need for further optimization to enhance its generalizability across diverse tasks.

Response 1: I. We would like to clarify that, as shown in Table 1 of this paper, our method outperforms TC-LIF. We acknowledge that it does not surpass PMSN on the PS-MNIST dataset. However, we want to emphasize that there are many works on neurons in this field, and our approach is based on a new perspective that focuses on the synergistic role between recurrent structures and neurons.

II. We would like to clarify that this paper focuses on long-term sequence modeling. While extending it to other spatiotemporal tasks is valuable, it is beyond the scope of this paper.

Weakness 2: The ASRC raise the model's computational complexity and increase training time and escalating computational resources.

Response 2: Below, we present the computational overhead of our models on PS-MNIST, the most complex dataset in our study, as shown in the two tables. The increase in memory consumption with larger values of $T_{\lambda}/\lambda$ is significant, while the increase in training time is relatively marginal. The training time of ASRC-SNN is approximately 17% longer than SRC-SNN. When $T_{\lambda}$ and $\lambda$ are close, the memory consumption of ASRC-SNN is slightly higher than SRC-SNN. Considering the trade-off between computational overhead and performance, we recommend selecting relatively small values of $T_{\lambda}/\lambda$ whenever possible. In this case, selecting $T_{\lambda} = 11$ and $\lambda = 8$ results in good model performance, while the additional computational overhead remains acceptable compared to the vanilla RSNN.

Table 1. computational metrics of ASRC-SNN on PS-MNIST

$T_{\lambda}$	Memory (GB)	Training time(hours)	Accuracy (%)
11	5.12	32.86	95.15
21	8.99	32.96	95.23
31	10.21	32.90	95.22
41	13.67	33.61	95.36
51	15.20	33.43	95.40

Table 2. computational metrics of SRC-SNN on PS-MNIST

$\lambda$	Memory (GB)	Training time(hours)	Accuracy (%)
1(vanilla)	2.43	27.93	84.59
2	2.67	28.23	90.65
8	4.11	28.10	93.83
16	6.39	28.20	94.48
24	8.21	28.36	94.44

审稿人评论

2025-04-02

I appreciate your response and extra experiments. Most of the concerns have been addressed. But ASRC is still not as good as PMSN, so I will not change my score.

作者评论

2025-04-07

We sincerely appreciate your acknowledgment that most concerns have been addressed, and we recognize PMSN as a strong contribution to the field. However, we highlight that direct comparison between the two works may not be fully equitable, as our research objectives are not entirely identical.

PMSN primarily aims at improving the spiking neuron model and emphasizes feedforward architectures for multi-scale temporal tasks, with a strong focus on computational efficiency. In contrast, our work focuses on recurrent spiking neural networks (RSNNs), exploring the interplay between RNN structures and spiking neurons. In this sense, we believe the two approaches are complementary rather than directly competing.

Beyond its contributions to SNNs, our ASRC method represents an architectural innovation for RNNs. As noted in our conclusion: "The essence of ASRC lies in learning a discrete position along the temporal dimension, with the potential to extend this method to learning a discrete position in both time and space." We believe this paradigm may offer broader methodological implications and inspire new research directions.

审稿意见

评分: 22025-03-13

This paper proposes an Adaptive Skip Recurrent Connection (ASRC) framework for Spiking Neural Networks (SNNs) to address gradient vanishing in long-term temporal modeling. By unifying the analysis of neurons and recurrent structures, the authors identify gradient propagation challenges and introduce SRC (fixed skip connections) and its adaptive variant, ASRC (learnable skip spans via temperature-scaled Softmax). Experiments on four benchmarks demonstrate ASRC-SNN’s superior performance and robustness compared to vanilla RSNNs and SRC-SNN.

给作者的问题

see weeknesses.

论据与证据

Most of the claims have corresponding evidence. e.g. the claim "SRC improves performance over vanilla RSNNs". Evidence on Table 1 shows SRC-SNN outperforms most of the prior SNNs.

方法与评估标准

The methodology and evaluation make sense. But longer temporal sequence tasks could be used to demonstrate the superiority of ASRC.

理论论述

Relevant theoretical analysis was checked, but no further Lemma or Theorem insight was provided.

实验设计与分析

The experimental dataset is validity, but lack additional inshightful experiments.

补充材料

All the appendixes have been reviewed.

与现有文献的关系

see below

遗漏的重要参考文献

There seems to be a corresponding literature

其他优缺点

Strengths:

The integration of adaptive skip connections into SNNs is innovative. The dynamic adjustment of skip spans through temperature annealing offers a fresh perspective on addressing gradient vanishing. The experiments covering multiple datasets and ablation studies. proposed ASRC could enhance SNNs’ applicability to complex temporal tasks.

Weaknesses:

The length of the temporal can be marked below the dataset to make it more intuitive.
The accuracy is not better than PMSN PS-MNIST datasets, and the network is also not recurrent.
It could be more additional experiment by incoporating other neuron on ASRC architecture to prove effectiveness and generality. The authors claim that PLIF and GLIF have no effect on ASRC, which can be illustrated by the specific data.
This work lack more deeper theory insight or more additional ablation experiment to explain more essentially. It could expolre the performance relationship bewteen the decay term $\alpha$ of LIF and $\lambda$ / $T_{\lambda}$ on the SRC/ASRC, there may show be some nature pattern or trend, because the LIF suffer from the temporal gradients vanishing problem, and the different $\alpha$ could influence the how fast the gradient in the temporal dimension disappears, and how $\lambda$ will solve this problem.
The increased GPU memory and training time of the corresponding w SRC and w ASRC of LIF can be given correspondingly as a reference.

其他意见或建议

More ablation experiments or other data sets can be performed to prove the ASRC more broadly. see weeknesses.

作者回复

2025-03-30

We sincerely thank the reviewer 9KzL for his review. Below we provide point-by-point responses to the Weaknesses.

Weakness 1: The length of the temporal can be marked below the dataset to make it more intuitive.

Response 1: We thank the reviewer for the helpful suggestion and will implement them in our final manuscript.

Weakness 2: The accuracy is not better than PMSN PS-MNIST datasets, and the network is also not recurrent.

Response 2: Yes, you're right. However, we clarify that there is a lot of works on neurons in this field, and our approach is based on a new perspective that considers the synergistic role between recurrent structures and neurons.

Weakness 3: It could be more additional experiment by incoporating other neuron on ASRC architecture to prove effectiveness and generality. The authors claim that PLIF and GLIF have no effect on ASRC, which can be illustrated by the specific data.

Response 3: Considering that if the neurons differ too much from LIF neurons, the gradient propagation in the time dimension would need to be reanalyzed and would likely lead to different conclusions than those presented in this paper, we have only considered PLIF and GLIF neurons.

Below, we present the neuron ablation experiment results for SRC-SNN and ASRC-SNN. First, consider Table 2. When replacing LIF with PLIF in SRC-SNN, performance improves on PS-MNIST but decreases on SSC. The skip span $\lambda$ is set to 12 for PS-MNIST and 3 for SSC. On PS-MNIST, where the skip span is larger, an appropriate membrane potential decay factor can better regulate the flow of information transmitted through the membrane potential within the skip span, especially between the first skip connection linked to the current time step, leading to improved performance. In SSC, where the skip span is smaller, controlling membrane potential transmission is likely less critical than in larger spans, making a slight performance drop reasonable. GLIF introduces additional complexity, significantly complicating the backpropagation topology in the temporal dimension of SRC-SNN. Learning proper temporal dependencies is challenging, so replacing LIF with GLIF in SRC-SNN leads to a performance drop.

In ASRC-SNN, replacing LIF with PLIF does not improve performance, possibly because simultaneously learning both an optimal match between the skip span and the membrane potential factor while ensuring global optimization is challenging. Similarly, the complex gating mechanism of GLIF introduces more learnable parameters, making it even harder for the model to learn a good match between the skip span and GLIF parameters, which explains the significant performance degradation when using GLIF. Additionally, we note that replacing LIF with GLIF increases the training time on PS-MNIST by approximately 60%.

Table 1. Ablation study on neurons in ASRC-SNN

neuron	Accuracy on PS-MNIST(%)	Accuracy on SSC(%)
LIF	95.40	81.93
PLIF	95.16	81.93
GLIF	94.19	80.44

Table 2. Ablation study on neurons in SRC-SNN

neuron	Accuracy on PS-MNIST(%)	Accuracy on SSC(%)
LIF	94.78	81.83
PLIF	95.11	81.57
GLIF	93.98	81.00

Weakness 4: This work lack more deeper theory insight or more additional ablation experiment to explain more essentially. It could expolre the performance relationship bewteen the decay term $\alpha$ of LIF and $\lambda/T_{\lambda}$ on the SRC/ASRC, there may show be some nature pattern or trend, because the LIF suffer from the temporal gradients vanishing problem, and the different $\alpha$ could influence the how fast the gradient in the temporal dimension disappears, and how $\lambda$ will solve this problem.

Response 4: We thank the reviewer for the helpful suggestions. I. We are training our model on sequential CIFAR(timesteps=1024), which takes a lot of time. We will provide the training results later. Additionally, we found that ASRC-SNN is more effective than SRC-SNN under sparse connectivity. Please refer to "Response 4" to Reviewer 3VEv.

II. Expolring the performance relationship bewteen the decay term $\alpha$ of LIF and $\lambda/T_{\lambda}$ on the SRC/ASRC requires extensive additional experimentation. We will include these detailed analyses in our final manuscript.

Weakness 5: The increased GPU memory and training time of the corresponding w SRC and w ASRC of LIF can be given correspondingly as a reference.

Response 5: Please refer to "Response 2" to wx2U.

审稿人评论

2025-04-03

I appreciate your response and extra experiments. Most of the concerns have been addressed. But I still have some concerns.

The intuition is that the ASRC architecture using more complex neurons than LIF should improve the performance. But the results do not, which shows that ASRC is not a general method, and the performance relies on the $\lambda$ and $T_{\lambda}$ empirical hyperparameters.

If the authors can show that ASRC or SRC is a general method framework, I will improve the score. Maybe you can try to apply TC-LIF/CLIF/PMSN or other neurons with ASRC or SRC compared to the naive recurrent architecture with the same neuron.

作者评论

2025-04-06

Thank you very much for your feedback. We truly appreciate your constructive suggestion regarding the generality of the ASRC/SRC framework.

As shown in Tables 1 and 2, both SRC and ASRC consistently achieve higher accuracy than the standard RSNN across different spiking neuron models (PLIF, GLIF, CLIF, TC-LIF) on two datasets (PS-MNIST and SSC). These results suggest that our method is not limited to a specific neuron type and can serve as a general recurrent structure for a wide range of spiking neurons.

Finally, we provide the performance of our models on the complex sequential CIFAR dataset (with 1024 timesteps). As shown in Tables 3 and 4, the results further demonstrate that SRC-SNN improves over RSNN, and that ASRC-SNN has even stronger ability in modeling long-term dependencies and shows better robustness compared to SRC-SNN.

Table 1. Evaluating the Generality of ASRC and SRC Across Spiking Neuron Models on PS-MNIST

neuron	Accuracy of RSNN(%)	Accuracy of SRC(%)	Accuracy of ASRC(%)
PLIF	86.6	95.11	95.16
GLIF	91.68	93.98	94.19
CLIF	88.58	92.57	94.62
TC-LIF	95.23	95.67	95.84

Table 2. Evaluating the Generality of ASRC and SRC Across Spiking Neuron Models on SSC

neuron	Accuracy of RSNN(%)	Accuracy of SRC(%)	Accuracy of ASRC(%)
PLIF	81.30	81.57	81.93
GLIF	79.47	81.00	80.44
CLIF	73.09	81.93	82.02
TC-LIF	69.16	75.13	77.37

Table 3. ASRC-SNN perfomerce on sequential CIFAR

$T_{\lambda}$	Accuracy(%)
21	71.51
41	71.88

Table 4. SRC-SNN perfomerce on sequential CIFAR

$\lambda$	Accuracy(%)
1(vanilla)	59.86
12	67.60
24	64.62
36	62.78

审稿意见

评分: 22025-03-14

The paper introduces ASRC-SNN, a spiking neural network architecture that incorporates adaptive skip recurrent connections to improve long-term temporal modeling. It identifies and addresses the gradient vanishing problem in recurrent spiking neural networks (RSNNs), which occurs when gradients propagate over long time spans. The authors propose replacing the standard recurrent connections with skip recurrent connections (SRC) and further extend this idea with adaptive skip recurrent connections (ASRC), where the model learns the skip span dynamically using a temperature-scaled softmax kernel. The model is evaluated on sequential MNIST, permuted sequential MNIST, Google Speech Commands, and Spiking Google Speech Commands datasets. Experimental results show that SRC improves over standard RSNNs, while ASRC further improves over SRC by allowing different layers to learn appropriate skip spans independently. The results suggest that ASRC-SNN is more robust and better at capturing long-term dependencies compared to existing methods.

update after rebuttal

Thanks to the authors for the detailed and thoughtful response. I appreciate the clarifications around computational overhead, the robustness to hyperparameters, and the discussion on potential local minima for skip spans. It’s also good to hear that larger-scale experiments are underway.

That being said, my overall view of the paper remains the same. While the additional explanations are helpful, they don't fully address the bigger concerns I had — especially the lack of experiments on more challenging datasets and the fairly limited theoretical grounding. The efficiency analysis is still missing, and the experimental gains, while promising, aren’t strong enough to outweigh the added complexity in my opinion. So I’m keeping my original score of Weak Reject.

给作者的问题

What is the computational overhead of learning adaptive skip spans compared to using fixed skips or standard recurrent connections? A runtime comparison would clarify whether ASRC is practical for large-scale problems.
How does ASRC-SNN perform on larger, real-world spiking datasets such as event-based vision tasks? Would the method still be effective when applied to more complex sequences?
How sensitive is ASRC-SNN to hyperparameter tuning? Does the softmax temperature decay require careful selection, or is the method robust across different datasets?
How does ASRC compare to alternative approaches such as spiking GRUs or gated recurrent SNNs, which also attempt to mitigate gradient vanishing? Would incorporating gating mechanisms alongside skip connections improve performance further?
Is there a risk that learned skip spans converge to suboptimal solutions? How does the model ensure that it selects useful temporal dependencies rather than defaulting to local minima?

论据与证据

The paper claims that ASRC-SNN solves the gradient vanishing problem in RSNNs and improves performance on long-term temporal tasks. The experiments provide reasonable evidence that ASRC-SNN performs better than baseline RSNNs, especially in datasets with long-term dependencies like PS-MNIST and speech recognition tasks. The claim that SRC mitigates gradient vanishing is plausible, as skipping over time steps can help maintain gradients, but the explanation lacks rigorous mathematical justification. The claim that ASRC is superior to SRC is supported by empirical results, but the advantage is marginal in some cases, making it unclear whether the complexity of learning adaptive skips is always justified. The paper does not analyze potential drawbacks, such as the additional computational cost of learning skip spans dynamically, and does not compare against other methods that might also alleviate gradient issues, such as gated spiking architectures. The evidence supports the claims to some extent, but the lack of theoretical analysis and broader comparisons weakens the argument.

方法与评估标准

The method of introducing skip recurrent connections is reasonable for addressing the gradient vanishing problem, and the evaluation criteria, mainly accuracy on temporal classification benchmarks, are standard in the field. The chosen datasets are relevant but mostly small-scale, meaning the real-world scalability of the method remains uncertain. The experiments focus only on accuracy, without discussing training efficiency, memory consumption, or computational overhead introduced by ASRC. It would be more convincing to analyze how much additional cost ASRC incurs and whether the improvement justifies it. The model is also not tested on more complex spiking datasets, such as neuromorphic event-based vision tasks, which would better showcase its advantages over simpler architectures.

理论论述

The paper provides some mathematical expressions to describe gradient propagation and the effects of skip recurrent connections, but it does not offer a formal proof that ASRC-SNN systematically mitigates the gradient vanishing problem. The key derivations related to gradient propagation are mostly heuristic, and the claim that SRC prevents vanishing gradients is stated without proving that it consistently avoids exponential decay in all cases. The theoretical basis for using the softmax-based adaptive skip mechanism is also weak; while the authors reference softmax annealing behavior, they do not prove that the model converges to an optimal skip span. A stronger theoretical argument, possibly including convergence guarantees or an analysis of how skip spans interact with different time scales, would improve the paper.

实验设计与分析

The experiments compare ASRC-SNN against several spiking models on relevant datasets, but the evaluation has some weaknesses. The authors do not provide runtime analysis or efficiency comparisons, which are crucial for spiking networks that often aim for energy efficiency. The reported improvements in accuracy are sometimes small, and there is no statistical significance analysis to show whether ASRC consistently outperforms SRC. The choice of datasets is somewhat limited to standard benchmarks, and the results do not explore whether ASRC-SNN generalizes well to more challenging real-world tasks. Ablation studies show the effect of skip coefficients and softmax kernel behavior, but there is no deeper analysis of why certain choices work better in specific settings. A comparison with alternative recurrent SNN architectures, such as spiking GRUs or LSTMs, would have made the evaluation stronger.

补充材料

The appendix provides hyperparameter settings and additional experimental details, but there are no major theoretical supplements. The provided tables help clarify the impact of skip coefficients, but the discussion of computational efficiency is missing. The training configuration details are useful but do not include a discussion on how sensitive the model is to hyperparameter tuning. No implementation details are provided to assess how easy it would be to reproduce the results.

与现有文献的关系

The paper is well-situated within the literature on recurrent SNNs and gradient propagation issues. It references prior work on improving spiking neuron models and incorporating recurrent structures, but the discussion is mostly limited to direct competitors. There is little connection to broader machine learning literature, such as meta-learning approaches for temporal dependencies or energy-based models that could also provide alternatives to gradient-based training. The paper does not discuss potential trade-offs compared to alternative recurrent architectures, such as gated SNNs, which have been explored in recent work.

遗漏的重要参考文献

No.

其他优缺点

The paper presents an interesting and well-motivated idea that could help mitigate gradient issues in recurrent SNNs. The introduction of skip connections and adaptive learning of skip spans is an intuitive extension, and the empirical results suggest it is a promising approach. However, the paper lacks a strong theoretical foundation, and the experimental results, while positive, do not clearly demonstrate that ASRC is necessary in all cases. The increase in model complexity from learning adaptive skips is not justified by efficiency analysis, and the small accuracy improvements on some tasks raise questions about whether the method is worth the added computational cost. The writing is clear overall, but some sections could be more precise in distinguishing between empirical observations and theoretical claims.

其他意见或建议

The paper would benefit from a more rigorous analysis of computational efficiency, including training time and memory usage comparisons. A theoretical discussion of when ASRC is expected to work better than simple skip connections would improve the argument. The authors should also explore whether ASRC generalizes well to more complex tasks beyond the datasets tested here. Finally, the discussion should include potential downsides of the approach, such as its sensitivity to hyperparameters.

作者回复

2025-03-30

We sincerely appreciate the reviewer's thorough and insightful comments. Below we provide point-by-point responses to the Questions.

Questions 1: What is the computational overhead of learning adaptive skip spans compared to using fixed skips or standard recurrent connections? A runtime comparison would clarify whether ASRC is practical for large-scale problems.

Response 1: Please refer to "Response 2" to Reviewer wx2U.

Questions 2: How does ASRC-SNN perform on larger, real-world spiking datasets such as event-based vision tasks? Would the method still be effective when applied to more complex sequences?

Response 2: I. This is a valuable direction to comprehensively assess the generalizability of our approach. However, to the best of our knowledge, the mainstream architectures used for event-based vision benchmarks, such as CIFAR10-DVS and DVS-Gesture, are not recurrent. This suggests that we may need to explore suitable new recurrent architectures. Additionally, our study focuses on long-term temporal modeling, whereas the timesteps for CIFAR10-DVS and DVS-Gesture are typically set to no more than 20. In summary, this is beyond the scope of this paper.

II. We are training our model on sequential CIFAR(timesteps=1024), which takes a lot of time. We will provide the training results later.

Questions 3: How sensitive is ASRC-SNN to hyperparameter tuning? Does the softmax temperature decay require careful selection, or is the method robust across different datasets?

Response 3: I. In our final setup, the learning rate of the softmax kernel is set to 100 times the global learning rate. Under this setting, ASRC-SNN is robust to the choice of the softmax temperature decay. Additionally, we find that setting the softmax kernel's learning rate to 1× or 10× the global learning rate requires careful decay rate adjustment.

II. We did not carefully select the softmax temperature decay. As mentioned in the last part of Section 3.4.1 of this paper: "In our experiments, the exponential decay factor is set to 0.96." Fine-tuning the decay precisely across different datasets could further improve ASRC-SNN.

Questions 4: How does ASRC compare to alternative approaches such as spiking GRUs or gated recurrent SNNs, which also attempt to mitigate gradient vanishing? Would incorporating gating mechanisms alongside skip connections improve performance further?

Response 4: I. The gating mechanism can create temporal shortcuts that help prevent gradient vanishing[1]. However, we believe these shortcuts are difficult to observe and inherently uncertain. SRC can creat fixed temporal shortcuts, while ASRC allows for flexible adjustment of the shortcut span.

II. Since this is not the core focus of this work and spiking GRUs do not have publicly available code, we have not explored this in SNNs. We will release our code after this paper is accepted, and we believe it is easy to extend.

[1] Dampfhoffer M, Mesquida T, Valentian A, et al. Investigating current-based and gating approaches for accurate and energy-efficient spiking recurrent neural networks[C]//International Conference on Artificial Neural Networks. Cham: Springer Nature Switzerland, 2022: 359-370.

Questions 5: Is there a risk that learned skip spans converge to suboptimal solutions? How does the model ensure that it selects useful temporal dependencies rather than defaulting to local minima?

Response 5: Thanks to the reviewer for the insightful questions. We've also given these issues some thought. First, we emphasize that the softmax kernel initially assigns equal weights to skip connections of different spans, with no manual bias. During training within an epoch, we do not guide the distribution of the softmax kernel’s weights. Only after completing an epoch do we slightly sharpen this distribution by reducing the temperature parameter. Therefore, these issues can be seen as related to whether certain parameters in BP-based neural networks will converge to local optima. Of course, we acknowledge that the convergence of the softmax kernel parameters is quite unique. This question may require knowledge from other fields, such as non-convex optimization theory, to answer. We can only state that our experimental results show that the convergence of the skip spans is good. Finally, we point out that the clear dynamic changes in the softmax kernel during training might provide material for research in other fields.

审稿意见

评分: 32025-03-17

This paper proposes the Skip Recurrent Connection (SRC) as a replacement for the vanilla recurrent structure and also proposes the Adaptive Skip Recurrent Connection (ASRC), a method that can learn the skip span of skip recurrent connection in each layer of the network.

给作者的问题

Not applicable.

论据与证据

This paper has a profound intention, which is "other works overlooking the importance of analyzing neurons and recurrent structures as an integrated framework."
The introduction is well-written and can vividly describe the problem being solved.

方法与评估标准

Eq.1-3 is confusing. Bringing equation 1 into 3, the result is $U^{l}[t]=\alpha U^{l}[t-1]-V_{t h} S^{l}[t]+I^l[t]$ , not consistent with Eq.4. Please check the symbol definitions carefully.
Theoretical analyses are intuitive and easy to understand but lack rigorous theoretical proof.

理论论述

Not applicable.

实验设计与分析

Experiments proved the effectiveness of SRC but not its generalization. For example, experiments lacking event-drive datasets, DVS-Gesture, CIFAR-DVS, and so on.
SRC is simple and effective. Can each timestep have a different $T_{\lambda }$ ?
This paper lacks an ablation experiment where the performance of SRC increases with the increase of timesteps.
In terms of power consumption considerations, since each neuron remains dense for different timesteps of links, the power consumption remains the same as before. So, can Neuron's connections for different timesteps be sparse?
This paper lacks ablation experiments on different neurons with SRC. You may not simply describe "and the results show no performance improvement with these substitutions." The reasons behind it should be analyzed.

补充材料

Not applicable.

与现有文献的关系

Not applicable.

遗漏的重要参考文献

Not applicable.

其他优缺点

Not applicable.

其他意见或建议

Not applicable.

作者回复

2025-03-31

We sincerely appreciate the reviewer's comments, especially some valuable questions raised. We also hope the reviewer will take note of other strengths of the paper, such as the positive points mentioned by Reviewer keA4 and Reviewer 9KzL. Below, we provide point-by-point responses to the questions.

Questions 1: Experiments proved the effectiveness of SRC but not its generalization. For example, experiments lacking event-drive datasets, DVS-Gesture, CIFAR-DVS, and so on.

Response 1: Please refer to "Response 2" to Reviewer keA4.

Questions 2: SRC is simple and effective. Can each timestep have a different $T_{\lambda}$ ?

Response 2: Is the reviewer suggesting that different time steps adapt to different skip spans? If so, this would be a further extension of ASRC-SNN. We would like to point out that this extension introduces additional parameters: $T_{\lambda} \times \text{number of layers} \times \text{timesteps}$ , whereas ASRC-SNN has $T_{\lambda} \times \text{number of layers}$ additional parameters compared to SRC-SNN. We will incorporate this experiment in our final manuscript as a reference. Additionally, we will release our code after the paper is accepted, and we believe it is easy to extend.

Questions 3: This paper lacks an ablation experiment where the performance of SRC increases with the increase of timesteps.

Response 3: We clarify that the timesteps for the datasets used in our paper are: GSC (101), SSC (250), SMNIST (784), and PS-MNIST (784). We will later provide experimental results of our models on sequential CIFAR with 1024 timesteps.

Questions 4: In terms of power consumption considerations, since each neuron remains dense for different timesteps of links, the power consumption remains the same as before. So, can Neuron's connections for different timesteps be sparse?

Response 4: Thank you for raising this great question! The experiments we conducted further demonstrate the advantage of ASRC-SNN. Below, we present our experimental results. Across different sparsity levels, ASRC-SNN consistently outperforms SRC-SNN. This advantage is particularly evident on the PS-MNIST dataset, which has complex temporal dependencies—where increasing sparsity further amplifies ASRC-SNN’s superiority over SRC-SNN. A possible reason is that sparse connectivity demands stronger temporal modeling capabilities, and ASRC-SNN is better suited to handle this challenge. Additionally, we observed that as sparsity increases, our model’s performance on SSC does not degrade significantly, which may be related to the tendency to overfit easily on this dataset.

Table 1. The performance of ASRC-SNN under different sparsity rates

Sparsity rate	Accuracy on PS-MNIST(%)	Accuracy on SSC(%)
0.00	95.40	81.93
0.25	95.04	81.83
0.50	93.84	81.53
0.75	90.21	80.51

Table 2. The performance of SRC-SNN under different sparsity rates

Sparsity rate	Accuracy on PS-MNIST(%)	Accuracy on SSC(%)
0.00	94.78	81.83
0.25	93.41	81.53
0.50	91.25	80.68
0.75	86.34	80.18

Questions 5: This paper lacks ablation experiments on different neurons with SRC. You may not simply describe "and the results show no performance improvement with these substitutions." The reasons behind it should be analyzed.

Response 5: Please refer to "response 3" to Reviewer 9KzL.

审稿人评论

2025-04-07

The author has addressed most of my concerns, I raised my score.

最终决定Reject

2025-05-01

The paper proposes adaptive skip connections to address the vanishing gradient problem in recurrent spiking neural nets. The reviewers find the idea innovative and well motivated and concede with the authors that the adaptive skip recurrent connections improve over fixed ones. The reviewers raise a number of concerns, including unclear performance on more complex datasets, questions about the efficiency of the method, limited empirical improvements given the added complexity and lower performance than existing prior art (PMSN). The authors have successfully addressed most of the concerns raised by the reviewers during the discussion period, but several reviewers remain somewhat reserved – although it seems that two of them did not see/acknowledge all efforts made to address their points (reviewer keA4: efficiency analysis; reviewer 9KzL: generality). While I think the paper could make a worthwhile contribution to ICML, the lack of support from the reviewers made it quite borderline, and calibration with respect to other submitted papers made it that it was excluded from the program.