5.8

/10

withdrawn4 位审稿人

最低5最高6标准差0.4

3.5

置信度

正确性2.8

贡献度2.5

表达2.8

ICLR 2025

When SNN meets ANN: Error-Free ANN-to-SNN Conversion for Extreme Edge Efficiency

Gourav Datta,Zeyu Liu,James Diffenderfer,Bhavya Kailkhura,Peter Anthony Beerel

OpenReview PDF

提交: 2024-09-28更新: 2025-01-26

TL;DR

In this paper, we propose a novel ANN-to-SNN conversion framework for ultra low-latency and energy efficiency using a new integrate-and-fire (IF) neuron model.

摘要

Spiking Neural Networks (SNN) are now demonstrating comparable accuracy to convolutional neural networks (CNN), thanks to advanced ANN-to-SNN conversion techniques, all while delivering remarkable energy and latency efficiency when deployed on neuromorphic hardware. However, these conversion techniques incur a large number of time steps, and consequently, high spiking activity. In this paper, we propose a novel ANN-to-SNN conversion framework, that incurs an exponentially lower number of time steps compared to that required in the existing conversion approaches. Our framework modifies the standard integrate-and-fire (IF) neuron model used in SNNs with no change in computational complexity and shifts the bias term of each batch normalization (BN) layer in the trained ANN. To reduce spiking activity, we propose training the source ANN with a fine-grained $\ell_1$ regularizer with surrogate gradients that encourages high spike sparsity in the converted SNN. Our proposed framework thus yields lossless SNNs with low latency, low compute energy, thanks to the low time steps and high spike sparsity, and high test accuracy, for example, 75.12% with only 4 time steps on the ImageNet dataset. Codes will be made available.

关键词

SNNANN-to-SNN conversionIF modelImageNetspiking activity

评审与讨论

审稿意见

评分: 6置信度: 42024-10-31

This work focuses on SNNs on CV tasks, particularly the image recognition task on ImageNet and CIFAR-10. The overall method is making SNN neurons more artificial to reduce the gap between the target ANN and the SNN after ANN-to-SNN conversion to achieve ultra-fast SNN inference. Constraints on firing rate and quantization are added during ANN training; an error correction method is used during SNN inference, specifically, by revising the spiking neuronal model.

The writing style and table style mainly follow [1], and Figure 1 likely follows the style of Figure 1 in [2].

[1] Bu T, Fang W, Ding J, et al. Optimal ANN-SNN conversion for high-accuracy and ultra-low-latency spiking neural networks[J]. arXiv preprint arXiv:2303.04347, 2023.

[2] Li C, Ma L, Furber S. Quantization framework for fast spiking neural networks[J]. Frontiers in Neuroscience, 2022, 16: 918793.

优点

The abstract is well-written, and I particularly like the first sentence. It clearly and effectively summarizes the current state of research on SNN algorithms.
Bias correction itself is not new, but this paper gives a format about conducting the bias shift with BN layers, which is new.

缺点

Line 59: "Our resulting SNN can be implemented on neuromorphic chips, such as Loihi (10)." Please provide experimental results demonstrating implementation on Loihi, or clarify how the proposed method is compatible with Loihi's constraints without actual implementation.
Line 125 "QCFS can enable ANN-to-SNN conversion with minimal error for arbitrary T and Q, where T denotes the total number of SNN time steps. ". How "arbitrary" is ensured? Please provide specific examples or experimental results demonstrating QCFS performance across a range of T and Q values, or explain the theoretical basis for this claim.
Line 159 "Importantly, the resulting function is equivalent to the ANN ReLU activation function, because ϕl(T)≥0." Is the resulting function Equation 6? Why does this function equal to ReLU? Please provide a more detailed explanation or mathematical proof of why ϕl(T)≥0 implies equivalence to ReLU, including any assumptions or conditions required.
Line 192 "As our work already enables a small value of T, the drop in SNN performance with further lower T <log2Q becomes negligible compared to prior works.". Why small T make your performance drop negligible? I do not see obvious logical relationships between time steps T and performance drop. Does any proof support this? Please provide empirical evidence or theoretical analysis demonstrating how the performance drop changes with T, particularly for T <log2Q, compared to prior works.
" Moreover, at low timesteps, the deviation error increases as shown in Fig. 2(a), and even dominates the total error, which highlights its importance for our use case. " Why will the deviation error dominate the total error? Is this your imagination or do you actually have quantitative experiments to support this? Please provide quantitative analysis or experimental results that break down the components of the total error at different timesteps, showing how the proportion of deviation error changes. Alternatively, citing papers that illustrated this point. I strongly disagree with giving such a strong claim without any support.
"In particular, spikes are transmitted to the next layer as soon as they are computed. Moreover, our implemented framework adheres to this scheme and thus our reported accuracies are consistent with the asynchronous implementation." This sentence is a little bit confusing. Why wait T time for each layer will not slow down SNN inference and will not damage the asynchronous implementation? Please clarify how your method maintains asynchronous behavior while still requiring T timesteps for each layer, perhaps by providing a more detailed description of the spike transmission process or a diagram illustrating the timing of computations across layers.
Line 804. In this equation, it seems that the β_c^l is replaced by β^l directly. Why it is legit to do this direct replacement?
Line 209 "In this section, we propose our ANN-to-SNN conversion framework, which involves training the source ANN using the QCFS activation function (5), followed by 1) shifting the bias term of the BN layers, and 2) modifying the IF model where the neuron spiking mechanism and reset are pushed after the input current accumulation over all the time steps.". This summary of Section 5 is completely missed to mention the contents in Section 5.2. Are the L1 Norm introduced in Section 5.2 a major part of your method anymore? Please include the content from Section 5.2 in the summary if it is indeed a major part of their method, or to explain why it was omitted and clarify its role in the overall framework.
Line 329. Could you give an example of what "a" in the equation represents? The other two questions are, how t in SNN can be explicitly represented in ANN during ANN training, and how summation is conducted in ANN training? Does it need to modify each layer`s computations to support summation on time dimension?
The addition of constraints in training for better ANN-to-SNN conversion, the modifications of neuronal models, and the application of error correction methods—as well as the overarching methodology of enhancing SNN performance by making SNN more artificial were not new ideas. It was proposed in [1] and probably some in other previous studies. Please give credits to previous related paper(s).
Are you the first one to use L1 norm in SNN research? If not, please cite previous papers and discuss the differences to previous work that uses L1 Norm. Also, I do not find the L1 norm contents to have a tight connection to your other methods. L1 norm is also not discussed in the related work section and the summary in the beginning of the method section.

[1] Li C, Ma L, Furber S. Quantization framework for fast spiking neural networks[J]. Frontiers in Neuroscience, 2022, 16: 918793.

问题

Except for questions in the weaknesses section, I have three minors listed below:

"However, these conversion techniques incur a large number of time steps, and consequently, high spiking activity". It is not obvious more time steps lead to high spike activity. Does any equation support this?
"while leveraging quantization-aware training in the ANN domain(2; 5). ". Could you specify which contents in reference (2) talk about "quantization-aware training in the ANN domain"?
“In fact, there is only a ∼3% (36.2% to 33.0%) drop in the spiking activity of a VGG16-based SNN”. Revise to "We observed ~3%...". "there is" is inappropriate as there is not any "there" that shows these detailed quantitative results in the rest of the paper. If you actually want to point to a position that presents these results, please write exactly which figure and results in which section you refer to. e.g. " “In fact, we can see from Figure x and it only has a ∼3% (36.2% to 33.0%) drop"

评论- Response to Reviewer iivj [3/4]

2024-11-27

Weakness: "In particular, spikes are transmitted to the next layer as soon as they are computed. Moreover, our implemented framework adheres to this scheme and thus our reported accuracies are consistent with the asynchronous implementation." This sentence is a little bit confusing. Why wait T time for each layer will not slow down SNN inference and will not damage the asynchronous implementation? Please clarify how your method maintains asynchronous behavior while still requiring T timesteps for each layer, perhaps by providing a more detailed description of the spike transmission process or a diagram illustrating the timing of computations across layers.

Response: We demonstrate in Appendix A.3.2 that, assuming the underlying hardware supports parallel processing, the layer-by-layer propagation method—requiring T time steps for each layer—results in lower delay compared to the step-by-step propagation method. In step-by-step propagation, the subsequent layer is processed only after the spike from the previous layer is computed. Intuitively, this is because parallel processing accelerates computations in layer-by-layer propagation, whereas the step-by-step approach is inherently sequential. This conclusion is further validated in Reference [3], accepted by Science Advances, in its section 'Distinction of Step Modes and Propagation Patterns'.

The layer-by-layer propagation approach supports asynchronous behavior by allowing each time step of each layer to be processed independently and concurrently. This enables a pipelined execution, where computations for one layer can proceed while others are ongoing, without requiring synchronization for the completion of all layers. Asynchronous chips, such as Loihi have forms of (asynchronously implemented) barrier synchronization to separate the processing from one time step to the next and this technique can used in our layer-by-layer propagation scheme just as it is used in the more conventional step-by-step counterpart. The output spikes in the emitting phase of our layer-by-layer scheme can be processed either one time step at a time sequentially or with a certain degree of parallelism using barrier synchronization between the time steps. Moreover, our proposed method can enable the transmission of the output spikes of a particular layer to the next layer (for potential aggregation) as soon as they are computed to increase concurrency and reduce latency.

Weakness: Line 329. Could you give an example of what "a" in the equation represents? The other two questions are, how $t$ in SNN can be explicitly represented in ANN during ANN training, and how summation is conducted in ANN training? Does it need to modify each layer`s computations to support summation on time dimension?

Response: As mentioned in line 314, $a^{i,l}_t$ denotes the $t^{th}$ bit of the $i^{th}$ activation value in layer $l$ . We would like to clarify that $t$ is the bit-position (starting from the least significant bit) in the ANN, not the time step. The summation is conducted across the different bit positions during ANN training.

Weakness: The addition of constraints in training for better ANN-to-SNN conversion, the modifications of neuronal models, and the application of error correction methods—as well as the overarching methodology of enhancing SNN performance by making SNN more artificial were not new ideas. It was proposed in [1] and probably some in other previous studies. Please give credits to previous related paper(s).

Response: We agree that prior works have focused on error correction methods between ANN and SNNs by making SNNs more artificial through various conversion approaches [4-6]. These relevant studies, including [4], are now referenced in the 'Related Works' section in our paper, where we also highlight that our method derives inspiration from these works. However, the key contribution of our work is that our proposed error correction method significantly reduces the accuracy gap between ANN and SNN at very low time steps ( $T{\sim}2{-}4$ ), while providing substantial computational efficiency, thanks to our $\ell_1$ regularizer.

Question: "However, these conversion techniques incur a large number of time steps, and consequently, high spiking activity". It is not obvious more time steps lead to high spike activity. Does any equation support this?

Response: It is challenging to theoretically prove that more time steps directly lead to higher spiking activity in SNNs. However, intuitively, as the number of time steps increases, the probability of additional spikes firing also rises, thereby increasing overall spiking activity. Empirically, this relationship is demonstrated in our work, as shown in Fig. 7. This phenomenon is also observed in other studies, such as Fig. 2 in [9], where the correlation between increased time steps and spiking activity is evident.

评论- Response to Reviewer iivj [2/4]

2024-11-27

Weakness: Line 192 "As our work already enables a small value of $T$ , the drop in SNN performance with further lower $T<log_2Q$ becomes negligible compared to prior works.". Why small $T$ make your performance drop negligible? I do not see obvious logical relationships between time steps $T$ and performance drop. Does any proof support this? Please provide empirical evidence or theoretical analysis demonstrating how the performance drop changes with $T$ , particularly for $T<log_2Q$ , compared to prior works.

Response: Unlike prior ANN-to-SNN conversion methods (e.g., QCFS [2] and others based on QCFS), which achieve similar but not identical ANN and SNN accuracies at $T=Q$ , our approach ensures identical accuracies for both ANN and SNN at $T=log_2Q$ . To achieve close to state-of-the-art (SOTA) ANN accuracy, a minimum value of $Q$ (typically $Q=8$ is required). Hence, prior methods struggle to achieve high-quality SNN performance with very small time steps (e.g., $T$ in the range of 2–4, which are considerably smaller than the typical value of $T=8$ ).

In contrast, our method ensures SOTA SNN accuracy even at smaller values of $T$ , specifically $T=log_2Q = 3$ for $Q=8$ . Moreover, the drop in accuracy when reducing $T$ further (e.g., to $T=1$ or $T=2$ ) is negligible, especially when compared to the significant performance degradation observed in prior works that rely on larger values of $T$ .

Empirically, this is demonstrated in Tables 1 and 2 of our paper, where our SNN accuracies significantly outperform those of prior works for $T = 2$ , which satisfies $T<log_2Q$ . It is important to note that most prior works, with the exception of QCFS [1], do not report accuracies for $T<log_2Q$ . This highlights the effectiveness of our approach, particularly for small values of $T$ , where other methods struggle to maintain competitive accuracy.

Weakness: "Moreover, at low timesteps, the deviation error increases as shown in Fig. 2(a), and even dominates the total error, which highlights its importance for our use case. " Why will the deviation error dominate the total error? Is this your imagination or do you actually have quantitative experiments to support this? Please provide quantitative analysis or experimental results that break down the components of the total error at different timesteps, showing how the proportion of deviation error changes. Alternatively, citing papers that illustrated this point. I strongly disagree with giving such a strong claim without any support.

Response: We apologize for not providing quantitative results to support our claim. We now provide empirical evidence demonstrating that the deviation error (modified to "unevenness error" in the revision per the suggestion from reviewer XCYA) dominates the total error at low time steps. The newly added Figure 3(b) in the revision shows the breakdown of different errors, with a clear illustration of how the proportion of unevenness error increases at lower time steps. For example, the percentage of the unevenness error increases from $57.7$ % to $73.9$ % as the number of time steps decreases from $6$ to $2$ . Specifically, Figure 2(a), which was included in the original manuscript, highlights the increase in deviation error as the number of time steps is reduced.

Weakness: Line 804. In this equation, it seems that the $β_c^l$ is replaced by $β^l$ directly. Why it is legit to do this direct replacement?

Response: We would like to clarify that we do not only replace $\beta_c^l$ with $\beta^l$ . We replace all the SNN trainable parameters with ANN counterparts, because we substitute the ANN activations with the accumulated SNN input current over $T$ time steps, as warranted by Condition-I, shown in Eq. 7.

Weakness: Line 209 "In this section, we propose our ANN-to-SNN conversion framework, which involves training the source ANN using the QCFS activation function (5), followed by 1) shifting the bias term of the BN layers, and 2) modifying the IF model where the neuron spiking mechanism and reset are pushed after the input current accumulation over all the time steps.". This summary of Section 5 is completely missed to mention the contents in Section 5.2. Are the L1 Norm introduced in Section 5.2 a major part of your method anymore? Please include the content from Section 5.2 in the summary if it is indeed a major part of their method, or to explain why it was omitted and clarify its role in the overall framework.

Response: We thank you for pointing this out. The $\ell_1$ norm introduced in Section 5.2 is indeed a key component of our method. The primary results, as shown in Tables 1–4, are based on ANN training with this fine-grained $\ell_1$ regularizer. We have now included the contents from Section 5.2 in the summary of our contributions at the beginning of Section 5.

评论- Response to Reviewer iivj [1/4]

2024-11-27

Thank you for your thorough and detailed review, as well as for acknowledging the strengths of our work. We have addressed the weaknesses and questions you raised below.

Weakness: Line 59, "Our resulting SNN can be implemented on neuromorphic chips, such as Loihi (10)." Please provide experimental results demonstrating implementation on Loihi, or clarify how the proposed method is compatible with Loihi's constraints without actual implementation.

Response: In order to enable the deployment of our SNN on Loihi, we implement our SNN with the proposed neuron model in the Lava-DL library [1], which supports modular operations, allowing us to flexibly reorder the neuron model's operational sequence. Specifically, we adapted the CUrrent BAsed (CUBA) leaky integrate-and-fire (LIF) neuron model by shifting from a sequential process—current accumulation, threshold comparison, and potential reset within each time step—to accumulating current across all time steps first, followed by threshold comparison and reset at each time step. Additionally, during threshold comparison and reset, we introduced a right-shift operation to halve the quantized membrane potential, adhering to Loihi's requirements.

The table below compares the accuracies of our SNN in PyTorch and Lava-DL. We observed an average accuracy drop of approximately $0.3\%$ on CIFAR-10 across both VGG and ResNet architectures when using the Lava-DL implementation compared to the PyTorch version. This discrepancy is likely due to the quantization of weights and synaptic inputs inherent to the Lava-DL framework, which introduces slight computational differences. These results are included in the revised manuscript to provide a detailed analysis of the impact of deploying the SNN model on Loihi via Lava-DL.

Architecture	Version	T	Accuracy(%)
VGG16	PyTorch	2	94.21
VGG16	Lava-DL	2	94.15
VGG16	PyTorch	4	95.82
VGG16	Lava-DL	4	95.61
ResNet18	PyTorch	2	96.12
ResNet18	Lava-DL	2	95.77
ResNet18	PyTorch	4	96.68
ResNet18	Lava-DL	4	96.02

Weakness: Line 125 "QCFS can enable ANN-to-SNN conversion with minimal error for arbitrary T and Q, where T denotes the total number of SNN time steps. ". How "arbitrary" is ensured? Please provide specific examples or experimental results demonstrating QCFS performance across a range of T and Q values, or explain the theoretical basis for this claim.

Response: Reference [2] states in Theorem II that QCFS enables ANN-to-SNN conversion with zero expected error for arbitrary values of $T$ and $Q$ , as proven in their Appendix A.3. However, this is primarily valid for large values of $T$ (e.g., $T$ \geq $8$ ), because for small values of $T$ , the unevenness error dominates, which is not addressed in [1]. Intuitively, the arbitrariness is guaranteed by the shift term of 0.5 in the ANN domain and the initial membrane potential of $\frac{\theta^l}{2}$ in the SNN domain, where $\theta^l$ denotes the SNN threshold in layer $l$ . Empirically, this is validated by the negligible difference between ANN and SNN accuracies for arbitrary $T$ \geq $8$ and $Q$ , as shown in Tables 2 and 3 of [2].

Weakness: Line 159, "Importantly, the resulting function is equivalent to the ANN ReLU activation function, because $\phi^l(T)≥0$ ." Is the resulting function Equation 6? Why does this function equal to ReLU? Please provide a more detailed explanation or mathematical proof of why $\phi^l(T)≥0$ implies equivalence to ReLU, including any assumptions or conditions required.

Response: The resulting function $\phi^l(T)$ , which is expanded in Equation 6, can be understood as a modification of the input current behavior. Specifically, $\phi^l(T)$ is equivalent to a ReLU function because it mimics the behavior of the rectified linear activation: it outputs zero for negative values of the input (since the accumulated input current ${s}^l(t)$ is zero when negative) and directly reflects the positive values of the input current. This behavior aligns with the typical ReLU function, where negative inputs are clipped to zero, and positive inputs are passed through unchanged.

This analogy is essential in understanding the transition from SNNs to ANNs using spike-based models and is consistent with how neuron models are adapted to be compatible with classical activation functions like ReLU in neural network architectures.

评论- Response to Reviewer iivj [4/4]

2024-11-27

Question: "while leveraging quantization-aware training in the ANN domain (2, 5). ". Could you specify which contents in reference (2) talk about "quantization-aware training in the ANN domain"?

Response: Reference (2) proposed the concept of the straight-through estimator, which forms the basis for quantization-aware training in the ANN domain. In the revised manuscript, we have modified the relevant section to read: "while leveraging quantization-aware training in the ANN domain (2,3,4,5), inspired by the straight-through estimator method (6).", where 3, 4, and 5 denote references [4], [5], and [6] below respectively. In particular, we credit additional works on ANN-to-SNN conversion techniques [4-6] that incorporate similar approaches, thus connecting our work to these advancements in the field.

Question: “In fact, there is only a ∼3% (36.2% to 33.0%) drop in the spiking activity of a VGG16-based SNN”. Revise to "We observed ~3%...". "there is" is inappropriate as there is not any "there" that shows these detailed quantitative results in the rest of the paper. If you actually want to point to a position that presents these results, please write exactly which figure and results in which section you refer to. e.g. " “In fact, we can see from Figure x and it only has a ∼3% (36.2% to 33.0%) drop"

Response: Thank you for this suggestion. Our Fig. 7(a) demonstrates that the spiking activity of a VGG16-based SNN drops by approximately $3$ % (from $36.2$ % to $33.0$ %) when the L1 regularizer is not used (please refer to $\lambda=0$ ). We have updated the sentence in the revision to: "In fact, we can see from Fig. 7(a) that the spiking activity of a VGG-16 based SNN drops only ${\sim}3$ % ( $36.2$ % to $33.0$ %)."

References

[1] https://github.com/lava-nc/lava-dl

[2] Bu T, Fang W, Ding J, Dai P, Yu Z, Huang T. Optimal ANN-SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks. International Conference on Learning Representations (ICLR), 2023.

[3] Fang W, Chen Y, Ding J, et al. SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence. Science Advances, 2023.

[4] Li C, Ma L, Furber S. Quantization framework for fast spiking neural networks[J]. Frontiers in Neuroscience, 2022.

[5] Hu Y, Zheng Q, Jiang X, Pan G. Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.

[6] Gao H, He J, Wang H, et al. High-accuracy deep ANN-to-SNN conversion using quantization-aware training framework and calcium-gated bipolar leaky integrate and fire neuron. Frontiers in Neuroscience, 2023.

[7] Narduzzi S, Bigdeli S, Liu S, Dunbar L. Optimizing The Consumption Of Spiking Neural Networks With Activity Regularization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

[8] Ho N, Chang I. TCL: an ANN-to-SNN Conversion with Trainable Clipping Layers. Design Automation Conference (DAC), 2021.

[9] Datta G, Liu Z, Beerel P. Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?. International Conference on Learning Representations (ICLR), 2024.

评论- Thank you

2024-11-27

Thank you once again for your detailed feedback! It has definitely helped improve the quality of our work. We also appreciate the increased score and your support throughout this process!

Best,

Authors

评论- Response to Weakness 11

2024-11-27

We apologize for missing the response to weakness 11. Please find the same below.

Weakness: Are you the first one to use L1 norm in SNN research? If not, please cite previous papers and discuss the differences to previous work that uses L1 Norm. Also, I do not find the L1 norm contents to have a tight connection to your other methods. L1 norm is also not discussed in the related work section and the summary in the beginning of the method section.

Response: To the best of our knowledge, we are the first to apply a bit-level $\ell_1$ regularizer during training in the ANN domain for ANN-to-SNN conversion. This technique enhances spike sparsity when the trained ANN is converted to the SNN. While there have been a few works exploring the use of $\ell_1$ regularizers in SNN training [7-8], we did not find any work that applies it specifically for ANN-to-SNN conversion. We have now referenced these works in Lines 114-117 in the 'Related Works' section to further highlight the connection of the $\ell_1$ regularizer to the central theme of our paper. Notably, the $\ell_1$ regularizer contributes to the energy efficiency results shown in Fig. 5, with a regularizer constant of $\lambda=1e-8$ .

[8] Ho N, Chang I. TCL: an ANN-to-SNN Conversion with Trainable Clipping Layers. Design Automation Conference (DAC), 2021.

评论- Response

2024-11-27

Thank you very much for your continued engagement and feedback. We are glad to hear that our response has addressed most of the weaknesses and concerns you raised. We are currently preparing a response to the points you raised. We just wanted to clarify which weakness you were referring to in point 7. Is it the following?

Line 804: In this equation, it appears that $\beta_c^l$ is directly replaced by $\beta^l$ . Could you clarify why this direct substitution is valid?

However, we believe this might actually be the weakness mentioned in point 6 above. Could you kindly clarify?

Best,

Authors

2024-11-27

Yes, this corresponds to "Line 804: In this equation, it appears that $\beta_c^l$ is directly replaced by $\beta^l$ "

评论- Thank you for your responses

2024-11-27

Thank you for your detailed responses. Most of the weaknesses and concerns are addressed. My replies to some points are listed below:

Weakness 3. I still do not see a clear causal relationship between "because ϕl(T)≥0" and "equivalence to ReLU." It seems that you have a clear explanation but in the sentence (Line 181), several intermediate steps are omitted in demonstrating how a positive value leads to equivalence with ReLU. To improve clarity and reduce confusion, consider revising the sentences to provide more detail. Alternatively, you could direct readers to a section where a detailed explanation of this causal relationship is provided.

Weakness 4. Thank you for your explanation. It is very clear, and I appreciate the detailed response.

Weakness 5. I appreciate the new results provided. They effectively address my concerns.

Weakness 7. Thank you for your reply. However, my concerns remain unresolved. In the last sentence of Theorem I (Line 815), $β_c^l$ does not equal $β^l$ . Yet, in the Proof (Line 820), $β_c^l$ is replaced with $β^l$ directly, which seems inconsistent.

Weakness 11. It appears that this weakness has not been addressed. If I missed your response to this point, please feel free to direct me to it.

Question 1. Since no existing theoretical proof supports the claim that more time steps lead to higher spike activity, using "Consequently" in this sentence implies an unsupported causal relationship. I suggest removing the causal language in this context.

评论- Response to follow-up weaknesses

2024-11-27

Thank you very much for your thoughtful comments and follow-up response. Please find our responses to your points below:

Weakness 3: We have now added the explanation as to how $\phi^l(T)$ is equivalent to ReLU in Lines 180-183 in the revision.

Weakness 7: We would like to clarify that we do not replace $\beta_c^l$ with $\beta^l$ directly.

In Line 820, we satisfy Eq. 7 by substituting the accumulated input current of the SNN over all time steps on the LHS and the ANN pre-activation value on the RHS. Upon closer inspection, the LHS accounts for the entire SNN pre-activation output (including both the convolutional and batch normalization layers), while the RHS includes only the ANN convolutional output (excluding batch normalization). This is done to satisfy Eq. 7. Thus, there is no direct replacement of $\beta_c^l$ with $\beta^l$ . We use the SNN parameters ( $W^l_{c}$ , $\mu^l_{c}$ , $\sigma^l_{c}$ , $\gamma^l_{c}$ , and $\beta^l_c$ ) in the accumulated input current on the LHS and the ANN parameters ( $W^l$ , $\mu^l$ , $\sigma^l$ , $\gamma^l$ , and $\beta^l$ ) in the ANN pre-activation value on the RHS. This formulation demonstrates how Theorem-I satisfies Eq. 7. Please let us know if this clarifies your concern. If not, we would be more than happy to provide further explanation.

Question 1: Thanks for this suggestion. We have now removed the word 'Consequently' in Line 15 the revision.

2024-11-27

Thank you for providing a detailed rebuttal. It effectively addresses all my concerns.

I have updated my score from 5 to 6. Good luck with your paper!

审稿意见

评分: 6置信度: 32024-11-01

This paper presents a novel ANN-to-SNN conversion framework that incorporates a modified IF neuron model and shifts the bias term of each batch normalization layer in the source ANN, along with a fine-grained ℓ1 regularizer. The framework achieves remarkable results, reaching 75.12% accuracy on ImageNet with only 4 time steps. This work makes two significant contributions: it achieves an exponential reduction in the required number of time steps while simultaneously establishing a new state-of-the-art benchmark for ANN-to-SNN conversion methods.

优点

The paper provides rigorous theoretical analysis, with clear mathematical proofs demonstrating how the modified IF neuron model and batch normalization layer bias shifts successfully eliminate quantization error at T = log2Q. The theoretical foundation is well-established and thoroughly documented.
The framework achieves exceptional empirical results, notably reaching 75.12% accuracy on the challenging ImageNet dataset with only 4 time steps. This represents a significant advancement in the field, as achieving such high accuracy with such low latency on large-scale datasets has been a longstanding challenge in ANN-to-SNN conversion research.

缺点

The motivation for introducing the fine-grained ℓ1 regularizer is inadequately explained. While the paper's primary goal is to address the low accuracy of ANN-to-SNN conversion under low time steps, the authors don't sufficiently justify why compressing spiking activity is necessary. This raises questions about whether such compression might actually degrade SNN performance rather than enhance it.
The paper lacks comprehensive comparisons with state-of-the-art BPTT-only methods. Current BPTT-based approaches have demonstrated the capability to achieve over 80% accuracy on ImageNet with only 4 time steps[1], yet this benchmark is not addressed in the comparative analysis.
The paper's error analysis framework lacks clarity. While the proposed method appears to target unevenness error, Section 4 focuses mainly on deviation error. In [2], deviation error is categorized into clipping error, quantization error, and unevenness error. The authors seem to use QCFS from [2] to address the first two errors and their novel ANN-to-SNN conversion framework to solve Unevenness error. However, this logical progression and the relationships between different error types are not clearly explained.
The paper exhibits organizational issues with equation numbering and placement. For instance, equations 3 and 4 on line 133 are not presented in sequential order, indicating a need for better structural organization of mathematical content.

[1] Yao M, Hu J, Hu T, et al. Spike-driven transformer v2: Meta spiking neural network architecture inspiring the design of next-generation neuromorphic chips[J]. arXiv preprint arXiv:2404.03663, 2024. [2] Bu T, Fang W, Ding J, et al. Optimal ANN-SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks[C]//International Conference on Learning Representations.

问题

Review and improve the logical consistency throughout the paper's details.
Clarify the similarities and differences between deviation error and unevenness error.
The authors should release their code to demonstrate the reproducibility of their proposed method.

评论- Response to Reviewer XCYA

2024-11-25

Thank you for your positive feedback and acknowledging the strengths of our work. Below, we address the concerns and questions you raised, providing detailed responses to each point.

Weakness: The motivation for introducing the fine-grained $\ell_1$ regularizer is inadequately explained. While the paper's primary goal is to address the low accuracy of ANN-to-SNN conversion under low time steps, the authors don't sufficiently justify why compressing spiking activity is necessary. This raises questions about whether such compression might actually degrade SNN performance rather than enhance it.

Response: While the primary objective of our work is to enhance the accuracy of ANN-to-SNN conversion under low time steps, we also aim to improve the energy efficiency of the converted SNN as a secondary goal. This improvement is achieved through the spiking activity compression facilitated by our $\ell_1$ regularizer. As demonstrated in Appendix A.4 and Figure 6, the accuracy drop resulting from this compression is minimal, particularly for $T>=3$ , where $T$ represents the total number of time steps.

Weakness: The paper lacks comprehensive comparisons with state-of-the-art BPTT-only methods. Current BPTT-based approaches have demonstrated the capability to achieve over 80% accuracy on ImageNet with only 4 time steps [1], yet this benchmark is not addressed in the comparative analysis.

Response: We apologize for the oversight in not including comparisons with state-of-the-art BPTT-based methods. To address this, we have now added comparisons with some of the latest BPTT-based approaches [1-2], including reference [1], in Table 3 of the revision. However, it is important to acknowledge that these methods rely on transformer networks, which may contribute to their superior accuracy. This is likely due to their enhanced representational power compared to our method, which is based on CNNs.

Weakness: The paper's error analysis framework lacks clarity. While the proposed method appears to target unevenness error, Section 4 focuses mainly on deviation error. In [2], deviation error is categorized into clipping error, quantization error, and unevenness error. The authors seem to use QCFS from [2] to address the first two errors and their novel ANN-to-SNN conversion framework to solve Unevenness error. However, this logical progression and the relationships between different error types are not clearly explained.

Response: We apologize for any confusion caused by the terminology issue. Specifically, when we mentioned "deviation error", we intended to refer to "unevenness error". This terminology has been corrected in the revised manuscript.

As you correctly noted, we utilize QCFS from [3] to mitigate the clipping and quantization error. Meanwhile, our proposed conversion framework specifically addresses the unevenness error. This clarification has been included in the updated text to ensure precision and transparency.

Weakness: The paper exhibits organizational issues with equation numbering and placement. For instance, equations 3 and 4 on line 133 are not presented in sequential order, indicating a need for better structural organization of mathematical content.

Response: We apologize for the organizational issues in the manuscript. To address these, we have revised the order of Eqs. 3 and 4, as shown in the updated version. Additionally, we have carefully reviewed the entire manuscript to ensure that the structural organization of the mathematical content is now consistent and clear.

Question: The authors should release their code to demonstrate the reproducibility of their proposed method.

Response: We have included our code in the supplementary material for your review, and we will make the code publicly available upon acceptance of the paper.

[1] Yao M, Hu J, Hu T, et al. Spike-driven transformer v2: Meta spiking neural network architecture inspiring the design of next-generation neuromorphic chips[J]. arXiv preprint arXiv:2404.03663, 2024. [2] Zhou Z, Che K, Fang W, et al. Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket[J]. arxiv preprint arXiv:2401.02020 [3] Bu T, Fang W, Ding J, et al. Optimal ANN-SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks[C]//International Conference on Learning Representations.

2024-11-29

Dear Reviewer XCYA,

We sincerely thank you for your time and effort in reviewing our manuscript and providing constructive feedback. As the discussion phase concludes, we hope our detailed responses have addressed your concerns effectively. If you require any additional clarifications or have unresolved issues, please feel free to reach out. We are more than willing to continue addressing your questions and improving our work based on your valuable insights.

Best,

Authors

2024-12-02

Many thanks for the detailed responses. I have decided to maintain the opinion that I agree to accept.

审稿意见

评分: 5置信度: 42024-11-02

This paper outlines a framework for transitioning Quantized Neural Networks (QNN) to Spiking Neural Networks (SNN) with the goal of markedly reducing the time steps for achieving top-tier accuracy while keeping computational complexity low. Modifications to the standard integrate-and-fire (IF) neuron model, adjustments to the bias terms in batch normalization (BN) layers, and the addition of regularizer terms are proposed to boost the efficiency of the conversion.

优点

The manuscript analyzes key errors inherent in ANN-to-SNN conversion approaches, highlighting areas critical for improving conversion accuracy.
The work implements binary encoding during the conversion of Quantized Neural Networks (QNN) to Spiking Neural Networks (SNN), which reduces the number of time steps required.

缺点

The paper's reliance on training ANNs from scratch using the QCFS activation function restricts its ability to convert pre-existing ANNs directly to SNNs. Thus, the definition of 'error-free' conversion provided is narrow, only ensuring the error-free transition from QCFS-equipped ANNs (equivalent to QNNs) to SNNs.
As in the domain of QNNs->SNNs, the approach to modifying integrate-and-fire (IF) neurons by separating their Accumulation and Generation phases does not significantly stand out from similar methods previously established in Spikeconverter [1].

[1] Liu, Fangxin, Wenbo Zhao, Yongbiao Chen, Zongwu Wang, and Li Jiang. "Spikeconverter: An efficient conversion framework zipping the gap between artificial neural networks and spiking neural networks." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 1692-1701. 2022.

问题

see Weaknesses.

评论- Response to Reviewer V7jF

2024-11-25

Thank you for your thorough and detailed review, as well as for acknowledging the strengths of our work. We have addressed the weaknesses and questions you raised below.

Weakness: The paper's reliance on training ANNs from scratch using the QCFS activation function restricts its ability to convert pre-existing ANNs directly to SNNs. Thus, the definition of 'error-free' conversion provided is narrow, only ensuring the error-free transition from QCFS-equipped ANNs (equivalent to QNNs) to SNNs.

Response: It is true that our ANN-to-SNN conversion framework is based on the QCFS activation function, and thus, it cannot be directly applied to ANNs trained using the ReLU function. However, we ran new experiments that demonstrate that we need to fine-tune the ANNs with the QCFS function for only a small number of epochs when they are pre-trained with the ReLU function. In particular, as shown in the Table below, for both VGG16 and ResNet20, we only need 30 epochs of fine-tuning with the QCFS function for ANNs pre-trained with the ReLU function to achieve the same accuracy as training with the QCFS function for 300 epochs (as done in our original experiments). These results have been included in Appendix A.9 in the revised manuscript, and we believe they illustrate the flexibility and practical applicability of our method.

Number of epochs	Architecture	Type	Accuracy
300	VGG16	QCFS pre-training	95.82%
30	VGG16	ReLU pre-training + QCFS fine-tuning	95.47%
300	ResNet20	QCFS pre-training	93.60%
30	ResNet20	ReLU pre-training + QCFS fine-tuning	93.51%

Weakness: As in the domain of QNNs->SNNs, the approach to modifying integrate-and-fire (IF) neurons by separating their Accumulation and Generation phases does not significantly stand out from similar methods previously established in Spikeconverter [1].

Response: While [1] separates the aggregation and emission phases, similar to our work, there are some notable differences, as explained in Section 5.1. We further elaborate these differences below.

(1) We train the source ANN using the QCFS activation function, which enables us to establish a mathematical equivalence between the activation output bits of the ANN and the spike outputs of the SNN at each time step. The specific activation function used in [1] is unclear, although it mentions ReLU activation.

(2) Our method embeds both the timing and binary value of spikes within the accumulated input current (as indicated by the term in Eq. 9). This approach allows us to achieve the same accuracy as the baseline SNN with a significantly reduced number of steps compared to [1], and with negligible complexity overhead.

(3) We provide a mathematical proof demonstrating that our proposed neuron model completely eliminates the conversion error, i.e., the difference between the ANN activation output bits and the SNN spike outputs at each time step. We provide empirical validation of this proof in Section 6.1. In contrast, [1] empirically shows that their inverted LIF model with k=2 only reduces, but does not eliminate the conversion error, without offering a mathematical justification.

(4) Reference [1] may incur additional training complexity within the SNN domain, which is costly since the backward pass requires the gradients to be integrated over every time step. The authors mention "For post-conversion training, we use SGD with..." in the "Results" section. Our method, however, relies solely on conversion and does not train the converted SNN at all.

We believe that due to the Points (2) and (3) above, our method yields better SNN accuracy compared to [1] at low time steps. We also believe that the superior accuracy of [1] may be partially attributed to Point (4).

That said, since [1] is a pioneering work on segregated neuron implementation, and is highly related to our method, we will discuss [1] in our 'Related Works' section, and summarize its differences with our method in the revision. Thank you for bringing it to our attention.

2024-11-29

Dear Reviewer V7jF,

Best,

Authors

审稿意见

评分: 6置信度: 32024-11-04

The paper presents a novel ANN-SNN conversion framework that significantly reduces the number of required time steps while maintaining high accuracy and computational efficiency. By integrating the QCFS activation function and adjusting the biases of the BN layers, the paper demonstrates how to achieve consistency in activation outputs between ANN and SNN without increasing computational complexity.

优点

The writing is generally smooth, and the arguments presented are supported by corresponding proofs and experiments.
The experimental results show certain advantages, suggesting that this may be a promising approach.

缺点

This paper merely introduces a spiking neuron model rather than a new conversion paradigm, which limits its novelty.
The integration of BN layers is not new.
The authors' discussion and proofs seem to revolve around the equation s^l(t) = a^l_{t}. However, the authors directly present this point without sufficiently establishing the rationale for the validity of this equation, which raises doubts.
The data presented in Table 1 is concerning, as it does not align with the data from the original papers cited.

问题

Can the authors provide a detailed example to clarify the meaning of the relationship between $s^l(t) = a^l_{t}$ , $T = \log_2 Q$ , and the relationship between $a^{(l-1)}$ and $2^{(t-1)} s^{(l-1)}(t)$ ?
In Table 1, for the ResNet18 model, the ANN accuracy for the OPI method is stated as 92.74%, whereas the original paper reports an accuracy of 96.04%. The results for the BOS method on VGG16 and ResNet18 models for T=8,T=16,T=32 are inconsistent with the original paper. Additionally, for the ResNet20 model, the original ANN accuracy for BOS is reported as 91.77%, but the authors present it as 93.3%. The results for T=6,T=8,T=16,T=32 also do not match those in the original BOS paper. Please provide a reasonable explanation for these discrepancies.

评论- Response to Reviewer ediR [1/2]

2024-11-25

Thank you for your thorough and detailed review, as well as for acknowledging the strengths of our work. We have addressed the weaknesses and questions you raised below.

Weakness: This paper merely introduces a spiking neuron model rather than a new conversion paradigm, which limits its novelty.

Response: Our approach is built on a conversion paradigm that integrates the following key steps:

ANN Training: We train an ANN model using the QCFS activation function with Q quantization levels and a bit-level fine-grained $\ell_1$ regularizer.

Parameter Transfer: All trainable parameters from the ANN, except the bias term of the BN layers, are transferred to the SNN. The bias term is specifically computed as detailed in Theorem-I to address the offset correction.

SNN Inference: The spiking neuron model is adapted during SNN inference to operate with $T=log_2Q$ time steps, as described in Theorem-II.

These steps collectively form a novel ANN-to-SNN conversion paradigm where the spiking neuron model is an integral component, rather than the sole contribution. This framework enhances the efficiency and accuracy of ANN-to-SNN conversion and addresses fundamental challenges such as activation offset correction. Thus, we believe our work advances the field beyond introducing a new spiking neuron model.

Weakness: The integration of BN layers is not new.

Response: While we acknowledge that the integration of batch normalization (BN) layers with convolutional layers is a well-established practice, the specific use of BN layers to correct the error offset between ANN and SNN activations is a novel contribution of our work. This approach, as also noted by Reviewer iivj, has not been explored before. By leveraging BN layers for this purpose, we provide a unique mechanism to bridge the gap between ANN and SNN activations, which we believe strengthens the originality of our method.

Weakness & Question: The authors' discussion and proofs seem to revolve around the equation $s^l(t) = a^l_{t}$ . However, the authors directly present this point without sufficiently establishing the rationale for the validity of this equation, which raises doubts.

Response: Thank you for pointing out this crucial aspect. The equation $s^l(t) = a^l_{t}$ plays a fundamental role in our conversion framework by establishing a direct mapping between the SNN spike outputs and the bitwise representation of the ANN activations in the QCFS layer.

Here’s an expanded explanation:

Bitwise Encoding of Activations: The QCFS layer outputs in the ANN represent activations in a quantized form with Q discrete levels. These activations can be expressed in a binary format, where each bit corresponds to a specific level of precision. The SNN emulates this process by assigning one time step (t) to each bit in the binary representation.

Spiking Representation: At each time step t, the integrate-and-fire (IF) neuron in the SNN outputs a spike ( $s^l(t)$ ) if the binary value of the corresponding bit in the QCFS activation is 1. This ensures that the cumulative spike train over $T=log_2Q$ time steps reconstructs the full quantized activation value $a^l$ of the ANN.

Zero Error Guarantee: This bitwise alignment inherently guarantees that the total activation encoded by the SNN matches the ANN activation precisely, leading to zero error between the final outputs of the ANN and SNN under these conditions. Consequently, this also ensures that the accuracies of the two networks are identical, as we have empirically validated in Tables 1 and 2.

Empirical Validation: The consistency between the theoretical framework and empirical results in our experiments further supports the correctness of the equation.

We have now summarized these details in the revised manuscript to clearly articulate the rationale behind this equation and its implications for the ANN-to-SNN conversion process.

A detailed example to clarify the meaning of this relationship is provided below.

评论- Response to Reviewer ediR [2/2]

2024-11-25

Let us assume that the source ANN has Q=8 quantization levels (3-bits), its layer $l$ has two neurons with pre-activation values 3.7 and 8.2, and $\lambda^l=7$ . This implies T=log $_2$ Q=3, and $\theta^l=7$ . The output of the QCFS function with these two neurons will be 4 and 7 respectively (see Eq. 2), and their bit-wise representation will be 100 and 111 respectively. Under the equivalence of Eq. (8), the aggregated input current to the SNN in layer $l$ , which is same as the initial membrane potential of the two neurons will also be $u^l_1(1)=3.7$ and $u^l_2(1)=8.2$ . Let us now compute the spike outputs of these two neurons ( $s^l_1(t)$ and $s^l_2(t)$ ) in the three time steps using Eq. (10) and (11).

1a. $T=1: s^l_1(1) = H(u^l_1(1)-\frac{\theta^l}{2^1}) = H(3.7-\frac{7}{2}) = 1$

1b. $T=1: s^l_2(1) = H(u^l_2(1)-\frac{\theta^l}{2^1}) = H(8.2-\frac{7}{2}) = 1$

2a. $T=2: u^l_1(2) = u^l_1(1) - s^l_1(1)\frac{\theta^l}{2^1} = 3.7-3.5=0.2, s^l_1(2) = H(u^l_1(2)-\frac{\theta^l}{2^2}) = H(0.2-\frac{7}{2^2}) = 0$

2b. $T=2: u^l_2(2) = u^l_2(1) - s^l_2(1)\frac{\theta^l}{2^1} = 8.2-3.5=4.7, s^l_2(1) = H(u^l_2(1)-\frac{\theta^l}{2^2}) = H(4.7-\frac{7}{2^2}) = 1$

3a. $T=3: u^l_1(3) = u^l_1(2) - s^l_1(2)\frac{\theta^l}{2^2} = 0.2-0=0.2, s^l_1(3) = H(u^l_1(3)-\frac{\theta^l}{2^3}) = H(0.2-\frac{7}{2^3}) = 0$

3b. $T=3: u^l_2(3) = u^l_2(2) - s^l_2(2)\frac{\theta^l}{2^2} = 4.7-1.75=2.95, s^l_2(1) = H(u^l_2(1)-\frac{\theta^l}{2^3}) = H(2.95-\frac{7}{2^3}) = 1$

Hence, $s^l_1={1, 0, 0}$ and $s^l_2={1, 1, 1}$ , which is similar to the bit-wise ANN representation.

Weakness & Question: The data presented in Table 1 is concerning, as it does not align with the data from the original papers cited.

Response: We have identified a typographical error in Table 1 concerning the ANN accuracy of the OPI method with ResNet18, where the value was mistakenly copied from ResNet20's ANN accuracy (92.74%). Additionally, we have corrected the ANN accuracy of the BOS method with ResNet20 on CIFAR10. We sincerely apologize for the oversight and have rectified these errors in the revision.

The BOS method accuracies reported in Table 1 of our paper differ from those in the original BOS paper. This discrepancy arises because the original paper does not account for the additional time steps required to initialize the membrane potential prior to SNN inference. To ensure a fair comparison with other ANN-to-SNN conversion methods, these initialization steps are included in the total number of time steps in our analysis, as noted in the caption of Table 1. For instance, the accuracy reported with T=2 in the BOS paper corresponds to the accuracy with T=6 in our paper.

2024-11-29

Dear Reviewer ediR,

Best,

Authors

撤稿通知

2025-01-26

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.