One-Hot Multi-Level LIF Spiking Neural Networks for Enhanced Accuracy-Latency Tradeoff
We propose a one-hot multi-level leaky integrate-and-fire (M-LIF) neuron to reduce the number of timesteps T during SNN inference while improving accuracy and maintaining the low-spike rates of traditional SNNs.
摘要
评审与讨论
This paper introduces a one-hot Multi-Level Leaky Integrate-and-Fire (M-LIF) neuron model, which expands neuron outputs beyond binary spikes by using multiple binary-weighted spike lanes. The M-LIF model enhances the accuracy-energy tradeoff by enabling higher accuracy with fewer timesteps compared to conventional SNNs. Experimental results demonstrate that M-LIF SNNs achieve higher accuracy on static datasets like ImageNet and significantly reduce latency on dynamic datasets like DVS-CIFAR10 while maintaining energy efficiency.
优点
-
Significance: The paper tackles a crucial challenge in SNNs by developing a method to reduce the number of timesteps while maintaining high accuracy, addressing the need for more energy-efficient SNNs.
-
Clarity: The paper is exceptionally well-written and organized.
-
Validation: The authors perform comprehensive experiments on both static and dynamic datasets.
缺点
-
Novelty: The proposed M-LIF neuron model closely resembles existing concepts such as multi-spike, burst spike, and multi-threshold neurons. While the approach of restricting outputs to powers of two introduces some variation, it does not fundamentally differentiate the model from prior research, potentially limiting its originality. And the author did not discuss any of those methods.
-
Performance: Although the paper presents extensive experiments, the results on datasets like CIFAR10 and CIFAR100 under equivalent timesteps (e.g., T=1, S=3 compared to traditional SNNs with T=4) show suboptimal performance. This raises concerns about the effectiveness of the proposed method in improving accuracy.
-
Lack of Hardware Implementation Discussion: The paper does not address how the proposed M-LIF model can be supported by existing or future hardware architectures. It remains unclear how the model can be practically deployed in neuromorphic hardware.
问题
-
Could you elaborate on the fundamental differences between your proposed M-LIF neuron model and existing models such as multi-spike, burst spike, and multi-threshold neurons?
-
In your energy calculations, do you factor in the weights associated with different spike lanes? For instance, if your M-LIF model operates with T=1 and S=3, making it equivalent to a conventional SNN with T=4, should the energy consumption not scale proportionally by the number of spikes (i.e., multiply by 4)?
-
How does your proposed M-LIF model align with current or emerging hardware architectures? Could you provide insights into the feasibility and potential challenges of implementing M-LIF neurons in hardware to achieve the claimed energy efficiency improvements?
Q4: In your energy calculations, do you factor in the weights associated with different spike lanes? For instance, if your M-LIF model operates with T=1 and S=3, making it equivalent to a conventional SNN with T=4, should the energy consumption not scale proportionally by the number of spikes (i.e., multiply by 4)?
A4: Thank you for raising this question. We appreciate the opportunity to clarify our energy calculations.
In one-hot M-LIF SNNs, the weights are indeed shared across spike lanes. One-hot M-LIF does not introduce overhead in that regard.
With respect to energy scaling proportionally, a (, ) one-hot M-LIF SNN would be equivalent to a () traditional SNN in terms of overall spike activity in the worst-case scenario where all neurons fire during every timestep. However, it’s important to note that the energy consumption is not solely based on the number of timesteps but is also determined by the firing rate, which is influenced by the learned threshold and leakage parameters of the neuron model during training. Therefore, the energy consumption does not scale proportionally with the number of spikes alone. Instead, it depends on the firing rates observed during inference. Our energy calculations take into account these firing rates, ensuring a more accurate estimation of computational energy. This reflects the actual operational conditions and provides a better understanding of the energy efficiency improvements offered by the one-hot M-LIF model.
We hope this explanation addresses your concern and makes the relationship between timesteps, firing rates, and energy consumption in our model clearer. Thank you again for your insightful question.
References
[1] Xiao et al., "Fast and accurate classification with a multi-spike learning algorithm for spiking neurons." In IJCAI 2019.
[2] Miao et al., "A supervised multi-spike learning algorithm for spiking neural networks." In IJCNN 2018.
[3] Wang et al., "MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds." In ArXiv 2023.
[4] Feng et al., "Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper Directly-Trained Spiking Neural Networks." In IJCAI 2022.
[5] Wang et al., “Bursting Spikes: Efficient and High-performance SNNs for Event-based Vision.” In arXiV 2023.
[6] Li et al., “Efficient and Accurate Conversion of Spiking Neural Network with Burst Spikes”. In IJCAI 2022.
[7] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[8] Lee et al., "Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators." In ICCD 2020.
[9] Lee et al., "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation." In HPCA 2022.
[10] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
Q3: Lack of Hardware Implementation Discussion: The paper does not address how the proposed M-LIF model can be supported by existing or future hardware architectures. It remains unclear how the model can be practically deployed in neuromorphic hardware. How does your proposed M-LIF model align with current or emerging hardware architectures? Could you provide insights into the feasibility and potential challenges of implementing M-LIF neurons in hardware to achieve the claimed energy efficiency improvements?
A3: Thank you for your insightful feedback. We appreciate the opportunity to elaborate on the hardware implementation aspects of the proposed M-LIF model.
The M-LIF model is designed to be adaptable to many existing hardware architectures. For example, previous work has shown the feasibility of leveraging systolic arrays for performing SNN inference using a small number of timesteps [8-9]. Adapting these methods to one-hot M-LIF SNNs would involve storing the exponent of the spike lane output {}, where and is the number of spike lanes, instead of traditional single-bit activation.
Our experiments indicate significant benefits from using up to four spike lanes () with accuracy improvements saturating beyond that point (as detailed in the ablation study in response to Q2 for Reviewer egRh). Consequently, storage requirements would increase by only 1 or 2 additional bits per activation compared to traditional SNNs. This minimal memory overhead allows us to achieve higher accuracy while also minimizing the number of timesteps, which scales linearly with the number of FP32 weights and membrane potential high-energy memory loads.
During computations, instead of masking the weight, we perform an INT8 addition on the exponent, as shown in Appendix A.1. This addition is approximately cheaper than the multiplication required in ANNs. For hardware implementation of one-hot M-LIF thresholding, a simple priority encoder can be used to return the first non-zero bit seen from the most significant bit of the membrane potential after accumulation.
Moreover, it is worth noting that current neuromorphic hardware such as Loihi 2 supports multi-level SNNs, as mentioned by Reviewer NG6F. Therefore, one-hot M-LIF SNNs can indeed be mapped onto such neuromorphic hardware, ensuring their practical applicability and alignment with existing and emerging hardware architectures.
References
[1] Xiao et al., "Fast and accurate classification with a multi-spike learning algorithm for spiking neurons." In IJCAI 2019.
[2] Miao et al., "A supervised multi-spike learning algorithm for spiking neural networks." In IJCNN 2018.
[3] Wang et al., "MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds." In ArXiv 2023.
[4] Feng et al., "Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper Directly-Trained Spiking Neural Networks." In IJCAI 2022.
[5] Wang et al., “Bursting Spikes: Efficient and High-performance SNNs for Event-based Vision.” In arXiV 2023.
[6] Li et al., “Efficient and Accurate Conversion of Spiking Neural Network with Burst Spikes”. In IJCAI 2022.
[7] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[8] Lee et al., "Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators." In ICCD 2020.
[9] Lee et al., "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation." In HPCA 2022.
[10] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
Q2: Performance: Although the paper presents extensive experiments, the results on datasets like CIFAR10 and CIFAR100 under equivalent timesteps (e.g., T=1, S=3 compared to traditional SNNs with T=4) show suboptimal performance. This raises concerns about the effectiveness of the proposed method in improving accuracy.
A2: Thank you for your valuable observation. We acknowledge that for the Transformer-2-512 architecture on CIFAR10 and CIFAR100, the performance of the one-hot M-LIF method (, ) is comparable to the SNN method () (% vs % on CIFAR10, % vs % on CIFAR100). However, our primary goal with one-hot M-LIF was to reduce the number of timesteps during SNN inference while maintaining accuracy and preserving the low spike rates characteristic of traditional SNNs.
By reducing the timesteps, we aim to decrease the energy overhead associated with multi-timestep processing. The need for multi-timestep processing poses a significant challenge for widespread SNN deployment, as it often results in increased memory storage and access requirements, which can exceed the computational costs. Therefore, the (, ) one-hot M-LIF SNN offers a more optimal tradeoff compared to the () SNN because it achieves comparable accuracy while avoiding the memory access overhead that scales linearly with timesteps.
For further illustration, please refer to our response to Q4 for Reviewer bytk, where we provide an example of the impact of multi-timestep processing on the total energy consumption of VGG16 for ImageNet, based on the memory energy model proposed in [7]. Additionally, for the larger and more challenging ImageNet dataset, evaluations of spike-driven Transformer-8-512 show that (, ) one-hot M-LIF SNN outperforms (, ) traditional SNN in terms of accuracy.
Lastly, our approach demonstrates improved performance for dynamic vision classification tasks on DVS-CIFAR10, as shown in Table 3 of our submission. It surpasses traditional SNNs in accuracy and preserves accuracy better than traditional SNNs in low-latency, memory-efficient regimes.
References
[1] Xiao et al., "Fast and accurate classification with a multi-spike learning algorithm for spiking neurons." In IJCAI 2019.
[2] Miao et al., "A supervised multi-spike learning algorithm for spiking neural networks." In IJCNN 2018.
[3] Wang et al., "MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds." In ArXiv 2023.
[4] Feng et al., "Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper Directly-Trained Spiking Neural Networks." In IJCAI 2022.
[5] Wang et al., “Bursting Spikes: Efficient and High-performance SNNs for Event-based Vision.” In arXiV 2023.
[6] Li et al., “Efficient and Accurate Conversion of Spiking Neural Network with Burst Spikes”. In IJCAI 2022.
[7] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[8] Lee et al., "Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators." In ICCD 2020.
[9] Lee et al., "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation." In HPCA 2022.
[10] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
Q1: Novelty: The proposed M-LIF neuron model closely resembles existing concepts such as multi-spike, burst spike, and multi-threshold neurons. While the approach of restricting outputs to powers of two introduces some variation, it does not fundamentally differentiate the model from prior research, potentially limiting its originality. And the author did not discuss any of those methods. Could you elaborate on the fundamental differences between your proposed M-LIF neuron model and existing models such as multi-spike, burst spike, and multi-threshold neurons?
A1: Thank you for your thoughtful question. We appreciate the opportunity to further clarify the novelty of our proposed one-hot M-LIF neuron model and its fundamental differences from existing approaches like multi-spike, burst-spike, and multi-threshold neurons. We have now added a paragraph in Section 2.5 of our submission to better contextualize these methods relative to our work.
Multi-spike [1,2] and multi-threshold [3,4] neurons increase activation precision by introducing uniform activation quantization into SNNs. This leads to higher computational costs, particularly in terms of energy consumption, as it breaks the multiplication-free nature of traditional SNNs and often requires multiplications or multiple successive additions to handle the increased precision. Burst-spike models [5,6] work by increasing the spike rate within a single timestep, which also enhances precision. However, burst-spike methods often do not scale down to unit-timestep processing, as demonstrated in [5], where results are provided for at least timesteps on ImageNet. Memory energy scales linearly with the number of timesteps, with single memory access costs often being higher than single compute energy costs [10]. In contrast, our one-hot M-LIF neuron reduces the timestep requirement to , which significantly cuts down on the associated memory access overhead.
Our one-hot M-LIF neuron utilizes a single threshold (and its powers of two) to ensure that only one spike lane fires per timestep. This restriction is learned during training, making it a unique feature not present in the multi-spike and multi-threshold approaches. This power-of-two encoding enables scaling weights efficiently with a single INT8 addition to their exponent values during membrane potential updates, avoiding the need for multiplications or complex additions that are typical in [1,2,3,4] models. Please also refer to the answer to Q1 for Reviewer NG6F for a comparison between one-hot M-LIF SNNs and multi-threshold [3,4] SNNs.
The unique features of our approach are also demonstrated in our experimental results. Unlike burst-spike [5,6] models that require multi-timestep processing, our model achieves high accuracy with unit-timestep processing, which is particularly beneficial for both energy efficiency and scalability. We provide comparisons against unit-timestep SNNs in Table 1 of our manuscript for static image classification. Finally, our method is scalable to more complex tasks like ImageNet classification and spike-driven transformer architectures.
References
[1] Xiao et al., "Fast and accurate classification with a multi-spike learning algorithm for spiking neurons." In IJCAI 2019.
[2] Miao et al., "A supervised multi-spike learning algorithm for spiking neural networks." In IJCNN 2018.
[3] Wang et al., "MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds." In ArXiv 2023.
[4] Feng et al., "Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper Directly-Trained Spiking Neural Networks." In IJCAI 2022.
[5] Wang et al., “Bursting Spikes: Efficient and High-performance SNNs for Event-based Vision.” In arXiV 2023.
[6] Li et al., “Efficient and Accurate Conversion of Spiking Neural Network with Burst Spikes”. In IJCAI 2022.
[7] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[8] Lee et al., "Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators." In ICCD 2020.
[9] Lee et al., "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation." In HPCA 2022.
[10] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
I appreciate the authors' detailed response and the effort put into addressing the concerns raised. However, after careful consideration and review of other reviewers' opinions, I maintain my original score.
The manuscript introduces a one-hot multi-level leaky integrate-and-fire (M-LIF) neuron model that represents the output of spiking neural networks (SNNs) as one-hot binary vectors. The focus is on training ultra-low latency spiking models with high accuracy, and the results show that the proposed model can outperform iso-architecture LIF models in terms of accuracy.
优点
The manuscript presents a simple, intuitive approach that is clearly described and achieves high accuracy with just a single timestep on large static image classification datasets, such as ImageNet. This approach offers an interesting solution for developing ultra-low latency SNN models.
缺点
The manuscript lacks comparisons with other methods that use graded (weighted) spikes. The idea of utilizing multibit outputs (one-hot encoded or otherwise) is not novel, as it has been previously explored, for instance in [1], and is a feature available in neuromorphic hardware like Loihi 2 [2], as well as in commonly used SNN libraries like snnTorch [3]. I recommend that the authors provide a discussion highlighting the unique features of the proposed model in comparison to these existing methods and tools.
Additionally, while the manuscript emphasizes the energy efficiency of the proposed spiking models over other SNN models and artificial neural networks (ANNs), these energy comparisons lack context without specifying the hardware architecture used. Moreover, as noted by the authors, memory access often dominates energy consumption, which is omitted from these comparisons. Given that memory-related energy can be orders of magnitude higher than compute energy, and considering that SNNs may require more memory access than ANNs due to membrane potential updates, the energy comparisons may be unrealistic. I suggest the authors replace these energy comparisons with more objective metrics, such as the number of multiply-accumulate (MAC) operations, memory usage, and average firing rate.
[1] Ponghiran, W., & Roy, K. (2022). Spiking Neural Networks with Improved Inherent Recurrence Dynamics for Sequential Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7), 8001-8008. https://doi.org/10.1609/aaai.v36i7.20771
[2] M. Davies et al., "Advancing Neuromorphic Computing With Loihi: A Survey of Results and Outlook," in Proceedings of the IEEE, vol. 109, no. 5, pp. 911-934, May 2021, doi: 10.1109/JPROC.2021.3067593.
问题
-
What unique features distinguish the proposed model from existing models in the literature, as well as from available neuromorphic hardware and open-source SNN projects? How does the proposed model perform in comparison to these alternatives?
-
Regarding the energy efficiency claims in the weaknesses section, could the authors elaborate on the assumptions made for the energy comparisons? How might these comparisons change if memory access energy is included?
Q3: What unique features distinguish the proposed model from existing models in the literature, as well as from available neuromorphic hardware and open-source SNN projects? How does the proposed model perform in comparison to these alternatives?
A3: Thank you for your thoughtful question. We appreciate the opportunity to further clarify the unique features of our proposed model.
Our one-hot M-LIF neuron utilizes one-hot (power-of-two) encoding of activations, meaning that only one of the spike lanes fires per timestep. This one-hot constraint is applied and learned during training, which sets our model apart from previous binary activation SNNs [4], [5], [6], [8], which do not incorporate such a learned one-hot activation scheme. This encoding approach has several advantages, as it ensures activation precision without the need for multiplication operations. Instead, the neuron’s output corresponds to powers of two, which can be efficiently used to modify the exponents of FP32 weights with a single INT8 addition during membrane potential updates, offering computational efficiency without sacrificing accuracy.
Additionally, our work includes a comprehensive evaluation across different architectures, including both convolutional neural networks and more complex, spike-driven transformer models [5]. This is in contrast to previous works that focused on specific networks or architectures.
Our model is distinct from the work of authors like those in [7], who introduced uniform activation quantization to offload multi-timestep processing and improve activation precision per timestep. However, this quantization approach introduces additional computational overhead and breaks the multiplication-free property of SNNs. Table 1 of our manuscript compares our one-hot M-LIF SNN to binary-activated SNNs that aim to minimize the number of timesteps while preserving accuracy. We also provide a comparison to log-quantized ANNs in Table 2, showing that our model, using unit-timestep processing, achieves competitive accuracy and energy efficiency. To clarify the performance differences with other methods, we have included the following comparison to [7] below:
| Neuron Model | Dataset | Architecture | T | Accuracy (%) | Comp. Energy (mJ) |
|---|---|---|---|---|---|
| LIF [1] | CIFAR100 | VGG-9 | 1 | 72.63 | 190 |
| Parallel-MT (2-bit uniformly quantized activation) [1] | CIFAR100 | VGG-9 | 1 | 73.89 | 440 |
| Cascade-MT (4-bit uniformly quantized activation) [1] | CIFAR100 | VGG-9 | 1 | 74.80 | 740 |
| LIF [3] | CIFAR100 | Spike-driven Transformer-2-512 | 1 | 75.8 | 0.221 |
| One-hot M-LIF (3 spike lanes, activations ∈ {0, 1, 2, 4}) | CIFAR100 | Spike-driven Transformer-2-512 | 1 | 78.2 | 0.478 |
As seen in this table, our one-hot M-LIF model (with 3 spike lanes) outperforms existing methods in terms of both accuracy and energy consumption, achieving an accuracy of % while using only mJ of computational energy, a marked improvement over the Parallel-MT and Cascade-MT models, which achieve % and % accuracy, respectively, but consume much higher computational energy.
IJCAI22 [10] also proposes introducing uniform activation quantization into SNNs but only evaluates on small datasets (CIFAR10, DVS-CIFAR10). We see that our one-hot M-LIF model significantly outperforms their results for both CIFAR10 and DVS-CIFAR10 datasets.
| Neuron Model | Dataset | Architecture | T | Accuracy (%) |
|---|---|---|---|---|
| MLF [2] | CIFAR10 | MLF(K=3) + spikingDS-ResNet | 4 | 94.25 |
| One-hot M-LIF (3 spike lanes, activations ∈ {0, 1, 2, 4}) | CIFAR10 | Spike-driven Transformer-2-512 | 1 | 95.4 |
| MLF [2] | DVS-CIFAR10 | MLF(K=3) + spikingDS-ResNet | 10 | 70.36 |
| One-hot M-LIF (4 spike lanes, activations ∈ {0, 1, 2, 4, 8}) | DVS-CIFAR10 | VGGSNN | 10 | 84.7 |
Our one-hot M-LIF SNN achieves higher accuracy (% vs. %) on CIFAR10 and % higher accuracy on DVS-CIFAR10 (% vs. %) when compared to [10], showcasing superior performance across both static and dynamic datasets.
Finally, we also note that AAAI22 [1] focuses exclusively on speech recognition tasks, whereas our experimental results are centered on static and dynamic image classification, providing broader applicability and relevance across a variety of tasks.
Please see references in first comment
Thank you for your detailed response.
Based on your reply, I understand that the key distinction of the proposed method compared to previous approaches lies in constraining the multibit representation of spikes to powers of two. The motivation for this choice is the improved efficiency of hardware implementations, as multiplication by powers of two can be executed more efficiently. While this approach appears to yield good accuracy, the paper lacks a detailed discussion of why this is the case. For instance, would the model achieve similar results if values other than powers of two were used, such as powers of (where is an arbitrary value), or even a different set of numbers not based on any specific power? A thorough discussion and analysis of why your approach outperforms prior methods, beyond simply presenting the results, would strengthen the paper.
Moreover, one of the claims is that the model can perform additions instead of multiplications, as in standard LIF neurons, supported by the explanation in Figure 4. However, in commercial hardware, the operation shown in Figure 4 would likely still be executed as a floating-point (FP) multiplication rather than an INT8 addition. I recommend including a reference to support the claim that commercial hardware performs multiplication of a floating-point number and a power of two as described, to substantiate this assertion.
Regarding the energy efficiency of the proposed method, I appreciate the inclusion of memory-related energy consumption. However, the discussion remains insufficient. I suggest considering the methodology presented in [1] for a more robust comparison of energy consumption between ANNs and SNNs. Alternatively, the authors may wish to soften their claims about the energy efficiency of SNNs compared to ANNs. Under the current assumptions, the energy consumption estimates are unrealistic, making claims such as "20x lower energy consumption than ANNs" inaccurate. Instead, I recommend focusing on comparisons among SNN models.
Additionally, it would be important to report the energy consumption (in terms of MACs and firing rates) and accuracy at iso-performance and iso-energy points when comparing SNN models. This analysis would provide a more comprehensive evaluation of the proposed method's efficiency.
[1] Zhanglu Yan, Zhenyu Bai, Weng-Fai Wong (2024). “Reconsidering the energy efficiency of spiking neural networks.” https://arxiv.org/abs/2409.08290
Q7: Additionally, it would be important to report the energy consumption (in terms of MACs and firing rates) and accuracy at iso-performance and iso-energy points when comparing SNN models. This analysis would provide a more comprehensive evaluation of the proposed method's efficiency.
A7: Thank you for your suggestion to report energy consumption at iso-performance and iso-energy points. We agree that this type of analysis can provide valuable insights. However, we would like to clarify the methodology employed in our work to evaluate energy consumption and accuracy.
In our approach, firing rates are not predetermined during training. Instead, the architecture, number of timesteps, and—for one-hot M-LIF SNNs—the number of spike lanes are fixed prior to training. The per-layer firing rates are derived during inference over the test set, consistent with methodologies established in prior works. Because firing rates are not fixed beforehand, it is inherently challenging to determine exact iso-energy or iso-performance points. However, this does not diminish the significance of the results we present. The comparisons across Tables 1, 2, and 3 demonstrate that one-hot M-LIF SNNs achieve better energy-accuracy tradeoffs and often highlight points where our method offers clear advantages in efficiency.
To illustrate this efficiency, we draw attention to specific results in Table 3:
- A traditional SNN (S=1, T=10) from ICLR22 [7] consumes of compute energy and achieves % accuracy. In contrast, a one-hot M-LIF SNN (S=4, T=10) consumes a lower compute energy while achieving a higher % accuracy.
- Similarly, a traditional SNN (S=1, T=3) from ICLR22 [7] consumes compute energy and achieves % accuracy. A one-hot M-LIF SNN (S=4, T=3) achieves a much better % accuracy with lower compute energy consumption of .
These examples, among many others provided across Tables 1, 2, and 3, highlight the proposed method’s superior computational efficiency. While we cannot compute strict iso-energy or iso-performance points due to the nature of firing rate estimation, the results clearly showcase how one-hot M-LIF SNNs achieve higher accuracy at lower compute energy levels compared to traditional SNNs.
References
[1] Yan, et al., “Reconsidering the energy efficiency of spiking neural networks.” In arXiV 2024.
[2] Miyashita et al., “Convolutional Neural Networks using Logarithmic Data Representation.” In arXiV 2016.
[3] Przewlocka-Rus et al., “Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks.” In tinyML 2022.
[4] Lee et al., "Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators." In ICCD 2020.
[5] Lee et al., "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation." In HPCA 2022.
[6] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A, Section 3-437. (2024). Intel Corporation. Available at: https://cdrdv2.intel.com/v1/dl/getContent/671200.
[7] Deng et. al, “Temporal efficient training of spiking neural network via gradient re-weighting.” In ICLR 2022.
[8] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[9] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[10] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[11] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
Q6: Regarding the energy efficiency of the proposed method, I appreciate the inclusion of memory-related energy consumption. However, the discussion remains insufficient. I suggest considering the methodology presented in [1] for a more robust comparison of energy consumption between ANNs and SNNs. Alternatively, the authors may wish to soften their claims about the energy efficiency of SNNs compared to ANNs. Under the current assumptions, the energy consumption estimates are unrealistic, making claims such as "20x lower energy consumption than ANNs" inaccurate. Instead, I recommend focusing on comparisons among SNN models.
A6: Thank you for referencing [1]. We appreciate the suggestion to refine our energy efficiency discussion, but we would like to clarify several key aspects:
First, the initial reviewer comment requested a discussion on how including memory-related energy consumption might impact our comparisons, without referencing [1] or any specific methodology. In response, we adopted a memory energy modeling approach based on prior work published at ICLR24 [11]. We conducted an iso-architecture comparison across multiple prior works and included the results for VGG16 on ImageNet in the rebuttal and the revised manuscript (Appendix A.5). This expanded discussion addresses the inclusion of memory access energy.
Second, the methodology in [1], while interesting, represents concurrent work submitted to arXiv only a month prior to the ICLR25 submission deadline (August 29, 2024). Our manuscript focuses on comparisons with prior, peer-reviewed works widely accepted by the community, such as ECCV22 [8], NeurIPS23 [9], and TNNLS23 [10]. These works employ well-established methodologies for estimating computational energy expenditure, which our manuscript also adopts.
Regarding the claim of "20 lower energy consumption than ANNs," we emphasize that this pertains specifically to computational energy expenditure and is consistently labeled as such throughout our manuscript (Tables 1, 2, and 3 under “Comp. Energy,” and in Sections 4 and 6). For example, Section 6 states “... enhance the computational efficiency by ”. For further clarity, we direct the reviewer to Section 4 (page 7), where we explicitly state:
“It is known that memory access energy can be significantly higher than compute energy (Horowitz, 2014; Han et al., 2015), and the number of memory accesses scales linearly with the number of timesteps in SNNs (Chowdhury et al., 2022). However, estimating memory energy improvements would depend on hardware architecture and system configuration. Therefore, as noted in Chowdhury et al. (2022), we are restricting our attention to the computational energy benefits, defined in Equation 11 (Chowdhury et al., 2022), of one-hot M-LIF SNNs and conventional SNNs over ANNs. As a result, we consider to be an optimistic energy gain estimate when . Note that when , memory requirements are identical for both SNNs and ANNs.”
In summary, our computational energy estimates are realistic, grounded in prior work, and clearly labeled to prevent misinterpretation. While we appreciate the reference to [1], incorporating it fully would require significantly broader adjustments that go beyond the current scope of our manuscript. Nevertheless, we thank the reviewer for the suggestion.
References
[1] Yan, et al., “Reconsidering the energy efficiency of spiking neural networks.” In arXiV 2024.
[2] Miyashita et al., “Convolutional Neural Networks using Logarithmic Data Representation.” In arXiV 2016.
[3] Przewlocka-Rus et al., “Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks.” In tinyML 2022.
[4] Lee et al., "Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators." In ICCD 2020.
[5] Lee et al., "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation." In HPCA 2022.
[6] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A, Section 3-437. (2024). Intel Corporation. Available at: https://cdrdv2.intel.com/v1/dl/getContent/671200.
[7] Deng et. al, “Temporal efficient training of spiking neural network via gradient re-weighting.” In ICLR 2022.
[8] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[9] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[10] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[11] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
Q5: Moreover, one of the claims is that the model can perform additions instead of multiplications, as in standard LIF neurons, supported by the explanation in Figure 4. However, in commercial hardware, the operation shown in Figure 4 would likely still be executed as a floating-point (FP) multiplication rather than an INT8 addition. I recommend including a reference to support the claim that commercial hardware performs multiplication of a floating-point number and a power of two as described, to substantiate this assertion.
A5: Thank you for raising this point. We appreciate your suggestion and have elaborated on this point with an additional reference in Appendix A.2 of the revised manuscript. For commercial hardware, it is worth noting that modern instruction set architectures, such as x86, include specific support for efficiently scaling floating-point numbers. For example, the x87 floating-point unit (FPU) provides specialized instructions like fscale [6], which are designed to scale floating-point numbers by powers of two, rather than performing a general floating-point multiplication as implied. This indeed suggests that the operation described in Figure 4 aligns with existing commercial hardware capabilities.
Our proposed approach is also designed to be adaptable to various hardware architectures, including custom ASICs or FPGAs, where the operation depicted in Figure 4 can indeed be implemented efficiently. As shown in prior work [4-5], leveraging systolic arrays for SNN inference with a small number of timesteps is feasible, and these methods can be extended to one-hot M-LIF SNNs. Specifically, this would involve storing the exponent of the spike lane output {}, where and represents the number of spike lanes, rather than traditional single-bit activations. Our experiments indicate that using up to four spike lanes () provides significant accuracy improvements, with diminishing returns beyond this point (as detailed in our ablation study, referenced in response to Q2 for Reviewer egRh). This approach introduces only a minimal memory overhead—1 to 2 additional bits per activation—compared to traditional SNNs, enabling both higher accuracy and reduced timesteps. The reduction in timesteps, in turn, scales linearly with the high-energy memory loads of FP32 weights and membrane potentials, contributing to overall efficiency.
References
[1] Yan, et al., “Reconsidering the energy efficiency of spiking neural networks.” In arXiV 2024.
[2] Miyashita et al., “Convolutional Neural Networks using Logarithmic Data Representation.” In arXiV 2016.
[3] Przewlocka-Rus et al., “Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks.” In tinyML 2022.
[4] Lee et al., "Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators." In ICCD 2020.
[5] Lee et al., "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation." In HPCA 2022.
[6] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A, Section 3-437. (2024). Intel Corporation. Available at: https://cdrdv2.intel.com/v1/dl/getContent/671200.
[7] Deng et. al, “Temporal efficient training of spiking neural network via gradient re-weighting.” In ICLR 2022.
[8] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[9] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[10] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[11] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
Q4: While this approach appears to yield good accuracy, the paper lacks a detailed discussion of why this is the case. For instance, would the model achieve similar results if values other than powers of two were used, such as powers of (where is an arbitrary value), or even a different set of numbers not based on any specific power? A thorough discussion and analysis of why your approach outperforms prior methods, beyond simply presenting the results, would strengthen the paper.
A4: Thank you for your thoughtful question. The choice of using powers of two (base ) is primarily motivated by the substantial efficiency benefits for hardware implementation, as detailed in Section 4 of our manuscript. The base-2 representation aligns seamlessly with commercial hardware that employs binary number representation for encoding values such as in the IEEE-754 standard for FP32.
In contrast, using an arbitrary base (e.g., ) introduces a non-integer scaling factor for exponents in FP32 (), leading to overhead in hardware processing. Powers of two allow for straightforward INT8 addition on the exponents during computation, as demonstrated in Appendix A.2 and Figure 4, which is approximately more efficient than the full-precision FP32 multiplications required in an ANN. Additionally, the hardware implementation of one-hot M-LIF thresholding benefits from this choice, as it enables the use of a simple priority encoder to efficiently determine the most significant non-zero bit of the accumulated membrane potential. This simplicity is lost when arbitrary bases are used.
The use of powers of two has been studied and adopted in prior ANN works [2-3] precisely because of its compatibility with hardware efficiency and binary number system representations. Regarding the observed improvements in accuracy compared to binary-activated SNNs, our approach expands the range of activations, which allows one-hot M-LIF SNNs to encode and learn more effectively within a single timestep. While this is an intuitive advantage, our experimental results spanning a comprehensive set of evaluations serve to empirically validate this design choice. We hope this explanation addresses your concerns.
References
[1] Yan, et al., “Reconsidering the energy efficiency of spiking neural networks.” In arXiV 2024.
[2] Miyashita et al., “Convolutional Neural Networks using Logarithmic Data Representation.” In arXiV 2016.
[3] Przewlocka-Rus et al., “Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks.” In tinyML 2022.
[4] Lee et al., "Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators." In ICCD 2020.
[5] Lee et al., "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation." In HPCA 2022.
[6] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A, Section 3-437. (2024). Intel Corporation. Available at: https://cdrdv2.intel.com/v1/dl/getContent/671200.
[7] Deng et. al, “Temporal efficient training of spiking neural network via gradient re-weighting.” In ICLR 2022.
[8] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[9] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[10] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[11] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
Q2: Regarding the energy efficiency claims in the weaknesses section, could the authors elaborate on the assumptions made for the energy comparisons? How might these comparisons change if memory access energy is included?
A2: Thank you for your insightful comments. We appreciate the importance of contextualizing energy comparisons and recognize that memory access can significantly impact energy consumption, particularly in SNNs. We will clarify the assumptions behind our energy comparisons. Below is a detailed response to your concerns.
Energy Comparisons and Assumptions
As noted in Section 4, energy modeling for SNNs has been performed using the methodology adopted from prior works in the field [4-8]. In the absence of a specified hardware architecture, we focus on computational energy as a metric that is often used in SNN papers. Specifically, we calculate computational energy based on the number of multiply-accumulate (MAC) or add-accumulate (AC) operations, the spike rates per layer, and the unit MAC/AC energy for a 45nm technology node, where pJ and pJ [9]. Therefore, the computational energies in Tables 1, 2, and 3 reflect both the number of MAC/AC operations and the average firing rates per layer, providing a reasonable estimate of energy consumed by non-zero operations.
Memory Access Energy
We acknowledge in our manuscript that memory access energy can be orders of magnitude higher than compute energy, particularly when dealing with multi-timestep SNNs. As you rightly pointed out, memory access energy plays a crucial role in energy consumption, especially since SNNs require more memory accesses than ANNs due to multi-timestep processing. This is indeed important when comparing SNNs and ANNs and motivates the goal of our unit-timestep one-hot M-LIF approach. By reducing the number of timesteps during SNN inference while improving accuracy and maintaining the low-spike rates of traditional SNNs, we can decrease the multi-timestep processing energy overhead. In our manuscript, we compare unit-timestep one-hot M-LIF SNNs with ANNs in Tables 1 and 2. There, the memory access patterns for ANNs are identical to those in unit-timestep one-hot M-LIF SNNs with the latter benefitting from a smaller memory footprint due to the lower bit-width activations.
Impact of Memory Access Energy
Although defining a precise memory architecture requires detailed assumptions regarding weight, input, and partial output reuse, we have included an analysis based on the memory energy model provided in Appendix A.8.2 of [8] for convolutional neural networks. This model allows us to estimate the impact of memory energy by incorporating per-layer spike rates, which we obtained from multiple prior works [4], [6], [8].
For our analysis, we focused on VGG16 trained on ImageNet, as prior works [4], [6], [8] provide detailed per-layer spike rates, and we used the same 45nm technology for energy modeling, allowing for consistent energy comparisons with previous works [4] and [6]. The table below illustrates the impact of memory access energy on the overall energy comparison.
| Method | S | T | Accuracy (%) | Comp. Energy (mJ) | Mem. Energy (mJ) | Total Energy (mJ) |
|---|---|---|---|---|---|---|
| ANN | / | / | 72.56 | 71.2 | 781 | 852.2 |
| Diet-SNN [7] | 1 | 5 | 69 | 6.09 | 58.8 | 64.89 |
| Temporal Pruning [5] | 1 | 1 | 69 | 2.89 | 15.5 | 18.39 |
| BANN ICLR24 [4] | 1 | 1 | 68 | 3.4 | 17.1 | 20.5 |
| One-hot M-LIF SNN | 3 | 1 | 71.05 | 3.73 | 22.1 | 25.83 |
As shown above, one-hot M-LIF SNNs consume less total energy than ANNs, while maintaining higher accuracy (up to %) than prior unit-timestep SNN works. We highlight the advantage of using unit-timestep processing in this table since we achieve higher accuracy compared to [7] while consuming less total energy, the majority of which stems from the memory energy overhead of multi-timestep (T=5) processing.
We hope this provides a clearer and more contextualized view of our energy comparisons. Finally, we have added this analysis for memory access energy impact in Appendix A.5 of our manuscript where we also include a graph illustrating a comparison of per layer spike rates for VGG16 ImageNet against [4], [6], [8].
Q1: The manuscript lacks comparisons with other methods that use graded (weighted) spikes. The idea of utilizing multibit outputs (one-hot encoded or otherwise) is not novel, as it has been previously explored, for instance in [1], and is a feature available in neuromorphic hardware like Loihi 2 [2], as well as in commonly used SNN libraries like snnTorch [3]. I recommend that the authors provide a discussion highlighting the unique features of the proposed model in comparison to these existing methods and tools.
A1: Thank you for your thoughtful feedback and for bringing relevant works to our attention. We appreciate the opportunity to clarify the distinction between our approach and existing methods that use graded (weighted) spikes, as well as to highlight the unique features of our proposed model. We have added a discussion in Section 2.5 that compares our work with the methods mentioned.
In AAAI22 (Figure 4) [1], the authors essentially propose uniform activation quantization as a way to alleviate the burden of multi-timestep processing while preserving better activation precision per timestep. However, this approach essentially breaks the multiplication-free property of traditional SNNs by introducing uniform quantization, which leads to an increase in spike rates. This is a key difference from our approach as explained in section 4 of our manuscript. Our one-hot M-LIF neuron utilizes a one-hot (power-of-two) encoding of activations, ensuring that only one of the S spike lanes is activated per timestep. This constraint is applied and learned during training, unlike in [1], where no similar restriction is enforced. The outputs from our neurons correspond to powers of two, enabling efficient computation. Specifically, these powers-of-two outputs allow us to modify the exponents of FP32 weights through a single INT8 addition during membrane potential updates, making our approach energy-efficient and maintaining the multiplication-free property of SNNs. In contrast, methods such as those in [1] that use uniformly quantized activations (e.g., 3 or 4 bits) still require multiplications, which can be less efficient in terms of both computation and energy consumption.
While works [2] and [3] support multi-bit SNN operations with broader functionality compared to our proposed one-hot encoding, we would like to emphasize the unique computational efficiency offered by one-hot M-LIF neurons. The output range of one-hot M-LIF neurons represents a subset of uniformly quantized activations, so our approach can provide distinct efficiency benefits, particularly when hardware is specifically designed to leverage these properties, as discussed in our paper. Nonetheless, the one-hot M-LIF SNNs can be effectively mapped onto existing neuromorphic hardware architectures, thereby demonstrating both practical applicability and strong alignment with current technological advancements.
We hope this clarifies the unique aspects of our model in comparison to existing approaches and tools.
References
[1] Ponghiran et al., “Spiking Neural Networks with Improved Inherent Recurrence Dynamics for Sequential Learning.”. In AAAI 2022.
[2] M. Davies et al., "Advancing Neuromorphic Computing With Loihi: A Survey of Results and Outlook," in Proceedings of the IEEE 2021.
[3] https://snntorch.readthedocs.io/en/latest/index.html
[4] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[5] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[6] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[7] Wang et al., "MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds." In ArXiv 2023.
[8] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[9] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
[10] Feng et al., "Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper Directly-Trained Spiking Neural Networks." In IJCAI 2022.
This paper proposes a novel one-hot multi-level leaky integrate-and-fire (M-LIF) neuron model, which represents activations as a set of one-hot, binary-weighted spike lanes. Experimental results on various datasets demonstrate that the M-LIF model achieves a better trade-off between accuracy and energy efficiency compared to other methods.
优点
- The paper is well-organized and utilizes some formulas and figures to help understand.
- The proposed spike lane concept is innovative, as it enhances the information in activation maps while maintaining the energy efficiency characteristic of SNNs. The authors validate the effectiveness of the proposed model through experiments on both static and dynamic datasets. Overall, the M-LIF model achieves the best trade-off between accuracy and energy efficiency.
- The paper also compares the M-LIF model with LQ-ANN and demonstrates that M-LIF generally performs better with some experiments.
缺点
-
Given that the proposed spike lanes share similarities with quantized (S-bit activation, S is the number of spike lanes) models, the authors should include additional comparisons, both in terms of accuracy and FLOPs/energy consumption, with state-of-the-art quantization approaches such as [1][2][3]. This would provide a more comprehensive and fair evaluation of the method's performance.
-
Although the authors provide some results for models with varying numbers of spike lanes, an additional ablation study would be beneficial. Specifically, the authors could conduct experiments with a fixed number of time steps, varying the number of spike lanes (S) from 1 to an upper bound (e.g., 8), and plot the accuracy versus S. This would help identify the point of saturation. Additionally, a complementary study could be performed by fixing the number of spike lanes and varying the number of time steps from 1 to an upper bound (e.g., 10). Such an analysis would offer a clearer comparison of the impact of spike lanes versus the time steps on model performance.
[1] Jaehyeon Moon et al, Instance-Aware Group Quantization for Vision Transformers, CVPR 2024
[2] Yefei He et al, BiViT: Extremely Compressed Binary Vision Transformers, ICCV 2023
[3] Yanjing Li et al, Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer, NeurIPS, 2022
问题
For the DVS-CIFAR10 dataset, as the authors add an additional multi-level input layer encoding, could the authors provide a latency comparison between the M-LIF model and other existing models?
Q3: For the DVS-CIFAR10 dataset, as the authors add an additional multi-level input layer encoding, could the authors provide a latency comparison between the M-LIF model and other existing models?
A3: Thank you for your insightful question regarding the latency comparison between our one-hot M-LIF SNN model and existing models, specifically in relation to the DVS-CIFAR10 dataset.
Both traditional SNNs and our one-hot M-LIF SNNs require a pre-processing step done once per inference to convert the asynchronous event stream into frames prior to running model inference. This step is necessary for both models to enable effective processing of the input data. In our approach, we use a multi-level input layer encoding that allows us to reduce the number of timesteps required for inference. Specifically, we are able to reduce the number of timesteps from to , which effectively halves the overall latency. With this reduction, our model achieves an accuracy of % using spike lanes, while traditional SNNs using timesteps achieve a slightly lower accuracy of %, as reported in Table 3 of our submission. It is important to note that the pre-processing overhead introduced by the multi-level input layer encoding is minimal compared to the latency incurred by performing multi-timestep inference. As a result, the overall latency of our model remains competitive, with the benefit of achieving higher accuracy at a lower number of timesteps.
In summary, while the additional multi-level input layer encoding introduces a minor overhead, its impact on latency is negligible when compared to the significant latency savings gained from reducing the number of timesteps. Therefore, our one-hot M-LIF SNN model offers a more efficient solution, achieving comparable or slightly better performance with reduced computational delay.
References
[1] Jaehyeon Moon et al., “Instance-Aware Group Quantization for Vision Transformers.” In CVPR 2024
[2] Yefei He et al., “BiViT: Extremely Compressed Binary Vision Transformers.” In ICCV 2023
[3] Yanjing Li et al., “Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer.” In NeurIPS, 2022
[4] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[5] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[6] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
I appreciate the authors' detailed response, particularly the additional experiments involving the increased value of S. I agree that the paper presents good empirical results. However, the concept of quantizing activations using one-hot (power-of-two) encoding closely aligns with existing work on quantization. While I acknowledge the additional efforts, such as the modifications to the surrogate gradient, to integrate this approach with SNNs, I believe the overall novelty of the work may not fully meet the standards typically expected at ICLR. I am sorry, but I decided to maintain my original score.
Q2: Although the authors provide some results for models with varying numbers of spike lanes, an additional ablation study would be beneficial. Specifically, the authors could conduct experiments with a fixed number of time steps, varying the number of spike lanes (S) from 1 to an upper bound (e.g., 8), and plot the accuracy versus S. This would help identify the point of saturation. Additionally, a complementary study could be performed by fixing the number of spike lanes and varying the number of time steps from 1 to an upper bound (e.g., 10). Such an analysis would offer a clearer comparison of the impact of spike lanes versus the time steps on model performance
A2: Thank you for your valuable suggestion regarding the ablation study. We appreciate your interest in further understanding the effects of varying spike lanes and time steps on model performance.
In response to your suggestion, we have conducted an ablation study where we varied the number of spike lanes () while keeping the number of timesteps () fixed at for VGGSNN on DVS-CIFAR10, and we provide the results below. Our findings indicate that as the number of spike lanes increases from to , there is a significant improvement in accuracy, after which performance saturates:
| T | S | Accuracy (%) |
|---|---|---|
| 3 | 1 | 74.7 |
| 2 | 78.6 | |
| 3 | 79.8 | |
| 4 | 82.5 | |
| 5 | 82.5 |
Additionally, when we increase the number of timesteps to , we see a similar performance saturation around timesteps. We also note that the accuracy improvement due to spike lanes becomes less pronounced, but the overall accuracy ceiling improves as shown below:
| T | S | Accuracy (%) |
|---|---|---|
| 5 | 1 | 78.0 |
| 2 | 81.5 | |
| 3 | 83.0 | |
| 4 | 83.3 | |
| 5 | 82.9 |
From these results, we conclude that both timesteps and spike lanes contribute to better model performance. We also note that multi-timestep processing introduces significant energy overhead, particularly with respect to memory access. As noted in previous works [5], memory access energy for each timestep can be up to higher than the energy required for FP32 additions [6]. This increase in memory energy makes traditional SNNs less efficient than ANNs when the performance gains from additional timesteps do not outweigh the energy costs. Nevertheless, we include additional results below to show one-hot M-LIF SNN accuracy when , showing improved accuracy (but at the energy cost of multi-timestep).
| Dataset | Architecture | S | T | Accuracy (%) |
|---|---|---|---|---|
| CIFAR10 | ResNet20 | 2 | 1 | 92.61 |
| 2 | 93.10 | |||
| 3 | 93.68 | |||
| CIFAR100 | VGG16 | 2 | 1 | 71.23 |
| 2 | 72.68 | |||
| 3 | 1 | 72.59 | ||
| 2 | 73.12 |
We hope this additional analysis addresses your request and further clarifies the trade-offs between spike lanes and timesteps. We have also added this ablation study to Appendix A.4 of our manuscript.
References
[1] Jaehyeon Moon et al., “Instance-Aware Group Quantization for Vision Transformers.” In CVPR 2024
[2] Yefei He et al., “BiViT: Extremely Compressed Binary Vision Transformers.” In ICCV 2023
[3] Yanjing Li et al., “Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer.” In NeurIPS, 2022
[4] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[5] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[6] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
Q1: Given that the proposed spike lanes share similarities with quantized (S-bit activation, S is the number of spike lanes) models, the authors should include additional comparisons, both in terms of accuracy and FLOPs/energy consumption, with state-of-the-art quantization approaches such as [1][2][3]. This would provide a more comprehensive and fair evaluation of the method's performance
A1: Thank you for your thoughtful suggestion and for highlighting works [1-3]. We agree that a comprehensive comparison is essential for a robust evaluation of the method. However, it is important to clarify the distinctions between our work and [1-3].
Papers [1] and [3] employ uniform quantization techniques, whereas our approach utilizes one-hot encoding, a critical difference that provides additional hardware benefits and preserves the multiplication-free property of spiking neural networks (SNNs). This distinction is significant because while uniform quantization introduces multiplications—which increase computational complexity and require additional hardware resources—our method avoids this, offering a more efficient solution for SNNs.
Paper [2] focuses on quantizing weights but does not address activation quantization, which is a core aspect of our work. Our approach centers on the quantization of activations through one-hot (power-of-two) encoding, which necessitates modifications to both the neuron model (one-hot M-LIF) and the surrogate gradients used during training. This allows us to achieve energy-efficient, high-accuracy performance in low-latency, memory-efficient regimes (i.e., unit-timestep processing), without sacrificing the fundamental property of SNNs that distinguishes them from traditional artificial neural networks (ANNs)—the ability to perform computations without multiplications.
Furthermore, we believe that comparing our method directly with the works you mentioned—[1], [2], and [3]—is not entirely aligned with the scope of our paper. These works focus on optimizing quantization for vision transformer architectures, specifically minimizing the impact of quantization on the attention mechanism. In contrast, our work is focused on optimizing SNNs, particularly through the novel application of one-hot encoding to activations, which enhances both the accuracy and energy efficiency of SNNs.
In our evaluation, we compare our approach against state-of-the-art spike-driven transformer architectures [4], demonstrating significant improvements in performance, especially for large and challenging datasets such as ImageNet. Additionally, we have conducted a direct comparison between unit-timestep one-hot M-LIF SNNs and log-quantized artificial neural networks (LQ-ANNs). Our results show that one-hot M-LIF SNNs either match or outperform LQ-ANNs in both accuracy and inference energy efficiency. Specifically, for CIFAR-10, the performance and energy efficiency are comparable, while for CIFAR-100, our method is up to 54% more energy-efficient than LQ-ANNs, with similar accuracy. These findings, particularly the superior scalability of one-hot M-LIF SNNs on large datasets such as ImageNet, underline the strengths of our approach and the distinct contributions we make to the field of SNNs.
We hope this explanation clarifies the distinctions between our work and the approaches you suggested and highlights the specific contributions of our research to the field of spiking neural networks.
References
[1] Jaehyeon Moon et al., “Instance-Aware Group Quantization for Vision Transformers.” In CVPR 2024
[2] Yefei He et al., “BiViT: Extremely Compressed Binary Vision Transformers.” In ICCV 2023
[3] Yanjing Li et al., “Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer.” In NeurIPS, 2022
[4] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[5] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[6] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
This paper introduces the one-hot multi-level leaky integrate-and-fire (M-LIF) neuron, aimed at balancing accuracy and energy efficiency in spiking neural networks (SNNs).
优点
S1. It is interesting to explore a new dimension (beyond the time step) for achieving an accuracy-latency trade-off.
S2. The experiments are quite comprehensive, and the source code is valuable.
缺点
W1. The novelty of employing multi-level thresholds in SNNs appears limited, as similar concepts have been introduced in related works [1,2]. What distinguishes MLIP from these existing approaches?
[1] Wang et al., "MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds." ArXiv 2023.
[2] Feng et al., "Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper Directly-Trained Spiking Neural Networks." IJCAI 2022.
W2. How does MLIF differ from quantization networks? Specifically, does an MLIF with operate equivalently to a K-bit quantized neural network (QNN) with quantized activation values?
W3. Table 1 shows that MLIF significantly increases computational energy consumption, which poses a drawback for MLIP. For instance, on CIFAR10, although MLIP-Transformer with and achieves the highest accuracy (only 0.3% higher than Spike-driven Transformer with and ), its computational energy usage is considerably higher. This trend is similarly observed on CIFAR100 and ImageNet.
问题
The questions I am most concerned about are raised in the Weakness section, particularly W1 and W2.
Q2: How does M-LIF differ from quantization networks? Specifically, does an M-LIF with S=K operate equivalently to a K-bit quantized neural network (QNN) with quantized activation values?
A2: Thank you for this insightful question. While -bit quantized neural networks represent activations using discrete values, they do not naturally extend to neuromorphic tasks, such as dynamic vision image classification. In contrast, a one-hot M-LIF spiking neural network (SNN) with employs power-of-two values to represent activations, while also extending to neuromorphic tasks, as demonstrated by our experiments on DVS-CIFAR10.
In our paper, we elaborate on these distinctions and similarities in Sections 2.4 and 3.2, where we discuss the relationship between log-quantized artificial neural networks (LQ-ANNs) and unit-timestep M-LIF SNNs. Additionally, Section 5.1.3 presents our evaluation comparisons against log-quantized ANNs. The key differences between LQ-ANNs and one-hot M-LIF SNNs are as follows: (1) during inference, neuron dynamics in M-LIF differ when due to their extensibility to neuromorphic tasks, and (2) during training, we learn threshold and leakage parameters for static tasks and employ back-propagation through time when .
Q3: Table 1 shows that MLIF significantly increases computational energy consumption, which poses a drawback for MLIP. For instance, on CIFAR10, although MLIP-Transformer with S=3 and T=4 achieves the highest accuracy (only 0.3% higher than Spike-driven Transformer with S=1 and T=4), its computational energy usage is considerably higher. This trend is similarly observed on CIFAR100 and ImageNet
A3: Thank you for bringing attention to the computational energy consumption in one-hot M-LIF SNNs compared to traditional SNNs. Your observation is valid concerning the increased computational energy of the spike-driven transformer evaluated on CIFAR-10 when comparing scenarios with , and , . Here, we acknowledge that while the one-hot constraint of the M-LIF SNN guarantees an upper bound on the number of spikes (ensuring that it will not exceed the worst-case scenario of a traditional SNN), there is no explicit guarantee of lower computational energy consumption compared to a traditional SNN with the same architecture and number of timesteps.
However, it is important to contextualize this increase in computational energy within the broader trade-off between computational and memory access energy. As noted, the memory access energy is typically much higher than the computational energy. For example, accessing a 8kB cache in 45nm technology consumes approximately 10pJ per 32-bit access, compared to just 0.9 pJ for an addition operation [4]. Multi-timestep processing in traditional SNNs incurs a significant increase in memory energy overhead, scaling linearly with the number of timesteps. In contrast, our one-hot M-LIF approach mitigates this overhead by reducing the reliance on multi-timestep processing, making it a more favorable solution from a holistic energy perspective.
Additionally, the observed increase in computational energy is not universally present across all datasets and architectures. For instance, in ResNet-20 on CIFAR10, our approach demonstrates lower computational energy consumption compared to higher-timestep SNNs (excluding memory access energy). Similarly, for the spike-driven transformer on ImageNet at , , we achieve a higher accuracy (% vs %) while consuming less computational energy than its higher-timestep counterpart. These examples underscore the broader benefits of our approach, particularly in scenarios where memory energy plays a dominant role in overall energy consumption. We are happy to discuss any further clarifications regarding these findings and trade-offs.
References
[1] Wang et al., "MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds." In ArXiv 2023.
[2] Feng et al., "Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper Directly-Trained Spiking Neural Networks." In IJCAI 2022.
[3] Yao, et al., “Spike-driven transformer.” In NeurIPS 2023.
[4] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
Q1: The novelty of employing multi-level thresholds in SNNs appears limited, as similar concepts have been introduced in related works [1,2]. What distinguishes M-LIF from these existing approaches?
A1: Thank you for highlighting the prior works on multi-level thresholds in SNNs. We have now added a paragraph in Section 2.5 of our submission to better contextualize these methods relative to our work. Our proposed one-hot M-LIF neuron model addresses critical computational challenges inherent in approaches [1,2]. We will outline these distinctions below.
In arXiv23 [1], the authors introduce parallel and cascade multi-threshold (MT) neuron models to enhance activation precision per timestep and mitigate the burden of multi-timestep processing. The parallel-MT model uses multiple threshold values, summing spikes from all firing lanes at each timestep, while the cascade-MT model processes them sequentially. These approaches utilize uniform activation quantization, which breaks the multiplication-free characteristic of SNNs and significantly raises spike rates. In contrast, our one-hot M-LIF neuron maintains a single threshold value, leveraging its power-of-two multiples to fire only one spike lane per timestep. This one-hot constraint is incorporated and learned during training—a crucial distinction not found in [1]. The resulting neuron outputs, represented as powers-of-two, allow for efficient FP32 weight exponent updates via a single INT8 addition, whereas [1]’s uniformly quantized outputs (e.g., 4-bit) require multiplications. Furthermore, our evaluations span both convolutional neural networks and more complex, high-performing spike-driven transformer architectures. Unlike [1], we also compare with log-quantized ANNs, which exhibit similarities to our one-hot M-LIF SNNs when operating with a unit timestep. The authors of [1] limit their evaluations to VGG-9 on CIFAR100, using numbers from a 45nm technology for energy estimates. A direct comparison demonstrates that our approach achieves superior accuracy gains with less computational energy overhead:
| Neuron Model | Dataset | Architecture | T | Accuracy (%) | Comp. Energy (mJ) |
|---|---|---|---|---|---|
| LIF [1] | CIFAR100 | VGG-9 | 1 | 72.63 | 190 |
| Parallel-MT (2-bit uniformly quantized activation) [1] | CIFAR100 | VGG-9 | 1 | 73.89 | 440 |
| Cascade-MT (4-bit uniformly quantized activation) [1] | CIFAR100 | VGG-9 | 1 | 74.80 | 740 |
| LIF [3] | CIFAR100 | Spike-driven Transformer-2-512 | 1 | 75.8 | 0.221 |
| One-hot M-LIF (3 spike lanes, activations ∈ {0,1,2,4}) | CIFAR100 | Spike-driven Transformer-2-512 | 1 | 78.2 | 0.478 |
IJCAI22 [2] similarly introduces uniform activation quantization, compromising the multiplication-free property of SNNs. Unlike our work, it does not impose constraints on input/output encoding during training, which is critical for one-hot behavior that enables preserving activation precision per timestep while maintaining computational simplicity. [2] employs individual thresholds that are not multiples of each other, resulting in more parameters. From an evaluation perspective, [2] focuses on residual networks without reporting results on larger datasets like ImageNet or on SOTA architectures, such as spike-driven transformers. [2] also does not offer comprehensive energy analyses to estimate the overhead of their approach. Below, we provide a comparison illustrating our method's superior performance on CIFAR10 and DVS-CIFAR10 datasets:
| Neuron Model | Dataset | Architecture | T | Accuracy (%) |
|---|---|---|---|---|
| MLF [2] | CIFAR10 | MLF(K=3) + spikingDS-ResNet | 4 | 94.25 |
| One-hot M-LIF (3 spike lanes, activations ∈ {0,1,2,4}) | CIFAR10 | Spike-driven Transformer-2-512 | 1 | 95.4 |
| MLF [2] | DVS-CIFAR10 | MLF(K=3) + spikingDS-ResNet | 10 | 70.36 |
| One-hot M-LIF (4 spike lanes, activations ∈ {0,1,2,4,8}) | DVS-CIFAR10 | VGGSNN | 10 | 84.7 |
I agree with the other reviewers that the novelty of this paper is limited. I greatly appreciate the authors' efforts in their rebuttal. However, I have decided to maintain my original score.
This paper proposes a one-hot multi-level leaky integrate-and-fire (M-LIF) neuron model to optimize the accuracy-latency trade-off of deep spiking neural networks. The proposed M-LIF model represents the inputs and outputs of hidden layers as a set of one-hot binary-weighted spike lanes. Experimental results demonstrate superior accuracy in image classification results with only one time step.
优点
- The proposed one times-step SNNs can significantly improve the energy-efficiency by avoiding repeated memory accesses of the membrane potential and reduced number of spikes.
- The accuracies obtained by the proposed method on ImageNet is good.
缺点
-
While the paper has moderately strong empirical results, the proposed method is not much different from bit-serial activation quantized neural networks [1], where the spike lanes can be regarded as bits sequentially arriving and accumulating at each neuron. The overhead of the time steps is thus shifted to the spike lanes.
-
I understand there is one-hot encoding in the spike lanes, however, this contribution alone may not be sufficient to warrant acceptance in ICLR. This is also similar to power-of-two quantization [2] that has been extensively explored in several quantization works.
-
I see some empirical comparisons of this method with log-quantized ANNs, and different modes of operations during training. However, during inference, the operations are almost similar. Even during training, I am not sure if the differences significantly affect the accuracies.
-
Several prior SNN works [3-4] have explored the interplay between the ANN activation bit-width and SNN time steps (that bases the foundation of this work), however, they have been ignored in this work.
-
I see only one comparison with one-time-step SNNs in Table 1. It would be better to add more comparisons, including the results from [4].
[2] https://openreview.net/pdf?id=BkgXT24tDS
[3] "Are Conventional SNNs Really Efficient? A Perspective from Network Quantization", CVPR 2024
[4] "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?", ICLR 2024
问题
Please respond to the weaknesses mentioned above.
Q3: I see some empirical comparisons of this method with log-quantized ANNs, and different modes of operations during training. However, during inference, the operations are almost similar. Even during training, I am not sure if the differences significantly affect the accuracy metrics
A3: Thank you for your thoughtful feedback regarding the comparison between log-quantized ANNs (LQ-ANNs) and our unit-timestep one-hot M-LIF SNNs. While it is true that layer computation during inference may appear structurally similar, one critical distinction lies in the degree of sparsity induced by the thresholds and weights learned during training. These differences in sparsity have a direct impact on the computational energy, aligning with our empirical observations. The variations in thresholds and weights between LQ-ANNs and M-LIF SNNs arise from their distinct training processes. Specifically, one-hot M-LIF SNNs are trained in a single-phase approach, whereas LQ-ANN training is typically conducted in two phases per epoch. The first phase involves recording a percentile value for each layer’s input activation distribution using the full training dataset and full-precision inference. The second phase employs a straight-through estimator to approximate the gradient with respect to quantized activations. While there are certain similarities between one-hot M-LIF SNNs with unit timestep () and LQ-ANNs, key differences remain. As detailed in Equations (9-10) in Section 3.2 of our submission, these differences include variations in firing thresholds and final output value ranges, scaled by . Moreover, one-hot M-LIF SNNs offer the capability for the extension to sequential processing (i.e., ). Another significant distinction is the use of surrogate gradients in one-hot M-LIF SNNs during training. We have adapted these surrogate gradients from traditional SNN research to fit the specific characteristics of the one-hot M-LIF neuron model. This adaptation further differentiates our approach, enhancing the effectiveness and accuracy of our model relative to LQ-ANNs.
References:
[1] Lam et al. "Precision Batching: Bitserial Decomposition for Efficient Neural Network Inference on GPUs." In PACT 2021.
[2] Yuhang et al., “Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks." In ICLR 2020.
[3] Shen et al., "Are Conventional SNNs Really Efficient? A Perspective from Network Quantization." In CVPR 2024.
[4] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[5] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[6] Yao et al., “Spike-driven transformer.” In NeurIPS 2023.
[7] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[8] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
Q1: While the paper has moderately strong empirical results, the proposed method is not much different from bit-serial activation quantized neural networks [1], where the spike lanes can be regarded as bits sequentially arriving and accumulating at each neuron. The overhead of the time steps is thus shifted to the spike lanes
A1: Thank you for highlighting this relevant aspect of quantized activation artificial neural networks (ANNs). We appreciate your reference to PACT21 [1], which indeed provides valuable insights into accelerating quantized neural networks using a bit-serial approach for weights and activations on GPUs. However, it is important to note that PACT21 focuses on accelerating quantized ANNs without delving into techniques for improving quantization-aware training. More specifically, it does not explore this learning aspect with spiking neural networks (SNNs).
Our work centers on enhancing the accuracy and efficiency of SNNs, which are inherently distinct from ANNs due to their use of the leaky integrate-and-fire (LIF) neuron model and their applicability to neuromorphic tasks. Our primary contribution lies in the one-hot (power-of-two) encoding of activations during both training and inference of SNNs, necessitating modifications to the neuron model (M-LIF) and the surrogate gradients employed during training. The M-LIF model achieves higher accuracy than conventional SNNs in low-latency, memory-efficient regimes (unit timestep) while preserving the multiplication-free nature of SNNs. This stands in contrast to alternative multi-threshold approaches that utilize uniform activation quantization, leading to increased spike rates and the reintroduction of multiplications. Additionally, unlike the bit-serial processing approach highlighted in [1], the spike lanes in M-LIF SNNs can be processed in parallel with a single addition to the weight exponent, as detailed in Section 4 of our submission. This parallelization is possible due to the predetermined one-hot nature of the activations, ensuring that only one "bit-position" (spike lane) is non-zero at any given timestep.
References:
[1] Lam et al. "Precision Batching: Bitserial Decomposition for Efficient Neural Network Inference on GPUs." In PACT 2021.
[2] Yuhang et al., “Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks." In ICLR 2020.
[3] Shen et al., "Are Conventional SNNs Really Efficient? A Perspective from Network Quantization." In CVPR 2024.
[4] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[5] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[6] Yao et al., “Spike-driven transformer.” In NeurIPS 2023.
[7] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[8] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
Q5: I see only one comparison with one-time-step SNNs in Table 1. It would be better to add more comparisons, including the results from [4].
A5: Thank you for highlighting the need for additional comparisons. In response, we have updated our submission to include comparisons with the results from [4] for iso-architecture SNNs in Table 1. It is worth noting that Table 2 in ICLR24 [4] provides only one comparison with a unit-timestep SNN, specifically referring to an older (2021) arXiv version of what later became ECCV22 [5]. In contrast, our paper compares against the most recent and up-to-date reference, which outperforms the results reported in ICLR24 [4] for the VGG16 model on ImageNet. Finally, our paper includes an evaluation of spike-driven transformers [6], the state-of-the-art architecture for SNNs, which is absent from the evaluations presented in ICLR24 [4]. By focusing on up-to-date and state-of-the-art comparisons, we aim to present a comprehensive and relevant evaluation of our proposed approach.
References:
[1] Lam et al. "Precision Batching: Bitserial Decomposition for Efficient Neural Network Inference on GPUs." In PACT 2021.
[2] Yuhang et al., “Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks." In ICLR 2020.
[3] Shen et al., "Are Conventional SNNs Really Efficient? A Perspective from Network Quantization." In CVPR 2024.
[4] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[5] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[6] Yao et al., “Spike-driven transformer.” In NeurIPS 2023.
[7] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[8] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
Thanks for your detailed response. However, I still think the idea of the paper is similar to recently proposed quantization works, as I mentioned above, and there are hardly any novel insights to support acceptance at a venue like ICLR. Apologies for not being able to increase the score.
Q4: Several prior SNN works [3-4] have explored the interplay between the ANN activation bit-width and SNN time steps (that forms the foundation of this work), however, they have been ignored in this work.
A4: Thank you for pointing out these prior works. We appreciate the opportunity to clarify how our approach builds upon and differs from the methods discussed in [3-4]. We have now added a paragraph in Section 2.5 of our submission to better contextualize these methods relative to our work.
The authors of [4] primarily focus on binary-activation ANNs, which are equivalent to unit-timestep SNNs [5] and which we have compared to in Table 1 of our submission. In contrast, [3] introduces uniform activation quantization in SNNs to preserve higher activation precision per timestep while offloading the burden of multi-timestep processing. However, this approach compromises the multiplication-free property of SNNs. Our proposed one-hot M-LIF neuron model diverges from these approaches by employing a one-hot (power-of-two) encoding of activations, which allows only one of the spike lanes to fire per timestep. This one-hot constraint is trained and maintained throughout training—a critical distinction not present in [3-4]. Moreover, the output activations of our neuron correspond to powers-of-two and are efficiently used to update the membrane potential by modifying FP32 weight exponents using a single INT8 addition, as opposed to requiring multiplications like the 4-bit uniformly quantized activations described in [3]. Regarding evaluations, CVPR24 [3] emphasizes joint quantization of both activations and weights for distinct neural network architectures from those in our study, making direct comparisons challenging. Weight quantization, as explored in [3], is orthogonal to our work and could complement one-hot M-LIF SNNs for potential further energy optimization. While ICLR24 [4] does not demonstrate the applicability of its method on spike-driven transformer architectures [6], which is the state-of-the-art SNN architecture backbone, we were able to draw comparisons using VGG16 on ImageNet. ICLR24 [4] uses a comparable computational energy model to the one used in our study and also provides memory energy metrics and spike rates, facilitating a comparative analysis. Since our submission already compares results using a 45nm technology baseline [8], we recalculated the energy consumption metrics of VGG16 from [4] using the same 45nm values from [8]. Additionally, one of ICLR24 [4]'s key contributions is reducing timesteps down to 1 without iterative temporal pruning, as employed in ECCV22 [5]. Similarly, our method achieves unit-timestep SNN training without temporal pruning while also offering a superior overall accuracy-energy tradeoff compared to standard ANNs as shown in the table below. One-hot M-LIF SNNs achieve energy efficiency that is lower than ANNs, while maintaining higher accuracy (up to ) than prior unit-timestep SNN works. Furthermore, we highlight the advantage of using unit-timestep processing in this table since we achieve higher accuracy compared to [7] while consuming less total energy, the majority of which stems from the memory energy overhead of multi-timestep () processing.
| Method | S | T | Accuracy (%) | Comp. Energy (mJ) | Mem. Energy (mJ) | Total Energy (mJ) |
|---|---|---|---|---|---|---|
| ANN | / | / | 72.56 | 71.2 | 781 | 852.2 |
| Diet-SNN [7] | 1 | 5 | 69 | 6.09 | 58.8 | 64.89 |
| Temporal Pruning [5] | 1 | 1 | 69 | 2.89 | 15.5 | 18.39 |
| BANN ICLR24 [4] | 1 | 1 | 68 | 3.4 | 17.1 | 20.5 |
| One-hot M-LIF SNN | 3 | 1 | 71.05 | 3.73 | 22.1 | 25.83 |
See references in previous comments
Q2: I understand there is one-hot encoding in the spike lanes, however, this contribution alone may not be sufficient to warrant acceptance in ICLR. This is also similar to power-of-two quantization [2] (https://openreview.net/pdf?id=BkgXT24tDS), that has been extensively explored in several quantization works
A2: Thank you for referencing ICLR20 [2]. We appreciate the opportunity to clarify the differences between our proposed approach and additive power-of-two quantization as described in the cited work. In ICLR20 [2], quantization levels are represented as sums of powers-of-two, which consequently introduces additional add operations during multiplication with inputs at these quantization levels. In contrast, our proposed one-hot M-LIF SNNs ensure only single (one-hot) power-of-twos inputs/outputs, maintaining a simpler and more efficient operation. Moreover, ICLR20 [2] does not explore the application of one-hot (power-of-two) encoded input/output representations within the domain of spiking neural networks (SNNs) or how these representations can enable improved accuracy-latency and energy trade-offs.
We have acknowledged prior work on power-of-two quantization in our submission, specifically in Sections 3.2 and 2.4, where we discuss log-quantized artificial neural networks (LQ-ANNs). Our comparisons illustrate key distinctions between LQ-ANNs and one-hot M-LIF SNNs, particularly in terms of training approaches and applicability to neuromorphic tasks such as dynamic vision classification. These differences include: (1) during inference, neuron dynamics differ when the time step due to the extensibility of M-LIF neurons to neuromorphic tasks, and (2) during training, we leverage back-propagation through time for while also learning threshold and leakage parameters for static tasks.
To the best of our knowledge, this work is the first to introduce a one-hot encoding scheme in SNNs. The one-hot M-LIF neuron we propose retains the multiplication-free property characteristic of SNNs, as it efficiently modifies the exponents of FP32 weights using a single INT8 addition during membrane potential updates. Moreover, due to the one-hot constraint, the overall spiking rate (and consequently the number of additions) per layer per timestep in one-hot M-LIF SNNs is not necessarily higher than that of conventional SNNs, even with multiple spiking lanes per neuron. This allows one-hot M-LIF SNNs to extract greater learning potential within a single timestep while maintaining comparable computational complexity and energy efficiency relative to traditional SNNs. This is evident in how one-hot M-LIF spike-driven transformers in Table 1 of our manuscript can preserve accuracy better than traditional spike-driven transformers when reducing the number of timesteps from to . Integrating the one-hot constraint is nontrivial, as it necessitates modifications to both the neuron model and the surrogate gradients. We have demonstrated its efficacy through extensive evaluations across various neural network architectures and datasets, including both static and dynamic tasks, as detailed in our submission.
References:
[1] Lam et al. "Precision Batching: Bitserial Decomposition for Efficient Neural Network Inference on GPUs." In PACT 2021.
[2] Yuhang et al., “Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks." In ICLR 2020.
[3] Shen et al., "Are Conventional SNNs Really Efficient? A Perspective from Network Quantization." In CVPR 2024.
[4] Datta et al., "Can we get the best of both Binary Neural Networks and Spiking Neural Networks for Efficient Computer Vision?" In ICLR 2024.
[5] Chowdhury et al., “Towards ultra low latency spiking neural networks for vision and sequential tasks using temporal pruning.” In ECCV 2022.
[6] Yao et al., “Spike-driven transformer.” In NeurIPS 2023.
[7] N. Rathi et al., "DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization." In TNNLS 2023.
[8] Horowitz, “1.1 computing’s energy problem (and what we can do about it).” In ISSCC 2014.
We once again thank all reviewers for their commitment and valuable feedback. The reviewers' comments provided great insights to improve the manuscript. We were delighted that the reviewers appreciated the strength and significance of our unit-timestep one-hot MLIF SNN ImageNet energy efficiency and accuracy results (bytk, 7r4n), the comprehensive and thorough experimental validation alongside valuable source code (NG6F, egRh, SBrN), the innovative one-hot spike lane concept for enhanced accuracy-latency tradeoff (NG6F, egRh), and the organization and clarity of the manuscript (egRh, SBrN).
Revisions to the uploaded manuscript are highlighted in red, we summarize the major modifications as follows:
- (bytk, NG6F, egRh, 7r4n, SBrN) Discussion on uniqueness with respect to other neuron-models (multi-spike, multi-threshold, burst) [Section 2.5]
- (bytk) Comparison with additional unit-timestep SNN work [Table 1]
- (egRh) Ablation studies on DVS-CIFAR10 for accuracy with respect to varying number of spike lanes and timesteps [Appendix A.4]
- (bytk, 7r4n) Impact of memory access energy on VGG16 ImageNet evaluations [Appendix A.5]
We would like to express our gratitude to the reviewers for their thoughtful feedback and the time invested in evaluating our work. During the rebuttal process, we addressed the primary concerns raised, including providing additional comparisons to prior works, memory energy estimations, and ablation studies analyzing the impact of varying spike lanes and timesteps on accuracy.
Despite these efforts, the reviewers remain divided on the paper's significance, citing that the proposed neuron model which leverages one-hot spike lanes to reduce the number of timesteps T during SNN inference while improving accuracy and maintaining the low-spike rates of traditional SNNs, a novel approach not explored in prior works, does not meet ICLR's standards. While we believe that our results, which outperform recent works cited in leading conferences (e.g., ECCV22, ICLR22, ICLR24) in terms of accuracy and energy tradeoffs, demonstrate the effectiveness and impact of our approach, we respect the reviewers' perspectives.
Given the current scores and the likelihood of rejection, we have decided to withdraw the paper from further consideration.