PaperHub
4.4
/10
Rejected4 位审稿人
最低2最高4标准差0.9
2
2
2
4
ICML 2025

Canonic Signed Spike Coding for Efficient Spiking Neural Networks

OpenReviewPDF
提交: 2025-01-23更新: 2025-06-18

摘要

关键词
Spiking Neural Networksspike encoding schemeANN-SNN conversion

评审与讨论

审稿意见
2

The paper proposes the Canonic Signed Spike (CSS) coding scheme, which enhances encoding capacity while maintaining network simplicity. Additionally, the Over-Fire-and-Correct method is introduced to enable efficient computation. The primary contribution lies in minimizing conversion loss when transforming artificial neural networks (ANNs) into spiking neural networks (SNNs).

给作者的问题

Can the proposed method be extended to support activation functions beyond ReLU, similar to the approach in Stöckl & Maass (2021)?

The paper claims that the algorithm is hardware-friendly. Could the authors elaborate on how it can be efficiently implemented in hardware?

论据与证据

The paper claims to achieve minimal conversion loss from ANN to SNN while preserving computational efficiency. However, while the proposed approach is promising, it shares similarities with existing methods, particularly the work of Stöckl & Maass (2021). Despite this, the proposed method does not appear to surpass existing techniques in terms of flexibility.

方法与评估标准

The validation methods used in the paper are correct. The authors apply Leaky Integrate-and-Fire (LIF) neurons for SNN conversion and introduce modulation mechanisms to ensure precise transformation. The methodology is well-grounded in established conversion techniques.

理论论述

The theoretical analysis focuses on conversion error, and no apparent errors are present in the conclusions. The mathematical formulations appear consistent with existing ANN-to-SNN conversion frameworks.

实验设计与分析

The experimental evaluation conducted on CIFAR-10 and ImageNet is reasonable and aligns with the standard benchmarks used in SNN research.

补充材料

The authors did not provide supplementary materials.

与现有文献的关系

The work primarily targets neuromorphic applications, significantly reducing deployment costs by eliminating the need for quantization training.

遗漏的重要参考文献

The references used in the paper are well-structured and appropriate for the study.

其他优缺点

The proposed method does not require model quantization, which reduces training requirements and enhances deployment efficiency.

其他意见或建议

None.

作者回复

Thank you for your thorough review. Below, we address some key points of your concerns.


Reference Implementation in Hardware

Compared to traditional rate coding with IF neurons, our method introduces three additional components: 1. Membrane potential amplification, 2. Silent period control and 3. Handling input and output of negative spikes.

Silent period control is efficiently managed by a state register. When a neuron starts computing, the register outputs 0. After PP clock cycles (where PP is the silent period length), it switches to 1 and remains there until a reset. This control signal is shared across multiple neurons, resulting in minimal hardware overhead.

Membrane potential amplification is implemented using a simple shift operation. Since the shift amount is fixed, this can be achieved only by introducing a grounding line (logic 0) at the least significant bit (LSB) of the membrane potential input to the adder. Specifically, the [n-2:0] bits are hardwired to the adder's input [n-1:1], while the LSB of the input is tied to 0. By performing threshold comparison, the system ensures that the residual value does not exceed 2n12^{n-1}, thus preventing overflow. As a result, there is no additional cost for the amplification operations.

Handling negative spikes is straightforward and requires a two’s complement addition, which can be efficiently performed by the adder. Specifically, the most significant bit (MSB) of the membrane potential is used to determine its polarity. If the MSB is 1, we take the two’s complement and compare it with the positive threshold in the comparator. This approach effectively compares the absolute value of the membrane potential with the threshold, maintaining a single comparator, which incurs minimal hardware overhead. Based on the comparison and the MSB value, we can determine both the presence and the polarity of the spike.

Thus, the only notable addition is the two’s complement unit. The silent period control and spike emission modules contribute negligible overhead. We provide an illustration of the reference design at this URL. The amplification operation incurs virtually no overhead and constitutes only a small proportion of the total operations. For details on the proportion, please refer to our response to Reviewer ujyj.

Based on our analysis and experimental results, our method is hardware-friendly and has minimal impact on energy consumption.

Our Main Contributions

First, we would like to clarify that we do not use LIF neurons (β<1\beta<1); instead, we introduce the novel TSA neuron (β>1\beta>1). The difference in β\beta leads to distinct weight patterns and using LIF neurons would cause significant conversion errors due to the residual membrane potential at the end. Additionally, we introduce a negative threshold mechanism to further reduce inference latency. For more details, please refer to our response to Reviewer cfNu.

Second, we highlight the differences between our work and that of Stöckl & Maass [1]. Their approach relies on complex neuron designs and increased computational latency to ensure conversion accuracy and support GeLU activation. In contrast, while our method does not support GeLU activation, it maintains network simplicity and significantly reduces inference latency. Moreover, ReLU activation remains the most widely adopted target for conversion. For additional experiments related to latency, please refer to our response to ujyj.

Lastly, we emphasize that our approach leverages stepwise weighting, which incurs minimal additional cost while delivering substantial benefits. Our method is also flexible, as it adheres to the standard ANN-SNN conversion framework, requiring only the replacement of IF neurons with TSA neurons. This enables our approach to be effectively applied to architectures such as Transformers and tasks like object detection, significantly reducing the required number of timesteps. For further details, we have provided additional experiments on these aspects in our response to pam4 for your reference.

[1] Optimized Spiking Neurons Classify Images with High Accuracy through Temporal Coding with Two Spike


If you have any further questions, we would be happy to address them.

审稿意见
2

The paper aims to improve the conversion of Artificial Neural Networks (ANNs) to Spiking Neural Networks (SNNs) by developing a more efficient spike coding scheme, which has improved encoding capacity and reduced computational overhead.

给作者的问题

See Weaknesses & Questions.

论据与证据

Based on a careful review, the claims in the paper are generally well-supported by evidence.

方法与评估标准

The proposed methods and evaluation criteria align with the research problem that reduces the conversion loss between ANN and SNN.

理论论述

The paper's theoretical proofs are mathematically rigorous and provide support for the proposed Canonic Signed Spike (CSS) coding scheme.

实验设计与分析

The experimental designs are sound, validating the proposed Canonic Signed Spike coding scheme effectively. Experimental results demonstrate their method's performance and advantages.

补充材料

I thoroughly reviewed the entire Appendix sections (A-E) in the document. The appendices provide critical mathematical foundations and supplementary experimental evidence.

与现有文献的关系

The paper's key contributions are within the ANN-SNN conversion learning algorithm. More specifically, the authors extend work by Li et al. (2022) and Wang et al. (2022) that uses negative spikes, and then introduces a more systematic approach to negative spike correction. Compared to previous methods, the proposed methods provide more efficient information encoding.

遗漏的重要参考文献

No.

其他优缺点

Strengths:

  1. The authors develop a spike coding scheme called CSS that has improved encoding capacity and reduced computational overhead.
  2. The paper's theoretical proofs are mathematically rigorous and support the proposed CSS coding scheme.
  3. Experimental results demonstrate their method's performance.

Weaknesses & Questions:

  1. Can the paper's method be extended to broader network architectures, such as Spiking Transformers that contain the attention mechanism?
  2. Why did the authors only validate their method on simple image classification tasks? Noteworthy, the state-of-the-art methods in ANN-SNN conversion like Fast-SNN[1] and Spike-Zip-TF[2] have verified their methods across multiple tasks, such as object detection, semantic segmentation, and natural language understanding.
  3. The SyOPs and energy efficiency claims in Table 4 require additional re-examination. Why is the energy consumption calculated by the rate-based and the CSS-based method lower than that of the single-spike TTFS method?
  4. Can the proposed method support conversion on neuromorphic datasets?
  5. Why were the authors not compared with the most recent state-of-the-art ANN-to-SNN conversion algorithms[2,3,4]? Refs: [1] Fast-snn: Fast spiking neural network by converting quantized ann [2] SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN [3] Towards High-performance Spiking Transformers from ANN to SNN Conversion [4] A universal ANN-to-SNN framework for achieving high accuracy and low latency deep Spiking Neural Networks

其他意见或建议

It is recommended that the author refer to Fast-SNN[1] and Spike-Zip-TF[2] for experimental design. [1] Fast-snn: Fast spiking neural network by converting quantized ann [2] SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

作者回复

Thank you for your thorough review. Below, we address some key points of your concerns.


First, we would like to emphasize that our core contribution lies in the innovation of the encoding method. Our work is not a continuation of Li et al. (2022) and Wang et al. (2022) because our core focus is on weighting the spikes, whereas they still rely on rate coding. Although both approaches use negative spikes, their roles are different, as we explained in Section 4.2.

By leveraging spike weighting, we enhance the encoding capacity of spike sequences. Compared to mainstream rate coding or TTFS coding, our approach significantly reduces the required timesteps. Our method follows the existing ANN-SNN conversion framework, making it straightforward to transition from rate coding to CSS coding. Since the core objective of ANN-SNN conversion is to accurately represent ANN activations using spike sequences, our approach is not limited to specific network architectures or tasks but rather provides a broadly applicable optimization.

Our experimental design aims to verify the accuracy of the encoding and the effectiveness of the TSA design. We choose image classification as the primary task because it is the most common benchmark in SNN research. We adopt CNN architectures as they are still the most widely used in ANN-SNN conversion. Additionally, we compare with CNNs of the same structure to highlight the impact of the encoding method, thereby better demonstrating the value of our work. To address your concerns regarding applicability, we have further extended our method to ViTs and object detection tasks, with preliminary results provided below.

Extended Experiments

Conversion of Transformer architectures

To demonstrate that CSS can also encode activations in Transformers, we converted ViT-S and ViT-B for the ImageNet classification task. We used the pre-trained weights provided in SpikeZIP-TF [1], where "32Level" denotes the quantization precision. As shown, our encoding scheme significantly reduces the required timesteps under the same weights. Additionally, we provide the actual runtime for a more intuitive comparison.

MethodArchitectureParamEncoding SchemeTimestepAcc.Runtime
SpikeZIP-TFViT-S-32Level22.05Mrate6481.45%3492.62s
CSS-SNNViT-S-32Level22.05MCSS681.51%325.55s

Object detection tasks

To demonstrate that CSS can be applied to object detection tasks, we conducted experiments on the VOC2007 dataset using the same architecture and weights as in Fast-SNN [2]. The results are shown in the table below. As observed, our method not only reduces the required timesteps but also significantly lowers the conversion loss.

MethodArchitectureANN mAPEncoding SchemeTimestepSNN mAP
Fast-SNNYOLOv2(ResNet-34-4b)76.16rate1576.05
CSS-SNNYOLOv2(ResNet-34-4b)76.16CSS476.18
Fast-SNNYOLOv2(ResNet-34-3b)75.27rate773.43
CSS-SNNYOLOv2(ResNet-34-3b)75.27CSS375.20

In addition, we have also included the experiment on the neuromorphic dataset in our response to Reviewer cfNu, which you may find useful for reference.

We would like to emphasize once again that our core contribution lies in the nonlinear encoding scheme, which has broad applicability. For existing rate-based ANN-SNN frameworks, one only needs to replace the IF neurons with TSA neurons, as we have done in the experiments above.

[1] SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN [2] Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN

Energy Estimation

Our energy estimation is based on an open-source codebase, and we have carefully re-examined the code without identifying any issues. Although it may seem counterintuitive, it is entirely plausible that CSS exhibits lower energy consumption than TTFS. First, CSS operates with only three timesteps, meaning each neuron can fire at most three spikes. Second, the CSS encoding scheme applies a more aggressive quantization to ANN activations—many small activations in the ANN are encoded as zero. In contrast, TTFS utilizes more timesteps for fine-grained quantization, encoding a greater number of "less important" activations. As a result, although TSA neurons may fire multiple spikes, significantly fewer neurons are activated in CSS coding. This combined effect leads to CSS achieving lower overall energy consumption.


If you have any further questions, we would be happy to address them.

审稿人评论

Thank you for your reply. I am very grateful to the author for conducting additional experiments to answer my questions, and some of my questions have been resolved, so I will modify my rating. However, I still have the following concerns/questions.

  1. Can the proposed method support neuromorphic datasets?

  2. The authors have effectively demonstrated the feasibility of their approach in visual processing tasks. As they note, "the core contribution lies in the nonlinear encoding scheme, which has broad applicability. For existing rate-based ANN-SNN frameworks, one only needs to replace the IF neurons with TSA neurons." However, it remains unclear whether this method could be successfully applied to text-based tasks such as NLP or NLU. Furthermore, the potential applicability to speech processing tasks, which inherently contain richer temporal information structures, warrants investigation. How might the proposed encoding scheme perform when extended to these different tasks?

  3. The authors state: "our approach follows a nonlinear accumulation process. This key difference allows us to accumulate information more quickly, thereby reducing the required timesteps." Does the proposed method introduce floating-point multiplication operations? Does this compromise the important spike-driven advantages of SNNs?

  4. Although TTFS uses more time steps for fine-grained quantization, it emits at most one spike across all time steps. With the same network structure, the number of spikes emitted by the TTFS encoding should be less than the method proposed in this paper. Additionally, how should we understand the author's explanation of " TTFS encodes more 'less important' activations"?

作者评论

Thank you for raising your score and providing further feedback! We are happy to address your concerns.

Additional Experiments

Neuromorphic Datasets

We have already presented results on the DVS128Gesture dataset in our response to Reviewer cfNu. Here, we further include experiments with ResNet-18 on CIFAR10-DVS and N-Caltech101 datasets. We implemented a simple rate coding scheme as the baseline, and the results are presented in the table below. The results demonstrate that our method is fully compatible with neuromorphic datasets, significantly reduces the number of timesteps, and further mitigates the conversion loss.

MethodANN Acc.DatasetCoding SchemeTSNN Acc.
-90.94%DVS128Gesturerate12890.56%
CSS-SNN90.94%DVS128GestureCSS690.89%
-83.03%N-Caltech101rate12882.51%
CSS-SNN83.03%N-Caltech101CSS882.76%
-78.35%CIFAR10-DVSrate25677.87%
CSS-SNN78.35%CIFAR10-DVSCSS878.15%

Natural Language Processing

We have already demonstrated the effectiveness of our method on the Transformer architecture, making its application to NLP tasks a natural extension. We conducted experiments using the RoBERTa model on the IMDB Movie Review and SST-2 datasets, using the pretrained ANN provided in SpikeZIP-TF for conversion. The results are presented in the table below, demonstrating the effectiveness of our method on NLP tasks. We additionally report the runtime, which clearly highlights the efficiency of our approach.

MethodArchDatasetCoding SchemeTAcc.Runtime
SpikeZIP-TFRoBERTa-B-32LvSST-2rate6492.32%169.45s
CSS-SNNRoBERTa-B-32LvSST-2CSS592.32%19.68s
SpikeZIP-TFRoBERTa-B-32LvIMDB-MRrate6481.30%4964.80s
CSS-SNNRoBERTa-B-32LvIMDB-MRCSS581.36%489.51s

Audio Classification

Our method can also be applied to audio processing. We conduct audio classification tasks using ResNet-18 on the GTZAN and ESC-50 datasets. The results, shown in the table below, further demonstrate the strong applicability of our method.

MethodANN Acc.DatasetCoding SchemeTSNN Acc.
-90.62%GTZANrate25689.54%
CSS-SNN90.62%GTZANCSS890.28%
-75.15%ESC-50rate25674.97%
CSS-SNN75.15%ESC-50CSS875.00%

We have already demonstrated the applicability of our design across visual, textual, and auditory tasks, as you suggested. We sincerely appreciate your constructive suggestions and believe these additional results further enrich our work. However, we hope you understand that it is impractical to enumerate and evaluate our method on every possible task.

We would like to reiterate that our method is not limited to specific models or tasks. Instead, it introduces a general innovation in encoding. Our proposed CSS coding significantly reduces the number of timesteps while preserving the simplicity of the rate-based conversion process. Wherever ANN-to-SNN conversion is applicable, our method can be readily adopted. This makes our approach a substantial contribution to the SNN community.

Further Clarifications

Efficient Implementation of Spike Weighting

In our method, spike weights are applied by doubling the membrane potential at each timestep, which corresponds to a left shift in hardware. Since the shift amount is fixed, it can be implemented purely through wiring, eliminating the need for shift registers. This design introduces no floating-point operations, preserves the spike-driven nature of SNNs, and incurs negligible hardware cost. For a more intuitive understanding, we provide an illustration of the reference design at this URL. You may also refer to our response to Reviewer MkoZ for more detailed information.

Reduced Spike Count

To address your follow-up question, we offer a more detailed breakdown of the spike counts. Suppose the activation value lies within [0,xp][0, x_p]:

  • For TTFS coding [2] with T=64T = 64, activations in [164xp,xp][\frac{1}{64}x_p, x_p] will be encoded as exactly one spike;
  • For CSS coding with T=3T = 3, activations in [18xp,xp][\frac{1}{8}x_p, x_p] are encoded into a spike sequence.

Considering the typical activation distribution in ANNs—where a large proportion of activations are close to zero—TTFS ends up encoding more values. Also note that CSS uses only three timesteps, requiring just around 1.5 spikes per activation. We visualize the activation distribution and report the average spike counts for encoding in each layer at this URL.


Finally, we believe our method introduces a meaningful innovation and delivers strong practical effectiveness, contributing to the advancement of ANN-SNN conversion. We sincerely appreciate the time and effort you have dedicated to reviewing our work!

审稿意见
2

In this work, the authors proposed a new neural coding, which is named canonic signed spike (CCS) coding. For the proposed encoding, they also introduced over-fire-and-correct and threshold optimization methods. The proposed coding method can efficiently transmit various information by transmitting information as binary spikes, but the accumulated membrane potential expresses information over time. The authors theoretically proved the correctness of information transmssion of this coding method. According to the authors' experiments, higher accuracies were achieved in image recognition performed with CNN models.

给作者的问题

Please refer to the above comments.

论据与证据

In order for the proposed method in this paper to be useful, the feasibility of implementation in neuromorphic hardware must be discussed. To implement the proposed method in neuromorphic hardware, synchronization between layers is essential. In addition, the operation of Equation 8 greatly deteriorates the advantage of event-based neuromorphic processors. It has a very big disadvantage that each membrane potential must be increased at every time step even if there are no input spikes. It will not be easy to implement in neuromorphic hardware. Is there a solution for this?

方法与评估标准

There is no detailed description or analysis of the proposed method. In addition to theoretical proof, it is necessary to experimentally show what influence each factor has. In addition, ablation studies and overhead analysis for the proposed method are required for evaluation.

Does the time step include a silent period? The inference procedure is not clearly explained.

If T and the silent period (P) overlap, a dependency occurs between layers, requiring the total time step of TxL. In this case, can we say that the time step took T instead of TxL?

There is a lack of detailed explanation about OFC.

What would be the performance if there were no negative spikes?

What is the ratio of shift operation and input integration in energy consumption analysis?

“the optimal threshold ~ accuracy in rate coding.” (lines 30~33) - Isn’t this the optimal threshold for the proposed CCS coding? In addition, detailed explanation and analysis of the optimal threshold are required.

理论论述

There seems to be no major problem with the theoretical claims.

实验设计与分析

Additional experiments are required to prove the superiority of the proposed method. It seems to be applicable to other tasks besides image classification. What are the experimental results for tasks such as object detection and segmentation? Also, what if it is applied to transformer models other than CNN? What are the experimental results for neuromorphic datasets?

补充材料

Yes, I reviewed it along with the manuscript.

与现有文献的关系

Yes, I reviewed it along with the manuscript.

遗漏的重要参考文献

The proposed method is similar to the method of the paper below in that it expresses information according to the integrated time difference by utilizing temporal information. In addition, it is similar in that it operates by dividing the integration (silent) and firing phases to transfer temporal information between layers. It is necessary to compare and discuss the methods of the papers below.

Temporal-Coded Spiking Neural Networks with Dynamic Firing Threshold: Learning with Event-Driven Backpropagation, ICCV-23

T2FSNN: deep spiking neural networks with time-to-first-spike coding, DAC-20

其他优缺点

Please refer to the above comments.

其他意见或建议

I think it would be helpful to have an overall figure of the proposed approach.

作者回复

Thank you for your thorough review. Below, we address some key points of your concerns.


Energy Overhead of Spike Weighting

To achieve nonlinear encoding, we double the membrane potential at each time step. First, we would like to emphasize that this method enhances encoded information with almost no additional cost. In our design, the shift amount is fixed for each step, allowing it to be implemented purely through wiring. Specifically, the [n-2:0] bits are hardwired to the adder's input [n-1:1], while the LSB of the input is tied to 0. This eliminates the need for shift registers, and incurs negligible energy consumption. Furthermore, the amplification is performed independently for each neuron, whereas the majority of operations arises from inter-neuron connections (e.g., convolutional or fully connected layers). Therefore, even if all neurons perform a shift at every time step, the overall cost remains minimal.

We provide a breakdown of the amplification operation's contribution to total operation count in the table below. The experiment was conducted with ResNet-18 on CIFAR-10. It can be observed that the operations for weighting are also minimal in number (accounting for just 4% of AC operations). Overall, their impact can be considered negligible.

TimestepAmp OpsACsMACs
84.42M108.5 M14.72M
42.23M76.89 M7.36M

Additionally, we would like to clarify that our approach is indeed quite hardware-friendly. We have provided a detailed reference design in our response to Reviewer MkoZ, which we encourage you to check for further details. We also provide an illustration of the reference design at this URL.

Methodological Details

OFC Method

The goal of OFC is to control the residual membrane potential to reduce conversion loss. This is achieved by lowering the firing threshold (causing Over-Firing) and introducing negative spikes (to Correct the excess firing). This method is also applicable to rate coding, as residual membrane potential impacts conversion loss in that setting as well. We addresses the question of how much to lower the threshold by deriving an optimal threshold mathematically. Notably, we have already conducted ablation experiments on negative spikes in Section 5.3, and experimental validation of the optimal threshold is provided in Section 5.4.

Inference Procedure of TSA

We provide the pseudocode for TSA’s forward propagation in Algo. 1, where the input spans all timesteps for clarity. However, our actual implementation employs a pipelined inference process: TSA processes spikes at each timestep, adjusting its behavior based on its local phase (e.g., silent periods).

Due to the pipelined processing, while the total delay from input to output is P×LP\times L, each image only occupies T+PT+P timesteps per layer, where TT timesteps are used for primary neural computation. We report TT in tables as it represents the actual encoding steps per input. This standard is also applied when comparing TTFS coding and [1], with a similar approach found in [2].

To further illustrate efficiency, we report actual runtime below. Although measured on standard GPUs, these results still reflect the pipeline design in inference. We implemented rate coding as a baseline, setting P=TP=T to simulate [1]. Using four 2080Ti GPUs, we validated 50,000 images on ImageNet with VGG-16. Additionally, we processed 2,000 images one by one and averaged the latency to obtain the inference latency per image (LPI).

Coding SchemeTPAcc.RuntimeLPI
rate256070.50%11816.04s (19×19\times)1279ms (9.2×9.2\times)
CSS8875.18%886.31s (1.43×1.43\times)801ms (5.76×5.76\times)
CSS8175.17%621.15s (1×1\times)139ms (1×1\times)

[1] Optimized Spiking Neurons Classify Images with High Accuracy through Temporal Coding with Two Spike [2] Bridging the Gap between ANNs and SNNs by Calibrating Offset Spikes

Related Work

The two works you mentioned focus on implementing TTFS coding. Although both approaches involve accumulating information over time, we would like to emphasize that TTFS accumulates information in a linear manner [3], whereas our approach follows a nonlinear accumulation process. This key difference allows us to accumulate information more quickly, thereby reducing the required timesteps.

In our paper, we acknowledge that the silent period concept has been used in TTFS coding and [1]. However, we do not claim a contribution to the silent period itself; rather, we aim to minimize it to reduce inference latency. By incorporating the OFC method, we reduce PP to 1, whereas in [3,5], P=TP=T.

[3] Temporal-Coded Spiking Neural Networks with Dynamic Firing Threshold: Learning with Event-Driven Backpropagation


If you have any further questions, we would be happy to address them.

审稿意见
4

The paper proposes an implicitly weighted spiking mechanism for direct ANN-to-SNN conversion. The weight of the spikes, βTt\beta^{T-t}, is determined by the temporal location t[1,2,,T]t \in [1,2, \cdots ,T] of the spikes, where an earlier spike gets a higher weight than spikes that arrive later, as β>1\beta > 1. Further, the authors use single-bit signed spikes to reduce the approximation error computed with respect to the ANN activation. The empirical evaluations are performed on CIFAR-10 and ImageNet datasets, where the authors compared the proposed method with existing methods, showing a reduction in temporal latency in direct ANN-to-SNN conversion.

给作者的问题

Q1: How do the present neuronal dynamics compare to LIF dynamics?

Q2. Can the experimental evaluations be extended to CIFAR-100?

Q3: What are the challenges to applying the method on neuro-morphic datasets, such as N-NMIST, N-Caltech, and DVS-CIFAR-10?

论据与证据

Yes.

方法与评估标准

The experimental evaluation could be extended to the CIFAR-100 dataset.

理论论述

The theoretical claims made in the paper are supported with detailed derivations.

实验设计与分析

The experimental design is sound.

补充材料

Yes.

与现有文献的关系

The paper is well referenced.

遗漏的重要参考文献

The authors can include the recent publication [1], which uses signed rate encoding to reduce the variance of noise introduced by the input pixels.

[1] Bhaskar Mukhoty, Hilal AlQuabeh, and Bin Gu, Improving Generalization and Robustness in SNNs Through Signed Rate Encoding and Sparse Encoding Attacks, in The Thirteenth International Conference on Learning Representations (2025).

其他优缺点

Since the ANN-to-SNN methods pre-suppose the existence of an ANN model, it can be difficult to apply such a method to a neuromorphic dataset where no ANN model exists, or ANN models are equally challenging to train due to the inherent temporal dimension of data.

其他意见或建议

None.

作者回复

Thank you for your thorough review. Below, we address some key points of your concerns.


Comparison with LIF

The neuron dynamics of TSA and LIF can both given by the following equation:

uil[t]=βuil[t1]+zil[t]Sil[t]u_{i}^{l}[t]=\beta u_{i}^{l}[t-1]+z_{i}^{l}[t]-S_{i}^{l}[t]

Apart from the difference in handling negative spikes, the key distinction between TSA and LIF lies in the choice of β\beta. While both mechanisms serve to weight the input, TSA sets β>1\beta>1, resulting in a weight pattern that decreases over time. This design is primarily motivated by two factors:

  1. Enabling rapid transmission of most information.
  2. Reducing the weight of the final residual information, which is crucial for conversion accuracy.

For comparison, we set β=0.5\beta = 0.5 in the table below and observed a significant increase in conversion error. We conducted experiments using VGG-16 on the CIFAR-10 dataset. It can be observed that when using (Ternary) LIF neurons, it is necessary to extend the length of the silent period to reduce conversion loss, which significantly impacts output latency. This further highlights the importance of adopting a decreasing weight pattern.

NeuronTimestepSilent PeriodAcc.
TSA8196.68%
LIF8184.19%
LIF8495.32%
LIF8896.16%

Additional Experiments

Our main contribution is compressing the timesteps for conversion through a stepwise weighting mechanism, which is both convenient and flexible: it still follows the standard ANN-SNN conversion framework, requiring only the replacement of IF neurons with TSA neurons. Therefore, our method is applicable to a wide range of network architectures, datasets, and tasks. We have included additional experimental results on CIFAR-100 in the table below. The experiments were conducted based on the full-precision VGG-16.

MethodANN Acc.Coding SchemeTimestepSNN Acc.
OPI76.31%rate12876.25%
SNN Calibration77.89%rate25677.68%
TSC71.22%TSC102470.97%
LC-TTFS70.28%TTFS5070.15%
CSS-SNN76.56%CSS876.51%

Moreover, we have conducted experiments on object detection tasks and applied our encoding method to Transformer architectures. The results of these experiments can be found in our response to Reviewer pam4.

Regarding neuromorphic datasets, as you pointed out, one of the challenges of applying our method is the absence of an ANN counterpart, which is also a general limitation of ANN-SNN conversion. A possible solution [1] is to integrate temporal information into static features and then train an ANN for classification. We conducted experiments using ResNet18 on the DVS128Gesture dataset, with the results shown in the table below. For comparison, we implemented a simple rate coding as the baseline.

MethodANN Acc.Coding SchemeTimestepSNN Acc.
-90.94%rate12890.56%
CSS-SNN90.94%CSS690.89%

[1] Masked Spiking Transformer


If you have any further questions, we would be happy to address them.

最终决定

The paper proposes a novel Canonic Signed Spike (CSS) coding scheme and an Over-Fire-and-Correct (OFC) method to enhance encoding capacity under reduced timesteps in ANN-to-SNN conversion. The aim is to improve the accuracy of rate-based encoding for spiking neural networks. The authors evaluate their methods on CIFAR-10 and ImageNet, reporting improved performance and demonstrating the potential of their approach in time-constrained scenarios.

After the rebuttal phase, the paper received scores of 4, 2, 2, and 2. Across the reviews, the strengths of the paper include well-organized theoretical derivation, novel methodology, and acknowledged performance improvement. These contributions are particularly pointed out by reviewers MkoZ and pam4. On the other hand, several common concerns were raised: insufficient comparison and discussion of related work, a lack of clarity on the applicability of the method to other datasets (e.g., CIFAR-100, neuromorphic datasets) and architectures, and limited discussion on neuromorphic hardware deployment and its implications.

In the rebuttal, the authors addressed the dataset generalization issue raised by Reviewer cfNu, provided clarifications on related literature requested by Reviewer MkoZ, and offered a pipeline-based implementation strategy in response to Reviewer ujyj' s deployment concerns. However, I think their response to reviewer ujyj did not fully resolve the reviewe'r's comments. Given the overall scores and the fact that all reviewers called for substantial improvements in comparison and analysis, I consider the paper to be on the borderline and recommend rejection. I encourage the authors to revise the paper by enhancing related work discussion, expanding experimental validation, and addressing neuromorphic deployment in more detail.