A Scalable, Causal, and Energy Efficient Framework for Neural Decoding with Spiking Neural Networks
Spikachu: Scalable, Causal, and Energy Efficient Neural Decoding Based on Spiking Neural Networks.
摘要
评审与讨论
This paper splits uses the neurosymbolic framework to make neural network (SNNs) behave more bayesian neural networks in practice. This can happen due to the universal computability of both neural networks and traditional modes of AI which are considered turing complete. The specific approach is called Spikachu which combines traditional computing paradigms with neural network computing paradigms. The primary benefit is energy efficiency for commodity computing.
优缺点分析
Strengths: Good use of neurosymbolic framework. good validation
Weaknesses: see questions.
问题
How should this work be evaluated with respect to other more theoretical works?
局限性
N/A
最终评判理由
See updated review below for justification. No increase in score after author rebuttal.
格式问题
N/A
Thank you for taking the time to review our manuscript. We are excited to read that our approach makes "good use of the neurosymbolic framework" and that you found our experimental validation comprehensive. We are also very happy to see that you marked our work as "excellent" (4/4) in terms of clarity, significance, and originality, and "good" (3/4) in terms of quality. Those scores, combined with the absence of weaknesses or unaddressed limitations in your report, speak to the high caliber of our work. We do hope that based on those, you will reconsider your overall score (3/6) which corresponds to “borderline reject”.
Below, we provide our response to your comments and questions. If you would like any further clarifications, we will be happy to provide them during the discussion period.
Weaknesses
We are proud to report that the reviewer identified no weaknesses in our work.
Questions
- How should this work be evaluated with respect to other more theoretical works?
Thank you for this insightful question. We agree that evaluating applied work such as ours alongside more theoretical research requires careful consideration of differing goals and contributions.
Our work primarily aims to address practical, real-world challenges in neural decoding, such as causality, scalability, and energy efficiency, all of which are essential for model integration into online BCI systems. Rather than proposing new theoretical learning principles or formal guarantees, we focus on extending the more theoretical works such as POYO [Azabou et al. (2023)], CEBRA [Schneider et al. (2023)], and LFADS [Sussillo et al. (2016)] towards real-world utility in the context of BCIs.
We believe that the value of this work lies in its conceptual contributions to simultaneously causal, scalable, and energy-efficient neural decoding, particularly through the introduction of the following key innovations:
-
Causal harmonizer for cross-session and cross-subject generalization. To the best of our knowledge, we are the first to adapt cross-attention with learnable latents queries (similar to the Perceiver encoder) into a causal harmonizer capable of integrating neural activity across multiple sessions and subjects. This is a significant step forward, as it enables models that were previously limited to single-session training to operate across larger, more diverse datasets, reaping the well-known benefits of scale. We further demonstrate in Appendix F.4 (lines 1126–1149) that this harmonizer is a modular and general-purpose component that can be integrated not only into our proposed architecture but also into other widely used neural decoding models such as MLPs, GRUs, and LSTMs.
-
Multi-scale SNN module for temporal feature extraction. We introduce a multi-scale SNN module designed to process streaming neural signals at distinct temporal resolutions. This approach enables our framework to capture rich representations of neural dynamics without relying on transformer-based architectures, distinguishing it from prior works such as Spikformer and POYO. Our ablation study (Appendix F.2, lines 1079–1103; Figure 11) reveals that this module is the most critical component for overall decoding performance.
-
First causal framework for multi-session, multi-subject neural decoding. To our knowledge, this is the first neural decoding framework that is simultaneously causal and trainable across multiple sessions and subjects. While many existing models (e.g., POYO, LFADS, CEBRA) can generalize across sessions and subjects, they rely on non-causal architectures (i.e. require past and future context to make a prediction), limiting their applicability outside offline research settings. Our framework addresses this gap directly.
-
SNN pretraining across subjects, sessions, and tasks: We are also the first to show that SNN-based neural decoders can be pretrained on large multi-session, multi-subject datasets and learn neural representations that generalize to unseen sessions, subjects, and tasks. While this has been demonstrated in ANN-based models (e.g., POYO, CEBRA), it was not previously known whether similar transferability would hold for SNNs given their different training dynamics and representational capacities.
-
Energy-constrained neural decoding at scale: Finally, we show that it is possible to build a multi-session, multi-subject neural decoder that respects the energy constraints required for deployment on edge computing platforms. Although this is not the only path toward practical BCI systems, we believe it is a viable and underexplored one, especially for scenarios where continuous tethering to a high-performance computer is infeasible.
These contributions aim to bridge the gap between theoretical developments and practical applicability, which we believe is a necessary step toward translating real-world BCI systems from the research lab to the real world. As such, we suggest that this work be evaluated on how effectively it translates theoretical ideas into robust, deployable models, and on whether it will inspire further theoretical and empirical research on causal, scalable, and efficient models for neural decoding.
Limitations
We are proud to report that the reviewer identified no unaddressed limitations in our work.
We appreciate your acknowledgement of reading our rebuttal, and kindly request that, if there are any unaddressed concerns or open questions, you reach out to us for further discussion. We are also glad to remark that the deadline for this phase has been extended to Aug 8, 11:59 pm AoE, and we remain at your disposal for any clarifications needed until then!
The paper describes a neural decoding framework, that is, a spiking model designed to map brain signals to some appropriate action. Inputs are taken from a cortical implant. Various architectures from the literature are implemented using spiking resulting in a causal framework for inference. Experiments are done using data from an animal controlling a cursor; the goal is to predict the cursor velocity. Results show that the model performs well whilst being the most energy efficient amongst a sensible choice of competitors.
优缺点分析
Strengths:
The paper is very clearly written. The use of spiking nets makes sense for BCI signals where the original signal is itself spiking, and the application is itself quite persuasive. The work shows that a spiking model can perform well against good alternatives for decoding brain signals. Further, it is by far the most (potentially) energy efficient. The processing pipeline is sensible. Spikachu is a cool name.
Weaknesses:
My main worry is that, however well intended, the work is essentially mainly engineering. The use of spiking is really only to achieve energy efficiency in a future neuromorphic architecture, and the component techniques are otherwise known. Of course the demonstration that it works is interesting, but I'm less certain whether that contribution is novel enough for NeurIPS. The energy calculations are presumably hypothetical, with the actual implementation being on conventional hardware. Other solutions appear to perform better than the spiking (although I fully appreciate that energy efficiency is an important goal).
问题
It may be in the paper, but does one of the baseline approaches in table 2 try to mimic the architecture in figure 1?
局限性
I don't see the limitations addressed explicitly, but I don't see a need, other than perhaps ethical points. There are clearly ethical issues associated with brain implants and animal testing, but the datasets appear to be independent of the study.
最终评判理由
My original negative opinion was based quite heavily on my belief that the work is mainly engineering and not so novel. After the review process I find myself simply disagreeing with the authors (and hesitant to just argue) and in conflict with the other reviewers (two of whom have bumped up scores significantly). Whilst I stand by the negative opinion I have lowered the confidence score reflecting the fact that I have probably missed something; it seems the right answer in a Bayesian sense.
格式问题
None
Thank you for your thoughtful feedback and suggestions. We are excited to read that our "paper is very clearly written" and that our "model can perform well against good alternatives for decoding brain signals" while being "by far the most (potentially) energy efficient".
Please find our answers to your questions/comments below.
Weaknesses
- My main worry is that, however well intended, the work is essentially mainly engineering.
Thank you for this thoughtful comment. We appreciate your concern regarding the balance between engineering and conceptual contributions in our work. We respectfully emphasize that our primary contributions are conceptual in nature. Our work addresses longstanding challenges in the field, particularly the need for causal, scalable, and energy-efficient models that generalize across sessions and subjects, and introduces several key innovations to advance neural decoding to this end.
Below, we highlight the main conceptual contributions of our work:
- Causal harmonizer for multi-session, multi-subject neural decoding. We are the first to use cross-attention with learnable queries to integrate multi-session, multi-subject data in a causal manner. This is a significant step forward, as it enables models that were previously limited to single-session training to operate across larger, more diverse datasets, reaping the well-known benefits of scale. Importantly, in Appendix F.4, we show that the harmonizer can be integrated with widely used neural decoding models such as MLP, GRU, and LSTM.
- Multi-scale SNN module. We introduce a multi-scale SNN module designed to process streaming neural signals at distinct temporal resolutions. This approach enables our framework to capture rich representations of neural dynamics without relying on transformer-based architectures, distinguishing it from prior works such as Spikformer and POYO. Our ablation study (Appendix F.2) reveals that this module is the most critical component for overall decoding performance.
- First causal framework for multi-session, multi-subject neural decoding. To our knowledge, this is the first neural decoding framework that is simultaneously causal and trainable across multiple sessions and subjects. While many existing models (e.g., POYO, LFADS, CEBRA) can generalize across sessions and subjects, they rely on non-causal architectures (i.e. require past and future context to make a prediction), limiting their applicability outside offline research settings. Our framework addresses this gap directly.
- SNN pretraining across sessions, subjects, and tasks. We are also the first to scale SNN-based neural decoders to multi-session, multi-subject data. We show that the learned neural representations generalize to unseen sessions, subjects, and tasks. While this has been demonstrated in ANN-based models (e.g., POYO, CEBRA), it was not previously known whether similar transferability would hold for SNNs given their different training dynamics and representational capacities.
- Energy-constrained neural decoding at scale. We show that it is possible to build a multi-session, multi-subject neural decoder that respects the energy constraints required for deployment on edge computing platforms. Although this is not the only path toward practical BCI systems, we believe it is a viable and underexplored one, especially for scenarios where continuous tethering to a high-performance computer is infeasible.
In summary, our contributions are primarily conceptual, and in this work we establish several key firsts required for causal, scalable, and energy-efficient neural decoding.
- The use of spiking is really only to achieve energy efficiency in a future neuromorphic architecture, and the component techniques are otherwise known.
Thank you for this thoughtful comment.
First, as outlined in our response to weakness 1, we introduce multiple novel components, most notably: (1) the neural harmonizer that facilitates causal multi-session, multi-subject training, and (2) the multi-scale SNN block that processes streaming inputs at distinct temporal scales to build rich latent representations.
Second, while energy efficiency is a compelling advantage of SNNs, it is not the sole reason for adopting them in our framework. SNNs offer other key properties that align with the demands of real-time neural decoding systems:
- They are inherently causal, enabling streaming inference without relying on future data or long context windows, since context is implicitly encoded in the neuron’s membrane potential.
- Their bio-inspired design makes them particularly well-suited for interpreting real neural signals.
Based on your comment, we will ensure to better motivate the choice of SNNs in the Introduction (section 1) of our revised manuscript.
- Of course the demonstration that it works is interesting, but I'm less certain whether that contribution is novel enough for NeurIPS
We are excited to hear that "the demonstration that it works is interesting". Please find the main contributions of this work in our response to your weakness 1. We hope that based on those you will reconsider.
- The energy calculations are presumably hypothetical, with the actual implementation being on conventional hardware
Thank you for this insightful comment.
We agree that the energy consumption figures presented in our work are estimates. However, we emphasize that they are grounded in rigorous calculations detailed in Appendix E and are based on assumptions drawn from well-established prior works [i.e. Zhu et al. (2024, Autonomous Driving with Spiking Neural Networks), Lv et al. (2023, SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation)]. While we recognize that absolute energy values may vary depending on the hardware, our efficiency claims rely on hardware-agnostic metrics such as FLOPs, ensuring fair comparisons that are not biased due to any specific hardware or software.
Based on your feedback, we will revise the manuscript to explicitly label all energy-related claims as estimates, and we will add a paragraph in the discussion section (section 5) expanding on this limitation.
- Other solutions appear to perform better than the spiking (although I fully appreciate that energy efficiency is an important goal).
Thank you for this thoughtful observation.
As you correctly noted, POYO does achieve slightly higher decoding performance than our model in its standard configuration (while requiring 2 orders of magnitude more energy per inference). However, we would like to emphasize that this comparison is not entirely fair. POYO is a non-causal model (i.e. it utilizes both past and future neural activity to make predictions). In contrast, Spikachu is causal (i.e. relies only on past inputs). This distinction is critical, as POYO’s access to future data is not feasible in online BCI applications.
To enable a fair comparison, we benchmarked Spikachu against a causal variant of POYO (described in Appendix C.2; “POYO-causal in Table 2). Under this causal constraint, Spikachu outperforms POYO-causal by a notable margin, with ΔR² = 4.37% on the CO task and ΔR² = 11.32% on the RT task.
Questions
- It may be in the paper, but does one of the baseline approaches in table 2 try to mimic the architecture in figure 1?
Thank you for this insightful question. In the original submission, we did not include a direct comparison between our model and an equivalent ANN variant.
To address this, we implemented an ANN variant of our model (referred to as Spikachu-ANN), in which we replaced all spiking LIF neurons with stateless ReLU neurons, while keeping all other architectural components and training procedures identical. We then evaluated Spikachu-ANN on the same dataset that we used for comparing against other baselines (see section 4.2 and Table 2).
We summarize the results below.
| Model \ R² | CO | RT |
|---|---|---|
| Spikachu | 0.8399 | 0.6762 |
| Spikachu-ANN | 0.5332 | 0.3642 |
Clearly, Spikachu-ANN consistently underperformed relative to its SNN version. This is not surprising, as the spiking neurons in the SNN-based implementation implicitly retain temporal information through membrane potential dynamics. In contrast, the ReLU-based neurons in Spikachu-ANN are stateless and therefore incapable of capturing temporal dependencies across time steps.
Based on your feedback we will include Spikachu-ANN as a baseline in section 4.2 and Table 2 of our revised manuscript.
Limitations
- I don't see the limitations addressed explicitly, but I don't see a need, other than perhaps ethical points. There are clearly ethical issues associated with brain implants and animal testing, but the datasets appear to be independent of the study.
Thank you for raising this important concern. We fully acknowledge the ethical considerations surrounding animal research and the use of brain implants. While, as you correctly noted, we did not collect the datasets ourselves, we hold ourselves accountable for ensuring that any data we use comes from studies that adhere to the highest ethical standards. To that end, we would like to emphasize that all datasets used in this work were collected by accredited research institutions and received approval from the relevant institutional ethical committees. We are committed to using only data from studies that prioritize animal welfare and follow established ethical guidelines.
If you feel it would be helpful, we would be happy to add a dedicated paragraph in the discussion section addressing the ethical considerations of brain implants and animal research. We appreciate your input on this.
Thank you for your response to all of the reviews. I say all of them because the other reviews and responses suggest I must concede to having been somewhat harsh in dismissing the work as just engineering. However, I must make a position clear:
I don't see causality as being particularly novel in that it does not demand a particular innovation in the implementation; the difficulty is retaining performance without the look-ahead. Nor do I see the application of a known technique (notably pre-training) in an SNN context as novel; it is clear a-priori that it will work.
That the harmoniser module is ANN is important; it currently dictates that the architecture requires a conventional processor in the chain, effectively cancelling out any power saving of moving other components to SNN. It is notable that in a rebuttal above regarding an ablation study, the authors state "Clearly, our model did not rely on the ANN harmonizer to perform." This in turn reduces the impact of any novetly. Another response above: "Fortunately, the ANN-harmonizer can be easily replaced once spiking cross-attention is developed." seems to carry a certain risk.
So the novelty for me is just the Multi-Scale SNN module.
Whilst it was always just a question, I do not find the replacement of the SNN components with ReLU to be persuasive for exactly the reason that the authors point out: The SNNs have memory. Rather, the ReLU would need at least a single recurrent connection to be comparable.
Overall, I do not wish to change my recommendation; the work clearly has strengths, but NeurIPS is a selective venue.
Thank you for engaging in the discussion! We are excited to read that “the work clearly has strengths” and we appreciate you acknowledging that your initial assessment may have been harsh. Please find our responses to your comments below.
I don't see causality as being particularly novel in that it does not demand a particular innovation in the implementation; the difficulty is retaining performance without the look-ahead
Thank you for recognizing that retaining performance without look-ahead is challenging. Our model directly addresses this challenge by achieving strong decoding performance under strict causality. This is important because non-causal models cannot be used online and therefore, their impact is confined to the research lab.
We also kindly note that novelty is not confined to "a particular innovation in the implementation". We invite you to review our work based on the novelty of our ideas and ability to inspire future work towards causal approaches for neural decoding.
Nor do I see the application of a known technique (notably pre-training) in an SNN context as novel; it is clear a-priori that it will work
We respectfully disagree. While it is reasonable to hypothesize that pretraining could benefit decoding performance, empirical validation is essential, as pretraining can sometimes degrade performance [e.g., Kumar et al. (2022); Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution]. To our knowledge, this is the first work demonstrating its effectiveness in SNN based neural decoding.
That the harmoniser module is ANN is important; it currently dictates that the architecture requires a conventional processor in the chain, effectively cancelling out any power saving of moving other components to SNN
We respectfully disagree. We provide quantitative evidence in Section 4.2 and Table 2 that our model achieves substantial energy savings compared to other approaches. Importantly, those results are supported by FLOP-based, hardware-agnostic comparisons:
| Model | MACs (M) | ACs (K) |
|---|---|---|
| GRU | 2.52 | 16.05 |
| MLP | 2.64 | 3.59 |
| LSTM | 3.26 | 23.63 |
| POYO | 467.06 | 10.24 |
| Spikachu | 0.96 | 791.97 |
Our model requires <2.6× MACs vs. the next most efficient model and orders of magnitude less MACs than POYO, demonstrating meaningful energy savings.
It is notable that in a rebuttal above regarding an ablation study, the authors state: “Clearly, our model did not rely on the ANN harmonizer to perform.” This in turn reduces the impact of any novelty
We would like to clarify a potential misunderstanding. The harmonizer enables scaling up training to multi-session, multi-subject data. Our ablations simply show that in single-session settings, the model performs well even without it, which highlights the generalizability of our SNN backbone and the role of the harmonizer in facilitating scalability.
Another response above: “Fortunately, the ANN-harmonizer can be easily replaced once spiking cross-attention is developed.” seems to carry a certain risk.
We agree. Our intention is to highlight the flexibility of our approach and its potential for a fully SNN-based implementations as spiking attention mechanisms mature.
So the novelty for me is just the Multi-Scale SNN module
Thank you for recognizing this contribution of our work.
Whilst it was always just a question, I do not find the replacement of the SNN components with ReLU to be persuasive […] comparable
We appreciate your clarification and note that we performed this experiment in good faith to address your inquiry.
To further address your concern, we retrained Spikachu-ANN with access not only the current but also the four preceding timesteps, a well known strategy used to incorporate context. The results across 99 sessions (same data used in Section 4.2) are shown below:
| Model \ R^2 | CO | RT |
|---|---|---|
| Spikachu-ANN-with-memory | 0.5348 | 0.3643 |
| Spikachu | 0.8399 | 0.6762 |
Our approach outperformed its ANN variant even when the ANN is given memory.
Overall, I do not wish to change my recommendation; the work clearly has strengths, but NeurIPS is a selective venue.
We respect your recommendation and are grateful for your engagement.
We kindly encourage you to evaluate our work not solely based on "a particular innovation in the implementation", but rather in its ability to inspire future work towards causal, scalable, and energy-efficient neural decoding. We believe that the impact of research lies in its potential to inspire follow-up studies and practical adoption.
As a reflection, your scores were: Quality: 3 (good), Clarity: 4 (excellent), Significance: 3 (good), Originality: 2 (fair). Based on those scores, your overall recommendation for rejection appears somewhat unfair.
Nevertheless, we sincerely appreciate your thoughtful feedback, which has helped us strengthen both our work and the clarity of its presentation.
The authors introduce Spikachu - a "spiking" transformer model for decoding spikes for multi-session, multi-subject from electrophysiology experiments in non-human primates performing a motor control task. The goal is for the network to predict the x and y velocities from binned spike data. They claim substantial (between 2.26-418.81x) energy efficiency advantages over alternative methods, such as POYO, while achieving similar performance. They propose in addition a way to causally map neural recordings from multiple subjects in a common latent space and will release both models and code publicly.
优缺点分析
While I find the overall motivation of the study compelling, I have some concerns about methodology and purported energy efficiency.
Strength:
- Cute name
- Causal attention model that performs about as well as a GRU
- The method matches prior work in terms of decoding performance
- Using the accounting performed in the work there is a claim to a significant energy efficiency advantage
- This is to my knowledge the first attempt to use a "spiking" attention mechanism for neural decoding
Weaknesses:
- Adopting an existing architecture (Spikformer) to a task with an existing transformer based solution (POYO) is has limited novelty.
- The spiking transformer uses no temporal dynamics: The actual timestep used is (T=1) (see page 26, footnote 3), making most of the background on spiking neurons and bio-inspired neuron models in appendix A redundant. A more straightforward description would be that the authors are training a linear attention model with perceptron activation function.
- Energy efficiency estimates are unrealistic and should not be stated without qualifications. While I recognize that it is common practice to make claims of energy efficiency for SNN, these claims should be clearly marked as estimates everywhere they are made and the limitations of these estimates should be discussed in the main body of the article.
- Energy consumption is first and foremost a property of the hardware that the model is run on. The main point of the cited reference (Horowitz), is not that E_AC is 0.9 pJ (the quantity entering the energy estimate for SNN), or that E_MAC is 4.6pJ, but that in a modern processor an addition will typically require 70pJ -- meaning only a fraction of the overall energy for addition is spend on the computation itself. On a typical processor you would therefore expect the difference between a MAC and an AC operation to be more on the order of a few percent, since most energy is expended on control and memory movement.
- The parameter size of the proposed transformer model make an implementation in a realistic BCI interface questionable.
- The authors do not measure the actual energy consumption of the models as they run it on the actual hardware they have available. Doing so would make the difference between the estimated energy and the actual energy clear.
问题
- What energy per sample for the tested model on the GPU you are running on?
- Can you describe the spiking neuron model and gradient computation explicitly for the case (T = 1) that you are using. It should simplify significantly from the description of A.1.
- Can you estimate the chip size (based on memory requirements) that would be required to implement your model with the energy estimates you are stating?
- Can you explicitly estimate the cost of memory operations under ideal conditions?
局限性
The claimed energy efficiency advantages are estimates that are unrealistic and have no basis in any existing hardware implementation. The authors should clearly discuss the limitations of the energy estimates and complement them with actual measurements.
最终评判理由
Authors sufficiently addressed my concerns.
格式问题
None
Thank you for your insightful comments and questions, and for pointing out that our work is the “first attempt to use a 'spiking' attention mechanism for neural decoding” with “a significant energy efficiency advantage”.
Due to space constraints, we provide only the beginning of each of your prompts.
Weaknesses
- Adopting an existing architecture (Spikformer)…
Thank you for this thoughtful comment. While it is true that our model incorporates components from the literature, we introduce several novel elements.
- We introduce a multi-scale SNN module designed to process streaming neural signals across distinct temporal scales. Notably, this component is not transformer-based, setting our design apart from other works (i.e. POYO and Spikformer). Our ablation study (Appendix F.2) demonstrates that this component benefits decoding performance most.
- We contribute a causal harmonizer that enables scalable training across multi-session, multi-subject data. In Appendix F.4, we demonstrate that the harmonizer can also be incorporated into other common neural decoding models, such as MLP and GRU.
- Our model uses spiking self-attention, a causal, SNN-specific mechanism that, contrary to vanilla self-attention (used in POYO), operates only across the electrode dimension of the data. Temporal relations are implicitly captured via the internal state of spiking neurons.
- Our framework is causal, which is essential for online applications. All other frameworks capable of handling multi-session, multi-subject data (including POYO, LFADS, and CEBRA) are non-causal.
In summary, we introduce key innovations that expand the scalability and real-time applicability of neural decoding systems.
- The spiking transformer uses no temporal dynamics…
Thank you for raising this important point. We would like to clarify that our model uses temporal dynamics. Specifically, it processes spike trains in a streaming fashion, one input bin at a time, without requiring repeated input presentations. By T=1 we indicate that we do not artificially repeat inputs. This contrasts with applications involving static data, where SNNs are typically run on the same input multiple times, implying T>1. The total number of membrane potential update steps is equal to the number of input spike bins, typically a trial’s worth of data.
To demonstrate the utility of temporal dynamics in our model, based on your feedback, we re-implemented Spikachu as an ANN without any dynamics (LIF neurons replaced by ReLU neurons) and tested it on the same data as all baseline models (section 4.2). The results are summarized below.
| Model \ R² | CO | RT |
|---|---|---|
| Spikachu | 0.8399 | 0.6762 |
| Spikachu-ANN | 0.5332 | 0.3642 |
This shows that the temporal dynamics of our model are essential for its performance. We will ensure to clarify this point in Appendix E.2 and add this ANN version of Spikachu as an additional baseline in section 4.2.
- Energy efficiency estimates are unrealistic and should not be stated without qualifications…
Thank you for this insightful comment. Indeed, the energy consumption figures presented in our work are estimates. However, we emphasize that they are grounded in rigorous calculations detailed in Appendix E and are based on assumptions drawn from prior works [3, 4]. While we recognize that energy values may vary depending on the hardware, our efficiency claims rely on hardware-agnostic metrics such as FLOPs, ensuring fair comparisons that are unbiased by specific hardware or software.
To address your feedback, we will label all energy-related claims as estimates, and we will add a paragraph in the discussion (section 5) on this limitation.
- Energy consumption is first and foremost a property of the hardware…
Thank you for this important comment. Indeed, in conventional hardware, memory access and data movement dominate energy costs.
To not lose generality, in our work, we wish to present hardware-agnostic estimates of energy consumption. FLOPs allow us to do that, as they do not depend on hardware specifics. Importantly, as shown in [2], FLOPs are a good proxy for memory costs as well, since they scale proportionally to the number of operations.
Additionally, our model would see further memory efficiency gains when deployed on neuromorphic hardware, which, unlike conventional CPU/GPUs, has efficient SRAM memory integrated with processing units. The integrated memory, combined with SNNs' compact binary transmission signals, which minimize communication overhead (unlike ANNs, which transmit floats), would amplify our model's energy efficiency gains beyond what is reflected in our FLOP-based calculations.
In summary, while we acknowledge that energy costs are hardware-dependent, our FLOP-based estimates provide a fair and implementation-agnostic way to capture costs not only associated with computation but also memory movement. We will emphasize this in Appendix E.2.
- The parameter size of the proposed transformer model make an implementation in a realistic BCI interface questionable.
Thank you for raising this important concern.
Spikachu contains <4M trainable parameters (therefore, also < 4M synapses) and ~10K LIF neurons. This is well within the capabilities of current neuromorphic chips. Loihi 2 supports 1M neurons and 120M synapses. Darwin 3 supports 2.35M neurons and 100M synapses. Their die sizes are 31 mm² and 358 mm², respectively, so Spikachu can fit on a chip < 31 mm². Importantly, this form factor is well within the physical constraints of fully implantable BCIs, which fit within the surface area of a U.S. quarter (~462 mm²) [7].
In addition, prior works have deployed SNN-based neural decoders. The authors of [2] successfully deployed a 4-layer spiking MLP with ~1K neurons on neuromorphic hardware for BCI. Similarly, in [5], a real-time, 20K-neuron SNN decoder was used on two monkeys in closed-loop BCI experiments.
Thus, from a size and capacity standpoint, Spikachu can already be accommodated by current neuromorphic platforms. We will add these points to the discussion (section 5) in revisions.
- The authors do not measure the actual energy consumption of the models…
Thank you for this insightful comment. We attempted this, please see our reply to Question 1.
Questions
- What energy per sample for the tested model on the GPU you are running on?
Thank you for this insightful question.
We attempted to measure real-time energy using both PyNVML and Zeus packages on NVIDIA 2080Ti and RTX 4090 GPUs. However, we found that the results on both GPUs were inconsistent and not repeatable. This is expected with GPU power profiling, a task complicated by factors such as dynamic power scaling, memory management, and coarse power sampling rates (only 1 Hz for NVML), which introduce significant noise.
In summary, while we agree that measuring real-world energies would have been insightful, we unfortunately cannot accurately perform these measurements on our available hardware. We are happy to share details of our scripts if it would be helpful.
We also note that it would be more appropriate to measure these energy costs on neuromorphic hardware, rather than on a GPU (see also our response to limitation 2 of reviewer Bmuc).
- Can you describe the spiking neuron model and gradient computation explicitly for the case (T = 1) that you are using…
We appreciate your attention to this technical point.
As explained in our response to Weakness 2, the standard LIF neuron dynamics described in Appendix A.1 apply to our model without simplification. Gradient computations follow the standard Back-Propagation Through Time (BPTT), as implemented in SpikingJelly [6].
- Can you estimate the chip size (based on memory requirements) that would be required to implement your model…
Thank you for this pertinent question. With current technology, a chip < 31mm² is enough. We refer you to our reply to Weakness 5 for more details.
- Can you explicitly estimate the cost of memory operations under ideal conditions?
Thank you for this insightful suggestion.
To address this, we explicitly estimated the cost of memory operations under ideal conditions for Spikachu and baselines on the data described in section 4.2. Following [2], we assumed 3 loads and one store per MAC and two loads and one store per AC. The results are summarized below.
| Model \ Ops | STORE | LOAD | Total |
|---|---|---|---|
| MLP | 7.93 M | 2.65 M | 10.58 M |
| GRU | 7.61 M | 2.54 M | 10.15 M |
| LSTM | 9.85 M | 3.29 M | 13.14 M |
| POYO | 1.40 G | 467 M | 1.86 G |
| Spikachu | 4.47 M | 1.75 M | 6.22 M |
Clearly, Spikachu requires the fewest memory operations across the board. We will include these results in revisions.
Limitations
- The claimed energy efficiency advantages are estimates…
Thank you for this constructive criticism. We will address this in the discussion section and refer you to our reply to Weakness 3.
- The authors should clearly discuss the limitations of the energy estimates…
We kindly refer you to our replies for Weaknesses 3 and 4 and Question 4, and reiterate our pledge to clarify the text to reflect the estimated nature of our energy findings.
Bibliography
[1] Azabou et al., A Unified, Scalable Framework for Neural Population Decoding, 2023
[2] Liao et al., An Energy-Efficient Spiking Neural Network for Finger Velocity Decoding for Implantable Brain-Machine Interface, 2022
[3] Zhu et al., Autonomous Driving with Spiking Neural Networks, 2024
[4] Lv et al., SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation, 2023
[5] Dethier et al., Spiking neural network decoder for brain-machine interfaces, 2011
[6] Fang et al., SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence, 2023
[7] Musk et al., An Integrated Brain-Machine Interface Platform With Thousands of Channels, 2019
We appreciate your acknowledgement of reading our rebuttal, and kindly request that, if there are any unaddressed concerns or open questions, you reach out to us for further discussion. We are also glad to remark that the deadline for this phase has been extended to Aug 8, 11:59 pm AoE, and we remain at your disposal for any clarifications needed until then!
The paper presents Spikachu, a novel neural decoding framework using Spiking Neural Network. It is designed to address scalable, causal, and energy-efficient decoding in brain machine interface. It shows competitive decoding performance and dramatic improvements in energy efficiency.
优缺点分析
Strengths
- Offers a compelling perspective to address the limitations of current BCI systems, such as energy efficiency, causality and real time processing
- Presents a novel method to integrate spiking neural networks with attention mechanism to improve energy efficiency without sacrificing performance
- Demonstrates promising transfer learning capabilities across session, subjects and task, which is an important step toward scalable neural decoding in real life
Weaknesses
- Lack of justification why ANN harmonizer is used here, instead of using SNN end-to-end, given the paper’s emphasis on efficiency
- Lack of ablation studies to isolate the performance contributions of the ANN harmonizer versus the SNN components
- Lack of representation analysis to see if any structure emerges in the learned session embeddings, for example, clustering by subjects or task conditions
问题
- In Figure 3, why does Spikachu-mp result in lower energy consumption than single-session training, despite involving a larger dataset?
- In Figure 5B, decoding performance fluctuates rather than improves with additional training data. Can the authors comment on possible causes?
- In Figure 6, why does transfer learning to a new task yield performance comparable to training from scratch?
- Are the new-task generalization results cross-validated across sessions or trials? What are the confidence intervals?
- Have prior works shown neural decoding or BCI devices on neuromorphic hardware? How might Spikachu translate to such platforms?
局限性
- Energy consumption estimates mainly focus on FLOPs, ignoring other costly operations such as memory access
- Lack of real life comparison between Spikachu and its ANN baselines in both accuracy and energy consumption (instead of simulated energy consumption)
最终评判理由
The authors’ rebuttal has resolved my concerns through detailed explanations and additional analysis. I have updated my score accordingly.
格式问题
No formatting concerns
Thank you for your thoughtful feedback. We are excited to read that our approach offers “competitive decoding performance and dramatic improvements in energy efficiency” that represent “an important step toward scalable neural decoding in real life.”
Weaknesses
- Lack of justification why ANN harmonizer is used here, instead of using SNN end-to-end, given the paper’s emphasis on efficiency.
Thank you for the constructive criticism. We agree that an end-to-end SNN architecture would be favorable.
However, such a design would limit training to single sessions. To scale to multi-session data (where neural input dimensions vary across sessions and subjects), we require a mechanism to project inputs into a shared, fixed-dimensionality latent space. The ANN harmonizer fulfills this role via cross-attention with learnable queries to map variable inputs to a fixed-dimensional latent.
We resorted to implementing the harmonizer with ANNs because cross-attention with learnable queries has not been translated to SNNs yet. Fortunately, the ANN-harmonizer can be easily replaced once spiking cross-attention is developed.
We will make sure to better motivate the ANN harmonizer in section 3.1.
- Lack of ablation studies to isolate the performance contributions of the ANN harmonizer versus the SNN components
Thank you for the suggestion. Please refer to Appendix F.2 for a comprehensive ablation study. The results of ablating the ANN harmonizer are summarized below:
| Model\R² | CO | RT |
|---|---|---|
| Spikachu | 0.8399 | 0.6762 |
| Spikachu w/o Harmonizer | 0.8344 | 0.6771 |
Clearly, our model did not rely on the ANN harmonizer to perform.
- Lack of representation analysis to see if any structure emerges in the learned session embeddings, for example, clustering by subjects or task conditions.
Thank you for this constructive suggestion. To address this, we conducted a clustering analysis on the embedding space of pretrained Spikachu-mp.
Spikachu-mp does not have explicit session-specific embeddings; however, it does learn unit embeddings for each electrode of each session. To investigate whether structure emerged in the latent space, we aggregated the unit embeddings of each session into a 2D matrix and applied PCA, retaining the first five PCs as a summary representation for each session. We then used these session-level representations as features in two distinct Linear Discriminant Analyses: one to assess separability by subject (monkeys C, J, and M) and another by task (CO vs RT). Visualizing these projections into the LDA space revealed clear clustering both by subject and by task.
To quantify the separability, we trained an SVM classifier using 5-fold cross-validation with features being the summary representations of each session (the PCs of the aggregated unit embeddings) to predict (1) the subject and (2) the task associated with each session. We achieved an accuracy of 0.69 ± 0.02 for subject (chance = 0.33) and 0.79 ± 0.02 for task classification (chance = 0.5).
Those results suggest that structure emerges in the latent space of our model. We are prohibited from including figures in the rebuttal but we promise to do so in revisions.
Questions
- In Figure 3, why does Spikachu-mp result in lower energy consumption than single-session training, despite involving a larger dataset?
Thank you for this insightful question. We identified that pretraining across multiple sessions and subjects produces a model with sparser activations which promote energy savings (see sections 4.5 and Appendix F.1).
Note that energy savings refer to inference, not training. In fact multi-session training is more resource intensive than single-session training but yields a model with more sparsely activated neurons, which in turn requires less energy during inference. We will further clarify this in section 4.5 in revisions.
- In Figure 5B, decoding performance fluctuates rather than improves with additional training data. Can the authors comment on possible causes?
Thank you for this insightful observation.
We attribute this to the fact that different levels of pretraining included different recording sessions. Naturally, the decoding performance of our model was better for some sessions than others (simultaneously across all conditions, i.e. training from scratch, finetuning, transferring), which explains why the model performance seems to fluctuate.
However, this should not be interpreted as a fluctuation in the performance gain offered by pretraining. Inspection of the difference between the colored bars (finetuning/transferring) and gray bars (training from scratch) in Figure 5B shows that the performance gains correlate with the amount of pretraining data. For enhanced clarity, we plotted the performance gain in Figure 8B.
We are happy to include Figure 8 in the main text if you think it would help.
- In Figure 6, why does transfer learning to a new task yield performance comparable to training from scratch?
Thank you for this insightful question. We note that transferring did boost performance: it accelerated learning by 2.33x and improved energy efficiency by 3.03%.
Regarding the lack of decoding performance improvement, we attribute this to the substantial domain shift between the pretraining and transfer conditions. Spikachu-mp was pretrained on the CO and RT data from Perich et al. (2018), all collected from the same lab under the same experimental conditions. The transfer was performed on MC-RTT from Pei et al. (2021), which was collected in a different lab with a different experimental setup (e.g., touchscreen-based instead of manipulandum). Given this substantial domain shift, the representations learned by pretraining did not exactly match those of the target domain, which explains why decoding performance did not improve.
- Are the new-task generalization results cross-validated across sessions or trials? What are the confidence intervals?
Thank you for this thoughtful suggestion. We have now conducted 3-fold cross-validation on transferring to both the MC-RTT and MC-Maze. The results are shown below.
| Task \ R² | Split 1 | Split 2 | Split 3 | Mean | 95% CI |
|---|---|---|---|---|---|
| MC RTT | 0.57 | 0.50 | 0.54 | 0.536 | ±0.0324 (±6.0%) |
| MC Maze | 0.79 | 0.82 | 0.83 | 0.813 | ±0.0192 (±2.4%) |
Spikachu-mp transferred consistently across all splits. We will incorporate these results in the revised manuscript.
- Have prior works shown neural decoding or BCI devices on neuromorphic hardware? How might Spikachu translate to such platforms?
Thank you for this insightful question. As discussed in Section 2 (lines 103–109), prior studies have successfully deployed SNN-based neural decoders in closed-loop BCI experiments. Here we highlight two compelling demonstrations:
- Dethier et al. (2013) deployed an SNN neural decoder on neuromorphic hardware and used it in closed-loop, online experiments with two monkeys.
- Leone et al. (2023) implemented an SNN neural decoder on Intel’s Loihi chip and showed that it matched the performance of larger ANN models.
Considering that Spikachu's complexity is on par with the models used in these works, we believe it would be feasible to deploy Spikachu on similar neuromorphic platforms (e.g. Intel's Loihi 2). For quantitative insights, see our response to weakness 5 of reviewer dGKH.
Limitations
1.Energy consumption estimates mainly focus on FLOPs, ignoring other costly operations such as memory access.
Thank you for raising this insightful concern. Here we will clarify our rationale for focusing on FLOPs as a proxy for energy consumption.
- Using FLOPs as an energy proxy is a widely accepted practice in literature. Many influential works [Zhu et al. (2024; Autonomous driving with Spiking Neural Networks); Lv et al. (2023; SpikeBERT); Zhou et al. (2022; Spikformer)] rely on FLOPs to measure energy costs. By doing the same, we ensure comparability with previous works.
- FLOPs provide an implementation-agnostic metric for assessing computational cost, avoiding variability due to specific hardware or software.
- FLOPs serve as a proxy not only for computation but also for memory usage since memory access costs for MACs and ACs are linearly related to FLOPs. Based on Liao et al. (2022; An energy-efficient spiking neural network for finger velocity decoding for implantable brain-machine interface), standard practices assume a fixed number of memory loads and stores per MAC and AC.
- Our target deployment is on neuromorphic hardware (e.g. Loihi 2) which does not rely on Von Neumann hardware. These chips employ local SRAM-based memory, which is far more energy efficient than DRAM used in conventional CPU/GPUs, effectively minimizing memory transfer costs.
For a quantitative analysis of memory costs for Spikachu and baselines, please refer to question 4 of reviewer dGKH.
- Lack of real life comparison between Spikachu and its ANN baselines in both accuracy and energy consumption (instead of simulated energy consumption).
We appreciate your constructive criticism and echo that real-world validation of Spikachu is a critical next step (as mentioned in lines 358-363).
That said, a real-life comparison between Spikachu and baselines is extremely challenging as it would require customizing a neuromorphic chip to support both Spikachu and the ANN baselines. Given the limited access to neuromorphic hardware and that their software ecosystems are underdeveloped, this effort would be a separate publication in itself. As such, we prioritized rigorous computational comparisons to ensure fair and reproducible results. However, we hope our work inspires collaboration between neuroscience and hardware communities to address these barriers in future work. We will further expand on this in our discussion (section 5).
I thank the authors for their thoughtful and detailed rebuttal. The additional experiments and clarifications address many of my initial concerns, including ablation, representation and generalization to new task. I have two follow-up questions for clarification:
-
"summary representations of each session... to predict (1) the subject and (2) the task associated with each session. We achieved an accuracy of 0.69 ± 0.02 for subject (chance = 0.33) and 0.79 ± 0.02 for task classification (chance = 0.5)." Is it expected for subject decoding accuracy to be high, given that the model aims to harmonize neural activity across subjects?
-
In Figure 8c, the error bar for the orange bar at 20 sessions appears disproportionately large compared to the others. Could the authors comment on whether this is due to high variance, a visualization artifact, or some other factor?
Thank you for following up with more feedback and for engaging in the discussion! We are excited to read that our additional experiments and clarifications address many of your initial concerns including the “ablation, representation and generalization to new task".
Please find our responses to your follow-up questions below.
- "summary representations of each session... to predict (1) the subject and (2) the task associated with each session. We achieved an accuracy of 0.69 ± 0.02 for subject (chance = 0.33) and 0.79 ± 0.02 for task classification (chance = 0.5)." Is it expected for subject decoding accuracy to be high, given that the model aims to harmonize neural activity across subjects?
Thank you for this thoughtful question.
Since our model was not explicitly trained to produce any structured organization in its latent space, we did not have a strong prior expectation regarding per-subject separability before conducting this analysis. That said, we were not entirely surprised by the observed clustering. Intuitively, it makes sense that unit embeddings from the same subject, collected across multiple sessions, would tend to be more similar to each other than to embeddings from different subjects. This natural similarity likely leads to the emergence of subject-specific clusters in the latent space.
A related observation was made by Azabou et al. (2023; A Unified, Scalable Framework for Neural Population Decoding), who analyzed the latent space of their pretrained model’s session embeddings. Like our work, their model was not trained to enforce any explicit structure in the latent space. Their pretraining included data from monkeys C, J, and M from Perich et al. (2018; A neural population mechanism for rapid learning), the same monkeys we used to pretrain Spikachu-mp. Their analysis revealed the emergence of subject-specific clusters with some degree of overlap (see Figure 5D of their manuscript), though they did not quantify separability.
In light of these findings, our observation of per-subject clustering is not entirely unexpected. The fact that our clusters are not perfectly separable (as evident by the classification accuracy being at 0.69 ± 0.02) further supports the idea that while embeddings from the same subject are generally more similar, there is still some overlap between subjects in the latent space.
Thank you again for raising this insightful point. We would also be very interested to hear your perspective on whether subject-level separability should be expected and we look forward to including both the representation analysis as well as this follow-up discussion in our revised manuscript.
- In Figure 8c, the error bar for the orange bar at 20 sessions appears disproportionately large compared to the others. Could the authors comment on whether this is due to high variance, a visualization artifact, or some other factor?
Thank you for your careful attention to detail.
The error bar for the orange bar at the 20-session pretraining level appears larger than the others because of the high variance in the percent difference in energy per inference within this group. This group is relatively small, consisting of transfers to the six RT sessions for the held-out monkey T.
For each of these sessions, the percent difference between the energy of the transferred and from-scratch trained models was:
- -0.067428
- -0.175149
- 0.208243
- -0.030774
- 0.051188
- -0.044045
These values yield a mean difference of 0.96 % with a SEM of 5.28 %, which produces a visually large error bar given the scale of the plot. We intentionally chose this scale to accommodate the full range of data across all bars in Figure 8C, which makes this particular error bar appear disproportionately large.
In summary, the large error bar reflects the high variance in this small group, rather than an anomaly in our results. We hope this clarifies your concern, and we are happy to provide additional details if helpful.
I thank the authors for the detailed follow-up and clarifications. Their responses addressed all of my remaining concerns. I am satisfied with the responses and have raised my score accordingly.
Dear Reviewers,
Thank you for taking the time to review our manuscript and for your meaningful feedback. We really appreciate it and believe our work has become stronger because of your suggestions!
However, as of now (11am EST, Aug 4th), we have not heard back from you regarding our rebuttal. With the discussion period concluding in less than three days, we would like to kindly encourage your engagement, so that a meaningful discussion can begin in a timely manner. Doing so would give us sufficient opportunity to address any follow-up questions or concerns you might have.
We really appreciate your feedback and look forward to improve our work based on your suggestions.
The paper introduces Spikachu, a causal, scalable, and energy-efficient spiking-neural-network (SNN) framework for neural decoding in BCIs. It processes binned spikes directly, learns shared latent representations with spiking modules (including spiking self-attention over electrodes), and decodes behavior online.
Key components and clarifications made during review: A multi-scale SNN module (non-Transformer) is highlighted as the largest contributor to accuracy; a causal “harmonizer” enables scalable multi-session training and is shown to plug into MLP/GRU baselines; spiking self-attention operates across electrodes (temporal relations handled by neuron state). Authors stress causality vs popular non-causal decoders (POYO/LFADS/CEBRA). Against a causalized POYO, Spikachu wins by 4.37% (CO) and 11.32% (RT). An ANN ablation (Spikachu-ANN) shows large drops (CO 0.84 to 0.53; RT 0.68 to 0.36), indicating performance relies on spiking dynamics rather than architecture alone. Energy numbers are estimates (hardware-agnostic, FLOP-based) and will be labeled as such.
Strength
- Clear motivation for causality and online use; wide coverage across animals/sessions/tasks.
- Authors provide a store/load ops table to reason about memory movement (dominant in energy), not just MACs.
- Feasibility argument for neuromorphic deployment. Model has <4M parameters and ~10k LIF neurons; fits well within Loihi-2 capabilities and even within a practical implant form factor (<31 mm^2).
Weakness
- All energy results are estimates; the attempted GPU profiling was unstable, and no neuromorphic measurements are reported. This tempers the otherwise strong efficiency story.
- In standard offline settings, POYO can outperform Spikachu; while the causalized comparison is fair for real-time BCIs, the paper should more clearly delineate offline vs online tradeoffs
- The initial write-up confused some reviewers about temporal dynamics; the rebuttal clarifies streaming-bin dynamics, but the camera-ready must make this explicit.
- While the representation clustering is promising, the paper could better connect learned features to neurophysiological structure and include closed-loop experiments to demonstrate end-to-end BCI utility (acknowledged as future work).
Recommendation I recommend Accept (poster). The paper addresses an important practical gap: causal, efficient neural decoding with substantive algorithmic contributions and broad empirical evidence. The authors engaged constructively during rebuttal, adding causalized baselines, crucial ablations, representation analyses, and memory-ops accounting. Although energy is not directly measured, the hardware-agnostic framing, large performance gaps over causal baselines, and strong scaling/transfer results make this a valuable contribution to both SNNs and BCI decoding. Two reviewers gave an accept and one reviewer provided a reject. Reviewer QBku (reject) Concerns: views work as engineering with hypothetical energy savings; wants real measurements and stronger novelty beyond assembling known pieces. My weighing: While the request for measured energy is fair, the paper’s causal framing, multi-scale SNN design, spiking-attention, scaling/transfer, and clear causal-baseline wins constitute meaningful scientific contribution. With the rebuttal clarifications and added analyses, I find the evidence sufficient for acceptance, acknowledging the limitation on measurement.