5.3

/10

Poster4 位审稿人

最低4最高7标准差1.1

3.8

置信度

正确性2.8

贡献度2.5

表达3.0

NeurIPS 2024

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

Hao-Lun Sun,Lei Hsiung,Nandhini Chandramoorthy,Pin-Yu Chen,Tsung-Yi Ho

OpenReview PDF

提交: 2024-05-04更新: 2024-11-06

TL;DR

NeuralFuse provides model-independent protection for AI accelerators built on a chip, allowing them to maintain stable performance when suffering low-voltage-induced bit errors.

摘要

关键词

machine learningenergy efficient inferencebit error resilience

评审与讨论

审稿意见

评分: 4置信度: 42024-07-12

The paper proposes a method to handle the errors when reducing the voltage of SRAM to reduce the power consumption. The method is based on preprocessing the input data into error-resilient forms. Experiments show a tradeoff with a reduction of 24% of power and 2-3% accuracy loss in CIFAR10-ResNet settings.

优点

The reduction of SRAM power is appreciated to lower the cost.
The training method to recovered accuracy at 1% weight bit errors is also important.

缺点

Reduction of voltage of SRAM requires hardware access and will cause more errors than the ideal uniform bit flipping errors. I think tolerating 1% error is useful and interesting, but the motivation might be from more than lowering the SRAM voltages.
I did not catch what if the weights in the generator (preprocessing) of the image has bit errors or not. Is it? If the generator network is assumed error free, then it seems not consistent with the overall system configuration.
By reducing the power by 30%, the accuracy loss of CIFAR10 dataset is 2-3%. The worth of the tradeoff is debatable. What if I use a 30% smaller network and use full power of SRAM? In the experiment, clearly ResNet50 and ResNet18 has the same clean accuracy, but ResNet18 is 3 times smaller. So one could easily achieve 3x power reduction by using ResNet18 instead of ResNet50. More rigorous experiment setup needs to be considered to validate the motivation of the paper: Why one would make tradeoffs by reducing SRAM's voltage to suffer bit-errors rather than using a smaller network? This is my major concern.

问题

The questions are listed in the "weakness"

局限性

n/a

作者回复

2024-08-07

We thank the reviewer for your effort and time in reviewing our paper. Our responses to your concerns are as follows:

1. (More Motivation) First of all, thank you for recognizing the value of our work in recovering accuracy under 1% bit errors. In fact, while energy savings are a beneficial aspect of our approach, the primary motivation behind NeuralFuse extends beyond merely lowering SRAM voltages. Our main goal is to provide a robust error protection mechanism that can be employed in scenarios where voltage instability is a concern.

For instance, the model runs at the minimum bit error-free voltage ( $V_{min}$ ) most of the time. NeuralFuse can be activated as a temporary error protection mechanism only when the system encounters voltage instability or other unforeseen scenarios inducing bit errors. This ensures the reliability of the model's output during such critical periods, maintaining the system's overall performance and robustness. This consideration also addresses broader scenarios where bit errors might occur due to other factors such as environmental variations, component aging, or transient faults. By focusing on error resilience, NeuralFuse can be applied in various contexts where maintaining model accuracy under non-ideal conditions is essential.

2. (Generator Errors) To ensure that NeuralFuse functions correctly, in our experimental setting, we assume that NeuralFuse operates at nominal voltage, meaning it should be error-free. Previous research has demonstrated that the integration of multiple chip units can be implemented with different voltages [1]. Therefore, regarding concerns about 'inconsistency in the overall system configuration,' it can be argued that although the voltage settings of different parts may be different, such system design and configuration is feasible and ensures that NeuralFuse is error-free during operation.

3. (Small Network) The reviewer's intuition is correct. Indeed, using a smaller network such as ResNet18 instead of ResNet50 can achieve significant power savings due to the reduced model complexity. However, as noted above, NeuralFuse could be much more useful in circumstances where bit errors are unexpected. In real-world applications, voltage instability or other transient conditions can introduce bit errors unpredictably. NeuralFuse provides a robust solution that can be activated dynamically in response to such errors, ensuring the reliability of model output during critical periods.

Beyond handling bit errors, our approach can also mitigate accuracy drops due to precision loss. In Section 4.6, our results demonstrate a promising use case in dealing with unseen bit-quantization errors. This capability broadens NeuralFuse's applicability, making it valuable in scenarios where precision constraints are relaxed to save power, but accuracy still needs to be preserved.

Therefore, in terms of efficiency, the system designers can use either smaller models or apply a low-voltage regime with NeuralFuse to achieve desired energy efficiency. In terms of robustness, NeuralFuse can act as an insurance mechanism, assuring model performance during critical periods when voltage instability or other factors induce bit errors. This reliability is particularly important for safety-critical applications, where maintaining model accuracy is essential despite adverse conditions.

[1] Rotaru et al. Design and development of high density fan-out wafer level package (HD-FOWLP) for deep neural network (DNN) chiplet accelerators using advanced interface bus (AIB). (ECTC 2021)

评论- Looking Forward to Discussing with You

2024-08-14

Dear Reviewer LgKi:

As the discussion period approaches, we want to check if our rebuttal has addressed your concerns. We greatly value the time and effort you have dedicated to reviewing our work and are eager to address any additional concerns or suggestions you may have.

To summarize our rebuttal:

We highlighted that NeuralFuse’s primary motivation extends beyond just reducing SRAM voltage. It’s designed to offer robust error protection in scenarios involving voltage instability, environmental variations, or transient faults, ensuring system reliability.
Regrading generator errors, we clarified that NeuralFuse operates error-free at nominal voltage, as supported by previous research, ensuring consistency within the system’s design.
Last, we acknowledged the possibility of using smaller networks but emphasized that NeuralFuse provides additional robustness during unexpected bit errors, making it especially valuable for safety-critical applications.

Please refer to our rebuttal for more detailed explanations. Feel free to let us know if there are any further questions, comments, or suggestions. We are more than happy to incorporate your feedback into the revision process.

Thank you once again for your time and consideration.

Yours Sincerely,

Authors

2024-08-14

Dear Reviewer LgKi,

Thank you for your time and effort in reviewing our work, and we really appreciate your support.

There are only a few hours left before the rebuttal deadline, and we would like to know whether our responses successfully address your concerns. Please also let us know if you have other concerns!

Warm regards,

Authors

审稿意见

评分: 5置信度: 32024-07-12

This paper presents NeuralFuse, a module that produces error-resistant data representations by learning input transformations in order to solve the accuracy loss of deep neural networks (DNNs) brought on by low-voltage-induced bit errors in SRAM, allowing DNNs to continue operating accurately even at low voltage without the need for model retraining. NeuralFuse was tested on multiple models and datasets, and it showed that it could recover accuracy by reasonable margin and save SRAM access energy. It supports two scenarios: restricted access, which trains using a white-box surrogate model, and relaxed access, which permits backpropagation. NeuralFuse exhibits robustness against low-precision quantization and is transferrable and adaptable. The authors argue that this development may be helpful for edge devices and on-chip AI.

优点

The proposed NeuralFuse operates as an add-on module, meaning it can be integrated with existing DNNs without requiring modifications of the base models. This non-intrusive approach makes it applicable to various models and scenarios, including those with limited access to model internals like cloud APIs
The results demonstrate high robustness to low-voltage-induced bit errors and low-precision quantization. It shows reasonable performance recovery across different datasets and architectures and its high transferability and adaptability to unseen models
Experimentation shows that during low-voltage operation, it can recover performance drop due to bit error with energy efficiency as fringe benefit

缺点

Authors assume that NeuralFuse module itself is bit error free due to low voltage of SRAM, claiming that its function can be performed by general purpose core operating in nominal voltage. However, in that case, the latency of running this module will be order of magnitude higher during inference time, and the total power consumption might even be higher than base model. If instead it is run on SRAM, then it itself should be vulnerable to bit error, whose analysis is not done. From the writing, it may be assumed that the energy and latency calculation in Appendix C and D is done assuming the proposed module is in SRAM, which will change drastically if we assume that Neuralfuse operation is performed by CPU.
It is seen from Figure 3 and Table-1 that in bigger models (e.g: ResNet-50), the standard deviation of performance is very high for different random test bit error patterns and furthermore, the smaller the generator architecture, the worse the performance in general. So, to scale up the performance, bigger generators might be needed, which in turn will be harder to optimize and more resource consuming.
The authors claim that their approach has advantage over previous methods because it does not retrain base model, however, the training of generator itself has been shown to be equally hard and time-consuming, which may scale up for bigger base models or generators, so the advantage here is not very apparent.
If the base model is running in nominal voltage, then there should not be any bit error and in that case, the added NeuralFuse module will only increase latency and power loss with performance decrease added to it due to transformation of input to base model. So, it is only logical to adopt an adaptive approach where this module is only applicable when the SRAM voltage goes below minimum required voltage

问题

Refer to the weakness section.

局限性

I have raised some concerns in the weaknesses section. Those are some possible limitations of this work. The authors may work on these points to overcome the limitation of their work.

作者回复

2024-08-07

We thank the reviewer for your effort and time in reviewing our paper. Our responses to your concerns are as follows:

1. (Latency) We understand the reviewers' concerns. However, we respectfully disagree that our evaluation would change drastically if we assume that Neuralfuse operation is performed by CPU. This is because the additional consumption of a CPU is influenced by various factors such as different CPU architectures, instruction sets, processes (14nm or 3nm) or even manufacturers. Therefore, in this paper, we simplify the confounding factors and merely evaluate the latency increased by NeuralFuse. Our experimental results based on SRAM have already achieved notable results. Nevertheless, we acknowledge that these are important factors, and we will discuss them further in the revision.

2. (Big/small NeuralFuse) In practice, the choice of the base model and the NeuralFuse generator depends on the problem to be solved. In current applications, achieving better performance often requires more training duration and larger models, so we believe this is an inevitable but acceptable issue.

3. (Training of NeuralFuse) Regarding the issue of model retraining, we believe it is important not only to consider the time-consuming nature of the training process but also to assess the sensitivity of the retrained model to hyperparameters, which can make training easily fail. Previous papers [1] have mentioned that using adversarial weight training with all vulnerable weight-bit combinations is not a feasible approach. Therefore, we believe our NeuralFuse techniques still have significant advantages than retraining one [2].

4. (Applicable Scenario) This is really a well-thought-out concern. As noted by Reviewer aRGH, practitioners might want to avoid the scenarios where NeuralFuse would be useful because NeuralFuse alone can alleviate low-voltage inference challenges to some notable extent. However, in our paper, we consider NeuralFuse to be an add-on module, meaning that NeuralFuse can be activated not only in a low-voltage regime but in both nominal- & low-voltage regimes. Although activating NeuralFuse in a nominal-voltage regime may incur the additional costs of accuracy degradation and latency, it helps protect the module from an unstable power supply and mitigate the bit-error-induced accuracy drop.

Nevertheless, from a hardware perspective, it is also feasible to enable NeuralFuse only when the main module is running in low-voltage regimes and disable it in nominal-voltage settings. This ensures that, in nominal-voltage regimes, NeuralFuse does not introduce any energy consumption due to the additional latency and SRAM space requirements.

[1] He et al. Defending and harnessing the bit-flip based adversarial weight attack. (CVPR 2020)

[2] Stutz et al. Bit error robustness for energy-efficient dnn accelerators. (MLSys 2021)

2024-08-11

I appreciate the author's response. While I increased my rating, I would still be inclined toward rejection. The reason is the performance gap with a smaller generator and the need for retraining, which undermines the paper's claim that it does not require retraining, leading to an efficient approach. Indeed, a larger generator (which may be necessary for performance) would incur training costs and latency.

2024-08-11

We thank the reviewer for the feedback. However, there may be some misunderstandings, as our work aims to not retrain the base model. If the reviewer is referring to "training of NeuralFuse generators," we would like to point out that the NeuralFuse generators considered in our experiments are relatively small compared to the size of the deployed base model (see Table 7 in Appendix C) and would be more efficient to train. Furthermore, it is difficult to retrain the base model in all cases; this is also why we use a white-box surrogate model for the restricted access scenario to demonstrate that NeuralFuse is highly transferable to different base models. In other words, there is no free lunch. It is impossible to have such a protection module with zero training cost. To the best of our knowledge, our proposed NeuralFuse (a plug-and-play module) is the best practice for reducing the retraining cost.

We sincerely appreciate the reviewers' feedback and are just one post away from answering any follow-up questions you may have. We look forward to your feedback.

2024-08-14

Dear Reviewer e1Hd,

Thank you for your time and effort in reviewing our work, and we really appreciate your support.

We would like to know whether our responses successfully address your concerns. Please also let us know if you have other concerns!

Warm regards,

Authors

审稿意见

评分: 5置信度: 42024-07-12

The paper presents NeuralFuse, a novel approach to address the accuracy degradation of deep neural networks (DNNs) in low-voltage regimes. The core idea is to learn an input transformation module that can generate error-resistant data representations, thereby protecting DNN accuracy even when bit errors occur due to voltage scaling. The proposed method is model-agnostic and doesn't require retraining of the deployed DNNs, making it suitable for access-limited scenarios like cloud-based APIs or non-configurable hardware. Experimental results demonstrate that NeuralFuse can significantly recover accuracy while achieving energy savings.

优点

Novelty and Practicality: The paper introduces a new perspective on mitigating the impact of bit errors in low-voltage DNN inference by focusing on input transformation. This approach is model-agnostic and doesn't necessitate retraining, making it practical for real-world scenarios where model access is limited.
Effectiveness: The experimental results across various datasets, DNN models, and NeuralFuse implementations showcase the effectiveness of the proposed method in recovering accuracy and achieving energy savings. The paper also demonstrates the versatility of NeuralFuse in handling low-precision quantization and adversarial weight perturbation.
Thorough Evaluation: The paper provides a comprehensive evaluation of NeuralFuse, including ablation studies, comparisons with baselines, and qualitative analysis, which strengthens the validity of the claims.

缺点

Limited Evaluation on Complex Models and Tasks: The evaluation of NeuralFuse is primarily focused on image classification tasks using CNN-based models. It would be beneficial to assess its performance on more complex tasks, such as object detection or natural language processing, and with different types of neural networks, such as Transformers or RNNs, to ensure its broader applicability.
Lack of Comparison with Post-Training Quantization: The paper demonstrates the effectiveness of NeuralFuse in recovering accuracy loss due to quantization. However, it would be valuable to compare its performance with post-training quantization techniques that also aim to reduce model size and energy consumption without retraining.
Applicability to Dynamic Voltage Scaling: The paper assumes a fixed low-voltage setting during inference. It would be valuable to explore the applicability of NeuralFuse in scenarios with dynamic voltage scaling, where the voltage might change during inference based on the workload or energy constraints.
Impact of NeuralFuse on Interpretability: Does the input transformation introduced by NeuralFuse affect the interpretability of the base model? It would be interesting to analyze how NeuralFuse impacts the ability to explain the model's decisions.

问题

Potential for Hardware Acceleration of NeuralFuse: The NeuralFuse module itself might introduce computational overhead. Could NeuralFuse be implemented or accelerated in hardware to minimize its impact on inference latency and energy consumption?
Limited Exploration of the Impact on Latency: While the paper acknowledges the latency overhead introduced by NeuralFuse, a more detailed analysis of its impact on real-time applications would be beneficial. It would be valuable to quantify the latency overhead for different NeuralFuse architectures and base models.
Assumption of Random Bit Errors: The paper assumes a random distribution of bit errors. However, in practice, bit errors might exhibit spatial or temporal correlations. It's worth investigating the robustness of NeuralFuse to different bit error patterns.

局限性

Potential for Adversarial Attacks on NeuralFuse: The paper doesn't discuss the potential for adversarial attacks specifically targeting the NeuralFuse module. It's worth investigating whether the input transformations introduced by NeuralFuse could be exploited to craft adversarial examples that bypass the error resistance mechanism.

作者回复

2024-08-07

We thank reviewer for recognizing the novelty, practicality, and effectiveness of our work. We address your comments in the following:

1. (Complex Models and Tasks) As an algorithm designer, we choose CNN-based models with classification, which may be a representative problem to prove our idea. As noted by reviewer aRGH, our approach aims to tackle the model’s bit-error from different angles, and one can easily extend our approach to various task-specific models. Due to the limited time of rebuttal, we might not be able to provide results on other tasks but would include our findings on these applications in the revision.

2. (Post-Training Quantization) We conducted an experiment that used a similar experimental setting in Section 4.6 and Appendix H. In this experiment, we apply post-training quantization to induce precision loss to the base model, which means that we do not apply quantization-aware training during base model training. The experimental results (see Table 27 in the attached file) show that our NeuralFuse generators can still recover the accuracy on the reduced-precision post-training quantization with 0.5% BER on the CIFAR-10 pre-trained model, despite the base model being more vulnerable to bit error attacks without quantization-aware training. This experiment demonstrates the robustness of our NeuralFuse, which is resistant not only to low voltage (bit errors) but also to precision loss (quantization).

3. (Dynamic Voltage Scaling) The reviewer’s suggestion is really interesting. However, unstable voltage can easily damage the chip or DNN accelerators. For instance, [1] mentioned that unstable voltage can cause chips or AI accelerators to break down easily. Additionally, recent Intel processors have also suffered from voltage issues that cause CPU damage [2]. To avoid any other side effects, we used fixed voltages in our experiments, which allows a more accurate evaluation of our method and reflects reality. On the other hand, if there are only slight voltage changes, since our NeuralFuse is trained using our proposed EOPM optimizer, it will not overfit to specific error patterns. Instead, it will learn the error distribution under the specific bit error percentage. In this scenario, even if error patterns change, it will not significantly affect final performance, as demonstrated by our experiments across ten different error models.

4. (Interpretability) These indeed are worth exploring, and in fact, we have conducted some analyses regarding the interpretability of the base model. In Appendix K, we demonstrate the output distribution at the final linear layer of the base model under three scenarios: 1) the clean base model without errors, 2) the perturbed base model with random bit errors, and 3) the perturbed base model with NeuralFuse. Based on the t-SNE visualization, we observed that the output distribution of the perturbed model is very chaotic. However, after applying NeuralFuse, the output distribution clearly groups into 10 classes. This indicates that NeuralFuse can indeed correct the outputs of the base model.

5. (Hardware Acceleration) The adoption of any special hardware for NeuralFuse will depend on how hardware manufacturers design the architecture. Previous literature [3] has mentioned that to accommodate the specific architectures of DNNs, IC design manufacturers can develop corresponding hardware to support these specialized computations. Therefore, we believe that NeuralFuse can reduce additional latency or energy consumption by designing/using these specialized hardware.

6. (Latency) In Appendix D, we have evaluated the latency overhead introduced by NeuralFuse. Although NeuralFuse brings a certain degree of extra latency, we deemed it an inevitable tradeoff for reducing energy consumption in our setting.

7. (Random Bit Errors) This is a great suggestion! In fact, as mentioned in our paper (line 121), bit-cell failures for a given memory array are randomly distributed and independent of each other; that is, the spatial distribution of bit-flips can be assumed to be random, as it generally differs from one array to another, within as well as between chips. Nevertheless, we run an additional experiment with non-uniform bit error to explore non-uniform bit-flipping scenarios. The table below shows the perturbed accuracy with a non-uniform / non-random attack (i.e., first/last layers were implemented at $V_{min}$ and others are sub- $V_{min}$ voltages). In this setting, the perturbed accuracy becomes higher than attacking the whole models due to less perturbed parameters in the models. The experimental results also show that our NeuralFuse can still recover the perturbed accuracy.

Base Model	Perturbed Acc	ConvL	DeConvL	UNetL
ResNet18	43.8%±12.4%	88.6%±0.8%	90.0%±0.4%	85.0%±0.5%
VGG19	41.5%±13.4%	86.0%±3.7%	85.8%±5.6%	84.3%±2.1%

8. (Adversarial Attacks on NeuralFuse) This is really an interesting idea. However, we respectfully disagree that this omission represents a limitation of our work, as we believe it fall outside the immediate scope of our current work. Our primary goal is to mitigate the effects of random bit errors induced by low-voltage SRAM operation through input pre-processing, which is a distinct challenge from adversarial robustness. Nonetheless, we agree that integrating adversarial robustness with NeuralFuse will further enhance the overall reliability and security of the system.

[1] When Does Poor Power Quality Cause Electronics Failures? [link]

[2] Instability Reports on Intel Core 13th and 14th Gen Desktop Processors [link]

[3] Zhang et al. SpArch: Efficient Architecture for Sparse Matrix Multiplication. IEEE International Symposium on High Performance Computer Architecture (HPCA 2020)

评论- Thanks for the rebuttal.

2024-08-13

I really appreciate the authors' response, which addresses my most concerns. I update the rating from 4 to 5. Thank you.

2024-08-13

We thank the reviewer for the encouraging response! We are glad our response could address your concerns. Thank you for the endorsement and recommendation of acceptance.

审稿意见

评分: 7置信度: 42024-07-13

The authors train an input pre-processing module which aims to counteract the effects of random bit errors induced by low-voltage SRAM operation. They demonstrate the ability to avoid most accuracy drops on a handful of CNNs while operating in a 0.5-1% error regime.

Disclosure: I have reviewed this paper in the past. I stand by much of my previous review, as the paper has not changed substantially.

优点

The paper takes a rather unique approach to trying to counteract bit errors. Instead of fixing the model or the hardware (both of which have been studied extensively, as the authors admit), they pre-transform the input to avoid stepping in areas of the input space which are vulnerable to random bit errors in the model weights.

The paper is clearly scoped. The authors avoid some common pitfalls with energy cost savings (e.g., they account for the costs of their own technique), and the explanation of end-to-end considerations was probably necessary for readers not used the HW energy measurement. Overall, I felt that the authors did a solid job of balancing practicality (don't want to get too far into SRAM design tweaks) and depth (the energy simulation setup seems like a very reasonable configuration).

The authors' actual learned model is fairly simplistic. Perhaps others might see this as a weakness, but as I see it, the benefit of this paper is in following the authors' perspective flip to its logical conclusion. If further work wants to attempt something more sophisticated for $\mathcal{G}(x)$ , great. Better surrogates for transfer? Sure. Plenty of room to follow-on later, if someone wants.

Ultimately, I see this work as a "perspective paper". The energy savings are not anything to write home about, but the approach is qualitatively different, and that makes it valuable. It's healthy to have a method that attacks the problem from a different angle, even if it doesn't quite stack up to state of the art. By analogy, it's worth exploring a very early automobile even if a horse can outrun it---maybe there's more down this road, maybe not. But the authors have at least shown that you can do something here, and the explanation and foundation they've provided is stable enough to build on.

缺点

The authors' approach is not competitive with most existing HW techniques addressing low-voltage operation. This is less bad than it sounds. NeuralFuse bolts a model on to the front of an existing not-optimized-for-robustness model and tries to make the best of things. If one can add low-voltage-aware hardware modifications or low-voltage-aware models, one should. This doesn't make NeuralFuse a bad idea, but it probably does mean that HW, SW, and model builders should be trying to avoid the scenarios where NeuralFuse would be useful. This ultimately limits the practical utility of the approach.

I was a bit underwhelmed by the actual reported energy/accuracy values achieved. One of the elephants in the room that the authors skirt around is the zero accuracy degradation scenario. In order for a real-world operator to accept any kind of accuracy degradation, the power savings must be enormous, usually measured in factors (i.e. a 5x or 10x reduction for a 1% error loss would be reasonable). In order to get good power savings, you really want to be aggressive lowering the voltage, but BER skyrockets with even small adjustments. So this is kind of the name of the game in low-voltage fault tolerance: it's a lot easier to get energy savings by allowing accuracy drop. But the test that usually separates wheat from the chaff is when you dial that down to zero measurable accuracy drop. You can always say "well there's a trade-off a user can adjust", but in this case, the "trade-off" is dominated by one end of the spectrum, and if there's not a lot of savings in that area, then it kind of condemns the approach.

问题

Feel free to address the zero accuracy degradation scenario above. (I'll note that there are plenty of other papers from the HW community that do solve this problem without accuracy loss, so while I accept that the authors' approach is different---and valuably so, I'm not willing to accept an argument that it's an unreasonably high bar. Just harder.)

局限性

Addressed above. No societal concerns.

作者回复

2024-08-07

Thank you so much for recognizing our work as unique and valuable and especially for pointing out its potential to inspire a number of follow-up works. We are thrilled that you enjoyed reading our paper and provided such encouraging reviews and constructive comments.

We address the answers to your concerns/questions in the following:

1. (Limited Practical Utility) We thank reviewer for raising this viewpoint. This is really a well-thought-out comment! While accuracy degradation is indeed a concern that developers aim to avoid, our work primarily seeks to explore a novel perspective in handling bit errors induced by low-voltage SRAM operation. In particular, NeuralFuse is used to serve as a robust error protection mechanism that can be employed in scenarios where voltage instability is a concern. Of course, NeuralFuse can be extended to save energy consumption, and our experiments have demonstrated its efficiency in this aspect. That being said, our intention is, as the reviewer said, to demonstrate that pre-processing inputs can indeed mitigate accuracy drops, even if the energy savings are modest compared to state-of-the-art hardware techniques.

It is also important to recognize that our approach is complementary rather than competitive with existing hardware solutions. In scenarios where hardware modifications are not feasible or desired, our method offers a software-centric solution that can be readily applied to existing models. We believe this flexibility is a significant strength, as it provides an additional tool for developers facing stringent power constraints.

2. (Zero Accuracy Degradation) We agree that achieving substantial power savings without any accuracy loss is a challenging goal and a critical step for practical utility. Our current results indicate reasonable power savings with minimal accuracy loss. At this stage, NeuralFuse serves as a proof-of-concept, and we are optimistic that with further research, more advanced models could significantly enhance the trade-off between energy savings and accuracy maintenance. One possible direction is to refine our pre-processing module to be more adaptive to the specific error characteristics of the low-voltage SRAM.

On the other hand, we are considering hybrid approaches that combine our input transformation with lightweight hardware modifications to further mitigate errors without compromising accuracy. This integrated approach could offer the best of both worlds—maintaining accuracy while still achieving meaningful power savings.

In summary, the value of our approach lies in its different attack angles. It serves as a foundation for future work that could potentially integrate more sophisticated pre-processing techniques with existing hardware solutions, thereby providing a more robust overall system.

2024-08-12

Both comments sort of hit on the same topic, so I'll lump them together: that NeuralFuse is complementary rather than competitive with HW approaches (agree) and that it can be used in a hybrid approach with both (optimistic). I'd like to agree with the second part, but in practice, there's a lot of evidence to suggest that it's not that easy. There's been a fair number of HW papers that have tried lumping several techniques together (including hybrid SW/HW), and the results are sometimes that different techniques end up cannibalizing each others' gains. It's not always the case, so it's fine to be optimistic that NeuralFuse might dovetail perfectly with other techniques and allow zero accuracy degradation with even more aggressive voltage settings. But ultimately, that's a claim that needs experimentation and proof before it's valid. So I'm strongly with the authors when I say I also hope it's true, but we'll both need to see the evidence before knowing so.

In summary, the value of our approach lies in its different attack angles.

I agree strongly with this statement, and it's largely the reason for my review score. I think there's a long way to go if NeuralFuse is to be proven useful in practice, but the paper demonstrates enough proof of a concept to allow the community to run with the idea if they so choose.

2024-08-12

We thank the reviewer for the prompt response! We are glad our response is in the same boat as the reviewer. We totally agree that "our work allows the community to run with the idea," and we are devoted to deploying our method into real applications.

Thank you for the endorsement and recommendation of acceptance.

作者回复

2024-08-07

We sincerely appreciate all reviewers' valuable feedback and the efforts of the program chair and area chair. We are particularly pleased that reviewers found our paper well-written (aRGH), featuring a novel idea (aRGH, nPtF), highlighting energy efficiency benefits (nPtF, LgKi), providing thorough analysis (aRGH, nPtF), and being practical for real-world scenarios (aRGH, nPtF, e1Hd). We have addressed your specific questions and concerns. Additionally, we have included further experimental results on post-quantization training for Reviewer nPtF in the attachment. We are committed to addressing any further issues raised and improving our manuscript accordingly, so we look forward to your feedback on our response.

评论- Start reviewer-author discussions right now

2024-08-08

Dear reviewers,

Authors submitted rebuttals, which should be visible to you now.

Please read the rebuttals carefully and start discussions with the authors now.

The reviewer-author discussion period will end on August 13, 2024. Since authors usually need time to prepare for their responses, your quickest response would be very appreciated.

In case you had requested additional experiments / analysis and the authors provided in the rebuttal, please pay extra attention to the results.

Thank you,
Your AC

评论- Reviewer-author discussions will end in about 30 hours

2024-08-13

Dear reviewers,

This is the final reminder for the reviewer-author discussions. It will end on August 13 11:59 AoE, and then we will start AC-reviewer discussions.

If you have already concluded the discussions with the authors, thank you so much!

If you have not responded to the author rebuttal yet, please do so immediately. We have been waiting for your response.

In case you missed it, the general author rebuttal includes a PDF file.

Best,
Your AC

最终决定Accept (poster)

2024-09-25

The authors submitted rebuttals. Some reviewers had discussions with the authors, and some left follow-up comments after the author-reviewer discussion period. During the AC-reviewer discussion period, all reviewers shared post-rebuttal thoughts on this work. We find this paper on borderline, but all the reviewers are either in favor of or ok with accepting the work. My recommendation is Accept (poster) for NeurIPS 2024.

In this work, the authors introduce an add-on module method to address accuracy degradation when reducing energy consumption (i.e. reducing supply-voltage), due to random bit-flips in SRAM. We find 1) value, 2) novelty, and 3) practicality in this type of approach, which overall slightly outweigh weaknesses of this work.

I also want to suggest that authors reflect reviewers' comments to the camera-ready. For example, reviewers shared concerns about limited evaluations in this work, which I agree to (e.g., dataset choices seem not well justified). The authors mentioned that "NeuralFuse serves as a proof-of-concept". This seems not aligned with claims made in the current manuscript, and I would like the authors to revise the claims and emphasize "NeuralFuse serves as a proof-of-concept" in the camera-ready.