H-Tuning: Toward Low-Cost and Efficient ECG-based Cardiovascular Disease Detection with Pre-Trained Models
This study allows for the utilization of pre-trained models with high computation efficiency and robust performance, exploring a path toward low-cost and efficient CVDs detection.
摘要
评审与讨论
This paper proposes H-Tuning, a novel framework that reduces the computational cost of fine-tuning large pre-trained models for ECG-based cardiovascular disease detection by integrating mix-order optimization, low-rank adaptation, and layer-dependent tuning. Additionally, it employs knowledge distillation to transfer knowledge to smaller models, significantly reducing inference costs and enabling efficient deployment on low-resource devices while maintaining high diagnostic performance.
update after rebuttal
I'd keep the current score.
给作者的问题
Nothing particular.
论据与证据
The claims made in the submission are well-supported by clear and convincing evidence. The paper provides quantitative experiments on four publicly available ECG datasets, demonstrating that H-Tuning significantly reduces GPU memory consumption (by 6.34×), inference latency (by 19.8×), and the number of model parameters (by 194.2×) while maintaining comparable performance to standard fine-tuning methods. It also includes comparisons against multiple baseline methods, such as Full Fine-Tuning (Full FT), LoRA, MeZO, and Addax, showing superior efficiency and performance. Furthermore, ablation studies confirm the contribution of different components (e.g., mix-order optimization, gradient refinement, and knowledge distillation), while sensitivity analyses demonstrate the robustness of the approach across different hyperparameter settings. Overall, the empirical results strongly support the paper’s core claims regarding computational efficiency and diagnostic performance.
方法与评估标准
The proposed methods and evaluation criteria make sense for the problem of ECG-based cardiovascular disease (CVD) detection using pre-trained models.
理论论述
This paper does not appear to present formal mathematical proofs.
实验设计与分析
The experimental design is methodologically strong, using diverse datasets, meaningful baselines, and rigorous efficiency metrics.
补充材料
No.
与现有文献的关系
Although this paper focuses on pre-trained ECG models for CVD diseases, the proposed approach has a strong potential to be generalized and be applied to many other fields. Thus it’s worth being discussed by the general audience of ICML.
遗漏的重要参考文献
Nothing particular.
其他优缺点
Nothing particular.
其他意见或建议
Nothing particular.
We are grateful for your insightful comments on our work. In this rebuttal round, we provide an external validation on a wearable ECG dataset, a more thorough ablation study to support the claims of our study. At the same time, experiments on different backbones strengthen the flexibility of the proposed method (Please refer to our responses to other reviewers).
The authors propose H-tuning, a model pipeline for efficiently fine-tuning pre-trained models for ECG classification to enable cardiac diagnosis under limited computation resources. They combine zeroth- order backpropagation, low-rank adaptation, and knowledge distillation to reduce computation times and memory requirements at fine-tuning and inference time. They demonstrate superior runtime, memory footprint, and classification performances for numerous related methods considering four public datasets for evaluation. Design choices are justified with thorough ablation studies.
update after rebuttal: I am happy with the authors responses and believe this submission to be worthy of publication. I am happy to change my score to reflect this.
给作者的问题
Could the authors highlight the differences of your proposed mix-order optimization with Addax and LoHO ?
Could the authors also discuss more in depth the difference and added value of H-tuning compared to CE-SSL (Zhou, R., Liu, Z., Clifton, L., Clifton, D. A., Chan, K. W., Zhang, Y.-T., and Dong, Y. Computation-efficient semisupervised learning for ecg-based cardiovascular diseases detection. arXiv preprint arXiv:2406.14377, 2024.)
论据与证据
The claims are convincing as the study introduces a thorough experiment design and evaluation. The proposed method has been evaluated on multiple datasets with convincing performance impact and compared with with multiple related state-of-the-art techniques. Finally the proposed model design choices is supported by various ablation studies.
方法与评估标准
The proposed method has not been assessed on external databases, and therefore it is impossible to assess the generalisability of the final classifiers, and that was one of the key challenges in the 2020 PhysioNet Challenge.
Moreover, it would have been interesting to compare the proposed with the approaches proposed by the winning challenge team to compare the models with a strong baseline.
理论论述
No
实验设计与分析
The generalizability of the learned representation has not been performed, testing of the classifier on an external database would have been interesting. Comapring their appoach with other state-of-the-art techniques (2021 PhysioNet challenge entries) would have been informative
补充材料
no
与现有文献的关系
It would have been interesting to compare the results with baseline approaches on the downstream tasks, and not only assess how the proposed technique compares with other SSL techniques.
It would have been also interesting to compare the performance of the proposed technique with CE-SSL (Zhou, R., Liu, Z., Clifton, L., Clifton, D. A., Chan, K. W., Zhang, Y.-T., and Dong, Y. Computation-efficient semisupervised learning for ecg-based cardiovascular diseases detection. arXiv preprint arXiv:2406.14377, 2024.), which has a very similar study design.
遗漏的重要参考文献
The authors have omitted the literature of past 2020 and 2021 PhysioNet/challenge on rhythm; classification.
其他优缺点
The novelty of the proposed work is the combination of existing state-of-the-art model fine-tuning techniques.
其他意见或建议
Font size of Figure 2 could be increased Typo in L224? “Additionally, we tune the deep layers using the proposed mix-order optimization method…” -> shallow layers? Table 3: Teacher “None” might be confusing before reading the manuscript, add a short description in the table caption
We sincerely thank the reviewer for all your questions and suggestions.
- Methods And Evaluation Criteria: To assess the generalizability of our classifiers in mobile cardiac healthcare, an external validation set consisting of 7000 wearable 12-lead ECG signals, provided by [1], is used for testing. We first utilized the G12EC, PTB-XL, Ningbo, and Chapman datasets for downstream training, where only 10% of labeled ECG signals are used to fine-tune the backbone. We employ two fine-tuning methods (Full FT and LoRA) and H-Tuning to train three teacher models, followed by knowledge distillation to create three corresponding student models. The CVD detection performance of the six classifiers on the external dataset is shown in Table R1.
Table R1 External validation on a wearable ECG dataset
| Methods | Macro AUC | Macro |
|---|---|---|
| Teacher Models | ||
| Full FT | 0.870 | 0.570 |
| LoRA | 0.879 | 0.579 |
| H-Tuning | 0.866 | 0.567 |
| Student Models | ||
| Full FT | 0.867 | 0.534 |
| LoRA | 0.874 | 0.543 |
| H-Tuning | 0.880 | 0.551 |
The results demonstrate that the teacher generated by H-Tuning achieves comparable performance to Full FT and LoRA, but with significantly less GPU memory consumption, as shown in our manuscript. Additionally, our student model performs better than the compared methods.
We cannot compare our classifiers with the winning challenge team on the Physionet 2020/2021 test datasets, which are not publicly available. However, we can report the performance of H-Tuning using the model designed by the winning team (See our response to Reviewer m5NF, Table R4, reference [2]).
[1] Lai, J., etal. Practical intelligent diagnostic algorithm for wearable 12-lead ECG via self-supervised learning on large-scale dataset. Nature Communications, 14(1), 3741.
-
Experimental Designs Or Analyses: Please refer to the above section.
-
Relation To Broader Scientific Literature: (1) We need to clarify that the proposed H-Tuning and the compared SOTA methods in our manuscript are all fine-tuning methods, which do not utilize unlabeled data for semi-supervised or self-supervised learning. In addition, the experiments presented in our manuscript are all conducted on the downstream datasets. External validation was performed on a wearable 12-lead ECG dataset (Table R1). (2) Comparisons between H-Tuning and CE-SSL. Please see Questions For Authors:.
-
Essential References Not Discussed: We thank the reviewer for this comment. The downstream datasets used in our study were also included in the PhysioNet challenge. We will add this reference in next version of our manuscript.
-
Other Comments Or Suggestions: We are grateful for your valuable suggestions. We will correct the issues in the next version of our manuscript.
-
Questions For Authors: (1) LoHO applied first-order optimization and zeroth-order optimization on different trainable parameters. It did not explore how to utilize their advantages in a more flexible way to fully optimize all parameters. Addax and the proposed mix-order optimization both utilize first-order gradients to refine the direction of the zeroth-order gradients. However, our mix-order optimization proposed a gradient normalization technique to regulate the norm of the zeroth-order gradients, which has not been explored by Addax. Our ablation studies in section 3.4 demonstrated that the proposed technique has an obvious positive impact on fine-tuning performance. At the same time, Table 1 in our manuscript demonstrated the superior performance of H-Tuning compared with Addax and LoHO. (2) The differences between CE-SSL and H-Tuning can be listed as:
a. CE-SSL is a semi-supervised method, which utilizes unlabeled data to achieve robust CVDs classification performance. In contrast, H-Tuning is a supervised method focusing on low-cost and efficient fine-tuning and inference. b. CE-SSL utilizes first-order optimization for model training, which needs extensive activation outputs for gradient backpropagation. H-Tuning avoids this drawback by designing a mix-order optimization method, which greatly reduces the GPU memory costs. c. In our study, we integrated H-Tuning with a knowledge distillation technique to reduce the model inference costs on wearable devices, which was not explored by CE-SSL. d. To ensure a fair performance comparison between CE-SSL and H-Tuning, we remove the semi-supervised learning module from the CE-SSL while preserving the other modules, which makes CE-SSL a supervised fine-tuning method. We report their average performance across four downstream datasets (following the data split method and the backbone used in our manuscript).
Table R2
| Methods | Memory (GB) | MAP | Macro |
|---|---|---|---|
| CE-SSL (without unlabeled data) | 9.024 | 0.562 | 0.616 |
| H-Tuning | 1.453 | 0.535 | 0.600 |
The results show that H-Tuning performs similarly to CE-SSL with much less GPU memory costs.
I thank the authors for the authors for their responses.
I hope that external validation will be included in the revised manuscript.
Many thanks for taking the time to review our responses. The external validation (Table R1) will be included in the revised manuscript along with more evaluation metrics (Coverage, Ranking Loss, MAP, and Macro score).
Table R1 External validation on the wearable 12-lead ECG dataset.
| Methods | Ranking Loss | Coverage | Macro AUC | MAP | Macro | Macro |
|---|---|---|---|---|---|---|
| Teacher Models | ||||||
| Full FT | 0.137 | 5.595 | 0.870 | 0.600 | 0.314 | 0.570 |
| LoRA | 0.134 | 5.440 | 0.879 | 0.598 | 0.319 | 0.579 |
| H-Tuning | 0.141 | 5.484 | 0.866 | 0.575 | 0.312 | 0.567 |
| Student Models | ||||||
| Full FT | 0.135 | 5.462 | 0.867 | 0.566 | 0.287 | 0.534 |
| LoRA | 0.129 | 5.384 | 0.874 | 0.582 | 0.302 | 0.543 |
| H-Tuning | 0.127 | 5.297 | 0.880 | 0.598 | 0.311 | 0.551 |
This paper aims to detect cardiovascular disease by fine-tuning large-scale pre-trained models using ECG signals. It focuses on low-cost and efficient fine-tuning through a mix-order optimization with low-rank adaptation and a novel layer-dependent model update scheme. Then, a knowledge distillation technique is introduced for smart devices. Experiments on the G12EC, PTB-XL, Ningbo, and Chapman datasets show the proposed method achieves comparable or even better performance with lower time cost and reduced memory consumption.
update after rebuttal
I would like to keep the score.
给作者的问题
NA
论据与证据
- The proposed method is based on the ECG setting. But is there any reason this framework has to be specific for ECG but not other signals like EEG? It would be better if the authors expanded the experiments for other signals for core AI/ML contribution.
方法与评估标准
- They propose a framework (H-Tuning) developed to integrate mix-order optimization with low-rank adaptation and a novel layer-dependent model update scheme, enhancing both computational efficiency and robustness.
- The mix-order optimization provides a low-cost solution and does not require text data.
- The study is set in the context of smart devices. However, as I understand it, the data collected by smart devices include a significant amount of noise. Is there any preprocessing applied to remove different types of noise?
- Unclear writing:
- In line 023 of the abstract, what is meant by “a joint framework” and “a holistic method”? Do they mean a framework that includes both fine-tuning and downsampling tasks? What is being joined, and what subtasks should a non-holistic method focus on?
- In Table 1 and Table 2, what is the unit for Memory? MB? GB?
理论论述
The authors claim the effectiveness of the zero-th order optimization and the mixed-order (zero-th order and first order) optimization.
实验设计与分析
- The authors investigate the performance of the student model under various lead configurations, considering that most mobile ECG devices have only 1–3 leads.
- The authors claim in the abstract that the computational costs for fine-tuning are unaffordable, however, the backbone in the proposed method only consists of 50 million parameters, which is easy to train from scratch. It would be better if the authors conducted experiments using a larger model, except it cannot support this motivation.
- The authors claim that for signal preprocessing, a band-pass filter (1–47 Hz) is applied to remove potential noise from the raw ECG recordings, such as power-line interference and motion artifacts (Section 3.1). It would be better if the authors had conducted additional experiments to show that the motion artifacts are removed.
- The authors claim that the student model meets the requirements for mobile cardiac healthcare. Are any of the datasets used collected by mobile devices? If yes, which dataset? If not, how can it be proven that this method has the potential to be embedded into mobile phones?
- It would strengthen the authors' argument if the authors included more ablation studies to demonstrate the effectiveness of the Low-Rank Adaptation module and the Model Update Scheme. A comparison between the current results and those obtained by removing the fine-tuning block would support their claims. Additionally, comparing the results with those from a standard layer (not the shallow and deeper layers) could provide further insights.
- In Table 1, MeZo and MeZo+LoRA only achieved about 0.5 on AUROC, almost making a random decision. It is unusual and worth explaining why.
- In Table 4, regarding Time/Iter, the efficiency of H-Tuning does not appear to be significantly improved.
补充材料
NA - no additional information. No code is provided.
与现有文献的关系
It would be better if the authors could provide the reason for selecting this particular backbone. Additionally, it would be useful to know whether the proposed framework can also work with other backbones.
遗漏的重要参考文献
NA
其他优缺点
NA
其他意见或建议
Some notions are unclear.
- In line 156, Equation (2), section 2.1, what does “η” represent?
- In line 159, Section 2.1, what does “E” represent?
We sincerely thank the reviewer for all your questions and suggestions.
- Claims:
The reasons for choosing the ECG setting: (1) The ECG community offers many open-access datasets for model training and evaluation. (2) Developing low-cost methods for accurate and mobile cardiac healthcare is an important topic in AI/ML for health. We apologize for not being able to extend the experiments to other signals within this timeframe, but we will include this as future work in the next version of the manuscript.
- Methods:
Q3: A bandpass filter (1-47 Hz) was used to filter out the noise, which is also adopted by [2].
[2] P. Nejedly et al., “Classification of ECG using ensemble of residual CNNs with attention mechanism,” in Proc. Comput. Cardiol., 2021, pp. 1–4.
Q4: An ideal joint framework should not only reduce the computational costs associated with fine-tuning and deploying pre-trained models but also maintain the performance of the fine-tuned models on downstream datasets. Therefore, under the constraint of achieving comparable fine-tuning performance to full fine-tuning, the joint framework proposed in our study includes two subtasks: (1) reducing the fine-tuning cost and (2) reducing the inference cost.
The unit for Memory is GB.
- Experimental Designs:
Q2: Yes, a backbone with 50 million parameters can be trained or fine-tuned with high-level GPUs (> 8GB memory). However, we want to point out that the costs for fine-tuning and inference become prohibitive on low-level devices (with 2-4 GB memory), which are affordable and typically deployed in clinics or home settings. Therefore, the motivation of our research is to maintain model performance under limited resources. We validated that H-Tuning achieves similar performance to Full FT while reducing GPU memory costs by 6.34 times, fulfilling our research objective. Additionally, we utilized a knowledge distillation technique to reduce the inference costs of the fine-tuned models, enabling them to be deployed on mobile devices (<20MB memory).
Q3: Motion artifacts often behave as baseline wander. Theoretically, a bandpass filter (1-47Hz) can filter out most of the baseline wander noises (generally below 1Hz). We will add a comparison between the raw and the filtered signals in the next version of the manuscript.
Q4: In the previous manuscript, no dataset was collected by wearable devices. To support our claims, we conducted an external validation of the final classifiers generated by H-Tuning on a wearable 12-lead ECG dataset. Please refer to our response to Reviewer ChMh (Table R1).
Q5: There is no 'standard layer' in the proposed model update scheme. All layers are trainable. Without this scheme, all layers can be optimized with first-order optimization, such as Full FT or LoRA. However, we agree that we should include an ablation study on the low-rank adaptation. The results are shown in Table R3.
Table R3
| Time/iter (s) | Memory (GB) | Macro | |
|---|---|---|---|
| H-Tuning w/o low-rank adaptation | 0.358 | 2.002 | 0.571 |
| H-Tuning | 0.408 | 1.453 | 0.600 |
| Full FT | 0.401 | 9.212 | 0.605 |
Q6: The success of MeZO-based methods in fine-tuning pre-trained models relies on prompts[3]. However, prompts are not feasible in ECG analysis because there are no text inputs, making the MeZO unable to find a stable optimization path. The MeZO paper also reported a performance collapse (30.3%-47.2% loss) in the absence of prompts.
[3]. Malladi, S.etal. (2023). Fine-tuning language models with just forward passes. Advances in Neural Information Processing Systems, 36, 53038-53075.
Q7: H-Tuning integrates low-rank adaptation to reduce the GPU memory consumption and decrease the variance for gradient estimation, which introduces extra time for the forward process. As shown in Table R3, training time is significantly reduced without this module at the expense of increased memory costs.
-
Supplementary Material: We will release our code after the publication of our manuscript.
-
Relation To Literature: The architecture of the backbone used in our study is provided by [4], which validated its effectiveness on various datasets. We also verified that the performance of H-Tuning is comparable to Full FT on another backbone provided by [2].
Table R4
| - | Params | Marco AUC | Macro |
|---|---|---|---|
| H-Tuning with backbone | - | - | - |
| in [2] | 15.8M | 0.902 | 0.576 |
| in [4] | 50.4M | 0.913 | 0.600 |
| Full FT with backbone | - | - | - |
| in [2] | 15.8M | 0.908 | 0.590 |
| in [4] | 50.4M | 0.918 | 0.605 |
| [4] Zhou, R. etal. Computation-efficient semisupervised learning for ecg-based cardiovascular diseases detection. arXiv:2406.14377, 2024. |
-
Other Comments: is the learning rate. denotes the expectation of the zero-order gradients over vector .
-
Note: All evaluation metrics are averaged across 4 downstream datasets and 4 seeds.
Additional comments for Q2: given that 50M is relatively small for modern pre-trained models, how does the method generalize to larger backbones? To demonstrate the effectiveness of H-Tuning and knowledge distillation in reducing fine-tuning and inference costs, comprehensive and rigorous experiments should be conducted across models of varying sizes, particularly larger-scale backbones. This would strengthen the generalizability of the proposed method and better support the motivation related to computational affordability.
Additional comments for Q3: Some experimental justifications are needed, as mentioned in those papers, motion artifacts cannot be removed by simple filtering, as the MA’s frequency contents overlap those of the ECG.
- Lee S Y, Su P H, Hung Y W, et al. Motion artifact reduction algorithm for wearable electrocardiogram monitoring systems. IEEE Transactions on Consumer Electronics, 2023, 69(3): 533-547.
- Pholpoke B, Songthawornpong T, Wattanapanitch W. A micropower motion artifact estimator for input dynamic range reduction in wearable ECG acquisition systems. IEEE Transactions on Biomedical Circuits and Systems, 2019, 13(5): 1021-1035.
First of all, we are grateful for your time and effort in reviewing our paper.
Additional comments for Q2:
We conducted additional experiments to compare the performance of H-Tuning and the other fine-tuning methods using a larger backbone provided by [1]. The large backbone has 113.49M parameters, demonstrating a complexity comparable to RoB-base (125M) and DeBERTaV3 (184M), both of which are commonly used in evaluating different fine-tuning methods for natural language processing tasks [2, 3]. The results shown in Table R5 demonstrate that H-Tuning achieves similar performance to Full FT and LoRA while using significantly less GPU memory (by 6.04x). Additionally, H-Tuning demonstrated better fine-tuning performance than Addax and LoHO, which are SOTA in memory-efficient fine-tuning. In conclusion, these results provide direct evidence to the generalizability of H-Tuning in larger backbones.
Table R5
| Method | Params (M) | Memory (GB) | MAP | Macro |
|---|---|---|---|---|
| Full FT | 113.49 | 13.78 | 0.541 | 0.593 |
| LoRA | 3.20 | 12.94 | 0.545 | 0.599 |
| Addax | 113.49 | 3.488 | 0.501 | 0.563 |
| LoHO | 113.49 | 3.487 | 0.498 | 0.557 |
| H-Tuning | 3.20 | 2.28 | 0.540 | 0.615 |
We also investigated how the backbone size influences the knowledge distillation process. Specifically, we fine-tune different backbones using the proposed H-Tuning to generate the teacher models. Subsequently, the knowledge distillation method is applied to generate the corresponding student models, which contain only 0.26M parameters. Table R6 shows the performance of the student models, which indicates that an increase in teacher size can improve student performance.
Table R6
| Teacher Model | Student's MAP | Student's Macro |
|---|---|---|
| [4] (15.8 M) | 0.539 | 0.602 |
| [1] (50 M) | 0.544 | 0.603 |
| [1] (113.49 M) | 0.551 | 0.618 |
Note: All metrics are averaged across 4 downstream datasets and 4 seeds.
Additional comments for Q3:
There are two primary types of motion artifacts in wearable ECG signals. The most common type is baseline wander, which has no frequency content overlapping with ECG and can be filtered out by a bandpass/highpass filter, as demonstrated in previous studies [4, 5]. In our research, we follow the pre-processing pipeline (bandpass filter) provided by the Physionet 2021 challenge winner [4]. The ECG classification model using this pre-processing pipeline ranked first among all models, justifying our choice of the pipeline in our study. On the other hand, there is another type of motion artifact, which is more complex and has frequency content overlapping with ECG [6, 7]. We agree that simple filtering cannot remove it from ECG. According to [8], three types of methods have the potential to remove it:
(1) Adaptive filtering, such as Least Mean Square (LMS).
(2) Wavelet Transform;
(3) Blind Source Separation (BSS).
According to [6, 7], LMS-based methods require a reference channel to remove the artifact, such as the electrode-tissue impedance. However, such reference is absent in the public ECG datasets we used. According to [8], the BSS-based method is computationally expensive and cannot meet our requirement for low-cost inference. Therefore, we choose the discrete wavelet transform (DWT) to pre-process the wearable ECG dataset [5] used in our study and compare it with the bandpass-based pipeline [4]. In Table R7, we present the performance of the H-Tuning with two pre-processing pipelines on the wearable ECG dataset. The results indicate that the DWT performs similarly to bandpass on CVDs detection using wearable ECG.
Table R7
| H-Tuning with | Teacher's AUC | Teacher's Macro | Student's AUC | Student's Macro |
|---|---|---|---|---|
| DWT | 0.862 | 0.562 | 0.876 | 0.541 |
| Bandpass (1-47 Hz) | 0.866 | 0.567 | 0.880 | 0.551 |
[1] Zhou, et al. Computation-efficient semisupervised learning for ecg-based cardiovascular diseases detection. arXiv:2406.14377, 2024.
[2] Hu, et al. "Lora: Low-rank adaptation of large language models." ICLR 1.2 (2022): 3.
[3] Zhang, et al. "Adalora: Adaptive budget allocation for parameter-efficient fine-tuning." arXiv preprint arXiv:2303.10512 (2023).
[4] P. Nejedly et al., “Classification of ECG using ensemble of residual CNNs with attention mechanism,” in Proc. Comput. Cardiol., 2021, pp. 1–4.
[5] Lai, J., et al. Practical intelligent diagnostic algorithm for wearable 12-lead ECG via self-supervised learning on large-scale dataset. Nature Communications, 14(1), 3741.
[6] Lee S Y, et al. Motion artifact reduction algorithm for wearable electrocardiogram monitoring systems. IEEE Transactions on Consumer Electronics, 2023, 69(3): 533-547.
[7] Pholpoke B,et al. A micropower motion artifact estimator for input dynamic range reduction in wearable ECG acquisition systems. IEEE Transactions on Biomedical Circuits and Systems, 2019, 13(5): 1021-1035.
[8] Berwal, Deepak, et al. "Motion artifact removal in ambulatory ECG signal for heart rate variability analysis." IEEE Sensors Journal 19.24 (2019): 12432-12442.
This paper received two accepts and one weak reject. While the core contribution of combining existing fine-tuning techniques for pre-trained ECG models may not be highly novel, the approach is well-motivated and shows strong potential for generalization beyond cardiovascular disease applications. The paper is methodologically sound, and its relevance to broader domains makes it a valuable contribution, although it lacks theoretical backup. Although one reviewer expressed limited enthusiasm due to the incremental nature of the work, the overall consensus supports acceptance. The authors are encouraged to clarify the broader impact and potential extensions of their method in the final version.