Does One-shot Give the Best Shot? Mitigating Model Inconsistency in One-shot Federated Learning
We propose a novel one-shot FL framework to address the model inconsistency in existing methods, thereby circumventing the "garbage in, garbage out" dilemma and achieving performance improvements of 10.86% without requiring auxiliary information.
摘要
评审与讨论
The paper investigates one-shot Federated Learning (OFL), which aims to reduce the communication costs associated with traditional multi-round Federated Learning. The authors highlight that existing OFL methods face significant challenges due to "garbage in, garbage out" issues, where inconsistent local models lead to a degraded global model. The proposed solution, denoted as FAFI, introduces Self-Alignment Local Training (SALT) and Informative Feature Fused Inference (IFFI) to create consistent features in one-shot local models and enhance global model inference. Through extensive experiments, FAFI outperforms existing OFL methods by a significant margin.
给作者的问题
See weaknesses.
- Besides, to highlight the novelty, could you illustrate the detailed differences between the proposed methods with previous works?
- In Line 69, ``we observe that better performance is always accompanied by larger parameter discrepancies.''. However, later analysis shows that the large discrepancy is not good so you want to minimize (Line 258). So, could you interpret this with more details? I'm willing to increase my score if addressing above weaknesses and questions.
论据与证据
Claims made in the submission are supported by clear and convincing evidence.
方法与评估标准
The proposed methods and evaluation criteria (e.g., benchmark datasets) make sense for the problem at hand.
理论论述
I checked the correctness of proofs for theoretical claims.
实验设计与分析
The experimental design is comprehensive. However, the feature fusion part might introduce extra communication or computation, authors can provide more detailed interpretation of this.
补充材料
Yes.
与现有文献的关系
This work is related to the one-shot Federated learning. The experiment results reveals general weakness of previous OFL methods. The Figure 4 also provides a holistic view to compare accuracy and communication costs of different OFL and multi-round FL methods.
遗漏的重要参考文献
For the feature alignment, there are some works also enhance the feature quality including [1,2,3], in which aggregating same-class features and repelling features of different classes are also proposed. For prototype based FL, works [1,4,5] are also related. Authors should consider adding them for discussion, what differences of the proposed feature alignment and prototype learning from them. Moreover, in Section 4.3, what is the difference between the proposed feature fusion and FuseFL?
[1] Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning. In ICML 2022. [2] Model-Contrastive Federated Learning. In CVPR 2021. [3] FedImpro: Measuring and Improving Client Update in Federated Learning. In ICLR 2024. [4] Fedproto: Federated prototype learning across heterogeneous clients. In AAAI 2022. [5] No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. In NeurIPS 2021.
其他优缺点
Strengths:
- The paper thoroughly identifies and analyzes the inconsistencies within and between one-shot local models, providing a theoretical foundation for the proposed method.
- The introduction of Self-Alignment Local Training (SALT) effectively addresses intra-model inconsistencies by fostering invariant feature representations.
- The use of Informative Feature Fused Inference (IFFI) on the server-side improves the aggregation process, leading to better global model performance.
- Extensive empirical validation across multiple datasets demonstrates the effectiveness of the FAFI framework in a variety of scenarios, establishing its robustness.
- The proposed solution achieves a notable accuracy improvement than baseline models, showcasing its potential impact on the field.
Weaknesses:
- The paper does not address the potential limitations and complexities of implementing the FAFI framework in real-world federated learning environments.
- There may be a lack of comprehensive comparisons with other state-of-the-art methods that also aim to reduce communication costs, apart from the eleven baselines included.
- Although the paper discusses privacy concerns, the utilized label information may still expose data privacy at some degree.
- For the feature alignment, there are some works also enhance the feature quality including [1,2,3], in which aggregating same-class features and repelling features of different classes are also proposed. For prototype based FL, works [1,4,5] are also related. Authors should consider adding them for discussion, what differences of the proposed feature alignment and prototype learning from them. Moreover, in Section 4.3, what is the difference between the proposed feature fusion and FuseFL?
- The paper could benefit from a more in-depth analysis of the computational efficiency and resource requirements of the FAFI framework compared to existing methods.
- Some format errors. Line 216, the formula is out of box.
- Some writing issues. Line 164, the \Delta_intra is not defined in the main texts. The augmentation function A is not clear, why we need an augmentation function A here. I guess the meaning of A in appendix is the global dataset.
[1] Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning. In ICML 2022. [2] Model-Contrastive Federated Learning. In CVPR 2021. [3] FedImpro: Measuring and Improving Client Update in Federated Learning. In ICLR 2024. [4] Fedproto: Federated prototype learning across heterogeneous clients. In AAAI 2022. [5] No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. In NeurIPS 2021.
其他意见或建议
See weaknesses.
We would like to thank the reviewer for the time in reviewing. We appreciate that you consider our presentation is clear, and our method is effective and robust. Please see our detailed feedback for your concerns below.
W1: Potential solutions and complexities of implementation.
Ans for W1: Thank you for your valuable comments. Indeed, FAFI may have limitations in handling domain shifts/multi-task scenarios. FAFI assumes that invariant feature representations should remain consistent or similar, enabling them to mitigate model inconsistencies. However, in domain shift/multi-task scenarios, feature representations often exhibit significant discrepancies across different domains or tasks, particularly in the isolated training paradigm.
A potential solution is to cluster and ensemble feature representations across various domains or tasks, allowing unbiased representations to encapsulate multi-domain/task knowledge.
Regarding implementation in real-world FL environments, as detailed in Algorithm.1(Appendix C), FAFI requires only two modifications compared to existing OFL methods, making it easily adaptable for both clients and the server:
- On the client side, we train the feature extractor in a contrastive manner and use the prototypes instead of the classifier.
- On the server side, fusion is performed at the feature level rather than the parameter or prediction level.
We will highlight these in our revised version.
W2: Comparisons with other methods
Ans for W2: Thanks for your important comments. We have conducted new experiments comparing 3 methods, i.e., FedCompress, FedKD, and FedACG, which reduce the communication cost by gradient compression, gradient fractorization, and transferring gradient momentum. These new results on CIFAR-10 are provided below and added to our revision. Results show that FAFI achieves better efficiency compared with all baselines.
| Methods | Acc. | Comm. Cost |
|---|---|---|
| FedCompress(C=1) | 15.45 | 9.86MB |
| FedKD(C=1) | 20.67 | 34.89MB |
| FedACG(C=1) | 19.77 | 44.7MB |
| FedCompress(C=50) | 33.34 | 0.48GB |
| FedKD(C=50) | 60.12 | 1.70GB |
| FedACG(C=50) | 68.23 | 2.18GB |
| Ours | 77.83 | 44.7MB |
W3: Privacy issue
Ans for W3: Thanks for your very valuable comments. We agree with the statement there is no free lunch for the privacy-utility trade-off[1], for that the tenet of FL is to seek balance among efficiency, effectiveness, and privacy. FAFI, as well as most OFL methods, all aim to ensure efficiency for affordable deployment while striving to find the effectiveness boundaries as much as possible and maintain privacy below an acceptable level. As to FAFI, the possible leakage is the prototypes, which have been proven in previous works to compromise no privacy. For stricter privacy requirements, one potential solution is implementing differential privacy or using noised samples with the cost of some effectiveness degradation.
W4 & Q1: Comparisons with other methods
Ans for W4 & Q1: Thanks for your valuable comments. We present the difference between our proposed FAFI with feature-enhanced and prototype-based methods as follows:
For feature-enhanced works, we note that FAFI is one-shot and selective enhances the feature at the inference stage for high performance.
| Methods | Paradigm | Stage | Feature Enhancemnet |
|---|---|---|---|
| VHL | Mul. | Train | Generative |
| MOON | Mul. | Train | Constrastive |
| FedImpro | Mul. | Train | Estimative |
| FuseFL | One-shot | Train | Adaptor-based |
| Ours | One-shot | Inference | Selective |
For prototype-based methods, FAFI aligns global prototypes by aggregating learnable local prototypes in one round.
| Methods | Paradigm | Prototypes |
|---|---|---|
| VHL | Mul. | Generative |
| FedProto | Mul. | Statistical |
| CCVR | Mul. | GMM-based |
| Ours | One-shot | Learnable |
Mul. is short for multi-round. We will add these discussions in our revision.
W5: More in-depth efficiency analysis
Ans for W5: Thanks for your valuable comments. We have conducted the memory and computation cost analysis of FAFI. Due to limited space, please see the response to Reviewer Chua W1 & Q1.
W6 & W7 & Q2: Writing issues
Ans: Thanks for your careful reading.
- The is the performance discrepencies between any two samples with the same label . For any two samples , we can find a function that satisfies . can abstract the feature variation among samples.
- The 'performance' in L69 represents the performance of the local model, we observe that better local performance is always accompanied by large parameter discrepancies(Fig.2-b). The performance in L258 denotes the performance of the global model.
Sorry for these misunderstandings. We will revise these writing issues and format errors and provide more explanations for clarity.
[1] No free lunch theorem for security and utility in federated learning. ACM TIST, 2022.
Thanks for the responses. The provided new clarification is clear, and the differences from previous methods are clear. I'd like to raise my score.
Dear Reviewer 8kHb:
We're grateful for your quick feedback during this busy period. We deeply appreciate your consideration in raising the score. Your constructive comments have significantly contributed to the refinement of our work. We will add all these above discussions in our final revision. Thanks a lot for your valuable comments!
We will remain open and ready to delve into any more questions or suggestions you might have until the last moment.
Best regards and thanks
Existing OFL methods focus on server-side aggregation, which falls into ‘garbage in garbage out’ pitfall. They unravel the root cause of such garbage inputs as intra-model and inter-model inconsistencies in the face of data heterogeneity. To address these, they design self-alignment local training and informative feature fused inference. Experimental results verify the effectiveness, scalability, and efficiency of the proposed method.
给作者的问题
- It is unclear how L_ssl and L_proto contribute to mitigating the intra-model inconsistency
- How does the weights in feature fusion impact the performance.
论据与证据
Clear and convincing claims.
方法与评估标准
The method is suitable for the proposed model inconsistency problem. The evaluation criteria, accuracy on three classification datasets, is suitable.
理论论述
Correct theoretical claims. The manuscript contains two theorems, Theorem 3.1 for intra-model inconsistency and Theorem 3.2 for inter-model inconsistency.
实验设计与分析
I have assessed the soundness and validity of the experimental designs, and the authors have effectively validated the proposed method’s effectiveness.
补充材料
There is no supplementary material.
与现有文献的关系
They provide a novel solution for mitigating model inconsistency. Existing OFL methods all focus on server-side aggregation based on inferior local models. This paper proposes to provide a high-quality local model through self-alignment local training, and utilizing the extracted feature for aggregation.
遗漏的重要参考文献
Almost all essential references have been included and discussed.
其他优缺点
- Strengths:
- This manuscript proposes FAFI which achieves a good performance while requiring only one communication round.
- The paper is the first to systematically identify the "garbage in, garbage out" pitfall in OFL caused by intra- and inter-model inconsistencies. The visualization and theoretical analysis on the model inconsistencies are interesting.
- The authors utilize self-supervised learning methods and prototype learning for better local models and propose a feature fusion-based inference method. The feature fusion is novel in OFL.
- FAFI is data-free, which does not require the source data or any other auxiliary information.
- Weaknesses:
- SALT consists of two parts, feature alignment, and category-wise prototype learning, and how these two parts contribute to mitigating the intra-model inconsistency. Please provide the ablation study on L_ssl and L_proto.
- Noise impact in IFFI. How does the noise representation used for attention weighting in feature fusion (§4.3) impact the final performance. Please provide more evaluations to demonstrate the impact of the noise.
其他意见或建议
- Inconsistent descriptions of existing OFL methods categories (Sec. 2 and Sec. 5)
We sincerely thank the reviewer for taking the time to review our work. We greatly appreciate you find our method is efficient and data-free. Please find our detailed responses to your concerns below.
W1 & Q1: Ablation study on and .
Ans for W1 & Q1: Thanks for your important comments. As suggested, we have conducted new ablation studies on and , and the results are shown below.
| Feature Extractor | Prototypes / Classifier | CIFAR-10 | CIFAR-100 | Tiny-ImageNet |
|---|---|---|---|---|
| Classifier + | 17.34 | 6.45 | 8.31 | |
| Prototypes + | 18.23 | 7.12 | 8.92 | |
| Prototypes + | 52.34 | 33.34 | 23.12 | |
| Classifier + | 22.34 | 12.45 | 10.44 | |
| Prototypes + | 50.12 | 31.89 | 22.34 | |
| Prototypes + | 77.83 | 45.48 | 43.62 |
We note that the performance of FAFI is significantly improved by using and for enhanced feature representation and discriminative prototypes, and the combination of and achieves the best. Additional details will be added in Appendix F in our revision.
W2 & Q2: Impact of the feature fusion strategy.
Ans for W2 & Q2: Thanks for your important comments. As suggested, we have conducted new evaluations on the impact of the feature fusion strategy with , and the results are shown below.
| Feature Fusion Strategy | CIFAR-10 | CIFAR-100 | Tiny-ImageNet |
|---|---|---|---|
| Average | 72.83 | 37.48 | 30.12 |
| 71.83 | 35.40 | 28.09 | |
| 70.12 | 33.02 | 26.34 | |
| 68.23 | 31.12 | 24.12 | |
| 69.12 | 32.77 | 25.12 | |
| (Ours) | 77.83 | 45.48 | 43.62 |
We note that facilitates the best performance in the tested cases, as it effectively captures feature information compared to other strategies. Additional details and discussions about this hyperparameter will be included in Appendix F in the revision.
Thank you for the author's response. After reading the author's rebuttal, the main concerns I had have been addressed, and I will maintain my score.
Dear Reviewer BvHZ:
We're grateful for your quick feedback during this busy period. We deeply appreciate your consideration in raising the score. Your constructive comments have significantly contributed to the refinement of our work. Thanks a lot for your valuable comments! We will remain open and ready to delve into any more questions or suggestions you might have until the last moment.
Best regards and thanks
This manuscript tries to solve the 'garbage in garbage out' problem caused by inconsistent models in the one-shot federated learning paradigm. They propose FAFI, a novel OFL framework consisting of two key components: SALT for invariant feature learning and IFFI for server-side feature fusion-based inference. Extensive experiments on CIFAR-10/100 and Tiny-ImageNet demonstrate FAFI’s superiority over 11 baselines, achieving a 10.86% average accuracy improvement.
给作者的问题
- Please provide the details about the noise in IFFI.
- Please provide more settings to verify the efficiency of FAFI.
论据与证据
The claims made in this manuscript are supported by clear and convincing evidence.
方法与评估标准
The proposed method and evaluation criteria are suitable for the problem.
理论论述
The theoretical claims in this paper are correct. However, some typos in appendix, line 653 should be ?
实验设计与分析
The experimental parts are well-organized, the demonstrations on main paper and appendix are good. Better presenting more results on efficiency.
补充材料
No supplementary material.
与现有文献的关系
The paper situates its contributions within the broader federated learning (FL) literature by empirically and theoretically analyzing the limitations of existing one-shot FL (OFL) methods. The paper explicitly differentiates FAFI from related approaches in Sec.4.4, including prototype-based methods and model merging techniques.
遗漏的重要参考文献
The paper has cited the related works that are essential to understanding the key contribution.
其他优缺点
Strengths:
- The paper is well-demonstrated and well-organized. The figures, such as Figure 1(a), Figure 2(a), and Figure 3 are good.
- The paper is well-motivated. The analysis of intra- and inter-model inconsistencies is supported by both empirical evidence (Grad-CAM visualizations) and theoretical proofs (Theorems 3.1 and 3.2).
- The experimental part is well-organized. Evaluations across diverse non-IID settings and clients scales over 11 baselines validate FAFI's effectiveness, scalability, and efficiency.
- The method significantly reduces communication overhead while outperforming multi-round FL baselines in efficiency-accuracy trade-offs (Figure 4). This aligns well with real-world FL deployment needs.
Weaknesses:
- Details on the noise in IFFI (eq. 8) are unclear. It is better to provide code and hyperparameter settings.
- The efficiency on extreme non-IID settings, such as . Figure 4 only provides the evaluations on . Please provide more settings to verify its efficiency.
其他意见或建议
- Some typos in appendix, e.g., line 653 should be .
- It is recommended that open-source code be provided to replicate the experimental results.
We sincerely thank the reviewer for taking the time to review our work. We greatly appreciate you find the paper is well-motivated and well-demonstrated. Please find our detailed responses to your concerns below.
W1 & Q1: Details in IFFI.
Ans for W1 & Q1: Thanks for your important comments. Sorry for missing the explanations about IFFI. We use the Gaussian distribution in Equ.8. For clarity, we will add an ablation study on the hyperparameters. More details can be seen in response to Reviewer BvHZ W2 & Q2 and will be added in Appendix E in the revision.
W2 & Q2: More settings to verify the effectiveness of FAFI.
Ans for W2 & Q2: Thanks for your important comments. We have conducted new evaluations on CIFAR-10 with data heterogeneity and , and the results are shown below.
| Methods | Acc. | Acc. | Comm. Cost |
|---|---|---|---|
| MA-Echo | 36.77 | 51.23 | 44.7 MB |
| O-FedAvg | 12.13 | 17.43 | 44.7 MB |
| FedFisher | 40.03 | 47.01 | 48.2 MB |
| FedDF | 35.53 | 41.58 | 44.7 MB |
| F-ADI | 35.93 | 48.35 | 44.7 MB |
| F-DAFL | 38.32 | 46.34 | 44.7 MB |
| DENSE | 38.37 | 50.26 | 44.7 MB |
| Ensemble | 41.36 | 45.43 | 44.7 MB |
| Co-Boosting | 39.20 | 58.49 | 44.8 MB |
| FuseFL | 54.42 | 73.79 | 53.32 MB |
| IntactOFL | 48.22 | 61.13 | 44.7 MB |
| FedAvg | 23.45 | 27.44 | 2.12 GB |
| FedProx | 13.53 | 17.58 | 44.7 MB |
| FedProx | 23.32 | 26.76 | 2.12 GB |
| SCAFFOLD | 12.45 | 16.23 | 89.4 MB |
| SCAFFOLD | 27.22 | 30.45 | 4.36 GB |
| FedCav | 12.49 | 16.77 | 44.8 MB |
| FedCav | 26.45 | 30.23 | 2.12 GB |
| FedProto | 12.11 | 16.23 | 44.7 MB |
| FedProto | 28.31 | 32.55 | 2.12 GB |
| FedDC | 11.32 | 15.23 | 44.7 MB |
| FedDC | 30.23 | 44.23 | 2.12 GB |
| Ours | 71.84 | 77.83 | 44.7 MB |
is the communication rounds. We note that FAFI can achieve competitive performance even in extreme heterogeneous scenarios (i.e., ), requiring only 44.7 MB of communication cost. Additional details will be included in Appendix F in our revised version.
C1 & C2: Typos and open source code
Ans for C1 & C2: Thanks for your comments. We will fix the typos in Appendix A, and we promise we will open the source code.
The paper addresses the critical challenge of model inconsistency in OFL due to heterogeneous data. They identify two key inconsistencies—intra-model and inter-model—and propose a novel framework, FAFI, which combines client-side self-aligned training (SALT) and server-side informative feature fusion (IFFI). Extensive experiments on three classification datasets demonstrate significant performance improvements over 11 baselines.
给作者的问题
- My main concern with this work is its applicability to resource-constrained scenarios, such as edge or mobile environments. In these scenarios, they tend to use more lightweight models~(MobileNet). Can FAFI perform well on some lightweight models? Besides, the diverse computation capabilities across clients restrict the clients from adopting the models with the same architecture, Can FAFI support heterogeneous model architectures?
论据与证据
The claims are clear and convincing.
方法与评估标准
FAFI consists of SALT and IFFI for intra- and inter-model inconsistencies, respectively. Evaluations on CIFAR-10/100 and Tiny-ImageNet are appropriate.
理论论述
I have checked the theoretical claims and proofs. Theorem 3.1 for intra-model inconsistency, Theorem 3.2 for inter-model inconsistency, and their proofs in the appendix are correct.
实验设计与分析
I have checked experimental designs or analyses.
补充材料
The authors have not provided the supplementary material.
与现有文献的关系
Unlike approaches that focus on server-side aggregation, this paper emphasizes improving local training strategies to achieve better models and mitigate the ‘garbage in, garbage out’ pitfall.
遗漏的重要参考文献
References are essential and sufficient for understanding the key contributions.
其他优缺点
Strengths:
- The proposed method is technically sound. Moreover, leveraging self-supervised methods and prototype learning is novel in OFL scenarios
- Comprehensive survey on existing OFL methods and good discussion on analogous methods, such as prototype-based FL and model merging approaches.
- Significant performance improvement and extensive and sufficient evaluations. Weaknesses:
- Computational costs for SALT’s contrastive learning (e.g., batch size=256) seem to be expensive for resource-constrained clients, which could be limited in edge scenarios with restricted computational capabilities.
- The reliance on class-wise prototypes assumes a fixed and known number of classes, which may not hold in dynamic or open-set FL scenarios.
- Some notions in theoretical analysis lacks of explanations.
- Federated learning under heterogeneous model scenarios is a very practical research problem, the model architectures across different clients are different. FAFI only consider the homogeneous model scenarios.
其他意见或建议
No
We sincerely thank the reviewer for taking the time to review our work. We greatly appreciate your recognition of the proposed method as both technically sound and high-performing. Please find our detailed responses to your concerns below.
W1 & Q1: Applicability to resource-constrained scenarios.
Ans for W1 & Q1: Thanks for your important comments. We note that our proposed FAFI can support lightweight models and is applicable to resource-constrained scenarios. We have complemented statistics on computational and memory cost with MobileNet and ResNet-18 on CIFAR-10.
| Methods | Memory Cost | Computation Cost (GPU/ CPU) | Accuracy |
|---|---|---|---|
| MobileNet | |||
| O-FedAvg | 1942 MB | 6s / 138s | 10.44 |
| IntactOFL | 1942 MB | 8s / 141s | 32.31 |
| Ours =32 | 1086 MB | 11s / 183s | 55.21 |
| Ours =256 | 1838 MB | 14s / 227s | 58.33 |
| Ours =512 | 3082 MB | 17s / 240s | 59.64 |
| ResNet-18 | |||
| O-FedAvg | 4638 MB | 8s / 197s | 12.13 |
| IntactOFL | 4638 MB | 10s / 204s | 48.33 |
| Ours =32 | 2789 MB | 18s/ 307s | 69.73 |
| Ours =256 | 8792 MB | 27s / 319s | 70.24 |
| Ours =512 | 14723 MB | 32s / 428s | 71.84 |
We use the GPU memory occupied during local training as the metric for memory cost, while computation cost is measured by the time taken per epoch on GPU(RTX 4090) and CPU(Intel Core i7-11700K). Here, represents the batch size. We believe that FAFI can achieve competitive performance even in resource-constrained scenarios, requiring only 11 seconds on a GPU with less than 2GB of memory or approximately 3 minutes on a CPU. Additional details will be included in our revised version.
W2: Assumption on the number of classes, which may not hold in dynamic or open-set FL scenarios.
Ans for W2: Thanks for your comments. FAFI assumes that the number of classes is known in advance. However, in dynamic or open-set FL scenarios, the number of classes may change over time. We note that FAFI can be easily extended to these scenarios by extracting the invariant features of the new classes and learning a new discriminative prototype for them. We will include this discussion in our revision.
W3: Some notions are not clear.
Ans for W3: Thanks for your comments. Sorry for the unclear notations. For clarity, 1) the is the performance discrepancies between any two samples with the same label ; 2) refers to the function that can abstract the feature variation among samples. We will clarify these notations in the revision.
W4 & Q1: Model heterogeneity scenarios.
Ans for W4 & Q1: Thanks for your comments. We note that FAFI can support heterogeneous models. We have conducted new evaluations on heterogeneous models with five different architectures (LeNet, ResNet-18, VGG, MobileNet, ResNet-50) on CIFAR-10 with data heterogeneity as follows and added more details in Appendix F. We only report part of them here due to the limited space.
| Client 0 | Client 1 | Client 2 | Client 3 | Client 4 | IntactOFL | Ours |
|---|---|---|---|---|---|---|
| LeNet | ResNet-18 | VGG | MobileNet | ResNet-50 | 48.33 | 71.84 |
| VGG | MobileNet | ResNet-50 | LeNet | ResNet-18 | 52.55 | 72.12 |
| ResNet-50 | LeNet | ResNet-18 | VGG | MobileNet | 54.12 | 72.45 |
Thank the authors for further experimental analysis and clarifying the rationale, which have addressed most of my questions and concerns. I appreciate this manuscript's interesting task, clear presentation, and extensive experiments.
Dear Reviewer Chua:
We're grateful for your quick feedback during this busy period. We deeply appreciate your consideration in raising the score. Your constructive comments have significantly contributed to the refinement of our work. Thanks a lot for your valuable comments! We will remain open and ready to delve into any more questions or suggestions you might have until the last moment.
Best regards and thanks
The paper addresses the OFL’s challenges due to the inconsistent models. Two technical innovations are proposed and achieved over 10% model accuracy improvement.