/10

Spotlight4 位审稿人

最低4最高5标准差0.5

ICML 2025

FedSSI: Rehearsal-Free Continual Federated Learning with Synergistic Synaptic Intelligence

Yichen Li,Yuying Wang,Haozhao Wang,Yining Qi,Tianzhe Xiao,Ruixuan Li

OpenReview PDF

提交: 2025-01-23更新: 2025-07-24

摘要

关键词

Federated LearningContinual Federated LearningData Heterogeneity

评审与讨论

审稿意见

评分: 52025-03-06

This paper introduces FedSSI, a novel regularization algorithm for continual federated learning that addresses the challenges of knowledge forgetting and data heterogeneity without replay. FedSSI can empirically and theoretically reduce computational overhead and outperform state-of-the-art methods．

update after rebuttal

My concerns are mainly addressed during rebuttal. Thus, I will keep my positive rating.

给作者的问题

Generative AI is an emerging topic. Only classification tasks have been studied and evaluated in the current paper. Can FedSSI generalize to generation tasks well, e.g., using diffusion models?
The proposed method depends on SI. If SI has inherent limitations or does not perform well under certain conditions, how to ensure the performance of FedSSI?

论据与证据

The claims in this paper are well-supported by clear and compelling evidence.

方法与评估标准

FedSSI is significant in addressing the current challenges in CFL by saving resources and alleviating heterogeneity.

理论论述

The theoretical claims in FedSSI are supported by clear proofs.

实验设计与分析

The paper presents a comprehensive evaluation with sufficient baselines across various datasets and scenarios.

补充材料

The appendix provides the experiment settings, and tests the resource cost, detailed results for each incremental task. This offers solid data support for the practicality of the methods.

与现有文献的关系

遗漏的重要参考文献

References are high-quality and sufficient.

其他优缺点

Strengths:

The paper is well-written and easy to follow, with a clear motivation and a thorough discussion of the limitations of previous work.
This paper is highly commendable for its pioneering exploration of CFL from the perspective of resource constraints. The research provides valuable insights and inspiration for future work.
The proposed method FedSSI is well-motivated and technically sound.
The experiment design is reasonable and comprehensive, and the analysis of results is thorough and exhaustive.

Weaknesses:

Although the proposed method avoids data rehearsal, introducing the PSM could add storage overhead that poses challenges for edge devices. The authors are supposed to discuss the storage costs or provide strategies to mitigate this issue.

其他意见或建议

作者回复

2025-03-30

Thank you for your careful review and valuable comments. In the following, we give point-by-point responses to each comment.

Q1. Concerns about the storage overhead caused by PSM.

R1: Thank you for this constructive suggestion. The PSM will be trained along with the global model on the current local task. Since this is purely local training assisted by an already converged global model, the training of the PSM is very fast (accounting for only 1/40 of the training cost per task and requiring no communication). We calculate and save the parameter contributions during the local convergence process of the PSM, which can then be locally discarded after its contribution has been computed. Then, each client trains on the new task with the local model and parameter contribution scores.

The model for an FL task is practically not as large as edge clients are mostly resource-limited, and the memory demand by PSM is relatively similar to existing methods like FedProx, as it manipulates over an extra model that has the same size as the federated model.

Possible Strategy: If an FL task has large models and limited memory left on the edge, a prerequisite can be reasonably assumed satisfied: transmission capacity (probably after optimization) is sufficient for the system. In such a case, we can record the PSM model in the server, which is only downloaded and updated locally to compute parameter contributions and then uploaded to the server again.

Q2. Combination of Gernerative AI and FedSSI.

R2: Thank you for raising this concern. Generative AI has emerged as a prominent research area in recent years, as it leverages the generalization capabilities of LLMs to enhance task performance and drive productivity improvements significantly. However, even in centralized learning paradigms, continual learning for Generative AI remains hindered by multiple challenges. Unlike conventional backbone models that primarily face catastrophic forgetting, Generative AI systems more frequently encounter difficulties acquiring new knowledge. Additionally, the current deployment of Generative AI models through quantization compression on small edge devices complicates continual learning implementation under resource-constrained conditions. While the continual federated learning has not yet published studies specifically addressing Generative AI and FedSSI is designed for traditional CFL tasks, the growing deployment of edge devices equipped with Generative AI capabilities and their potential to collect real-world data from novel scenarios underscores the urgent need for such research.

Q3. Dependence on SI Algorithm.

R3: Thank you for raising this concern. Although the FedSSI algorithm uses PSM to improve the SI algorithm, the selection of the SI algorithm itself is not random. In our empirical experiments, we analyzed a vast number of existing CFL methods and techniques, as well as traditional CL techniques. The SI algorithm was chosen as the appropriate one among them, and it has been widely recognized for its efficiency and feasibility in most scenarios. We believe that this assumption is acceptable, similar to how many FL studies are based on CNN and ResNet series networks without considering the feasibility of techniques on a single Linear network.

审稿意见

评分: 42025-03-10

This paper focuses on the continual federated learning and systematically analyzes the resource consumption of existing works. The authors propose a resource-friendly method based on the SI algorithm, FedSSI, which balances local and global knowledge. Extensive experiments and analytical understanding have been done to verify the effectiveness.

update after rebuttal

After reviewing all the author rebuttal and discussions, given the quality of this paper and their thorough rebuttal, I would like to vote for the acceptance of this paper. The authors have solved my concerns regarding experimental settings and paper details.

给作者的问题

While FedSSI calculates the contribution of each parameter during the training process, will its performance and cost be affected by the number of model parameters? Nowadays, LLM is a very hot topic. Can this method still work based on LLM?

论据与证据

All claims are well supported by evidence.

方法与评估标准

Studying the lightweight federated continual learning is interesting and is crucial for the practical deployment in real-world applications. The proposed FedSSI is simple yet effective with abundant experiments and analysis.

理论论述

The analytical understanding is easy to read and solid.

实验设计与分析

The paper presents a comprehensive evaluation with sufficient baselines across various datasets and scenarios.

补充材料

I have read the supplementary materials about experiments and settings.

与现有文献的关系

N/A

遗漏的重要参考文献

References are sufficient.

其他优缺点

Strengths:

The paper is well-organized and easy to read.
The experiments are adequate and sufficient.
The research top is interesting and may contribute to the practical applications.
The proposed method is easy to follow and seems promising.

Weaknesses:

Although this article demonstrates a high level of quality, there are still minor typos in its presentation. The Eq. (1) is not rigorous. To the left of the equal sign, should be w, and authors should unify the notation of the model w. In Table 2, the spelling of “CIFAI100” is wrong.
The authors should provide more details about the data partition and task setting. It is better to expand this, especially for readers who may not be familiar with continual federated learning.

其他意见或建议

N/A

作者回复

2025-03-30

Thank you very much for providing us with positive comments. In the following, we give detailed responses to each review.

Q1. Concerns about the selection of the hyperparameter $\lambda$

R1: Thanks a lot for raising this concern. In Table 3, $\alpha$ refers to the degree of data heterogeneity, while $\lambda$ is a control coefficient in the training process of PSM. In Proposition 1, we show that adjusting $\lambda$ can control whether the proportion of knowledge in PSM leans towards the local or global distribution (i.e., it is related to $\alpha$ ). When $\alpha$ has a higher value, indicating a trend towards homogeneity in distribution, clients need to focus more on local knowledge. This means that by setting a larger $\lambda$ value, PSM can rely more on local knowledge. Although we cannot directly relate $\alpha$ and $\lambda$ with a simple formula due to the complexity of the problem, even in specialized research on personalized federated learning (PFL), methods such as Gaussian mixture modeling are relied upon. This specific study goes beyond the resource-efficient CFL that we focus on in this manuscript, so we did not develop new mechanisms for this issue. In this paper, we can empirically and theoretically judge that there exists a positive correlation between $\alpha$ and $\lambda$ , which is supported by Proposition 1 and extensive experiments conducted in Table 3. We will consider this issue in our future research work.

Q2. Concerns about framework of FedSSI and more details about CFL

R2: Thank you for this helpful comment. We agree that a framework diagram could improve accessibility for readers less familiar with CFL. However, the regularization-based methods in FedSSI are inherently theoretical, making it challenging to explicitly visualize their nuanced mechanisms (e.g., PSM step or SI module) within a high-level framework. To address this, we have included Algorithm 1, which details the iterative steps of FedSSI. We appreciate your suggestion and will consider providing the framework in our final version. Moreover, we will further provide the relevant experimental details in the supplementary: we assign different numbers of tasks to various datasets. Using CIFAR-10 as an example, we set five tasks for class-incremental tasks, each covering two classes without data overlap. For domain-incremental tasks, each domain represents one task.

Q3. Concerns about FedSSI's adaptability to heterogeneous models

R3: Thank you for raising this concern. FedSSI still can work under model heterogeneity, but this heterogeneity will introduce novel challenges to CFL systems that have never been addressed. FedSSI’s core mechanism utilizes a PSM Module to address data heterogeneity and employs SI for continual learning. Notably, the SI algorithm is architecture-agnostic, as it operates by quantifying parameter contributions during gradient updates. Similarly, the PSM module—an initial copy of the local model—is independent of other clients’ model architectures. However, model heterogeneity exacerbates system heterogeneity due to divergent feature representation spaces across architectures. As CFL is an emerging research area, existing studies have yet to address model heterogeneity systematically. The reviewer’s suggestion highlights a promising research direction we will prioritize in future work.

审稿人评论

2025-04-02

The authors have well addressed my problems. Therefore, I vote for the acceptance of this paper.

作者评论

2025-04-02

Sincere thanks for your response! We will further improve our manuscript later.

Best of luck!

审稿意见

评分: 42025-03-12

The paper introduces a continual federated learning method, FedSSI, aimed to mitigate catastrophic forgetting without rehearsal. FedSSI employs the personalized surrogate model to strike a balance between global and local knowledge during the training process. Experimental results show that FedSSI can outperform other baselines.

给作者的问题

For edge devices with limited resources, each device may employ models with different architectures. In such a situation, can FedSSI still maintain its advantage?

论据与证据

The claims in the paper are clear and supported by convincing evidence.

方法与评估标准

The proposed method sounds technical and the evaluation is sufficient with various settings and advanced baselines.

理论论述

The theoretical analysis is explicit and easy to understand.

实验设计与分析

The experimental designs and statistical analyses are rigorous and valid.

补充材料

The supplementary material is a comprehensive description of the settings and additional experiment results.

与现有文献的关系

N/A

遗漏的重要参考文献

All essential references are included.

其他优缺点

Pros:

It is meaningful to explore the training cost in CFL, where edge devices are often equipped with portable but weak hardware.
The paper is well-organized and easy to read.
This paper innovatively introduces the PSM and successfully addresses the issue of data heterogeneity in CFL at a low cost.
The authors have conducted extensive experimental validations across multiple datasets and CFL scenarios, demonstrating the effectiveness of FedSSI.

Cons:

The hyperparameter $\lambda$ is based on data heterogeneity. An adaptive adjustment strategy can be discussed to enhance the robustness of FedSSI.
The authors can provide the framework of FedSSI for readers who are not so familiar with CFL. It will help enhance the understanding of the technique and CFL settings.

其他意见或建议

N/A

作者回复

2025-03-30

Thank you very much for this professional review. The critical comments have been addressed carefully, and responses have been given one by one.

Q1. Minor typos in our manuscript.

R1: Thank you very much for this helpful comment. We are sorry for the wrong spelling in Table 2 and will correct it. We will carefully polish our manuscript to further improve the presentation. In Eq.(1), we formulate the overall optimization objective of CFL. $w^t$ denotes the converged global model, and the superscript $t$ represents the number of tasks here.

Q2. Insufficient description of experimental settings.

R2: Thank you for this valuable comment. We will further provide the relevant experimental details in the supplementary: we assign different numbers of tasks to various datasets. Using CIFAR-10 as an example, we set five tasks for class-incremental tasks, each covering two classes without data overlap. For domain-incremental tasks, each domain represents one task.

Q3. Concerns about the LLM foundation for FedSSI.

R3: Thanks for raising this concern. LLMs have gained significant attention for their strong performance on conventional tasks, but their high computational and communication overheads prevent deployment on edge devices. Current CFL methods, including FedSSI, mainly focus on training with traditional architectures (e.g., CNN, ResNet), and we will later consider the LLM foundation for CFL in our future research. Moreover, calculating the contribution for each parameter incurs negligible computational overhead due to the PSM module. The PSM will be trained along with the global model on the current local task. Since this is purely local training assisted by an already converged global model, the training of the PSM is very fast (accounting for only 1/40 of the training cost per task and requiring no communication). We calculate and save the parameter contributions during the local convergence process of the PSM, which can then be locally discarded after its contribution has been computed. Then, each client trains on the new task with the local model and parameter contribution scores.

审稿意见

评分: 52025-03-14

The paper introduces FedSSI, a regularization-based continual federated learning (CFL) method designed to address catastrophic forgetting and data heterogeneity without requiring data rehearsal or heavy computational overhead. It identifies limitations in applying traditional regularization techniques like Synaptic Intelligence (SI) to heterogeneous data in federated learning scenarios. To overcome this, FedSSI proposes a Personalized Surrogate Model (PSM) that leverages both local and global information to calculate a surrogate loss tailored to client-specific data heterogeneity effectively. Experiments conducted across multiple benchmarks—including CIFAR10, CIFAR100, Tiny-ImageNet, Digit10, Office31, and Office-Caltech-10—demonstrate that FedSSI significantly outperforms existing methods, achieving accuracy improvements of up to 11.52% across different scenarios.

给作者的问题

1. Hyperparameter Selection How sensitive is FedSSI to the choice of λ, particularly under realistic scenarios where data distribution shifts might be unpredictable? Could an adaptive strategy for tuning λ be implemented practically? By clarifying this question would help to assess the practical usability of FedSSI in dynamic scenarios.

2. Computational Overhead While FedSSI is designed to be computationally efficient, but computing surrogate model in resource-limited edge devices will cause an computational overhead?

3. Client Participation How does FedSSI's performance scale with increasing numbers of clients? Have you evaluated its robustness or convergence speed in highly scaled FL scenarios (hundreds or thousands of clients with partial participation)?

4. Theoretical Bound on λ Is there a theoretical guideline or bound for selecting the optimal λ based on measurable properties of client data distributions (non-IID)?

论据与证据

The claims made by the authors, particularly the effectiveness of FedSSI in handling data heterogeneity and mitigating catastrophic forgetting, are supported by substantial experimental evidence. Experiments cover diverse datasets, clearly demonstrating performance gains over baseline methods. The claim that FedSSI addresses limitations of traditional regularization methods (e.g., SI) is convincingly supported by experiments illustrating the superior performance of FedSSI under various levels of data heterogeneity.

方法与评估标准

The evaluation criteria and methods proposed (such as Class-Incremental and Domain-Incremental learning tasks, along with different data heterogeneity levels using Dirichlet distribution) are appropriate for validating the method in realistic CFL scenarios. The benchmarks and comparison baselines used in the evaluation make sense and cover a comprehensive range of existing approaches in the literature.

理论论述

The paper includes theoretical discussions about the convergence and effectiveness of the Personalized Surrogate Model (PSM). The authors provide a theoretical analysis of the personalized surrogate model's convergence. Proposition 1 and Theorem 1 are theoretically sound and based on prior established results. No specific proof issues or errors were identified upon review.

实验设计与分析

The experimental designs are thorough, clearly defined, and valid for the CFL tasks explored. The authors have conducted comprehensive experiments across multiple datasets and scenarios, including consideration of data heterogeneity levels, and the analyses provided (e.g., comparing test accuracy and communication efficiency) are sound and convincing.

补充材料

The supplementary material mentioned includes appendices with additional experimental details, baseline descriptions, and hyperparameter settings. I reviewed these parts as described in the main paper, and they adequately support the primary results.

与现有文献的关系

The paper situates itself clearly within the broader literature on Continual Federated Learning and builds explicitly on Synaptic Intelligence (SI), extending this traditional continual learning technique to federated learning settings. It also positions itself relative to recent works addressing catastrophic forgetting in federated scenarios (FedWeIT, FOT, FedCIL, etc.), clearly articulating its contributions against existing CFL approaches.

遗漏的重要参考文献

This paper reviewed relevant references.

其他优缺点

Strengths

The paper addresses a challenging and important problem in CFL: catastrophic forgetting without rehearsal.
FedSSI extends synaptic intelligence for federated learning scenarios and heterogeneous data distributions.
Extensive experimental validation convincingly supports the efficacy of the proposed method.

Weakness

The personalized surrogate model introduces an additional local model update step, and while computational overhead is claimed to be minimal, practical implications of this overhead on low-resource edge devices might require further clarification.
The balance hyper-parameter (λ) requires careful tuning; however, the paper doesn't provide a fully automatic or adaptive mechanism for setting it dynamically in real-world deployments.

其他意见或建议

The paper is well written. No such major suggestions on writing.

作者回复

2025-03-30

Q1&Q4. Concerns about selection of the hyperparameter $\lambda$ and its theoretical bound

R1: Thank you for this valuable comment. We conducted relative experiments in Table 3. In Table 3, $\alpha$ refers to the degree of data heterogeneity, while $\lambda$ is a control coefficient in the training process of PSM. In Proposition 1, we show that adjusting $\lambda$ can control whether the proportion of knowledge in PSM leans towards the local distribution or the global distribution (i.e., it is related to $\alpha$ ). When $\alpha$ has a higher value, indicating a trend towards homogeneity in distribution, clients need to focus more on local knowledge. This means that by setting a larger $\lambda$ value, PSM can rely more on local knowledge.

Although we cannot directly relate $\alpha$ and $\lambda$ with a simple formula due to the complexity of the problem, even in specialized research on personalized federated learning (PFL), strategies such as Gaussian mixture modeling are relied upon. Another possible approach can be adapted from APFL [1], where the optimal weights are empirically determined through iterative gradient descent during optimization. However, this process may introduce additional model parameters and computational overhead.

This specific study goes beyond the resource-efficient CFL that we focus on in this manuscript, so we did not develop new mechanisms for this issue. In this paper, we can empirically and theoretically judge that there exists a positive correlation between $\alpha$ and $\lambda$ , which is supported by Proposition 1 and extensive experiments conducted in Table 3. We will consider this issue in our future research work.

[1] Deng Y, Kamani M M, Mahdavi M. Adaptive personalized federated learning[J]. arXiv preprint arXiv:2003.13461, 2020.

Q2. Concerns about the computational overhead brought by PSM.

R2: Thank you for raising this concern. The computational overhead of PSM is negligible compared to the overall CFL training process. The computational overhead of the PSM module scales proportionally with the complexity of learning tasks. For edge devices with limited computational capacity, their CFL tasks are typically simpler, thus the PSM’s computational demands scale down correspondingly. Specifically, the PSM will be trained along with the global model on the current local task. Since this is purely local training assisted by an already converged global model, the training of the PSM is very fast (accounting for only 1/40 of the training cost per task and requiring no communication). We calculate and save the parameter contributions during the local convergence process of the PSM, which can then be locally discarded after its contribution has been computed. Then, each client trains on the new task with the local model and parameter contribution scores. We analyze this issue in Line 232 in our submitted manuscript.

Q3. Concerns about the scalability of FedSSI.

R3: Thank you for your helpful comment. We apologize that we are unable to simulate thousands of clients to conduct scalability experiments due to hardware limitations. But we validated the scalability of FedSSI on 100 clients, which is also a common scale experiment in FL's work. We conducted further experiments by increasing the number of clients to 100 while reducing the client selection rate to 10%. We performed some related experiments on CIFAR10 and Digit10 ( $\alpha=10.0$ ), with the results as follows:

	Metric	FedAvg	FL+EWC	Re-Fed	FOT	FedSSI
CIFAR10	$A(f)$	18.67	19.93	19.44	21.26	23.61
	$\bar A$	45.8	46.38	44.08	47.02	47.14
Digit10	$A(f)$	55.91	56.82	54.91	56.06	59.35
	$\bar A$	70.37	70.4	66.24	69.69	71.27

Since the dataset needs to be divided into different numbers of tasks, an excessive number of clients can lead to a very small number of samples per client, making model training difficult. However, FedSSI still maintains a leading position.

最终决定Accept (spotlight poster)

2025-05-01

Thank you for submitting your work to ICML 2025, and for your efforts in the rebuttal phase to clarify the reviewers' concerns. Since all the reviewers affirmed the novelty of the authors' work, I therefore recommend acceptance of the paper.