PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
4
4
5
4
2.5
置信度
创新性2.8
质量3.3
清晰度3.3
重要性2.8
NeurIPS 2025

Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA

OpenReviewPDF
提交: 2025-05-02更新: 2025-10-29
TL;DR

RoLoRA improves federated fine-tuning alternating optimization of LoRA, enhancing expressiveness and robustness. It reduces communication costs by half and outperforms alternatives.

摘要

关键词
LoRAFederated Learning

评审与讨论

审稿意见
4

In this paper, the authors present the robust federated finetuning of LLMs using alternating optimization of LoRAA–RoLoRA. This is a novel federated learning framework for addressing the limitations of previous work for training LoRA adapters in distributed settings.

优缺点分析

The strengths of this paper are summarized as follows:

  1. This paper is well-motivated. It gives a very rigorous mathematical analysis and proofs for showing both linear and non-convex cases. It shows why using alternating optimization is necessary. Also, the algorithm is simple and effective. It maintains computational efficiency and exact model aggregation.

  2. Also, this paper gives a thorough experimental result on multiple models, tasks, and federated settings.

The weaknesses of this paper are summarized as follows:

Alternating optimization has been extensively studied in many other areas, such as weighted low rank approximation and low rank matrix completion. The key contribution of this paper seems to be applying this alternating optimization to study federated LoRA. However, this paper does not make any fundamental changes to this algorithm. It would be better if the authors could compare this paper with alternating optimization in other fields and highlight the novelty of this paper.

I am not an expert in this field, so I hope the AC will put more weight on the other reviewers' comments.

问题

Please see weaknesses.

局限性

Yes

最终评判理由

I have read the authors’ response, and to the best of my knowledge, it is convincing. All proofs appear to be correct. However, I have chosen to maintain my original score (Borderline Accept), which reflects a relatively neutral stance, as I am not an expert in this area.

格式问题

No

作者回复

Thank you for your thoughtful feedback and for recognizing the strengths of our work, including the theory, simplicity, and thorough experiments. We appreciate your positive assessment.

W1: Need for Clearer Distinction from Prior Alternating Optimization Methods

While conceptually inspired by classic methods, to the best of our knowledge, no existing algorithm both alternates the optimization of components and alternates the aggregation of the corresponding components. This alternating strategy addresses the inexact aggregation issue unique to federated LoRA, as discussed in Section 2.3. RoLoRA differs in both its specific algorithmic design and its application focus, as discussed in Appendix A2.

First, we distinguish our work from the field of matrix sensing and matrix completion. Unlike traditional methods [1,7] that assume centralized data, RoLoRA is explicitly designed for the decentralized federated learning setting. More importantly, our key algorithmic difference lies in the aggregation strategy itself. Prior work in federated matrix factorization [2] performs alternating updates only locally within each client. In contrast, RoLoRA introduces an alternating aggregation scheme at the server level, where the up- and down-projection matrices (A and B) are aggregated across communication rounds. Our algorithm is critical for mitigating destructive interference between the LoRA components, a unique challenge in this federated LoRA context.

Second, while RoLoRA shares a similar derivation framework with Multi-Task Representation Learning (MLRL) methods such as FedRep [3] and Vaswani et al. [4], it differs fundamentally in objective. MLRL aims to learn personalized models by keeping final layers diverse and unaggregated. In contrast, RoLoRA is designed to train a single, high-performing global model. This difference is reflected in Algorithm 2 (line 7) and in the theoretical analysis (Eq. 35-40), where RoLoRA's global aggregation on bb is captured through a union bound over each bib_i, followed by transforming the bound to their average. Moreover, the architectural roles of components differ: in FedRep, the local head acts as a down-projection, while in RoLoRA—built on the LoRA framework—the matrix BB acts as an up-projection, mapping the adaptation back to the original feature space.

Finally, we differentiate RoLoRA from personalized federated learning algorithms that also use an alternating structure. These methods [5,6] are inherently designed to create separate parameters for each client, whereas RoLoRA operates entirely within the global model paradigm. This distinction leads to different motivations and theoretical approaches. Our alternating scheme is motivated by the structural decomposition of LoRA matrices, not the separation of personal and global parameters. Consequently, our proof technique is entirely novel in this context, drawing from matrix sensing theory to analyze the learning process, in stark contrast to the standard FL convergence analysis used in personalized FL literature.

[1]Jain et al., "Low-rank Matrix Completion using Alternating Minimization", STOC13

[2]Wang et al. "Federated Matrix Factorization: Algorithm Design and Application to Data Clustering", TIP22

[3]Collins et al. "Exploiting Shared Representations for Personalized Federated Learning." ICML21

[4]Vaswani et al. "Efficient federated low rank matrix recovery via alternating gd and minimization: A simple proof", TIT24

[5]Mishchenko et al. "Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity", TMLR25

[6]Pillutla et al. "Federated Learning with Partial Model Personalization", ICML22

[7]Ward et al. "Convergence of Alternating Gradient Descent for Matrix Factorization", NeurIPS23

评论

Thank you very much for the clarifications. I have no further questions and will maintain my original score.

审稿意见
4

This paper proposes RoLoRA, an alternating optimization method for LoRA in LLMs, and provides a thorough theoretical analysis to demonstrate the importance of training the down-projection matrix in federated LoRA updates.

优缺点分析

Strengths:

  • The theoretical analysis is well-grounded. The use of both linear and non-linear models to demonstrate how the initialization of matrix AA significantly affects the performance of FFA-LoRA via angle distance is insightful.

Weaknesses:

  • The proposed method is not inherently specific to LLMs and could be applied to any LoRA-based update, reducing its uniqueness in the LLM context.

  • The theoretical analysis focuses on linear models, which do not capture autoregressive characteristic of modern LLMs.

问题

  • It seems that you used 3, 20, and 50 clients in your experiments. Could you clarify the rationale behind choosing these specific values, especially the large jump from 3 to 20?

  • (Discussion) In my opinion, a core issue in FFA-LoRA is that random initialization does not guarantee the subspace spanned by matrix AA includes the optimal subspace spanned by AA^*. I believe that alternating optimization of AA and BB could mitigate this issue. However, it's worth noting that in standard LoRA implementations, AA is typically initialized with Gaussian random values while BB is initialized to zero. This suggests that AA serves as a basis while BB acts as a coefficient matrix. Therefore, treating AA and BB symmetrically may not be optimal. Is there a principled approach to gradually adapt matrix AA toward the optimal basis while ensuring matrix BB is updated effectively?

  • Is there any evaluation on how rank affects the performance gap between RoLoRA and FFA-LoRA in a centralized setting? Since angle distance may decrease as the rank increases, it would be valuable to understand whether higher ranks reduce the effectiveness gap between the two methods.

局限性

Yes, the paper have addressed that the theoretical analysis centers around linear models.

  • Moreover, the theoretical analysis fail to capture the autoregressive architecture of modern decoder-only LLMs. As such, the theoretical analysis may not be sufficient to support the claimed benefits in LLM training. If the authors are unable to extend the theory to autoregressive or non-linear models more representative of LLMs, I recommend placing greater emphasis on empirical evaluations or supporting the claim of effectiveness on general neural architectures rather than LLMs through theoretical analysis.

最终评判理由

The rebuttal addresses most of my concerns. The experiments on alternative strategies are insightful. I hope there will be further deep investigation into these insights in the camera-ready version. I'll raise my score.

格式问题

  • In line 30,

However, these studies typically apply the FedAVG algorithm directly to LoRA modules, resulting in in-exact model updates, as we will discuss later in the paper.

could be revised for clarity. Since the limitations of directly applying FedAvg to LoRA are already well-known in the community, it might be more effective to succinctly state the issue and directly transition to introducing the methods that address it, without referring to future discussion in the paper.

  • In page 18, one of the equations exceeds the page width.
作者回复

We sincerely thank the reviewer for their thorough, thoughtful, and insightful feedback. We'll address each point below.

Q1: Client Number Selection

Our choice of 3, 20, and 50 clients was intended to comprehensively showcase RoLoRA's performance across different scales of FL scenarios, and specifically to highlight RoLoRA's advantage in achieving both faster convergence (Fig.3) and higher final accuracy (Table 1) in large-scale settings. To provide a smoother transition and further strengthen our conclusions, we have now added experimental results for 10-client setting with rank-4 adapter. The results show that RoLoRA still outperform other methods. In the 10-client setting, RoLoRA's performance gain over other methods falls between the gains observed in the 3-client and 20-client settings.

Acc(std)MNLIQQPQNLI
LoRA81.48(2.19)84.11(0.14)87.73(0.67)
FFA-LoRA83.19(0.64)84.35(0.06)89.88(0.13)
RoLoRA84.95(0.8)85.25(0.39)90.3(0.76)

Q2: Asymmetry in LoRA and Basis Adaptation Strategy

We thank the reviewer for their deep and insightful question regarding the symmetric treatment of matrices A and B in RoLoRA. To empirically investigate whether an asymmetric update schedule might be preferred, we have conducted new experiments comparing our standard approach to two alternative strategies in a 50-client setting using rank-4 adapter on the QQP. We consider strategy 1 that prioritizes updating B, updating 3 times for every one-round update of A (e.g., B, B, B, A, ...), and strategy 2 that prioritizes updating the basis matrix A(e.g., B, A, A, A, ...). While A and B serve inherently asymmetric roles, we observe that aggressively updating either matrix without sufficiently adapting the other leads to misalignment. Furthermore, we considered even more fine-grained approaches to handle this asymmetry, such as assigning different learning rates for A and B. We adapt LoRA+'s strategy [4] into federated setting by using a larger learning rate for B. Strategy 3 and 4 corresond to lrB=2lrAlr_B=2lr_A and lrB=4lrAlr_B=4lr_A, respectively. Our experimental results showed that a symmetric approach with equal learning rates still yielded the best performance.

It appears that a balanced, symmetric alternation between the matrices A and B yields the most stable and effective optimization trajectory. RoLoRA's design provides this balance and provides a robust strategy to update A and B matrices. We will add a remark clarifying this in the paper.

Round12345678...30
RoLoRABABABABA...A
Test Acc.42.4863.1877.7779.0180.7582.1082.5283.4...86.30
Strategy 1BBBABBBA...B
Test Acc.42.4863.1877.7378.8278.7779.4281.5481.86...85.15
Strategy 2BAAABAAA...A
Test Acc.42.4863.1863.1863.1868.9674.8074.8875.01...84.77
Strategy 3BABABABA...A
Test Acc.47.5463.1865.8674.9279.5680.5681.2681.99...85.7
Strategy 4BABABABA...A
Test Acc.63.1863.1865.2371.4174.667677.0579.38...84.36

Q3: Impact of Rank on RoLoRA vs. FFA-LoRA

To investigate the reviewer's hypothesis, we've conducted a new set of experiments in a centralized setting as requested. We evaluated both methods on the MNLI and QQP using 8 LoRA adapters attached to query and value projection of last 4 layers, and trained for 50000 iterations to ensure full convergence. We note that when the rank increases from 11 to 6464 the gap between the two schemes decreases as suggested by the reviewer. Another related technique to narrow the performance gap between the two schemes is by increasing the number of adapters, as discussed in Section 5.1 (“Effect of Number of Fine-Tuning Parameters”). With sufficient adapters, FFA-LoRA can achieve comparable peak accuracy to RoLoRA. However, in federated settings where resources are constraiend, RoLoRA is more advantageous.

rank-1rank-2rank-32rank-64
FFA-LoRA80.6681.5183.383.32
QQPRoLoRA83.9384.5985.7885.79
Diff+3.27+3.08+2.51+2.47
FFA-LoRA69.6174.0175.5375.51
MNLIRoLoRA77.2677.4178.0378.05
Diff+7.65+3.40+2.5+2.54

Limitation: Theoretical Analysis for Autoregressive LLMs

Our theoretical analysis deliberately focuses on simplified models to provide foundational insights. The linear model provides a tractable case where different methods can be directly compared and their limitations can be formally established. We then emperically demonstrate that similar findings also apply in more complex neural networks and large langugage models. This approach aligns with standard practice in federated learning, where simplified yet representative linear models are used to isolate and understand the core mechanics of new methods [1,2]. Within this setting, we formally prove the mechanisms behind RoLoRA's improved performance and also provide a convergence analysis in non-convex loss landscapes.

It is also important to note that this simplification is common across the LoRA literature. To the best of our knowledge, no existing work on LoRA or its variants has provided a theoretical analysis that fully captures the autoregressive architecture of modern decoder-only LLMs. Prior theoretical analysis are typically limited to linear regression settings [4,6] or rely on assumptions such as rank-one shift, rank-one LoRA, or frozen B[7]. Our model is thus aligned with the current state of research [3,4,5,6], while addressing a more complex federated setting.

We view our extensive empirical evaluations on LLMs as the primary validation of our claims. The strong performance gains demonstrated in these experiments directly align with the predictions of our theory, showing that the foundational insights successfully transfer from the theoretical model to complex, practical applications.

[1]Collins, Liam, et al. "Exploiting shared representations for personalized federated learning." ICML21

[2]Collins, Liam, et al. "Fedavg with fine tuning: Local updates lead to representation learning." NeurIPS22

[3]Zhu, Jiacheng, et al. "Asymmetry in Low-Rank Adapters of Foundation Models." ICML24

[4]Hayou, Soufiane et al. "LoRA+: Efficient Low Rank Adaptation of Large Models." ICML24

[5]Zhang, Yuanhe et al. "LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently." ICML25

[6]Zhang, Fangzhao et al. "Riemannian preconditioned LoRA for Fine-Tuning Foundation Models." ICML24

[7]Dayi, Arif Kerem et al. "Gradient dynamics for low-rank fine-tuning beyond kernels."

Revisions to Paper Formatting We thank the reviewer for their careful reading and helpful suggestions for improving the paper's formatting and clarity. We will address both points in the revision. For clarity in line 30, we will revise it to succinctly state the well-known issue of applying FedAvg to LoRA and directly transition to the methods that address it.

评论

Thank you for your rebuttal, which addresses most of my concerns. The experiments on alternative strategies are insightful. I hope there will be further deep investigation into these insights in the camera-ready version. I'll raise my score.

评论

Thank you for your kind response and for recognizing the value of our additional experiments. We’re pleased to hear that they provided helpful insights, and we will incorporate a deeper investigation of these findings in our next revision.

审稿意见
5

This paper describes a novel method for federated learning with LoRA that improves on prior art performance. It belongs to a class of algorithms that address the shortcoming of directly applying standard federated optimization methods such as FedAvg on LoRA modules. LoRA's efficiency in module storage opens up impactful opportunities in federated learning applications where communication efficiency is crucial. However, brute force combining of the two methods often leads to interference and poor performance as theoretically and empirically shown in this paper. Hence a prior body of research has been dedicated to resolving the performance gap. This paper builds on prior arts of novel strategies based on treatment of the adaptation matrices A and B during the federated optimization process. The authors observe that a lot of the performance gap can be recovered by taking turn between optimizing A and B in alternating communication rounds, versus methods from prior arts that freeze A or B, or that optimize A and B in every round.

优缺点分析

Strengths:

  1. Quality and clarity: well written and comprehensive, well structured with detailed analysis and easy to follow
  2. Originality: novel method that wins in its simplicity and effectiveness
  3. Significance: especially applicable today where LLMs have found mainstream and user privacy is crucial

Weaknesses:

  1. Societal impact evaluation may be missing

问题

  1. Could you comment on how the number of local steps impact the performance of RoLoRA and FFA-LoRA?
  2. Could you comment on the intuition behind why FedAvg + LoRA performs relatively well in Table 2? How does data heterogeneity impact performance of RoLoRA?

局限性

No except with one mention of data privacy guarantee. Suggestion is to stress significance of privacy protection for client data.

最终评判理由

Simple, effective method. Written in clarity. With strong applicability.

格式问题

NA

作者回复

We are deeply grateful to the reviewer for their thoughtful and encouraging reviews. Below, we address your insightful questions.

Q1: Effect of local steps

To provide a clear, empirical answer, we have conducted an ablation study on the number of local steps in a 50-client setting with rank-4 adapter. For a fair comparison, we kept the total computational budget (Local Steps * Total Rounds) constant across all settings. FFA-LoRA's performance consistently degrades on both datasets as the number of local steps increases. This indicates that with more local work per round, FFA-LoRA suffers severely from client drift, where local models overfit to their own data. RoLoRA maintains high performance across all settings.

Dataset(Local steps, total rounds)(1,600)(5,120)(10,60)(20,30)
MNLIFFA-LoRA72.52(0.68)71.73(1.27)69.64(4.31)69.97(5.57)
RoLoRA84.39(0.34)84.96(0.18)84.79(0.23)82.98(3.36)
QQPFFA-LoRA80.51(1.38)80.2(1.65)79.07(1.21)78.44(0.41)
RoLoRA85.24(0.56)85.44(0.8)84.77(0.77)85.71(0.18)

Q2: FedAvg+LoRA Performance and Data Heterogeneity We believe the relatively good performance of FedAvg+LoRA in Table 2 is partially due to the small number of clients (10). Our findings suggest that the performance of FedAvg+LoRA is more sensitive to the number of clients than to data heterogeneity alone. With fewer clients, the model aggregation becomes more accurate. As the number of clients increases, aggregation inaccuracy becomes more pronounced.

While RoLoRA, like all federated methods, is inevitably affected by data heterogeneity, it demonstrates greater robustness to client drift induced by non-IID data distributions. Evidence for this can be found in our ablation study on the number of local epochs, where RoLoRA maintains competitive performance under increased local update steps, which is known to exacerbate drift. By contrast, FFA-LoRA exhibits a sharper performance degradation in non-IID settings, aligned with our observation in the same ablation study. Further comparisons between i.i.d. and non-i.i.d. scenarios in the 10-client setting can be drawn from Table 2 and the additional experiments provided in response to Reviewer jkws under “Client Number Selection.” These observations highlight RoLoRA's improved resilience to heterogeneity, particularly in scenarios with significant local deviation.

Limitation: privacy protection for user data

Thank you for this important and constructive suggestion. While RoLoRA was not explicitly designed with privacy mechanisms such as differential privacy, we recognize that its ability to mitigate inexactness through alternating optimization and aggregation may help mitigate a key challenge for DP-aware federated learning, where inexact model updates can be particularly problematic. We itegrated NbAFL [1] with ϵ=10,δ=1e6\epsilon=10,\delta=1e-6 in a 3-client setting. RoLoRA outperform others across two tasks.

MethodMNLI (std)QQP (std)
LoRA72.79(5.23)57.97(8.23)
FFA-LoRA79.65(0.56)78.16(0.60)
RoLoRA81.08(0.81)81.64(0.35)

[1]Wei, Kang, et al. "Federated learning with differential privacy: Algorithms and performance analysis." TIFS2020

评论

Thank you for the rebuttal. I enjoyed the paper and the thoughtful rebuttal comments thoroughly, learning quite a bit along the way. I will maintain my score.

审稿意见
4

This paper proposes RoLoRA, a parameter-efficient fine-tuning method for federated fine-tuning of LLMs. RoLoRA learns the two LoRA low rank terms in an alternating minimizing style, achieving SoTA fine-tuned accuracy without increasing the communication cost.

优缺点分析

As an emergency reviewer, I was assigned with this paper without bidding. Sorry for the fact that I have limited knowledge in federated learning and hope the following comments & questions are reasonable.

Strength

  1. Clear method description: the method and derivations in this paper are well organized
  2. Sufficient experiments: the paper covers both ideal toy experiments and practical problems, with additional experiments in the appendix.
  3. The improvement of the baselines are clear: the fine-tuning process of RoLoRA is more stable, and achieves much higher model performance when the number of clients is large.

Weakness

Limited by my knowledge background, I did not observe weakness/limitations of this paper.

问题

  1. According to line 288, comparison against baselines are run under the same number of communication rounds. Does this mean roughly the same amount of communication cost like the amount of data transferred?
  2. In Table 1, the rank is 4 and 8, which seems smaller than the common values of 32 used in PEFT papers. Could the author explain the reason behind it, or show the results with higher rank?
  3. In Figure 3, could the author explain why the error band of MNLI accuracy is much wider than the other two baselines?

局限性

Limited by my knowledge background, I did not observe weakness/limitations of this paper.

最终评判理由

All of my concerns have been addressed. I would like to keep my current score due to my limited knowledge in this area.

格式问题

None

作者回复

We sincerely thank the reviewer for their time and effort. We are grateful for the insightful questions, which we address below.

Q1: Clarification on Communication Rounds vs. Communication Cost

Thank you for this question. It allows us to clarify a key contribution of our work: RoLoRA's ability to achieve superior performance at a significantly reduced communication cost. Our emphasis on using the same number of communication rounds is intended to ensure a fair comparison within a typical federated learning setting. First, we note that in our paper, unless otherwise specified (as in Figure 11 in the appendix), the communication cost per round is by default different across the methods being compared, since RoLoRA and FFA-LoRA only need to transmit a single LoRA component in each round due to their freezing nature. In contrast, standard LoRA and FlexLoRA must transmit both LoRA components (A and B). This means that for a given rank, RoLoRA has half the communication cost per round compared to LoRA and FlexLoRA. Second, to make the comparison even more direct, in Figure 11 (Appendix), we compare all methods under an identical total communication budget (i.e., the same total megabytes transferred). To achieve this, we configured RoLoRA and FFA-LoRA with twice the number of LoRA adapters so their total communication cost matches that of FedAvg with LoRA. The results show that when the communication cost is equalized, RoLoRA can converge to a better accuracy.

Q2: Rank-32 results

We chose rank-4 and rank-8 in Table 1 to reflect practical scenarios in federated settings, where communication and memory constraints on edge devices often necessitate smaller ranks.

To address the reviewer's concern, we have conducted additional experiments with rank-32 adapter in the 20-client and 50-client setting. The results, averaged over three runs, are aligned with our previous observations: RoLoRA consistently outperforms others in the setting with large number of clients at higher ranks.

We will include the rank-32 results in the revision.

Client numAcc(std)MNLIQQP
LoRA79.72(0.38)83.66(0.02)
20FFA-LoRA80(0.47)84.08(0.31)
RoLoRA85.91(0.63)86.37(0.09)
LoRA70.84(4.63)79.75(0.31)
50FFA-LoRA74.47(1.57)80.65(0.31)
RoLoRA85.46(0.08)86.15(0.26)

Q3: Error Band in Fig.3

The observed variance arises from a combination of training dynamics and task difficulty. As shown in Fig.3 (MNLI), none of the methods have fully converged—RoLoRA is nearing convergence, while LoRA and FFA-LoRA are still early in training. High variance across runs is expected in the intermediate phase due to different optimization paths from random initialization. This effect is also evident in SST-2 and QNLI.

Additionally, variance magnitude depends on task difficulty. MNLI's complexity leads to more diverse training trajectories and higher variance, whereas simpler tasks like SST-2, QNLI, or BoolQ show more stable results. We consistently observe greater variance on harder tasks like PIQA and SIQA.

评论

Thanks the author for clarification. My concerns have been addressed. I would like to keep my current score due to my limited knowledge in this area.

评论

For clarity and simplicity, we will refer to Reviewers WEwx, jkws, 3VmG, 5j6K as R1, R2, R3, and R4, respectively, in the following response.

We sincerely thank all reviewers for their thoughtful and constructive feedback. We are encouraged by their recognition of the key strengths and contributions of our work.

In particular, we appreciate the acknowledgment of our rigorous theoretical analysis (R2, R4), with one reviewer finding it particularly 'insightful'. We also value the recognition of the clarity and comprehensive nature of our manuscript (R1, R3), and the novelty and effectiveness of RoLoRA in the federated LoRA context (R3). We are also grateful for the positive comments on our sufficient and thorough experimental results which demonstrate clear improvements over baselines (R1, R4).

We have carefully addressed each individual comment provided by the reviewers and believe we have successfully responded to their concerns. Below, we summarize the core responses to the reviewers' comments.

Updates of experimental results and in-depth discussions during Rebuttal

  • Added new experimental results for 10-client and rank-32 settings. To address questions about performance at different scales (R1, R2), we have included new results for a 10-client setting (rank-4) and a rank-32 setting (20 and 50 clients), demonstrating RoLoRA's consistent advantages.
  • Added ablation study on the impact of local steps. In response to R3, we conducted an ablation study showing RoLoRA's robustness to increased local computation, contrasting it with FFA-LoRA, which suffers from client drift.
  • Added new experiments analyzing the impact of rank in a centralized setting. As requested by R2, we investigated the performance gap between RoLoRA and FFA-LoRA across various ranks (1 to 64), confirming the gap narrows at higher ranks but RoLoRA maintains an advantage, especially in resource-constrained scenarios.
  • Added ablation study on asymmetric update strategies for LoRA matrices. To address R2's insightful question, we empirically tested multiple asymmetric update schedules and learning rates for matrices A and B, concluding that RoLoRA's balanced, symmetric approach yields the most stable and effective performance.
  • Added new experimental results on integrating differential privacy. Following R3's suggestion, we conducted experiments integrating RoLoRA with the NbAFL framework, demonstrating its superior performance in a privacy-preserving federated setting.
  • Appendix A2: Further clarified the novelty of RoLoRA with a detailed comparison to related work in areas like matrix completion, multi-task representation learning and personalized FL, explicitly distinguishing our contribution (in response to R4).
  • Added in-depth discussion on data heterogeneity and FedAvg+LoRA. In response to R3, we elaborated on why FedAvg+LoRA performs well in low-client-number settings and highlighted RoLoRA's superior resilience to data heterogeneity, referencing our new ablation studies.

We thank the reviewers for their constructive feedback, which has provided a clear path for improving our work. We will implement all the proposed clarifications and experiments in the next revision. In particular, we appreciate R2's valuable suggestion and will conduct a deeper investigation into the insights on alternative strategies, with the goal of including this extended analysis in the next revision.

We look forward to the reviewers' favorable consideration and remain grateful for their valuable feedback.

最终决定

All reviewers suggest accepting the paper, so will be accepted. Please address the remaining concerns in the camera ready version.