PaperHub
6.4
/10
Poster4 位审稿人
最低4最高4标准差0.0
4
4
4
4
3.0
置信度
创新性2.5
质量2.8
清晰度3.0
重要性2.5
NeurIPS 2025

Accelerated Vertical Federated Adversarial Learning through Decoupling Layer-Wise Dependencies

OpenReviewPDF
提交: 2025-05-08更新: 2025-11-06

摘要

Vertical Federated Learning (VFL) enables participants to collaboratively train models on aligned samples while keeping their heterogeneous features private and distributed. Despite their utility, VFL models remain vulnerable to adversarial attacks during inference. Adversarial Training (AT), which generates adversarial examples at each training iteration, stands as the most effective defense for improving model robustness. However, applying AT in VFL settings (VFAL) faces significant computational efficiency challenges, as the distributed training framework necessitates iterative propagations across participants. To this end, we propose **_DecVFAL_** framework, which substantially accelerates **_VFAL_** training through a dual-level ***Dec***oupling mechanism applied during adversarial sample generation. Specifically, we first decouple the bottom modules of clients (directly responsible for adversarial updates) from the remaining networks, enabling efficient _lazy sequential propagations_ that reduce communication frequency through delayed gradients. We further introduce _decoupled parallel backpropagation_ to accelerate delayed gradient computation by eliminating idle waiting through parallel processing across modules. Additionally, we are the first to establish convergence analysis for VFAL, rigorously characterizing how our decoupling mechanism interacts with existing VFL dynamics, and prove that _DecVFAL_ achieves an $\mathcal{O}(1/\sqrt{K})$ convergence rate matching that of standard VFLs. Experimental results show that _DecVFAL_ ensures competitive robustness while significantly achieving about $3\sim10\times$ speed up.
关键词
Vertical Federated LearningAdversarial TrainingAdversarial SampleRobustness

评审与讨论

审稿意见
4

This paper proposes DecVFAL framework to accelerate vertical federated adversarial training. DecVFAL decouples the bottom modules of clients and backpropagation to eliminate idle waiting times. Experiments on four public benchmarks demonstrate that DecVFAL effectively balances computational efficiency with model robustness.

优缺点分析

  • Strengths
  1. The paper is well-written and relatively easy to follow.
  2. There are extensive theoretical analysis about the convergence analysis.
  3. Well-designed ablation studies of the impacts of components and hyperparameters are conducted.
  • Weaknesses
  1. The problem setting seems niche with only 2 clients in main experiments. Even in ablation studies, the max number of clients is only 7.
  2. The model deployed in server side is kept as a single-layer perceptron. Analysis or justification for such choice could be helpful.
  3. The speedup is dependent on the split position. Training time increases with more layers included in bottom layers, which restricts the architecture.

问题

  1. Will deploying more complex models on the server side, such as multi-layer perceptron affect the results?
  2. Typo: "single-layer perception" in line 859

局限性

yes

最终评判理由

The rebuttal effectively addresses the main concerns, particularly on client scaling and server model analysis. Therefore, I maintain my score of 4.

格式问题

N/A

作者回复

Thank you for your insightful comments and for recognizing the contribution of our work. Your feedback has been invaluable in enhancing the quality and clarity of our manuscript. We reply to the weaknesses and questions.

W1-Client Scale:

Experiments on More Clients. Thank you for this valuable feedback. We have conducted additional experiments on MNIST with 14 and 28 clients as shown in Table 1. These results demonstrate that DecVFAL maintains strong robustness and computational efficiency across different client configurations.

Table 1: Results with Varying Client Numbers (MNIST)

No. ClientsMethodCleanRobustAccuracyTime (h) ↓Speedup
FGSMPGDAAvs PGD
14PGD96.7986.3385.1678.313.09
FreeAT97.7265.1067.5047.482.471.25×
YOPO96.7483.6483.1974.501.771.74×
DecVFAL97.1189.6386.4079.440.714.35×
28PGD95.9588.5387.1680.273.59
FreeAT96.8382.3983.0065.920.874.12×
YOPO95.9389.0988.8683.750.834.32×
DecVFAL96.0089.9988.0482.630.734.92×

W2&Q1-Server model:

  1. It is worth that server and client represent functional roles rather than physical deployment constraints. The single-layer perceptron for the server model is a standard configuration in VFL research, following established VFL frameworks where the server typically handles final classification while clients process feature-specific computations [1-3].

  2. Thank you for this insightful question. We tested DecVFAL (M=6, N=8) on the MNIST dataset by deploying various server-side models: 1-layer perceptrons, 4-layer perceptrons, and 16-layer perceptrons. The experimental results reveal that as the server model becomes more complex (with more layers), it tends to slow down the entire framework's parallel efficiency. This happens because the server acts as a single module, and when it takes longer to process, all other modules must wait for its output before proceeding.

Table: Results with Different Server Model Complexities

Server ModelMethodCleanRobustAccuracyTime ↓
FGSMPGDAA(s/epoch)
1-LayerDecVFAL98.1090.4091.9389.589.866
4-LayerDecVFAL98.3590.5791.7088.749.986
16-LayerDecVFAL97.7589.7889.6083.9713.218

W3-Split Position:

We acknowledge this important practical consideration. When one module contains significantly more layers, other modules must idle waiting for its output, reducing overall efficiency as demonstrated in Table 6. However, our framework provides flexibility to optimize split positions based on computational balance. We recommend treating server-client communication as a module boundary and ensuring each module's computation time is balanced and slightly greater than communication delay. This design enables modules to simultaneously send embeddings while receiving gradients, effectively eliminating communication-induced idle time and achieving maximum acceleration benefits.

Q2-Typo:

Thank you for catching this typo. We will correct "single-layer perception" to "single-layer perceptron" in line 859.

[1] LIU, Yang, et al. Vertical federated learning: Concepts, advances, and challenges. IEEE transactions on knowledge and data engineering, 2024, 36.7: 3615-3634.

[2] WEI, Kang, et al. Vertical federated learning: Challenges, methodologies and experiments. arXiv preprint arXiv:2202.04309, 2022.

[3] WANG, Ganyu, et al. A unified solution for privacy and communication efficiency in vertical federated learning. Advances in Neural Information Processing Systems, 2023, 36: 13480-13491.

评论

Thanks for the author's clarification. My concerns have been addressed.

审稿意见
4

This paper introduces DecVFAL, a novel framework that addresses the computational efficiency challenges in VFAL through an innovative dual-level decoupling mechanism. Specifically, the framework first employs "lazy sequential backpropagation" to decouple the client bottom module from the rest of the network, reducing communication frequency between participants; second, it implements "decoupled parallel backpropagation" to enable asynchronous computation across modules, eliminating idle waiting time. The authors provide the first convergence analysis for VFAL, proving that DecVFAL achieves an O(1/√K) convergence rate matching standard VFL. Experimental results demonstrate that DecVFAL achieves 3-10x training acceleration compared to existing methods while maintaining model robustness.

优缺点分析

Strengths

  1. The paper addresses a critical computational bottleneck in VFAL, which has significant implications for practical deployment. The proposed dual-level decoupling mechanism represents a novel approach that cleverly resolves the inherent sequential dependencies in VFAL.

  2. The paper provides the first convergence analysis for VFAL, considering the interaction of multi-source approximation gradients, decoupling mechanisms, and VFL architecture, which is a significant contribution to the theoretical foundation.

  3. The work includes extensive evaluations across multiple datasets, different model architectures, and various attack types, demonstrating the effectiveness and generalizability of the method. The detailed analysis of key parameters (M, N), number of modules, and split positions provides valuable insights into the method's design principles.

Limitations and Questions

  1. While the paper mentions differential privacy guarantees, there is limited analysis of whether the decoupling mechanism (particularly delayed gradients) introduces new privacy vulnerabilities. Could malicious participants extract additional information from these delayed gradients?

  2. The proposed decoupling strategies are primarily based on feed-forward network architectures. How would these strategies adapt to models with complex structures (e.g., Transformers or networks with dense skip connections), particularly when these structures are distributed across different participants?

  3. While DecVFAL reduces the number of communication rounds, each round may transmit more information. How does this trade-off affect overall performance in bandwidth-constrained environments?

  4. The experiments indicate that M and N parameter choices significantly impact performance. Is it possible to develop adaptive strategies that dynamically adjust these parameters based on task characteristics, communication costs, and computational resources?

问题

See above.

局限性

See above.

格式问题

None

作者回复

We sincerely thank you for your insightful comments and constructive criticisms. Your feedback has been invaluable in improving the quality and clarity of our manuscript. Below, we address the weaknesses and respond to your questions.

W1-Privacy Analysis:

  1. Delayed gradients do not introduce new privacy vulnerabilities. Delayed gradients transmit same content as immediate gradients differing only in transmission timing. Malicious participants cannot extract additional private information from delayed gradients beyond what they could already obtain from immediate gradients.

  2. DecVFAL operates within the VFL-CZOFO framework [1], which provides differential privacy through zeroth-order optimization. The delayed gradients undergo the same privacy-preserving transformations as standard gradients, maintaining equivalent protection levels.

  3. While AT increases communication rounds and may require a larger privacy budget compared to standard VFL-CZOFO, we will provide detailed privacy analysis in the revised version to quantify this trade-off between robustness and privacy.

W2-Model with Complex Architectures:

Decoupling easily adapts to complex architectures by treating functional blocks as module boundaries [2-6]. For complex structures like Transformers and residual networks, we can treat entire functional blocks (including their attention or residual connections) as single dynamical units. This preserves the sequential flow while enabling parallel processing. Our ResNet-18 experiments illustrate this concept: residual blocks naturally work as complete dynamical units, maintaining both the residual connections and the dynamical systems framework (footnote 4, page 7).

W3-Transmitted Content:

DecVFAL and standard VFAL transmit the same content (i.e. forward propogation for embeddings and backward propagation for gradients).DecVFAL achieves efficiency by reusing gradients to reduce the total number of communication rounds.

W4-The selection of M and N:

  1. The optimal M and N values depend on many factors: dataset size and features, model structure, and training settings like learning rate and adversarial step size. Because of these complex interactions, developing an adaptive strategy is difficult.
  2. Practical Selection Strategy. Empirically, M and N can be selected by referencing the choice of r in PGD-r, where M×N should be slightly larger than r, and maximizing M within the communication budget typically achieves excellent performance (only M relates to server-client communication).

[1] WANG, Ganyu, et al. A unified solution for privacy and communication efficiency in vertical federated learning. Advances in Neural Information Processing Systems, 2023, 36: 13480-13491.

[2] CHEN, Ricky TQ, et al. Neural ordinary differential equations. Advances in neural information processing systems, 2018, 31.

[3] LI, Qianxiao, et al. Maximum principle based algorithms for deep learning. Journal of Machine Learning Research, 2018, 18.165: 1-29.

[4] LI, Qianxiao; HAO, Shuji. An optimal control approach to deep learning and applications to discrete-weight neural networks. In: International Conference on Machine Learning. PMLR, 2018. p. 2985-2994.

[5] HAN, Jiequn, et al. A mean-field optimal control formulation of deep learning. Research in the Mathematical Sciences, 2019, 6.1: 1-41.

[6] WEINAN, Ee. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 2017, 5.1: 1-11.

评论

Dear Reviewer yT3W,

We appreciate your insightful review and the important questions you raised about privacy analysis, complex architectures, and parameter selection. Our rebuttal provides detailed responses to each of your concerns.

We would welcome any additional feedback or clarifications you might need, as this would help us ensure a comprehensive discussion of our work. Any additional feedback in the remaining two days would be much appreciated.

Thank you for your thoughtful evaluation.

Best regards,

The authors of Paper 10820

审稿意见
4

The paper proposes DecVFAL, a fast adversarial learning technique for vertical federated learning setting. To accelerate training:

  1. Client-side performs lazy refinement steps on the small bottom module ‘N’ times while keeping the upper-layer gradients frozen.
  2. Use delayed gradients concurrently to update the rest of the modules.

Authors provide theoretical guarantees for convergence and demonstrate 3-10x speed-up on various datasets.

优缺点分析

Strengths: Although the technique comprises well-known tricks (gradient reuse, async updates), it shows good practical speed-up for training. Provides extensive comparisons with baselines with ablations on number of clients, modules, and other decoupling hyperparameters.

Weaknesses:

  1. Novelty is marginal as technique largely reuses know ideas in this context.

  2. Most of the compared baselines are outdated (mostly from 2019) and some recent ones (e.g., FastAT, ATAS) are only lightly covered.

  3. Networks that are used are small (e.g., 84 % clean accuracy on CIFAR 10). It is unclear if results hold for deeper models.

  4. DecVFAL sometimes yields higher robust accuracy than full PGD despite using weaker inner maximisers with small module in the client-side. It indeed entails strong clarification and justifications. Even the theoretical analysis suggest degradation in performance due to the added bias.

  5. The results are inconsistent. For instance, why does PGD and FreeLB accuracy trend differs strongly in Figure 3 and Figure 4 for different datasets?

  6. The results for impact of split position in Table 6 seems counter intuitive. Large bottom modules providing better gradient suggests increase in performance.

  7. Experiments max out at seven clients and assume perfectly aligned sample IDs. Larger client size or heterogeneous environments are untested.

  8. The sensitivity of the parameters M and N should be carefully studied and discussed.

  9. The analysis of the communication cost is missing. Using decoupled modules suggest substantial increase in communication of smashed layer’s embeddings and should be quantified.

问题

  1. Can you briefly instate the significance of AL in VFL and why does it need acceleration in practical scenarios? Considering most of the applications (e.g., in hospitals) involves collaboration of a few parties and would have enough resources.

  2. In this VFL scenario, a separate server is utilized between parties and assumed to hold the labels. Is this a realistic in VFL? What other threats is this setup exposed to? How would the method adapt when parties have non overlapping user sets and rely on secure entity matching?

局限性

Yes some of them.

最终评判理由

My doubts about the experimental settings are clear now and the additional results and discussion on my comments seems satisfactory. However, considering most of the gains are marginal for bigger datasets (almost 2-3x), I am increasing my score by 1 point.

格式问题

Figure 5 is in fact a table and also wrongly referred in the Section 6.5.

DecVFAL-3-3 and DecVFAL-6-2 used without proper explanation.

作者回复

We sincerely thank you for your insightful comments and constructive criticisms. Your feedback has been invaluable in improving the quality and clarity of our manuscript. Below, we address the weaknesses and respond to your questions.

W1-Novelty:

Due to space limitations, please refer to W3 of Reviewer ikWZ.

W2-Baselines:

  1. We selected sota AT acceleration methods (PGD, FreeAT, YOPO). They specifically target iteration complexity reduction, directly addressing the primary bottleneck in VFL scenarios: communication overhead. These baselines provide the most relevant comparison for communication-constrained environments like VFAL.

  2. While we recognize recent AT developments, most focus on improving robustness through sample optimization (DOM [1]) or loss function enhancements (MART [2]) rather than addressing the computational bottleneck that VFAL faces. These approaches tackle complementary but different challenges.

  3. Following your valuable suggestion, we enhanced our CIFAR-10 experiments to include recent methods like DOM [1] and MART [2] in Table 1. Importantly, our decoupling framework is compatible with these optimization strategies, allowing for combined benefits as shown in our "DecVFAL-6-2+DOM" results.

Table 1: Comparison with Recent AT Methods on CIFAR-10

MethodClean4/2558/25512/255Time↓Speedup
PGDAAPGDAAPGDAA(h)(vsPGD)
PGD78.0068.4763.0844.6134.5534.5916.298.231.00×
DOM[1]75.0663.7462.2644.2636.5936.8721.2913.260.62×
MART[2]80.9864.9163.7643.2232.4631.7514.287.851.05×
DecVFAL-6-281.8368.5962.3446.2736.4836.1418.722.753.00×
DecVFAL-6-2+DOM74.3166.2261.4446.2439.1639.1721.973.472.37×
DecVFAL-3-381.2466.1960.5842.9631.7631.9815.111.455.68×

W3-Deeper model:

  1. Standard Benchmarks for Comparison. Our experimental settings follow well-established protocols from VFL research [6-7] and adversarial training studies (e.g., RobustBench [4]). This ensures fair comparison with existing methods and enables reproducible results.

  2. Deeper Architectures Included. We have actually evaluated DecVFAL on several deeper models, including ResNeXt-50 on CIFAR-100 and Tiny-ImageNet (detailed in Section 6.1 and Appendix C.6). Since DecVFAL is designed to be architecture-agnostic, it naturally scales to modern deep networks without structural modifications.

  3. Additional Experiments. We are currently conducting experiments with deeper networks on CIFAR-10 to further validate our approach. These results will be available within one week.

W4-PGD Performance:

The key insight is that DecVFAL performs significantly more adversarial updates (M×NM\times N) compared to standard PGD (rr iterations), where typically M×N>rM\times N > r in our experiments (e.g., 5×\times10=50 vs. 40 updates for MNIST, 6×\times2=12 vs. 10 updates for CIFAR10). While individual updates use weaker gradients, the substantially higher update frequency provides more comprehensive adversarial space exploration, compensating for per-update approximation errors.

W5-Performance differences:

The primary reason for the accuracy differences is the different adversarial sample generation settings: 40 iterations on MNIST versus 10 iterations on CIFAR-10 to generate adversarial samples.

  1. Slower Convergence for PGD. The high iteration count combined with MNIST's 2-layer perceptron model results in slower convergence compared to others.
  2. Overfitting for FreeLB: FreeLB's gradient accumulation mechanism combined with 40 iterations causes overfitting issues, leading to poor generalization and accuracy degradation.

W6-Number of Layers in Bottom Module:

This counter-intuitive result in Table 6 is caused by error accumulation during backpropagation. While larger bottom modules do provide better gradient approximations and reduce the initial delayed gradient error, this error still persists and becomes progressively larger as it propagates backward through the network layers. This accumulation effect ultimately leads to the unstable performance we observed.

W7-Client Size and Heterogeneous Environments:

  1. Experiments on More Clients. We have conducted additional experiments on MNIST with 14 and 28 clients as shown in Table 1 in W1 of Reviewer Ecn8.
  2. About Heterogeneous Settings. Sample ID alignment is indeed a separate challenge in VFL that involves entity resolution and record linkage [5]. Our work focuses on the complementary problem of speeding up adversarial training when participants already have aligned data, following standard VFL assumptions.

W8-The selection of M and N:

Due to space limitations, please refer to W4 of Reviewer yT3W.

W9-Communication:

  1. Decoupling does not introduce additional communication overhead, as only the server module and the top-layer client modules require true server-client communication, while other inter-module communications are purely local computations that barely affect efficiency.
  2. To fully address your concern, we have included detailed communication cost statistics below to quantify the actual communication requirements and demonstrate the efficiency of our approach.

Table 2: Communication Cost

MethodsComm.Cost(MB)↓Speedup (vs PGD)
MNISTCIFAR10CIFAR100
PGD294.48122.92261.861.00×
FreeLB294.48122.92261.860.98×
FreeAT117.7998.34209.491.18×
YOPO147.2461.46130.930.99×
DecVFAL73.6273.75130.933.00×

Q1-Why AT and Acceleration are needed:

  1. Why AT is essential in VFL. VFL faces unique security challenges as detailed in Appendix A.3. These include third-party attackers modifying embeddings during communication and malicious participants corrupting local features. While we could use different defenses for each threat, AT offers a unified solution that builds inherent robustness against multiple attack types simultaneously.

  2. The resource utilization challenge. You're absolutely right that institutions like hospitals have substantial computational resources. However, this actually shifts the bottleneck to how efficiently we use these resources. In VFAL, generating each adversarial sample requires multiple rounds of client-server communication for gradient updates. This creates significant idle time where powerful computing resources sit unused.

  3. Our approach reduces communication rounds from r to M, allowing modules to process computations in parallel during communication delays. This means institutions can fully utilize their existing computational infrastructure rather than leaving it idle. Our experiments show consistent 3-10× speedup.

Q2-VFL setup:

  1. The server-holding-labels setup is a standard and realistic assumption in VFL literature, as evidenced by numerous works [3,6,7], where one party naturally possesses the target labels. An example is fraud detection where a bank holds transaction labels (whether fraud) and transaction history feature while partnering with e-commerce platforms holding user shopping features.
  2. It is worth that 'server' and 'client' are roles within the VFL framework rather than fixed physical entities—in practical deployments, these roles can be distributed across different participating organizations based on their data ownership and computational capabilities, with the label-holding participant typically assuming the server role for coordination purposes.
  3. Our threat model considers multiple adversarial attack vectors including third-party adversaries intercepting embeddings and malicious clients, as detailed in Appendix A.3, though VFL involves other security threats like backdoor and inference attacks that fall outside our scope.
  4. For scenarios with non-overlapping user sets requiring secure entity alignment, this represents another research topic in VFL [3,5], and decoupling can be integrated with existing privacy-preserving entity resolution techniques since AT operates independently of the entity alignment phase.

Typo:

Thank you for catching the typo. We will correct them and clarify DecVFAL-3-3 and DecVFAL-6-2 indicate the settings of M and N.

[1] LIN, Runqi, YU, Chaojian, HAN, Bo and LIU, Tongliang. On the Over-Memorization During Natural, Robust and Catastrophic Overfitting. In: The Twelfth International Conference on Learning Representations. 2024.

[2] WANG, Yisen, et al. Improving adversarial robustness requires revisiting misclassified examples. In: International conference on learning representations. 2020.

[3] LIU, Yang, et al. Vertical federated learning: Concepts, advances, and challenges. IEEE transactions on knowledge and data engineering, 2024, 36.7: 3615-3634.

[4] CROCE, Francesco, et al. RobustBench: a standardized adversarial robustness benchmark. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).

[5] HUANG, Lingxiao, et al. Coresets for Vertical Federated Learning: Regularized Linear Regression and KK-Means Clustering. Advances in Neural Information Processing Systems, 2022, 35: 29566-29581.

[6] WEI, Kang, et al. Vertical federated learning: Challenges, methodologies and experiments. arXiv preprint arXiv:2202.04309, 2022.

[7] WANG, Ganyu, et al. A unified solution for privacy and communication efficiency in vertical federated learning. Advances in Neural Information Processing Systems, 2023, 36: 13480-13491.

评论

Dear Reviewer 2ipB,

For your concern "W3-Deeper model", We are very excited to share new experimental results that directly address the concern about model depth. We have completed experiments on CIFAR-10 using WideResNet-70-16, a significantly deeper architecture with 70 layers. DecVFAL achieves comparable robust accuracy to PGD while providing 2.46× speedup. The results demonstrate that DecVFAL maintains its effectiveness on deep networks:

MethodClean AccFGSMPGDAutoAttackTime (h)Speedup
Clean90.6917.773.340.2413.56-
PGD87.5266.3468.5762.7541.371.0×
FreeAT90.3054.1156.6746.7836.171.14×
DecVFAL87.9166.7468.8863.4416.812.46×

We hope our rebuttal can address your concerns. If you have any questions, we would be very happy to discuss them with you in depth.

评论

Dear Reviewer 2ipB,

Thank you for your detailed review and the specific concerns you raised. We have provided extensive responses addressing novelty, baselines comparison, deeper network evaluation, and communication analysis. We've also included additional experimental results on WideResNet-70-16 as requested.

With two days left in the discussion phase, we'd welcome any further thoughts on our responses. This would allow us to address your concerns more thoroughly.

Thank you for your careful evaluation.

Best regards,

The authors of Paper 10820

评论

After considering the rebuttal, I appreciate the extra experiments and clarifications, but the core concerns on marginal novelty, incomplete and partly unfair empirical comparison (I appreciate the response to W5, but I would ask authors to provide experiments with consistant settings across datasets to make it easier to compare, further as in Table 1 of main paper, the computation advantage moving from MNIST to CIFAR-10 reduces from 10x to 3x against PGD requiring extension to more datasets), and limited evidence of benefit at realistic scale/heterogeneity remain. Therefore, I would like to maintain my score.

评论

Dear Reviewer 2ipB,

Thank you for your positive response and for increasing your score. Your constructive feedback has been invaluable in strengthening our work, and we appreciate your thorough engagement throughout this review process.

Should you have any further questions or concerns, we would be more than happy to address them. We wish you continued success in your research endeavors.

Best regards,

The authors of Paper 10820

评论

Dear Reviewer 2ipB,

Thank you for your thoughtful engagement with our work. We respectfully address your concern about "marginal novelty" by clarifying our significant technical and theoretical contributions that extend well beyond incremental improvements.

Fundamentally Different Application of Decoupling

While previous decoupling works focused on parameter updates during standard training, DecVFAL is the first approach to apply decoupling to adversarial sample generation—the most computationally intensive component of AT. This represents a fundamental innovation rather than a marginal adaptation.

Our key insight is that adversarial sample generation offers unique advantages for decoupling that are unavailable in standard training. Since model parameters remain frozen during adversarial sample generation, our approach enables true parallel processing rather than the approximate parallelism of forward propagations in previous work. This fundamental difference allows us to achieve substantially greater efficiency gains while maintaining robustness guarantees.

To demonstrate the standalone value of our approach beyond VFL, we evaluated DecVFAL in a centralized (non-VFL) setting with a single client, simulating local AT on CIFAR-10. Even when applied to standard PGD-based AT, our method achieves significant improvements:

MethodTime (h)PGD-10AASpeedup
PGD-106.2950.5445.731.00×
DecVFAL-6-22.1650.3145.772.91×

Moreover, VFL's natural vertical partitioning perfectly aligns with our layer-wise decoupling mechanism, enabling even greater efficiency gains in distributed settings, as confirmed by our experimental results in the paper.

Unique Technical Challenges

Applying decoupling to adversarial sample generation in VFL introduces unprecedented challenges not addressed in prior work:

1. Preserving Complex Optimization Structure: Unlike standard training, AT involves intricate minimax optimization that alternates between adversarial sample generation and model parameter updates. Our framework carefully maintains this delicate balance while introducing parallelization. Thsi is a non-trivial work that required novel algorithmic design.

2. Managing Multi-source Approximation Errors: The distributed nature of VFL compounds approximation challenges significantly. The VFL framework inherently requires communication-efficient techniques, necessitating the introduction of compression and gradient approximation methods that generate multiple sources of approximation errors. These accumulated errors pose a substantial risk of degrading the robustness of AT. Our framework innovatively balances gradient quality and computational efficiency through carefully designed approximation strategies that preserve model robustness while achieving substantial speedups.

评论

Theoretical Challenges

The Core Theoretical Challenge. Analyzing VFAL convergence presents an unprecedented theoretical challenge due to the simultaneous interaction of three distinct approximation error sources within a complex distributed min-max optimization framework. Unlike standard AT or FL that deal with single error sources, VFAL requires handling delayed gradients from our decoupling mechanism, compressed gradients from VFL communication constraints, and estimated gradients from zeroth-order optimization—all while maintaining the intricate inner-outer loop structure of AT across distributed participants. Each error propagates through different pathways and interacts in non-trivial ways within the distributed min-max optimization structure.

Theoretical Innovation. We developed a unified error decomposition framework that rigorously characterizes how these three error sources interact within the VFAL system. We leverage optimal control theory to analyze the complex interplay between adversarial sample generation (inner loop) and parameter updates (outer loop). Through this formulation, we established Lemma 1, transforming temporal delays in our decoupling mechanism into spatially bounded approximation errors. This foundation allowed us to prove that the three error sources can be independently bounded and linearly combined without exponential error amplification.

Achieving Optimal Convergence with Error Characterization. Our Theorem 1 establishes that DecVFAL maintains the O(1/K)O(1/√K) convergence rate identical to standard VFL, demonstrating that our substantial computational acceleration comes at zero theoretical cost. More importantly, we provide precise characterization of each error component, with the total approximation error decomposed as I1+I2+Ep+Ec+EzI₁ + I₂ + Ep + Ec + Ez, where each term has computable bounds. Our analysis reveals the bias term O(NMK/M)O(N \mathcal{M}_K/M) that directly captures the trade-off between modularity and performance, while the π(M,N)π(M,N) function provides explicit guidance for optimal parameter selection. This theoretical framework not only proves correctness but delivers actionable design principles for practitioners.

We believe these contributions constitute significant novelty and technical advancement that go beyond merely applying known techniques to a new setting. Our approach fundamentally changes how AT is conducted in distributed settings, enabling practical deployment of robust VFL systems.

Respectfully,

The authors of Paper 10820

评论

Thank you for this important feedback. We would like to clarify our experimental design and provide additional evidence.

Addressing the Experimental Setting Concerns

We understand the reviewer's concern about consistency across datasets. However, we respectfully clarify that our experimental configurations follow well-established practices in AT literature, as evidenced in benchmark frameworks like PGD, YOPO, TRADES, and RobustBench.

The different iteration settings between MNIST (40 iterations) and CIFAR (10 iterations) reflect fundamental dataset characteristics rather than arbitrary choices:

  • MNIST: The simpler feature space requires more iterations to generate sufficiently challenging adversarial examples
  • CIFAR: The higher-dimensional, more complex feature space enables effective perturbations with fewer iterations

Crucially, all baseline methods use identical dataset-specific configurations within each experimental setting, ensuring fair comparison with baselines.

Additional Experiments with Consistent Settings

To directly address your concern, we have conducted supplementary experiments on MNIST using 10 iterations (consistent with CIFAR settings). The following table presents accuracy achieved at fixed time intervals, demonstrating DecVFAL-6-2's efficiency advantages:

Algorithm (time/epoch)100s200s500s800s1000s1500s2000s2500s3000s
PGD-10 (31.62s)85.76% (3)90.00% (6)93.49% (15)94.28% (27)94.97% (32)95.52% (47)95.90% (63)96.10% (79)95.90% (95)
FreeAT-8 (26.05s)87.47% (4)91.95% (8)94.69% (19)95.35% (31)95.64% (38)96.04% (58)96.07% (77)96.07% (96)96.07% (100)
FreeLB-10 (28.94s)80.07% (3)84.48% (7)87.68% (17)85.20% (29)81.82% (35)77.57% (52)75.39% (69)75.39% (86)75.39% (100)
YOPO-5-3 (19.91s)89.82% (5)93.92% (10)95.40% (25)95.61% (40)95.43% (50)95.68% (75)95.69% (100)--
DecVFAL-6-2 (8.76s)90.82% (11)94.27% (23)96.01% (57)96.34% (91)96.34% (100)----

Numbers in parentheses indicate completed epochs at each time point.

The results demonstrate the significant advantages of DecVFAL-6-2. DecVFAL-6-2 achieves 96% accuracy within 500 seconds, while competing methods require 2-6× longer training time for comparable performance. With 8.76 seconds per epoch, DecVFAL-6-2 completes 100 epochs in 876 seconds, representing a 2.3-3.6× speedup over alternatives while achieving superior final accuracy. At every measured time point, DecVFAL-6-2 outperforms all baselines, confirming robust performance across the entire training trajectory.

These additional experiments with consistent iteration settings across datasets validate our original findings and demonstrate that DecVFAL-6-2's advantages stem from algorithmic improvements rather than experimental configuration differences. The consistent performance gains across different settings confirm the method's effectiveness and practical value.

Respectfully,

The authors of Paper 10820

评论

Thank you for providing additional results and clarifying my doubts. I am happy to increase my score.

审稿意见
4

DecVFAL accelerates adversarial training in VFL via dual-level decoupling (lazy sequential + parallel backpropagation), achieving 3–10× speedup with rigorous convergence guarantees and competitive robustness.

优缺点分析

Strengths:​​

​​1. Rigorous Technical Foundation​​: Proposes a mathematically grounded dual-decoupling mechanism (lazy sequential + parallel backpropagation) with convergence guarantees (Theorem 1, Corollary 1). 2. ​​Comprehensive Evaluation​​: Validated across 4 datasets (MNIST, CIFAR-10/100, Tiny-ImageNet), 8 baselines (PGD, FreeAT, YOPO), and 10 attack methods (FGSM, PGD, CW, etc.), demonstrating ​​3–10× speedup​​ without compromising robustness (Tables 1–3). 3. ​​Theoretical-Experimental Alignment​​: Convergence analysis explicitly accounts for VFL-specific challenges (e.g., gradient delays, compression errors) and aligns with empirical results.

Weaknesses:

​​1. Limited Real-World Data​​: Experiments use partitioned public datasets (Sec. 6.1). Testing on industry VFL benchmarks (e.g., finance/healthcare) would strengthen practicality. 2. Bias Term Trade-Off: Convergence rate includes O(NMK/M)\mathcal{O}(N\mathcal{M}_K/M) (Corollary 1), implying performance degradation with many modules. Not empirically explored beyond Table 5. 3. Decoupling Inspiration: Builds on prior work (e.g., Synthetic Gradients [27], ADMM [57]), though VFAL-specific extensions are non-trivial.

问题

  1. While experiments on public datasets (MNIST/CIFAR) demonstrate efficiency, how does DecVFAL perform on industry VFL benchmarks (e.g., financial fraud detection or healthcare datasets with heterogeneous feature distributions)?

  2. Corollary 1 identifies a bias term O(NMK/M)\mathcal{O}(N\mathcal{M}_K/M) that degrades performance with increasing modules \mathcal{M}_K . What practical strategies can mitigate this?

  3. DecVFAL assumes synchronous client-server updates (Algorithm 1). How would asynchronous settings (common in cross-device VFL [36]) affect convergence?

局限性

See questions

格式问题

no

作者回复

We sincerely appreciate your insightful comments and constructive feedback, which have been invaluable in improving the quality and clarity of our manuscript. Below, we provide detailed responses to the identified weaknesses and questions.

W1&Q1-Real-World Dataset:

Thank you for the recognition of this limitation. To address your concern, we added 2 real-world datasets: COVID-19 Image Data Collection [1] and Credit Card Fraud Detection [5] to further validate our work. VFL Setup: we set 2 clients which each uses ResNet-18 for COVID-19 and MLP for Credit, one server employs 1-layer classifier. DecVFAL achieves 3.72× speedup and 2.35× speedup respectively compared to standard PGD, while maintaining superior robustness.

Table 1: Results of Robust Training on COVID-19 dataset [1]

MethodscleanFGSMPGDAATime (min) ↓Speedup (vs PGD)
None98.190.360.000.00--
PGD69.2067.0267.3966.3029.641.00×
FreeAT77.5349.2750.5440.7611.652.54×
YOPO77.9066.3066.4958.518.733.39×
DecVFAL88.2267.3962.8644.757.973.72×

Table 2: Results of Robust Training on Credit dataset [5]

MethodscleanFGSMPGDAATime (min) ↓Speedup (vs PGD)
None92.450.002.863.385.19-
PGD88.2840.6342.1949.4836.021.00×
FreeAT92.4414.5840.365.7227.161.33×
YOPO90.1054.6832.2922.3917.982.00×
DecVFAL89.5868.2247.1347.3915.322.35×

W2&Q2-Impact of the number of modules:

  1. The trade-off between module number and accuracy is inherent to module decoupling and must be carefully balanced. Our theoretical analysis (Remark 1) and empirical results (Table 5) consistently demonstrate that increasing module numbers MK\mathcal{M}_K degrades performance. This outcome is consistent with prior decoupled training work [2-4].

  2. The practical strategies to mitigate this include (1) against excessive module partitioning, (2) align the execution time of all partitioned modules to maximize parallelization benefits.

W3-Novelty:

Thank you for recognizing the non-trivial contributions of our framework. We want to emphasize the unique challenges our work addresses.

  1. Technical Innovation: While we build upon fundamental techniques, applying decoupling to communication-efficient VFL presents distinct challenges. Our approach uniquely combines decoupling with AT and VFL-specific methods like Zero-Order Optimization and compression [8]. The key innovation lies in decoupling adversarial sample generation with frozen parameters - a departure from prior work [2-4] that only decouples parameter updates. This enables true concurrent forward/backward propagation and addresses the computational bottleneck in adversarial sample generation, achieving substantial 3-10× acceleration for VFAL.

  2. Theoretical Contribution: The simultaneous presence of multiple gradient approximations creates analytical complexity not addressed in existing literature. Our framework must account for three interacting approximation sources: delayed gradients (from decoupling), compressed gradients (from embedding compression), and estimated gradients (from ZOO). We provide the first convergence analysis that captures how these approximation errors propagate and interact within VFAL systems, requiring careful treatment of their cumulative effects.

Q3-Asynchronous Settings Impact:

We think that full asynchronous VFL updating presents significant convergence challenges. AT operates through a minimax optimization that alternates between adversarial sample generation and model parameter updates. This requires freezing model parameters during adversarial sample generation, followed by synchronized parameter updates using the generated samples. Asynchronous settings would disrupt this minimax execution order, potentially leading to scenarios where model parameters are updated before optimal adversarial samples are identified.

[1] MAGUOLO, Gianluca; NANNI, Loris. A critic evaluation of methods for COVID-19 automatic detection from X-ray images. Information fusion, 2021, 76: 1-7.

[2] JADERBERG, Max, et al. Decoupled neural interfaces using synthetic gradients. In: International conference on machine learning. PMLR, 2017. p. 1627-1635.

[3] HUO, Zhouyuan, et al. Decoupled parallel backpropagation with convergence guarantee. In: International Conference on Machine Learning. PMLR, 2018. p. 2098-2106.

[4] HUO, Zhouyuan; GU, Bin; HUANG, Heng. Training neural networks using features replay. Advances in Neural Information Processing Systems, 2018, 31.

[5] CARCILLO, Fabrizio, et al. Combining unsupervised and supervised learning in credit card fraud detection. Information sciences, 2021, 557: 317-331.

[6] CROCE, Francesco, et al. RobustBench: a standardized adversarial robustness benchmark. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).

[7] LIU, Yang, et al. Vertical federated learning: Concepts, advances, and challenges. IEEE transactions on knowledge and data engineering, 2024, 36.7: 3615-3634.

[8] WANG, Ganyu, et al. A unified solution for privacy and communication efficiency in vertical federated learning. Advances in Neural Information Processing Systems, 2023, 36: 13480-13491.

评论

Dear Reviewer ikWZ,

Thank you for your thorough review and constructive feedback on our paper. We have addressed your concerns regarding real-world datasets, bias term trade-offs, and asynchronous settings in our detailed rebuttal.

With the discussion phase ending in two days, we'd greatly appreciate any follow-up comments you might have.

Thank you for your time and valuable insights.

Best regards,

The authors of Paper 10820

评论

Dear ACs and Reviewers,

As the rebuttal period is concluding for Paper 10820: "Accelerated Vertical Federated Adversarial Learning through Decoupling Layer-Wise Dependencies," we sincerely thank you for your thorough reviews and constructive feedback. To facilitate your final evaluation, we summarize our rebuttal responses below:

Key Contributions

Our DecVFAL framework introduces the first application of decoupling to adversarial sample generation, fundamentally different from prior work that focused on parameter updates. This innovation enables true parallel processing with frozen parameters, achieving 3-10× speedup while maintaining O(1/√K) convergence rate. We provide the first theoretical analysis for VFAL, rigorously characterizing the interaction of three approximation sources (delayed, compressed, estimated gradients) within distributed adversarial training.

Response to Main Reviewer Concerns

  • Novelty (Reviewer 2ipB): We clarified that decoupling adversarial sample generation presents fundamentally different challenges compared to parameter update decoupling. Additional centralized experiments demonstrated 2.91× speedup even without VFL, proving standalone algorithmic value.

  • Real-world Evaluation (Reviewers ikWZ, yT3W): We added COVID-19 medical imaging (3.72× speedup) and credit card fraud detection (2.35× speedup) datasets, validating performance in industry-relevant VFL scenarios beyond public benchmarks.

  • Architecture Scalability (Reviewer 2ipB): Extended validation to WideResNet-70-16 with 70 layers, achieving 2.46× speedup. Additional analysis of server model complexity (1/4/16-layer configurations) demonstrated architectural flexibility.

  • Client Scale (Reviewers 2ipB, Ecn8): Expanded experiments to 14 clients (4.35× speedup) and 28 clients (4.92× speedup) on MNIST, maintaining consistent robustness across larger federated scenarios.

  • Experimental Rigor (Reviewer 2ipB): Provided consistent iteration settings across datasets, integrated recent baselines (DOM, MART) with competitive results, and quantified communication costs.

  • Theoretical Depth (Reviewers ikWZ, yT3W): Characterized module number trade-offs with practical mitigation strategies, confirmed equivalent privacy preservation to standard VFL, and provided explicit parameter selection guidance.

DecVFAL addresses the critical computational bottleneck in VFAL by dramatically reducing adversarial training time. Our expanded experimental validation across real-world datasets, deep architectures, and large-scale scenarios demonstrates broad practical applicability. The rigorous theoretical framework provides both correctness guarantees and actionable design principles for users.

We deeply appreciate the constructive scholarly discussion, which has been invaluable for improving our work.

Best regards,

The Authors of Paper 10820

最终决定

This paper proposes a novel method which accelerates the adversarial training process in Vertical Federated Learning (VFL) by introducing a decoupling strategy with lazy sequential back-propagation, achieving significant speedup in performance. It also provides rigorous convergence guarantees and competitive robustness. One weakness is the marginal performance gain for bigger datasets, as pointed by reviewer 2ipB.

During the rebuttal process, the authors addressed most of the concerns from the reviewers, and no additional concerns is raised. After the AC-reviewer discussion, the reviewers reached a consensus that the paper can be accepted.

Considering the consensus of reviewers and the nontrivial technical contribution of this work targeting a timely and important challenge, efficiency bottleneck of VFL, I recommend the acceptance of this paper.