Vertical Federated Feature Screening
We propose the vertical federated feature screening (VFS) algorithm, which effectively reduces computational, communication, and encryption costs.
摘要
评审与讨论
This paper introduces a feature selection method to help eliminate unnecessary features in vertical federated learning (VFL). The benefit is to reduce the communication overhead without sacrificing the accuracy of the trained model. It is possible because in ultra-high dimensional settings, only a small number of features truly contribute to the model prediction, and other features can be pruned. The idea is to first ask the passive participants to group the features they own and calculate a group-wise score for the server to remove those surely unimportant ones. Then, a similar process is done on individual features of those remaining groups. Theoretical studies and experiments on simulated and real-world datasets have been conducted to show the effectiveness of the proposed method.
优缺点分析
Strengths
- VFS supports secure VFL and extremely imbalanced datasets. These are practical considerations that are commonly overlooked in the VFL literature.
- The hierarchical pruning (i.e., group->individual) is sound. Encryption overhead is often a concern in practical deployment of FL systems. The reduction in such overheads can be of broad impact.
- Theoretical studies have been conducted to analyze the properties of VFS.
Weaknesses
- The paper writing could be improved. Technical details are insufficient. For instance, how Y_{enc} is being used by the clients to compute the encrypted statistic is not clearly articulated.
- The choice of real-world datasets is not clearly justified. While those datasets are not simulated, it is unclear whether they are for VFL. As shown in prior work [1], VFL datasets are of different characteristics than HFL, and randomly distributing features to different "clients" can create an artificial dataset that does not represent VFL.
[1] Wu, Zhaomin, Junyi Hou, and Bingsheng He. "Vertibench: Advancing feature distribution diversity in vertical federated learning benchmarks." ICLR 2024.
问题
- Could you clarify how to compute the statistic? It is unclear how to make use of Y_{enc} to measure the statistic for pruning.
- Could you elaborate on the choice of the real-world datasets used in the experiments? How are those features being partitioned and distributed across different parties. The reviewer is particularly concerned about whether this partitioning is natural or artificial. If it is artificial, whether it can be a representative of real-world VFL problems.
局限性
The authors discussed the limitations of existing methods. However, they did not discuss the potential impact of limitations of the method they propose.
最终评判理由
The authors addressed my concerns in the rebuttal. I believe that it is a valuable work on VFL. Hence, I maintain my score.
格式问题
No concern
We sincerely thank Reviewer 1Em4 for the positive evaluation of our work and thoughtful comments. We are especially grateful for the recognition of the practical value of our method in supporting secure VFL under extreme class imbalance, the effectiveness of the hierarchical pruning strategy, and the theoretical rigor of our analysis. These aspects are indeed the core motivations behind our proposed VFS framework. We have carefully addressed the concerns as detailed in our point-by-point responses below.
1. Writing improvement and technical details for and calculation. Per your advice, we have revised the writing to ensure that all notations and algorithm details are clearly defined in the revision. Specifically, we have clearly defined the technical details for and . In the submitted version (Lines 128–137), we briefly described how to compute and , but the specific form may vary depending on the choice of the screening statistic. For example, in the case of the distance covariance used in our implementation, denote
where denotes the ciphertext of . It is worth noting that the same plaintext may correspond to different ciphertexts after encryption, and therefore the ciphertexts are no longer binary and cannot be easily decrypted. The encrypted statistic for the -th feature group is computed as , where denotes the -th feature group of the -th user.
2. Justification of Real-World Datasets. Thank you for the insightful comment. We have carefully studied the concerns raised by Wu et al. (2024), cited the paper in the revision and agree that preserving real-world VFL characteristics is critical. In the revision, we have included a new empirical study based on a real VFL scenario from a third-party payment company, which is also the empirical motivation of this work. The features are naturally vertically partitioned, covering 189,236 attributes for 10,000 merchant (financial product users). This dataset is analyzed in an actual VFL pipeline in a secure sandbox environment. See also our reply to Reviewer ekrS. Results below show that VFS substantially reduces computational cost to less than 10% of the runtime for baseline methods. Thank you for your suggestion. We believe that the inclusion of this real-world application significantly strengthens the practical relevance of our work.
| # Features | AUC | Screening Time | Modeling Time | Total Time |
|---|---|---|---|---|
| 100 | 0.937 | 29.098 | 100.313 | 129.411 |
| 500 | 0.941 | 30.876 | 125.343 | 156.219 |
| 1000 | 0.936 | 31.699 | 151.941 | 183.640 |
| All | 0.901 | - | 1911.829 | 1911.829 |
3. Impact and Limitation of VFS. The VFS framework tackles a key bottleneck in vertical federated learning: performing effective feature screening under the dual challenges of ultrahigh-dimensional data and severe class imbalance, while preserving data privacy. It offers broad adaptability to different federated architectures and screening criteria, and substantially reduces the computational and communication burden in downstream tasks, as supported by both theoretical analysis and empirical validation. Nevertheless, we recognize certain limitations of the proposed method. First, the overall efficiency of VFS depends on the specific statistic employed for screening. Second, due to its marginal nature, VFS may retain features that are correlated, which could be undesirable in scenarios where independence is required for model interpretability or classical statistical inference. These considerations have been clearly discussed in the revised version.
4. Elaboration on the Partitioning of Real-World Datasets. Thank you for raising this important concern. We have carefully studied, cited, and acknowledged the insight of Wu et al. (2024), which emphasizes the unique characteristics of VFL datasets. In real-world VFL applications, features are often naturally partitioned due to their storage across different clients or servers. In the newly added real-world case study with a third-party payment company, data are naturally stored in different clients and divided into different groups. Typically, in this dataset, (1) merchant profile attributes and (2) transaction-level behavioral features are stored in separate clients. And they include different groups of features respectively. Together with the active party, this setup constitutes a genuine VFL scenario without artificial partitioning. We have clarified this point in the revised manuscript.
Thank you for the response. Most my concerns have been addressed.
Thank you for your thoughtful review and for acknowledging that most of your concerns have been addressed. We sincerely appreciate the time and effort you have dedicated to reviewing our work. We hope that our clarifications and additional results have fully addressed your questions and further demonstrated the novelty, theoretical contributions, and practical impact of our study. We would be happy to address any further questions or suggestions you may have.
This paper proposes a Vertical Federated Feature Screening (VFS) algorithm designed to filter out spurious features that may arise in Vertical Federated Learning (VFL) settings. The authors support the effectiveness of their method through both theoretical results and empirical evaluation.
优缺点分析
Strengths
- The proposed VFS algorithm can be used as a preprocessing step before existing VFL algorithms such as LESS-VFL to improve overall efficiency. Notably, VFS appears to be more efficient than classical screening methods under class imbalance condition.
- The authors propose a model-free VFS statistic inspired by generalized U-statistics, and they prove theoretical properties of the proposed method. (Due to time constraints, I was unable to fully verify the theoretical proofs provided in the appendix. I assume that the proofs are correct and base my review on the theoretical results presented in the main text.)
Weaknesses
- While feature screening based on generalized U-statistics has been studied in the statistics literature, it may be unfamiliar to a general machine learning audience, including myself. I suggest providing a brief background on this topic in the main text or appendix, which would help clarify the novelty of the proposed method.
- Given recent trends in the ML community, more extensive experiments on real-world datasets are expected.
- In the experiments, the authors still apply an additional federated feature selection method such as LESS-VFL. Is it difficult for the proposed method to perform sufficiently accurate feature screening to enable the use of a vanilla VFL algorithm? If possible, it would be helpful to include results of applying vanilla VFL directly after VFS.
问题
- In Section 6, is there a communication cost difference between using LESS-VFL alone and using VFS + LESS-VFL?
- Please clearly define m_0 and m_1 in Equation (1).
- The statement of Theorem 1.1 does not appear sufficient to justify the claim in line 203. In particular, tilde{C_2}>1/C_1 is required where C_1 is defined as C_1 = liminf n_1^gamma/log n > 0 (the positivity comes from assumption (C3)). Such a result would be needed to ensure that the second term on the left-hand side of Theorem 1.1 tends to zero as n_1\to\infty.
- Are the datasets used in Section 6 class imbalanced?
局限性
No. I would encourage the authors to include a dedicated section that discusses the limitations of their approach.
最终评判理由
The authors have addressed most of the issues I raised through adequate explanations and additional experiments. Accordingly, I will maintain my positive score.
格式问题
No
We thank Reviewer yBdV for the positive feedback and for recognizing the theoretical contributions of our work. We appreciate your comment that VFS improves efficiency especially under class imbalance and serves well as a preprocessing step for VFL feature selection algorithms. Below, we address your comments and questions in detail.
1. Background on Feature Screening Based on Generalized U-Statistics. We appreciate the reviewer’s suggestion. While feature screening is an active research topic in statistics and generalized U-statistics serve as a classical powerful tool in theoretical analysis, the adoption of generalized U-statistics dealing with data imbalance and sampling technique has not been well explored. We would like to kindly remark that this is one of the key contributions of our work. Thank you for pointing this out. Per your kind advice, we have added more background on generalized U-statistics in Section 2.1 in the revision. Furthermore, we have clarified the novelty of the proposed VFS method by emphasizing how generalized U-statistics are leveraged to construct new screening statistics tailored for VFL scenarios, enhancing both statistical efficiency and computational feasibility. Thank you for helping clarify our contribution.
2. More Experiments on Real-World Datasets with Vanilla VFL algorithm. We thank the reviewer for emphasizing the importance of real-world validation. Our study is indeed directly motivated by a practical credit risk modeling task subject to privacy constraints. Following your suggestion, we now include this case study based on collaboration with a third-party payment platform. The data contain 189,236 features across 10,000 merchants (financial product users), covering diverse aspects such as merchant profiles, transaction behaviors, and temporal dynamics. See also our reply to Reviewer ekrS. Using the proposed VFS method, we have filtered out redundant features and have employed SplitNN (Vepakomma et al., 2018) as an example of Vanilla VFL algorithm. The results below confirm that VFS reduces runtime by over 90% while improving prediction accuracy. This further supports the scalability and practical value of our method, no matter whether the downstream tasks are trained by federated feature selection algorithms or Vanilla VFL algorithms.
| # Features | AUC | Screening Time | Modeling Time | Total Time |
|---|---|---|---|---|
| 100 | 0.937 | 29.098 | 100.313 | 129.411 |
| 500 | 0.941 | 30.876 | 125.343 | 156.219 |
| 1000 | 0.936 | 31.699 | 151.941 | 183.640 |
| All | 0.901 | - | 1911.829 | 1911.829 |
3. Discussion on Communication Cost. Thank you for raising this important point. We have already provided the expression for the communication cost order in Table 1 of the submitted version. Indeed, the VFS procedure introduces an initial communication cost, which is inevitable for most federated screening or selection methods. However, the communication cost accounts for only a small fraction of the overall cost, which leads to no further influence on the iteration of joint modeling. For example, under the default simulation settings in Appendix E, the time spent on communication accounted for an average of 4.26% of the total runtime. In real-world applications, the actual communication cost can be further influenced by various factors such as network conditions and hardware configurations (McMahan et al., 2017). Thus, the communication overhead introduced by VFS is relatively small when weighed against the significant reduction in per-party computation cost.
4. Limitation of VFS. Per your kind advice, we have included a discussion on the limitation of VFS in the revision. First, although VFS is a flexible framework that accommodates a wide range of screening statistics, its computational efficiency may depend on the specific statistic used. Second, as VFS performs marginal screening, the selected feature set may include redundant ones as well. In some scenarios, a subsequent feature selection step may still be required for identifying the important features rather than removing the redundant ones. This is also why we originally adopted feature selection methods as downstream tasks to mitigate feature redundancy and correlation. However, for downstream models that are robust to correlated and partially redundant features, such as SplitNN, VFS can be directly integrated without additional selection steps.
5. Other Issues.
(5.1) Notations for and . We sincerely apologize for the confusion caused by the notations. In the VFS statistic, and denote the numbers of class 0 and class 1 samples used by the kernel function, respectively. We have clarified these definitions in Section 4.1 of the revised manuscript for better readability.
(5.2) Condition clarity. According to Condition (C3), we have , which implies that the quantity you mentioned as , and hence . Therefore, we would like to kindly remark that (C3) in the submitted version is sufficient to ensure that the second term on the left-hand side of Theorem 1.1 tends to zero as .
(5.3) Class Imbalance in Datasets. The real-world datasets include both balanced and imbalanced ones, with case ratios of 45.36% (Activity), 49.16% (Gina), 37.45% (RNA-Seq), 0.48% (p53), and 5.57% (newly added credit dataset). As shown in Table 3 in our submitted version and supported by the newly added experiments, VFS consistently demonstrates robust performance across a wide range of case ratios.
References
[1] Vepakomma, P., Gupta, O., Swedish, T., & Raskar, R. (2018). Split learning for health: Distributed deep learning without sharing raw patient data. arxiv preprint arxiv:1812.00564.
[2] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR.
Thank you for the response. Most of my concerns have been adequately addressed. I will maintain my score.
We appreciate your careful assessment and the acknowledgement that our responses have addressed your earlier concerns. Thank you for the time and effort devoted to evaluating our work. We remain happy to discuss any further questions or suggestions you may have.
The authors address the problem of high computational, communication, and encryption costs in Vertical Federated Learning (VFL) under ultra-high dimensional feature spaces and imbalanced (rare event) response variables. Existing federated feature selection methods struggle to scale in these settings due to the enormous feature dimension and the resource-intensive secure multi-party computations involved. To solve this, the authors propose the Vertical Federated Feature Screening (VFS) algorithm, a two-stage coarse to fine screening procedure that first eliminates irrelevant feature groups and then performs refined screening on individual features, all under secure encryption. They combined subsampling technique to address class imbalance with a model-free statistic compatible with encrypted data, they significantly reduce the resource burden for downstream federated learning tasks. The authors achieve theoretical guarantees of consistency and asymptotic properties for their method, and demonstrate through both simulations and real-world experiments that VFS can maintain high predictive performance while reducing encryption and computation costs by an order of magnitude compared to classical approaches.
优缺点分析
Strengths:
-
The authors present a strongly motivated problem. In practice, datasets with very high feature dimensionality and severe class imbalance can create serious challenges for federated learning, especially when participating devices have limited computational resources.
-
The application of U-statistics to define a model-free VFS statistic is an elegant choice, providing an asymptotically consistent way to measure associations between features and rare-event labels without relying on restrictive parametric assumptions.
-
The paper offers a solid theoretical analysis, rigorously establishing convergence, screening consistency, and bounded model size properties of the proposed method.
Weakness:
-
The authors did not implement a full vertical federated learning algorithm (such as SplitNN or SecureBoost) to directly verify how their proposed feature screening would impact downstream VFL model training.
-
The paper does not adequately investigate how robust the feature selection is when coupled with an actual VFL training pipeline, leaving open questions about its real-world effectiveness.
-
The paper is not studied the scalability of their algorithm across a VFL framework. With N number of clients.
-
The work does not analyze potential privacy leakage risks from the intermediate screening statistics themselves, which might expose sensitive label information to adversaries under certain attack models.
问题
-
In Stage 2, the fine screening depends on the features retained from Stage 1. Could cascading errors accumulate if Stage 1 removes too many features? Did the authors investigate how robust the method is to Stage 1 mistakes? Since the group-level screening uses a hypothesis-testing framework, how are the type I and type II errors controlled across potentially many groups, and is there a multiple testing correction strategy built in?
-
Would it be feasible to skip Stage 2 altogether if the initial grouping was fine-grained enough? In other words, is there a principled trade-off between group granularity and the necessity of a second stage?
局限性
- One limitation of the work is that authors have not tested their method in a VFL setup with multiple clients involved with a diversed amount of features and classes (multi-class). Therefore, it is difficult to say at this moment the applicability of the algorithm in a real scenario.
最终评判理由
The authors gave justifications for my queries, and the answers are satisfactory.
- The authors empirically tested their VFS approach on a VFL framework.
- Error analysis and the necessity of stage 2 in VFS are addressed.
Therefore, I think it is a good paper. Consequently, I raised the score.
格式问题
NA
We sincerely thank Reviewer xnKo for the thoughtful and detailed comments. We truly appreciate your recognition of the practical challenges addressed by our work, especially the computational burden in VFL under ultra-high dimensional and imbalanced settings. Your acknowledgment of the novelty of using model-free U-statistics for rare-event feature screening and the rigor of our theoretical analysis is highly appreciated. We have carefully addressed your suggestions and revised the manuscript accordingly, as detailed below.
1. Comparison with Full VFL Algorithms. Thank you for bringing us these excellent works. We would like to kindly emphasize that the examples SplitNN (Vepakomma et al., 2018) and SecureBoost (Cheng et al., 2021) considered are low dimension cases, and the maximum of feature dimension in these two works is , which is much smaller than the corresponding sample size . This is because there could be prohibitive computational cost when (Ceballos et al., 2020) and also leads to non-robust estimation (Li et al., 2017). More specifically, experiments show that SplitNN is infeasible for training on a machine with 16GB GPU and 62GB RAM when , even with . Therefore, from both empirical and theoretical perspectives, feature reduction is considered necessary in high-dimensional VFL settings (Cassará et al., 2022). This is why we only consider feature selection methods as baselines to deal with high-dimensional features in the original submission. However, we fully agree that comparisons with full VFL algorithms can more effectively demonstrate the efficacy of VFS. Following your advice, we have compared with full VFL method (i.e., SplitNN) in an actual VFL training pipeline (described in the next point) based on the dataset with and . The results below further support the scalability and efficiency of our method, which are similar to the results in the submitted version. The results below demonstrate that incorporating VFS reduces the computational cost to less than 10% of that incurred by using SplitNN alone, while simultaneously improving the prediction performance. See also our reply to Reviewer ekrS. In addition, we have carefully compared and cited both of the works you have brought to us in the revision.
| # Features | AUC | Screening Time | Modeling Time | Total Time |
|---|---|---|---|---|
| 100 | 0.937 | 29.098 | 100.313 | 129.411 |
| 500 | 0.941 | 30.876 | 125.343 | 156.219 |
| 1000 | 0.936 | 31.699 | 151.941 | 183.640 |
| All | 0.901 | - | 1911.829 | 1911.829 |
2. Robustness Demonstration in Actual VFL Training Pipeline. Thank you for this insightful suggestion. This work is directly motivated by a real-world VFL application in collaboration with a leading third-party payment platform in China. Due to privacy constraints, the data cannot be publicly released and was not included in the initial submission. Following your advice, we now report results based on the dataset in a secure sandbox containing 189,236 features for 10,000 merchants, covering key feature groups such as merchant attributes, transaction activity, revenue, and business growth. See also our reply to Reviewer ekrS for more background description of the dataset. We adopt SplitNN as the downstream model per your advice and present the results above. This application-based evaluation further demonstrates the practical relevance and robustness of VFS.
3. Scalability to Multiple Clients and Classes. As discussed in Remark 1 and Appendix C, our method can be readily theoretically extended to the multi-party setting. Empirically, we have added simulation results in Appendix E of this revision. Specifically, we investigate the performances of the proposed method in scenarios with 1, 2, and 5 passive parties. The resulting PSRs are , , and , respectively, indicating no statistically significant difference. In the newly added real-world case, merchant profile data and transaction data are naturally stored across different institutions, which are treated as separate clients. The results remain stable, further supporting the scalability of VFS to multiple clients. For the scalability of multi-class scenario, the theoretical property has already been guaranteed by Corollary 1. Empirically, we have now validated the correctness of our proposed theory by additional simulations in the revision. Taking Model 1 as an example, consider the three-class label variable. When the number of features is set to , the resulting PSRs are , , and , respectively, which are very close to the results obtained in our binary classification simulations (see Table 6 in our submitted version). We have made this point clear in the revision.
4. Privacy Leakage from Intermediate Statistics. First, we would like to kindly clarify that the main focus of this work lies in the statistical and computational aspects of feature screening in VFL, rather than in the cryptographic guarantees of homomorphic encryption (HE). The privacy of intermediate statistics is protected by adopting appropriate HE schemes, and the proposed VFS framework is compatible with any suitable HE methods. The classical privacy-preserving properties of HE are established by Gentry’s foundational work (Gentry, 2009). While some studies have investigated the potential vulnerability of HE to inference attacks, more advanced variants of HE can offer enhanced protection (Falcetta and Roveri, 2022), and these methods remain fully compatible with VFS. Moreover, combining HE with differential privacy (DP) is a promising direction for strengthening privacy guarantees, as suggested in recent studies (Wang et al., 2023). We have clarified this point and added a corresponding discussion in the revised manuscript. Thank you again for this valuable suggestion.
5. Difference with Multiple Testing and Cascading Error Analysis. Thank you for this insightful comment. Our focus differs from classical multiple testing, which typically controls the Type I error (e.g., Bonferroni correction uses to reduce false positives but at the cost of increased Type II error [Dunn, 1961; Dai et al., 2023]). In contrast, we fix regardless of the number of groups to maintain power. This means we wish to decrease Type II error rather than control Type I error as in a classical multiple testing problem. We have already provide theoretical guarantees on the Type II error in Theorem 1(2) in the submitted version. Moreover, we can quantify the cascading error in Stage 1 directly by the result from Theorem 1. Assume that each iteration involves groups and a total of iterations are conducted with , then under similar conditions in Theorem 1(2), the probability that any relevant feature is mistakenly removed after Stage 1 satisfies
\\mathrm{P}\\left\\{ \\mathcal{F} \\not\\subset \\hat{\\mathcal{F}} \\right\\} \\le O\\left( lr \\left[ \\exp\\left( - \\tilde{C}_1 n_1^{1 - 2\\gamma - 2\\kappa} \\right) + n \\exp\\left( - \\tilde{C}_2 n_1^\\gamma \\right) \\right] \\right) \\to 0 \\, ,ensuring the error does not accumulate uncontrollably across stages. We have made both of these points clear in the revision with a newly added remark.
6. Necessity of Stage 2 in VFS. We would like to kindly remark that we have already pointed out in Theorem 2 in the submitted version that the statistic is asymptotically normally distributed for relatively large . Specifically, as pointed out in Theorem 2, the asymptotic normality holds only under the condition in line 217, which is satisfied only when each feature group in Stage 1 contains a large number of features (Empirically, this should be larger than 5 [Zhang and Zhu, 2024]). Therefore, Stage 1 alone is not fine-grained enough, and the retained feature groups may still contain a substantial number of irrelevant features.
References
[1] Vepakomma, P., Gupta, O., Swedish, T., & Raskar, R. (2018). Split learning for health: Distributed deep learning without sharing raw patient data. arxiv preprint arxiv:1812.00564.
[2] Cheng, K., Fan, T., Jin, Y., Liu, Y., Chen, T., Papadopoulos, D., & Yang, Q. (2021). Secureboost: A lossless federated learning framework. IEEE intelligent systems, 36(6), 87-98.
[3] Ceballos, I., Sharma, V., Mugica, E., Singh, A., Roman, A., Vepakomma, P., & Raskar, R. (2020). Splitnn-driven vertical partitioning. arxiv preprint arxiv:2008.04137.
[4] Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2017). Feature selection: A data perspective. ACM computing surveys (CSUR), 50(6), 1-45.
[5] Cassará, P., Gotta, A., & Valerio, L. (2022). Federated feature selection for cyber-physical systems of systems. IEEE Transactions on Vehicular Technology, 71(9), 9937-9950.
[6] Gentry, C. (2009). A fully homomorphic encryption scheme. Stanford university.
[7] Falcetta, A., & Roveri, M. (2022). Privacy-preserving deep learning with homomorphic encryption: An introduction. IEEE Computational Intelligence Magazine, 17(3), 14-25.
[8] Wang, B., Chen, Y., Jiang, H., & Zhao, Z. (2023). Ppefl: Privacy-preserving edge federated learning with local differential privacy. IEEE Internet of Things Journal, 10(17), 15488-15500.
[9] Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American statistical association, 56(293), 52-64.
[10] Dai, C., Lin, B., Xing, X., & Liu, J. S. (2023). False discovery rate control via data splitting. Journal of the American Statistical Association, 118(544), 2503-2520.
Thank you for your answers. Most of my queries are properly addressed by the authors. Therefore, I would like to raise my scores.
We sincerely appreciate your positive re-evaluation and willingness to increase the score. Your recognition of our contributions and the time and effort spent reviewing our work are greatly valued. We would be pleased to address any further questions or suggestions you may have.
The proposed method attempts to address the ultrahigh dimensionality of features and the sparse data structures inherent in large-scale datasets in a VFL context. The gist of the method lies in a two-stage approach by which features are selected in a coarse-to-fine manner: first by groups then by individual features.
优缺点分析
Stength:
- The idea, which is based on Generalized U-statistic, seems straightforward and effective.
- Theoretically analysis elucidate the effectiveness of the method. It also provides kind of performance estimation which seems agree with the empirical results.
Weakness:
- empirical results on real datasets are limited, in terms of the number of features tested (50K features are only considered moderated in real-life applciations)
- feature selection methods are not necessarily limited to VFL. It would be good to compare the proposed with more baseliens e.g. those referred in Sect.2.
问题
-
how different options in the grouping method of stage one would affect the overall performance?
-
in terms of scalbility, what will happen if apply the method to features more than 100K?
-
how would privacy-preserving mechanisms e.g. DP or HE affect the proposed method?
局限性
yes
格式问题
no concerns
We sincerely thank Reviewer ekrS for recognizing the strengths of our paper, specifically highlighting the straightforward and effective idea based on Generalized U-statistics and our rigorous theoretical analysis. We highly appreciate these insightful comments and constructive suggestions, which help clarify and enhance our contribution. Below we address every question and comment raised by the reviewer in detail.
1. Empirical Results with More Features. We would like to kindly note that this work was originally inspired by a real-world credit modeling task. While privacy concerns initially prevented us from including the real data results, we have now incorporated them following your suggestion. The data provider has approved their inclusion under confidentiality agreements, and the results are demonstrated through a secure sandbox environment (Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz and 62 GB of RAM). The goal is to build a model using merchant risk labels as the response and a large number of transaction-based features as inputs. The model outputs are further utilized to inform financial product recommendation strategies for merchants. Specifically, the total sample size is and the feature dimension (far larger than 100K). The features are generated from transaction-level payment records to describe each merchant's business behaviors, including but not limited to merchant attributes, transaction activity, revenue scale, business growth, customer structure, payment channels' distribution, transaction stability, temporal transaction patterns. Applying the VFS method, we remove redundant features to efficiently detect risky merchants. We validated the efficiency gains of VFS in downstream VFL tasks , taking SplitNN (Vepakomma et al., 2018) as an example. This is a neural network VFL method, as suggested by Reviewer xnKO. As shown in the table below, VFS achieves a substantial reduction in computational cost, with the runtime reduced to less than 10% of that required by the original method. Furthermore, VFS achieves improved out-of-sample AUC while using significantly fewer features. Other competing methods are also considered in the revision, and the results are similar. This further demonstrates the scalability of our method, as you kindly asked, highlighting its superior performance even in scenarios involving more than 100K features.
| # Features | AUC | Screening Time | Modeling Time | Total Time |
|---|---|---|---|---|
| 100 | 0.937 | 29.098 | 100.313 | 129.411 |
| 500 | 0.941 | 30.876 | 125.343 | 156.219 |
| 1000 | 0.936 | 31.699 | 151.941 | 183.640 |
| All | 0.901 | - | 1911.829 | 1911.829 |
2. Additional Baseline Comparisons. Thank you very much for your insightful comment regarding the baselines. Indeed, previous studies have already discussed feature selection within federated learning (FL), but feature screening in FL has not yet been explored. Hence, in Section 2.1, we have only reviewed feature selection methods in VFL as well as feature screening methods in non-FL scenarios. In response to your suggestion about the performance of non-FL feature selection methods, we have now additionally included classical feature selection methods, such as Lasso (Tibshirani, 1996), SCAD (Fan and Li, 2001), and Elastic Net (Zou and Hastie, 2005), in subsequent analyses. Using the p53 dataset as an example, the following table demonstrates that VFS quickly removes irrelevant noise, thereby enhancing both the efficiency and performance of feature selection, as evidenced by our theoretical analyses.
| Method | AUC | Total Time |
|---|---|---|
| Lasso | 0.824 | 498.0 |
| VFS + Lasso | 0.890 | 17.4 |
| ElasticNet | 0.824 | 522.8 |
| VFS + ElasticNet | 0.890 | 17.5 |
| SCAD | 0.885 | 1227.6 |
| VFS + SCAD | 0.923 | 42.2 |
3. Grouping Strategy. The grouping strategy in Stage 1 has minimal impact on the overall performance. Theoretically, we have established the asymptotic normality of the group-based VFS statistics in Theorem 2. Empirically, to address your concern, we conducted an additional experiment in which the feature rankings were randomly permuted in each replication to create a new grouping strategy. The results yielded a PSR of , which is very similar to the original result of , with no statistically significant difference. We have made this point clear in the revision per your kind advice.
4. Effect of Privacy-Preserving Mechanisms. Our method is compatible with any suitable homomorphic encryption (HE) scheme, requiring only the availability of one appropriate HE method. We would like to kindly remark that the primary focus of this work is on the statistical and computational aspects of feature screening in VFL, rather than the cryptographic guarantees of HE. However, per your kind advice, we have added discussions about the effect of privacy-preserving mechanism in the revision with a newly added remark. Specifically, the privacy-preserving property of HE could be secured by Gentry’s foundational work (Gentry, 2009). There is also a possibility that HE could be vulnerable to certain inference attacks, more advanced variants of HE could offer stronger protection (Bost et al., 2014; Falcetta and Roveri, 2022). Furthermore, combining HE with differential privacy (DP) is a promising direction for enhancing privacy guarantees, as suggested in recent studies (Li et al., 2022; Wang et al., 2023), which we now have cited in the revised manuscript. Thank you again for this excellent point.
References
[1] Vepakomma, P., Gupta, O., Swedish, T., & Raskar, R. (2018). Split learning for health: Distributed deep learning without sharing raw patient data. arxiv preprint arxiv:1812.00564.
[2] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.
[3] Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456), 1348-1360.
[4] Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.
[5] Gentry, C. (2009). A fully homomorphic encryption scheme. Stanford university.
[6] Bost, R., Popa, R. A., Tu, S., & Goldwasser, S. (2014). Machine learning classification over encrypted data. Cryptology ePrint Archive.
[7] Falcetta, A., & Roveri, M. (2022). Privacy-preserving deep learning with homomorphic encryption: An introduction. IEEE Computational Intelligence Magazine, 17(3), 14-25.
[8] Li, B., Micciancio, D., Schultz-Wu, M., & Sorrell, J. (2022). Securing approximate homomorphic encryption using differential privacy. In Annual International Cryptology Conference (pp. 560-589). Cham: Springer Nature Switzerland.
[9] Wang, B., Chen, Y., Jiang, H., & Zhao, Z. (2023). Ppefl: Privacy-preserving edge federated learning with local differential privacy. IEEE Internet of Things Journal, 10(17), 15488-15500.
Effect of Privacy-Preserving Mechanisms on the proposed screening method is important, especifically, for financial use cases such as real-world credit modeling task. No bank will adopt a new technique unless they are 100% ensured.
What I was suggesting is to empirically show, e.g if Gaussian noise were added to features, how the performance of screening might degrade (gracefully or serverely)?
Thank you for your point. We sincerely appreciate this insightful comment on the importance of evaluating privacy-preserving mechanisms, and we fully agree that this is an important consideration. We would like to kindly remark that in this work, our focus is on the statistical and computational aspects of feature screening in VFL, leveraging standard HE schemes for privacy protection rather than DP. We do not focus on designing HE or DP schemes.
However, to address your concern, we conducted additional simulations to empirically assess the potential impact of privacy-preserving mechanisms by adding Gaussian noise to each feature. Under the simulation settings in Appendix E, we added zero-mean Gaussian noise with standard deviations of 0, 1, 2, 5, and 10. The corresponding PSR (Positive Selection Rate) values were , , , , and , respectively, showing gradual performance degradation as expected.
Empirically, from a practical perspective, transmitting encrypted -dimensional is infeasible in ultra-high-dimensional settings (), as the communication cost could be , which is prohibitive. Moreover, the passive party (data holder) may be reluctant to share all features even under secure protocols. Our framework therefore transmits under HE to enable secure computation. HE allows operations to be performed directly on encrypted data, producing decrypted results identical (or nearly identical) to those obtained from plaintexts. While different HE schemes' computational efficiency may vary, for completeness we compared two representative schemes, Paillier and CKKS. The results are nearly identical, with Paillier requiring slightly less computation time. Thank you again for your important comment.
This paper presents an approach for making vertical federated learning more efficient via a two-stage approach involving a preprocessing step ("screening"). The reviewers praised the applicability of the method and its theoretical analysis but raised strong concerns about the scope and real-world applicability of the empirical results. In the discussion they indicated that these concerns were addressed, but maintained borderline ratings, with remaining concerns devoted mainly to an even broader applicability of the method. Therefore, I tentatively recommend accepting the paper.