PaperHub
6.0
/10
Rejected4 位审稿人
最低3最高5标准差0.8
3
3
5
4
3.8
置信度
创新性2.5
质量2.3
清晰度2.3
重要性2.3
NeurIPS 2025

Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning

OpenReviewPDF
提交: 2025-05-09更新: 2025-10-29
TL;DR

We propose a pre-training client anomaly detection method for Federated Learning using compressed signal representations and a lightweight external detector.

摘要

关键词
Federated LearningMalicious Client DetectionRepresentation Operators

评审与讨论

审稿意见
3

This paper addresses a critical challenge in Federated Learning (FL): detecting anomalous or malicious clients with faulty or non-representative data, without accessing raw data. The proposed solution, Waffle (Wavelet and Fourier representations for Federated Learning), introduces a novel offline detection algorithm that uses compressed spectral representations (Wavelet Scattering Transform (WST) and Fourier Transform (FT)) to identify malicious clients before training. These representations are task-agnostic, low-dimensional, and robust to data perturbations, offering stability against deformations. The Waffle method is lightweight, relying on locally computed statistics, and uses a pre-trained classifier on a public distilled dataset for detection, minimizing computational and communication overhead.

优缺点分析

Strengths

  1. Privacy-Preserving: Waffle is designed to work in a privacy-preserving manner, using low-dimensional, compressed representations of client data, which enhances scalability and reduces computational overhead. This is particularly important in federated settings where preserving client privacy is a priority.
  2. Low Overhead: The method reduces the communication and computational burden on both the client and server side, as it avoids real-time detection during training and relies on pre-computed spectral features. This lightweight design is well-suited to large-scale FL deployments.

Weakness The paper's structure is unclear, with Figure 2 placed in the appendix and inconsistencies between experimental results. While claiming 100% precision under 90% attack, the method’s recall rate is only 60%, indicating incomplete detection. The use of Fourier transforms raises privacy concerns, and the lack of comparison with advanced methods limits the evaluation. Additionally, the reference list is outdated, and the claim of a "lightweight" approach lacks empirical validation.

问题

  1. The layout of the paper could be improved. For instance, Figure 2, mentioned in the experimental section, is not presented in the main text but is relegated to the appendix. Additionally, there is an inconsistency between the results for FedAvg the absence of data attacks presented in the experimental section and those in the text’s table, although they align with the results in the appendix.
  2. The paper claims that WST-based features achieve 100% precision under 90% attack scenarios. However, recall rate is around 60%, indicating that the method does not effectively detect all malicious data. This raises concerns about the completeness of the detection.
  3. As mentioned in the paper, the Fourier Transform can potentially reconstruct the data. This contradicts the privacy-preserving nature of Federated Learning. is unclear whether the use of this transform is fully justified in terms of privacy protection.
  4. The paper lacks a comprehensive comparison with other advanced anomaly detection methods. This makes it difficult to evaluate the true performance and advantages of the proposed approach. Moreover, the claim of the method being "lightweight" needs further empirical validation No specific experiments are provided to substantiate how Waffle’s approach reduces computational or communication overhead in a significant way.
  5. The reference list is somewhat limited, with only seven references from 2024. Moreover, these references are not compared against the methods proposed in this work, which diminishes the paper’s contribution to the research landscape.

局限性

yes

格式问题

Nope

作者回复

Q1:

We will improve the layout of the paper as requested and we will try to either make Figure 2 fit in the main text, or refactor the experimental section in a more suitable way to refer properly to the appendix. Regarding FedAvg results in the absence of attacks, could the reviewer kindly specify which inconsistency is being referred to—namely, whether it concerns a mismatch between the text and Table 1, or a different part of the main paper? We have double-checked and found that the results reported in the main table and the appendix are aligned, but it’s possible that the phrasing in the text caused confusion. We would be happy to correct or clarify this point accordingly.

Q2:

We thank the reviewer for this comment on the precision-recall trade-off. The choice to prioritize precision, especially in high-contamination scenarios, was a deliberate strategic decision designed to guarantee the integrity of the final model. In an extreme setting with 90% malicious clients, the primary risk is not the failure to detect every adversary, but the catastrophic possibility of misclassifying one of the few benign clients as malicious (a false positive). Doing so could lead to a scenario where the model is trained exclusively on data from attackers. Therefore, achieving 100% precision, as Waffle-WST does across all datasets in the 90% attack scenario, is an important feature. This guarantees that any client allowed to participate in training is genuinely benign, ensuring a trusted foundation for the global model. This focus on high precision is particularly critical in the hostile environments we test, which go far beyond the assumptions of most existing literature. Many robust aggregation methods, such as Krum, are designed for scenarios where malicious clients constitute a minority (i.e., less than 50%). Most importantly, our downstream results in Table 2 validate this strategy. On CIFAR-100, the high-precision Waffle-WST variant combined with FedAvg yields a final model accuracy of 17.12%. In contrast, the Waffle-FT variant, which has a higher recall (88.1% vs 67.86%) but lower precision (88.1% vs 100%) in this specific scenario, achieves a significantly lower accuracy of 11.58%. This demonstrates that it is more effective to train on a smaller, guaranteed-clean set of clients than on a larger, but still contaminated, set. The ultimate performance of the global model confirms the value of our high-precision approach.

Q3:

We acknowledge that the Fourier Transform is invertible, and its direct application to private data would indeed be a privacy risk. However, as discussed in Appendix D, our framework is safe since it has a multi-stage pipeline specifically designed to prevent this, ensuring that the information sent to the server is a heavily processed, non-invertible summary from which the original data cannot be reconstructed by any external party. The process, performed entirely on the client's device:

  1. The client first computes the principal components of its entire local dataset {xki}\{x_k^i\}. It then creates a single, low-dimensional vector x^k\hat{x}_k by taking a weighted sum of its top principal components, as defined in Equation (6). This vector is an aggregated statistic representing the client's whole dataset, not any single data point. Reconstructing an individual sample from this aggregated vector is already computationally infeasible for an external party.
  2. The Fourier Transform is applied to this single, low-dimensional aggregate vector x^k\hat{x}_k, not to the raw data.
  3. Finally, the client sends φk=Φ[x^k]\varphi_k =|\Phi[\hat{x}_k]| to the server, which is the magnitude of the Fourier Transform output. Taking the modulus discards all phase information, which is essential for inversion, making the reconstruction of x^k\hat{x}_k—let alone the original data—practically impossible for an attacker.

Regarding invertibility, we clarify that this is a localized capability only available to the client itself. An external attacker or the server, who only sees the final embedding φk\varphi_k, cannot reverse these steps. However, the client, having access to its own private data and principal components, could theoretically train a generative model (like a Variational Autoencoder) to learn a mapping from its internal representations back to its data. This inversion capability is entirely confined to the client's local environment and does not constitute a privacy leak within the federated system.

Q4 until "[...] proposed approach":

We recognize that a direct comparison with advanced anomaly detection method was not included. Most existing detection methods in the literature, such as FLDetector and VAEDetector, are online techniques that require analyzing model updates across multiple training rounds to identify malicious behavior. A direct comparison is therefore not straightforward, as they operate at different stages of the FL lifecycle. Our work is one of the first to propose a dedicated solution for an offline filtering task.

For completeness, we implemented FLDetector in the case of 60% benign clients (similar attack setting wrt table 1). Here we report the results in contrast with Waffle's which are already present in the paper:

DatasetFedAvg w/o DetectorFedAvg w/ WAFFLE-WSTFedAvg w/ WAFFLE-FTFedAvg w/ FLDetector
FashionMNIST73.3376.1873.3871.44
CIFAR-1048.7549.7046.9545.38
CIFAR-10016.3517.1211.5816.43

For the considered setup, FLDetector appears to not be able to correctly detect the vast majority of the attackers, consistently across the dataset. We plane to be able to present the result with VAEDetector before the end of the discussion phase.

Q4 from "moreover, the claim [...]":

All significant computations required by our method, specifically the offline training of the Waffle detector, are performed entirely on the server-side. This design choice aligns with the standard FL framework, where the central server is assumed to be a powerful entity with access to significant computational resources, and is therefore not considered a bottleneck (Kairouz et al., 2021). The clients, in contrast, are only required to perform a single, one-time lightweight computation to generate their embedding. This one-time server-side training cost is a practical and efficient approach to establishing a robust defense before the collaborative training even begins.

Computation Overhead:

  • Client-Side: The computation required from each client is a one-time, offline cost consisting of performing PCA and a single spectral transform (FT or WST) on their data statistics. These are standard and efficient operations.

  • Server-Side: The server is tasked with training a simple MLP detector offline. This is a minor task, especially considering the common assumption in FL that the central server possesses significant computational resources; e.g. in Bao et al (2023) they even train a different discriminator neural network for each couple of clients.

Communication Overhead:

With Waffle, each client transmits its embedding φk\varphi_k to the server only once before training begins. This embedding is a small, fixed-size vector. In contrast, standard FL protocols require clients to transmit their entire model weight updates—which can consist of millions of parameters—at every communication round. The one-time cost of sending a small vector is negligible compared to the cumulative communication burden of a typical FL training process, empirically validating our claim that the approach is lightweight, particularly in communication-constrained environments.

We will update the manuscript by including a detailed cost complexity analysis in a stand-alone Appendix.

Q5:

We will expand our Related Work section in the camera-ready version to include and discuss state-of-art literature from 2024. Here are some references that we will investigate and discuss, in order to better position our work and to clarify the novelty of our contribution in contrast to more recent works:

  • Mu, X., Cheng, K., Shen, Y., Li, X., Chang, Z., Zhang, T., & Ma, X. (2024). FedDMC: Efficient and robust federated learning via detecting malicious clients. IEEE Transactions on Dependable and Secure Computing, 21(6), 5259-5274.
  • Zeng, H., Li, J., Lou, J., Yuan, S., Wu, C., Zhao, W., ... & Wang, Z. (2024). Bsr-fl: An efficient byzantine-robust privacy-preserving federated learning framework. IEEE Transactions on Computers, 73(8), 2096-2110.
  • Zhou, T., Liu, N., Song, B., Lv, H., Guo, D., & Liu, L. (2024). RobFL: Robust federated learning via feature center separation and malicious center detection. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) (pp. 926-938). IEEE.
  • Sharma, A., & Marchang, N. (2025). Detection of malicious clients in federated learning using graph neural network. IEEE Access.
  • Licciardi, A., Leo, D., Fanì, E., Caputo, B., & Ciccone (2025), M. Interaction Based Gaussian Weighting Clustering for Federated Learning. In Forty-second International Conference on Machine Learning.
  • Allouah, Y., Guerraoui, R., & Stephan, J. (2025) Towards Trustworthy Federated Learning with Untrusted Participants. In Forty-second International Conference on Machine Learning.

References

  • Zhang, Z., Cao, X., Jia, J., & Gong, N. Z. (2022, August). Fldetector: Defending federated learning against model poisoning attacks via detecting malicious clients. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining (pp. 2545-2555).
  • Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2021). Advances and open problems in federated learning. Foundations and trends® in machine learning, 14(1–2), 1-210.
  • Bao, W., Wang, H., Wu, J., & He, J. (2023, July). Optimizing the collaboration structure in cross-silo federated learning. In International Conference on Machine Learning (pp. 1718-1736). PMLR.
评论

Following up on our previous response and in direct response to the reviewer’s concern about the lack of a comprehensive comparison with other advanced anomaly detection techniques. Aside FLDetector already presented in the authors' rebuttal, we have implemented and evaluated another common baseline: the VAE-based detector proposed by Li et al. ("Learning to detect malicious clients for robust federated learning").

We evaluated this method under the same 40% attack setting used in Table 1 of the paper. Below, we report the final model accuracies for each configuration:

DatasetFedAvg w/o DetectorFedAvg w/ WAFFLE-WSTFedAvg w/ WAFFLE-FTFedAvg w/ FLDetectorFedAvg w/ VAE Detector
FashionMNIST73.3376.1873.3871.4473.06
CIFAR-1048.7549.7046.9545.3847.15
CIFAR-10016.3517.1211.5816.4315.77

The VAE-based detector, much like FLDetector, is an online detection method designed to identify malicious model updates during the training process. However, our setting focuses on offline client filtering before training begins, which these methods are not designed for.

As the table shows, the VAE detector fails to significantly improve final model accuracy and in some cases degrades it further. This is due to its poor detection performance: across all datasets, the VAE detector achieved very low recall and F1 score, indicating that it was unable to distinguish malicious clients from benign ones under the considered data-level corruption attacks.

We will integrate these new results into the experimental section of the final manuscript.

审稿意见
3

This paper proposes Wavelet Scattering Transform and Fourier Representation for federated learning scenario to detect the data attack that can degrade the model performance. They show that their method achieves the best accuracy performance for several datasets such as FashionMNIST, CIFAR-10/100.

优缺点分析

Strength

s1: This paper is well written and easy to follow.

s2: They apply the previous detection algorithms for federated learning to detect malicious clients.

Weakness

w1: The non invertibility itself is not enough for preserving privacy.

w2: The authors mentioned they compare their method with the previous FL anomaly detection algorithms, but I cannot find this

w3: I recommend authors to do more experiments with larger datasets and models.

w4: It's not clear why Waffle - FT works worse than w/o detector in many cases in Table 2.

问题

  1. Could you explain why FedAvg works better than other methods in Table 2?

  2. Could we apply this method to the NLP, LLM domains?

局限性

N/A

最终评判理由

I raise my score to 3. But, my concern is still not resolved that this method preserves privacy by itself. The information loss itself is not enough to claim the privacy guarantee. I think it should be more elaborated with some theoretical results or experimental results that this method is robust on some privacy attacks.

格式问题

N/A

作者回复

w1: The non invertibility itself is not enough for preserving privacy.

We thank the reviewer for this comment. As discussed in Appendix D Our framework, however, incorporates several inherent privacy-enhancing steps and can be integrated with state-of-the-art privacy-preserving protocols in FL (Bonawitz et al, 2020).

In our case, the vector φk\varphi_k that a client sends to the server is not a simple transform of raw data, but the result of a multi-stage pipeline that obscures raw information, embedding it in a low dimensional statistic. Specifically:

  • Aggregation: The first step of the process consists in computing x^k\hat{x}_k, a single, low-dimensional vector representing an aggregated statistic of the client's entire local dataset. This aggregation is the first and most critical step, as it breaks the link to any individual data point.
  • Information Loss: The subsequent spectral transform and modulus operation further discard information (FT or WST), making reconstruction of even the aggregated statistic x^k\hat{x}_k computationally infeasible.

w2: The authors mentioned they compare their method with the previous FL anomaly detection algorithms, but I cannot find this

We recognize that a direct comparison with advanced anomaly detection method was not included. Most existing detection methods in the literature, such as FLDetector (Zhang et al, 2020) and VAEDetector, are online techniques that require analyzing model updates across multiple training rounds to identify malicious behavior. A direct comparison is therefore not straightforward, as they operate at different stages of the FL lifecycle. Our work is one of the first to propose a dedicated solution for an offline filtering task.

For completeness, we implemented FLDetector in the case of 60% benign clients (similar attack setting wrt table 1). Here we report the results in contrast with Waffle's which are already present in the paper:

DatasetFedAvg w/o DetectorFedAvg w/ WAFFLE-WSTFedAvg w/ WAFFLE-FTFedAvg w/ FLDetector
FashionMNIST73.3376.1873.3871.44
CIFAR-1048.7549.7046.9545.38
CIFAR-10016.3517.1211.5816.43

For the considered setup, FLDetector appears to not be able to correctly detect the vast majority of the attackers, consistently across the dataset. We plane to be able to present the result with VAEDetector before the end of the discussion phase.

w3: I recommend authors to do more experiments with larger datasets and models.

We agree that the current version lacks evaluation on more realistic datasets. To address this, we are in the process of integrating experiments on word embedding datasets, in particular the 50-dimensional GloVe embeddings (Pennington et al., 2014) in its recent updated version (Carlson et al., 2025). We are confident that we will be able to submit results on this NLP datasets before the end of the discussion phase, in order to provide a first assessment of scalability and applicability in this setting. Plus, the adoption of such dataset necessarily requires the use of larger models.

w4: It's not clear why Waffle - FT works worse than w/o detector in many cases in Table 2.

Waffle-FT can underperform because the FT is a global tool that struggles to detect localized or non-uniform attack artifacts. This can lead the detector to misclassify and remove valuable benign clients, which ultimately harms the final model's performance. In contrast, the WST is a more robust solution for this task: it is stable against minor, non-malicious data variations, preventing the incorrect removal of good clients; it analyzes data at multiple scales and locations, making it highly effective at detecting the kind of localized structural disruptions that FT misses. Hence, WST provides a more robust representation, leading to a more accurate detector that preserves benign clients. This phenomenon, where an imperfect defense is more harmful than no defense, is not unique to our FT variant. As seen in Table 2, some robust aggregation methods like Krum also underperform relative to standard FedAvg.

q1: Could you explain why FedAvg works better than other methods in Table 2?

FedAvg, when combined with our Waffle method, outperforms the other robust aggregation baselines (Krum, mKrum, etc.) in Table 2 because those methods are unable to detect and mitigate the type of data-level attacks we introduced.

The baseline methods like Krum and Trimmed Mean were developed to be resilient against Byzantine-style attacks, where malicious clients send arbitrary and harmful gradient updates. These methods operate by identifying statistical outliers in the model's parameter space—for example, by discarding the updates that are furthest from the others. However, our attacks (noise and blur) corrupt the data features themselves before training begins. A malicious client therefore produces gradient updates that, while detrimental to the model's convergence, do not necessarily appear as extreme statistical outliers in the parameter space. Consequently, the defense mechanisms of Krum and other baselines often fail to identify them, and end up aggregating harmful contributions that degrade the global model's performance.

Our approach, in contrast, is specifically designed for this scenario. Waffle analyzes the characteristics of each client's data before training starts, effectively filtering out the corrupted clients. Once these harmful clients are removed, the remaining pool of clients is "clean." At this stage, the simple FedAvg algorithm is the most effective choice because it no longer has to contend with malicious updates.

The results in Table 2 confirm this dynamic: FedAvg on its own (in the "w/o detector" row) performs poorly due to the malicious clients. The robust baselines (Krum, mKrum, etc.) fail to significantly improve the situation because they do not effectively detect this type of attack.

When Waffle (specifically the WST variant) is applied to filter clients, FedAvg's performance is drastically boosted, approaching the accuracy achieved on a perfectly clean dataset.

q2: Could we apply this method to the d, LLM domains?

As addressed above, we are in the process of testing our method on an NLP dataset, specifically the 50-dimensional GloVe embeddings (Pennington et al., 2014) in its recent updated version (Carlson et al., 2025). For the computationl resources at our disposal we are not able at this stage to scale up to LLM datasets. However, the NLP scenario we are considering is already more than a mere proof of concept in that direction.

References

  • Zhang, Z., Cao, X., Jia, J., & Gong, N. Z. (2022, August). Fldetector: Defending federated learning against model poisoning attacks via detecting malicious clients. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining (pp. 2545-2555).
  • Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., ... & Seth, K. (2017, October). Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175-1191).
  • Pennington, J., Socher, R., and Manning, C. D. 2014. GloVe: Global Vectors for Word Representation.
  • Carlson, R., Bauer, J., and Manning, C. D. 2025. A New Pair of GloVes.
评论

Following up on our previous response and our commitment to provide further comparisons, we have now implemented and evaluated another common anomaly detection baseline, the VAE-based detector proposed by Li et al. ("Learning to detect malicious clients for robust federated learning").

We present the results for the final model accuracy below, in the same 40% attack setting as our previous table.

DatasetFedAvg w/o DetectorFedAvg w/ WAFFLE-WSTFedAvg w/ WAFFLE-FTFedAvg w/ FLDetectorFedAvg w/ VAE Detector
FashionMNIST73.3376.1873.3871.4473.06
CIFAR-1048.7549.7046.9545.3847.15
CIFAR-10016.3517.1211.5816.4315.77

The VAE-based detector, much like FLDetector, struggles to handle the data-level corruptions used in our experiments. As shown in the table, its application leads to final model performance that is nearly identical to—or in some cases, worse than—the baseline with no detector at all.

The reason for this is the detector's failure to identify the malicious clients. Across all datasets, the VAE detector achieved an extremely low recall and F1 score, indicating that it was unable to distinguish the attacked clients from the benign ones in this setting.

We will include these new baselines in our experimental section.

评论

Thank you for the clarification and additional experiments. I raise my score to 3. But, my concern is still not resolved that this method preserves privacy by itself. The information loss itself is not enough to claim the privacy guarantee. I think it should be more elaborated with some theoretical results or experimental results that this method is robust on some privacy attacks.

评论

We thank the reviewer for their time and their comment. We agree that non-invertibility alone is not sufficient to ensure formal privacy guarantees. Our intent was to present it as one component within a broader privacy-enhancing pipeline, not as a standalone guarantee. We will revise the manuscript accordingly, replacing the claim that “non-invertibility enhances privacy” to more accurately reflect this position.

Specifically, the vector φk\varphi_k transmitted by each client is not a direct transform of raw data, it is the output of a multi-stage pipeline.The first consists in aggregation via PCA, which compresses the client’s entire local dataset into a low-dimensional vector summarizing dominant directions of variance; spectral encoding via WST or FT; and modulus operation, further discarding phase information.

These operations jointly ensure that φk\varphi_k is not easily invertible, does not correspond to any single data point, and is task-agnostic—making it substantially harder to exploit for privacy attacks. This aligns with existing approaches in federated learning that rely on sending aggregate statistics instead of raw data to enhance privacy (Geyer et al., 2017; Bonawitz et al., 2017).

Moreover, recent works have shown that even sharing model updates (e.g., in FedAvg) can lead to privacy leakage, such as reconstruction or membership inference attacks (Elkordy et al., 2022), further motivating our approach. While we do not claim formal privacy guarantees, our method is compatible with differential privacy or secure aggregation and can be extended with them, as discussed in Appendix D. We plan to incorporate empirical privacy attack evaluations in future work.

  • Geyer, R. C., Klein, T., & Nabi, M. (2017). Differentially private federated learning: A client level perspective.
  • Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., ... & Seth, K. (2017, October). Practical secure aggregation for privacy-preserving machine learning.
  • Elkordy, A. R., Zhang, J., Ezzeldin, Y. H., Psounis, K., & Avestimehr, S. (2022). How much privacy does federated learning with secure aggregation guarantee?
评论

As addressed above, we are in the process of testing our method on an NLP dataset, specifically the 50-dimensional GloVe embeddings (Pennington et al., 2014) in its recent updated version (Carlson et al., 2025). For the computationl resources at our disposal we are not able at this stage to scale up to LLM datasets. However, the NLP scenario we are considering is already more than a mere proof of concept in that direction.

Following up on our promise in the rebuttal, we have completed our initial NLP experiments. We implemented a composite Shift-and-Noise Attack, (1) adding random permutations to the embedding and (2) adding Gaussian noise to the 50-dimensional GloVe embeddings on 40% of the 100 clients in the federation.

Waffle (WST) detection performance:

  • Accuracy: 0.83

  • Precision: 1.0

  • Recall: 0.58

  • F1-Score: 0.73

Our Waffle-WST method demonstrated strong detection capabilities, achieving an F1-score of 0.73 and, notably, a perfect precision of 1.0, ensuring no honest clients were penalized. This successful detection directly translated to a significant performance recovery in the global model:

FedAvg w/ Waffle-WST (Under Attack): 42.71%

FedAvg w/o Detector (Under Attack): 38.53%

FedAvg w/o malicious clients (No Attack): 44.81%

Waffle detector is able to raise the final test accuracy from a compromised 38.53% (without our detector) to 42.71%. This result brings the model's performance remarkably close to the ideal, attack-free scenario of 44.81%. These outcomes are entirely consistent with the extensive experiments in our main paper, and we will provide a full and detailed analysis of these NLP results in the camera-ready submission.

审稿意见
5

This paper proposed to use Wavelet and Fourier representations to detect malicious client before the begining of federated learning. The developed approaches can provide low-dimensional, task-agnostic embeddings suitable for unsupervised client separation. Experiments show that the proposed method strengthen the maclicious client detection and improve the downstream task performance.

优缺点分析

pros:

  1. offline malicious client detection is relatively new in FL and could be an initial defense layer.

  2. The WST and FT based spectral methods are theoretically sound and interpretable.

  3. The structure of paper is clear and easy to follow, with detailed explanation of proposed framework and procedure.

cons:

  1. Though the proposed framework could be useful in detecting feature distribution shift, malicious clients could hide itself by act honestly before FL or only modify the training data / gradient during the online training phase.

  2. Only image-related tasks are considerd as the feature shift is easy to construct, it is not clear how other types of data or application domains can apply this framework.

问题

  1. The studied two types of attacks may be easily exposed under spectral methods as they are either generated by adding Gaussian noise or perturbed with Gaussian kernels.

  2. For heterogeous FL with multi-domain data over clients, will the detection phase remove valuable clients instead?

  3. Although noisy data might be harmful for model performance, if the level of corruptions is limited, this set of training data may also contribute to the convergence of model training, how to balance those two aspects?

局限性

N/A

最终评判理由

The authors have provided detailed rebuttal and addtional experiments to support the claims.

格式问题

N/A

作者回复

Though the proposed framework could be useful in detecting feature distribution shift, malicious clients could hide itself by act honestly before FL or only modify the training data / gradient during the online training phase.

We thank the reviewer for this insightful comment, which addresses the important scenario of adaptive clients who begin their attacks during the online training phase. While the primary contribution of our work is a novel offline, pre-training filter—designed to enhance efficiency by removing compromised clients at the outset—the core Waffle framework is extensible to an online setting to counter such adaptive threats. For instance, at round 0, the initial Waffle detector could assign each client a trust score based on its predicted probability of being malicious. The server would then use these scores to create a weighted client selection probability distribution for training. To counter adaptive attacks, clients would be required to re-communicate their Waffle statistic, φk\varphi_k, every DD training rounds. The server would then update the selection probabilities, dynamically down-weighting or excluding clients that become malicious mid-training. This highlights that Waffle, while currently focused on pre-training data integrity, offers a modular foundation that complements—rather than replaces—defenses designed for other adversarial scenarios.

This online verification process directly addresses attacks on the feature distribution. While Waffle does not operate on model gradients, it can detect the root cause of many gradient-based attacks that stem from corrupted data. For pure gradient manipulation attacks, our framework would serve as a complementary defense to be used alongside gradient-based robust aggregation methods. We consider this a promising direction for future work, which would involve empirically determining the optimal verification frequency to balance security guarantees with communication overhead.

Only image-related tasks are considerd as the feature shift is easy to construct, it is not clear how other types of data or application domains can apply this framework.

While our experiments focused on image datasets for their intuitive and clear demonstration of feature-shift attacks, the underlying principles of our framework are fundamentally domain-agnostic.

The core of our method relies on the Fourier Transform and the WST, which are not computer vision-specific tools. WST and FT are operators designed for general signal analysis. As defined in our paper (Section 2.2), these transforms operate on signals in general function spaces (e.g., L1(X)L^1(\mathcal{X})). An image is one example of a 2D signal, and any data that can be represented as a signal or a vector in a high-dimensional space is a potential candidate for our framework.

The Waffle framework can be readily applied to other data modalities where feature-level integrity is a concern. For instance, in NLP, text is converted into high-dimensional numerical representations, such as word or sentence embeddings (e.g., from Transformer models). These embedding vectors can be treated as signals. A malicious client could perturb these embeddings by adding noise or applying other transformations to degrade model performance. Waffle could detect such attacks by analyzing the statistical distribution of a client's collection of embeddings. The PCA and spectral analysis steps would identify anomalous shifts in the structure of the embedding space, just as they do for image data.

To validate this, we are in the process of integrating experiments on word embedding datasets, in particular the 50-dimensional GloVe embeddings (Pennington et al., 2014) in its recent updated version (Carlson et al., 2025). We are confident that we will be able to submit results on these NLP datasets before the end of the discussion phase.

The studied two types of attacks may be easily exposed under spectral methods as they are either generated by adding Gaussian noise or perturbed with Gaussian kernels.

We agree that perturbations based on Gaussian processes have distinct spectral characteristics. Our choice of Gaussian noise and blur was intended to model common and practical data corruption scenarios, such as sensor noise, providing a clear testbed for our framework.

However, the Waffle framework is not limited to detecting these types of attacks. To validate its robustness against more complex, non-Gaussian structural attacks, we conducted further experiments. In this new scenario, malicious clients apply a random dropout attack on part of the image, in particular where 50% of the image pixels, grouped into small random blocks, are set to zero. This introduces sharp, non-Gaussian artifacts that are structurally different from simple noise. Waffle-WST obtained an almost perfect detection performance, as it is summarised here:

Waffle-WST detector metrics:

  • Accuracy: 0.99
  • Precision: 1.0
  • Recall: 0.98
  • F1-Score: 0.99

As expected from our theoretical result, if the detector is perfect we reach a performance of the federated training that is comparable with the situation without malicious clients:

  • FedAvg w/o malicious: 50.24 (present in caption of Table 2)
  • FedAvg w/ malicious: 46.62
  • FedAvg w/ Waffle-WST detector: 49.77

These results represent a validation of the wider applicability of our method, beyond gaussian attacks.

The WST variant is particularly effective, as it is designed to capture local structural information and textures. The random dropout attack fundamentally disrupts these local patterns, creating a strong and detectable signal for our framework. This demonstrates that Waffle is a robust solution capable of identifying a broader class of feature-level data integrity attacks beyond simple Gaussian perturbations.

For heterogeous FL with multi-domain data over clients, will the detection phase remove valuable clients instead?

Our framework is designed to be inherently robust to this challenge because it operates on the structural properties of the data in the spectral domain, rather than on the semantic data distribution itself. The Waffle detector is trained to distinguish between the spectral signatures of clean and corrupted data. Attacks like noise and blur introduce fundamental signal-level artifacts that are distinct from the spectral characteristics of natural data heterogeneity. A clean, well-formed signal from a different domain will still be recognized as "clean."

In scenarios with extreme heterogeneity where the distinction might be less clear, our framework can employ an adaptive "soft" filtering mechanism instead of a hard "remove-or-keep" decision. The detector's output probability can be interpreted as a trust score, allowing the server to implement a weighted client selection strategy. A valuable but statistically different client might receive a marginally lower selection probability, while a clearly malicious client would be assigned a probability near zero, thus retaining valuable data while mitigating risk. While our experiments on FashionMNIST, CIFAR-10, and CIFAR-100 suggest this robustness, we agree that a targeted study on multi-domain data is an excellent direction for future work.

Although noisy data might be harmful for model performance, if the level of corruptions is limited, this set of training data may also contribute to the convergence of model training, how to balance those two aspects?

While certain types of noise can act as a regularizer, the attacks in our work are designed to model data corruption that degrades quality and harms model convergence. Our results in Table 2 provide direct evidence that these clients are detrimental, not beneficial, to the training process.

The results clearly show that removing the corrupted clients via Waffle consistently improves model performance, moving it closer to the ideal performance of training on purely clean data.

To address the reviewer's question about balancing these aspects in a more nuanced way, our framework can be extended to an online, probabilistic variant that uses a "soft" filtering approach instead of a binary keep/discard decision.

We could implement a probability selection structure where the server uses the Waffle detector's output to define a client's selection probability for each training round. For instance, if Waffle predicts a probability pkp_k that client kk is malicious, the server can set its selection probability to be proportional to (1pk)(1 - p_k).

A client with limited, potentially harmless noise might receive a low but non-zero pkp_k, allowing it to contribute to training, albeit less frequently. A client with severe corruption would receive a pkp_k close to 1, effectively removing it from the training pool. Furthermore, this selection structure can be updated online. By having clients re-submit their Waffle embeddings every DD rounds, the server can dynamically adjust the selection probabilities, thus adapting to changes in client behavior over time.

评论

Following up on our previous discussion, we have extended our preliminary analysis on the "random block" attack to provide a more comprehensive comparison across all three benchmark datasets.

We are pleased to share these new results, which further validate the robustness of our method against non-Gaussian, structural attacks. This complete comparison against the robust aggregation baselines will be included in the camera-ready version of the manuscript.

KrumMultiKrumTrimmedMeanGeoMedWaffle-WST
CIFAR-1044.4747.7547.9348.2249.77
CIFAR-1009.4115.3316.2015.1417.82
Fashion-MNIST74.2771.6576.3171.4076.11

The benefit of Waffle (WST variant) is especially pronounced on more realistic datasets such as CIFAR-10 and CIFAR-100, where the structured dropout visibly disrupts chromatic and textural features—precisely the kind of local patterns that WST is designed to capture. For Fashion-MNIST, which contains grayscale images, the corruption is less visually distinct; nonetheless, Waffle-WST still performs near the clean-case baseline, confirming its robustness across modalities.

评论

To validate this, we are in the process of integrating experiments on word embedding datasets, in particular the 50-dimensional GloVe embeddings (Pennington et al., 2014) in its recent updated version (Carlson et al., 2025). We are confident that we will be able to submit results on these NLP datasets before the end of the discussion phase.

Following up on our promise in the rebuttal, we have completed our initial NLP experiments. We implemented a composite Shift-and-Noise Attack, (1) adding random permutations to the embedding and (2) adding Gaussian noise to the 50-dimensional GloVe embeddings on 40% of the 100 clients in the federation.

Waffle (WST) detection performance:

  • Accuracy: 0.83

  • Precision: 1.0

  • Recall: 0.58

  • F1-Score: 0.73

Our Waffle-WST method demonstrated strong detection capabilities, achieving an F1-score of 0.73 and, notably, a perfect precision of 1.0, ensuring no honest clients were penalized. This successful detection directly translated to a significant performance recovery in the global model:

FedAvg w/ Waffle-WST (Under Attack): 42.71%

FedAvg w/o Detector (Under Attack): 38.53%

FedAvg w/o malicious clients (No Attack): 44.81%

Waffle detector is able to raise the final test accuracy from a compromised 38.53% (without our detector) to 42.71%. This result brings the model's performance remarkably close to the ideal, attack-free scenario of 44.81%. These outcomes are entirely consistent with the extensive experiments in our main paper, and we will provide a full and detailed analysis of these NLP results in the camera-ready submission.

审稿意见
4

This paper aims to detect anomalous or corrupted clients without accessing raw data under the federated learning settings. To this end, the authors propose the framework Waffle to label malicious clients before training. Waffle trains a classifier on a distillated public dataset, and asks local clients to provide compressed representations of samples derived from either the Wavelet Scattering Transform or the Fourier Transform. The paper also provides theoretical results showing that the WSF is non-invertibility and stability to local deformations. Finally, the paper conduct several experiments to demonstrate the performance of Waffle.

优缺点分析

Strengths:

  • The paper is easy to follow.
  • The paper has a clear problem formulation and provide background knowledge on Fourier Transform and Wavelet Scattering Transform.
  • The complete codes are provided

Weaknesses:

  • The paper only considers corrupted but benign clients as the attacker, which significantly limit the attack power. Besides, the paper does not include the attack modeling, e.g., what information the attacker knows, the resources they can get access to.
  • The paper only considers noisy and blur attackers, while in practice, the attacks can be adaptive or specific to scenarios, e.g., backdoor attack. Also, for noisy attack, what if the noise is added to a subset of the image rather than the whole image?
  • In the offline detector training step is vulnerable to adaptive attacks if the attack know the training schema and get access to the auxiliary dataset.
  • Section 4 is not well-written. There are no explanations on what each lemma/proposition intuitively means.
  • The datasets adopted in the experiments are simple and limited. More advanced datasets should be adopted.

问题

  • The noise attack is defined by the Wiener processes in Definition 1, does the theoretical results or the framework design rely on this specific type of noise?
  • In the offline detector training step, how to find the auxiliary dataset?

局限性

Yes

最终评判理由

I raise my score from 3 to 4 as the author addressed most of my concerns in the rebuttal. However, I still think the attack the paper considers is a little bit impractical.

格式问题

No

作者回复

W1 until "[...] the attack power".

We acknowledge the reviewer's perspective on our chosen attacks (noise and blur) but respectfully argue that their power and relevance stem from their practical nature and their ability to circumvent existing defenses. Our work focuses on a critical class of data integrity attacks, which can be intentionally malicious or arise from faulty hardware like compromised IoT sensors, a scenario we motivate in our introduction. The strength of these attacks is demonstrated by their success in degrading model performance where defenses designed for more complex adversarial behavior fail. Our experiments provide direct evidence of this critical vulnerability gap; robust aggregation methods such as Krum and Trimmed Mean, which were specifically designed to be resilient to powerful Byzantine adversaries, fail to mitigate the impact of our data-level attacks, often performing worse than standard FedAvg.

W1 from "Besides, the paper [...]".

We define our adversary as a data-level attacker whose capability is confined to manipulating their own local dataset before training begins, with the goal of degrading final model performance. This is formalized in Definitions 1 and 2, where attackers perturb their data with noise or blur. This attack model requires minimal information, as the adversary does not need knowledge of the global model's architecture or other clients' data, making the threat both practical and widespread. While Waffle is specialized for these attacks, its strength lies in its modularity. As stated in the manuscript, Waffle operates as an offline step that can be combined with other online defenses to create a more robust, multi-layered security framework. We add a remark to summarize the capabilities of the attacker in the final version of the manuscript.

W2:

We thank the reviewer for these insightful questions regarding the scope of attacks our method can handle. Regarding localized attacks, such as adding noise to a subset of an image, our approach is inherently well-suited to detect them, particularly the WST variant. To validate its robustness against more complex, non-Gaussian structural attacks, we conducted further experiments. In this new scenario, malicious clients apply a random dropout attack on part of the image, in particular where 50% of the image pixels, grouped into small random blocks, are set to zero. This introduces sharp, non-Gaussian artifacts that are structurally different from simple noise. Waffle-WST obtained an almost perfect detection performance, as it is summarised here:

Waffle-WST detector metrics:

  • Accuracy: 0.99
  • Precision: 1.0
  • Recall: 0.98
  • F1-Score: 0.99

As expected from our theoretical result, if the detector is perfect we reach a performance of the federated training that is comparable with the situation without malicious clients:

  • FedAvg w/o malicious: 50.24 (present in caption of Table 2)
  • FedAvg w/ malicious: 46.62
  • FedAvg w/ Waffle-WST detector: 49.77

These results represent a validation of the wider applicability of our method, beyond gaussian attacks.

Regarding more complex threats like backdoor or adaptive attacks, these represent a different class of problems beyond the scope of our current offline, pre-training detection framework. However, Waffle is designed to be extensible to an online setting that can address adaptive adversaries. For example, at round 0, the Waffle detector could assign each client a trust score based on the predicted probability of maliciousness. The server would use these scores to weight client selection probabilities during training. To counter adaptive attacks, clients would report their Waffle statistic φk\varphi_k every D rounds, allowing the server to update these probabilities dynamically—down-weighting or excluding clients who turn malicious mid-training. This demonstrates that Waffle, while currently focused on pre-training data integrity, provides a modular foundation that complements existing defenses for other adversarial threats.

W3:

We acknowledge the reviewer’s concern about a “white-box” attacker aware of the defense protocol. However, such a powerful adversary challenges nearly all Federated Learning defenses. As shown by Shejwalkar et al. (2022), even advanced methods like Krum and Trimmed Mean can fail when the attacker adaptively crafts updates with full knowledge of the aggregation rule. Our work assumes a more practical scenario where clients are unaware of the server’s internal defenses, and Waffle’s server-side secrecy offers robust protection. Importantly, even if a malicious client accesses the public auxiliary dataset Daux\mathcal{D}^{aux}, our protocol remains secure due to critical information asymmetry. Attack simulations (Algorithm 1) run exclusively and privately on the server. The attacker does not know the types of attacks simulated, the distributions of blur/noise parameters β\beta and σ\sigma, nor the detector architecture, making it infeasible to poison training or craft targeted evasions. To address adversaries who change strategies over time, Waffle can be extended to an online adaptive defense. The server initializes client trust scores at round 0 using Waffle, then periodically requires clients to communicate their Waffle embeddings φk\varphi_k every D rounds. This enables dynamic adjustment of client selection probabilities, down-weighting or removing clients that become malicious during training.

W4:

We thank the reviewer for the valuable feedback and agree that clarifying the intuition behind our theoretical results will enhance the paper. We will update the final version accordingly.

Section 4 provides a general statistical justification for pre-emptive filtering in Federated Learning, independent of specific attacks or detectors. The key insights are:

  • Lemma 1 (Bias): If malicious clients aim to shift the model objective, their inclusion biases the FedAvg estimator. Filtering them restores unbiasedness with respect to the benign distribution.
  • Lemma 2 (Variance): Even if malicious clients have aligned objectives, high-variance updates destabilize training. Removing them lowers the variance of the estimator.
  • Proposition 1: Combining the above, filtering malicious clients yields an estimator that is both unbiased and more stable, justifying the core principle of Waffle.

W5:

We agree that the current version lacks evaluation on more realistic datasets. To address this, we are in the process of integrating experiments on word embedding datasets, in particular the 50-dimensional GloVe embeddings (Pennington et al., 2014) in its recent updated version (Carlson et al., 2025). We are confident that we will be able to submit results on this NLP datasets before the end of the discussion phase, in order to provide a first assessment of scalability and applicability in this setting.

Questions

Q1:

We thank the reviewer for their question. Our theoretical results are fundamentally independent of the specific noise model. The analysis in Section 4 is general because it concerns the statistical properties of the aggregated estimator, specifically its bias and variance. These theoretical results hold for any data-level attack that causes a client's updates to be biased or exhibit high variance, regardless of the underlying process used to generate the perturbation. The use of Wiener processes in Definition 1 serves as a concrete, formal example of one such process, but the conclusions of Lemmas 1, 2, and Proposition 1 are not confined to it.

Similarly, the Waffle framework itself is not limited to detecting a single type of noise. Its purpose is to learn the spectral signatures of data anomalies. We have experimentally verified this flexibility; as seen from further results, Waffle maintains high detection performance against other data attacks, including non-Gaussian noise applied to a random subset of an image's pixels.

Q2:

We appreciate the reviewer’s question on the practicality of an auxiliary dataset. Using a server-side dataset is common in Federated Learning to handle heterogeneity or enable knowledge distillation (e.g., Zhu et al., 2021; Cao et al., 2021). This dataset need not match clients’ private data exactly; public datasets representing the general domain (e.g., CIFAR-100, ImageNet subsets) or even small synthetic datasets via Dataset Distillation (Wang et al., 2018) suffice.

Importantly, our detector is task-agnostic: it detects spectral corruption artifacts rather than semantic features, making it robust across domains. This is supported by strong results using the same detector trained offline on FashionMNIST, CIFAR-10, and CIFAR-100.

References:

  • Pennington et al. (2014). GloVe: Global Vectors for Word Representation.
  • Carlson et al. (2025). A New Pair of GloVes.
  • Shejwalkar et al. (2022). Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning. IEEE S&P, 1354–1371.
  • Zhu et al. (2021). Data-free knowledge distillation for heterogeneous federated learning. ICML, 12878–12889.
  • Lin et al. (2020). Ensemble distillation for robust model fusion in federated learning. NeurIPS, 33, 2351–2363.
  • Chao et al. (2021). Rethinking ensemble-distillation for semantic segmentation-based unsupervised domain adaptation. CVPR, 2610–2620.
  • Chen & Chao (2020). FedBE: Making Bayesian model ensemble applicable to federated learning. arXiv:2009.01974.
  • Wang et al. (2018). Dataset distillation. arXiv:1811.10959.
评论

I thank the authors for their detailed response. I think it would be beneficial to incorporate some of this discussion into the updated version of the paper.

评论

We thank the reviewer for their time, and we will include all the further considerations and experiments in the camera ready version of the manuscript.

As a direct follow-up to our earlier response on localized perturbations, we report below the results of the random block attack across CIFAR-10, CIFAR-100, and Fashion-MNIST, including standard robust aggregation baselines. The attack consists of randomly zeroing out a rectangular region of each image.

KrumMultiKrumTrimmedMeanGeoMedWaffle-WST
CIFAR-1044.4747.7547.9348.2249.77
CIFAR-1009.4115.3316.2015.1417.82
Fashion-MNIST74.2771.6576.3171.4076.11

The benefit of Waffle (WST variant) is particularly prominent on color images (CIFAR-10/100), where the attack disrupts chromatic and texture patterns that WST is well-suited to detect. In contrast, Fashion-MNIST consists of grayscale images, where the attack is more subtle and less disruptive to local statistics. Nonetheless, Waffle-WST still achieves performance very close to the clean-case baseline, confirming its robustness across modalities.

评论

Dear Reviewers,

This is a kind reminder for the reviewers to check the authors' rebuttal and involve in the discussion.

Thanks, AC

最终决定

This paper proposed to use Wavelet and Fourier representations to detect malicious client before federated learning. The paper is easy to follow with a clear problem formulation. Offline malicious client detection is also an interesting problem that is worthy of more studies. However, the reviewers have raised several concerns regarding the paper. For instance, the threat model is not very clear. It seems that the paper mainly considers corrupted benign clients as the adversary, but malicious clients are usually used to refer to active adversaries. The recall rate is around 60%, so the completeness of the detection is questionable. (There are also other concerns). I have read the authors' rebuttal and feel that the rebuttal does not fully address these concerns. Considering the high bar of the conference, I regret to recommend reject.