Robust Graph Neural Networks via Unbiased Aggregation
摘要
评审与讨论
The paper addresses the adversarial robustness of Graph Neural Networks (GNNs) against adaptive attacks. The authors propose a unified framework for robust estimation in GNNs to understand and improve their robustness. Specifically,
-
The paper provides a unified view of -based robust graph signal smoothing, and tries to reveal the estimation bias leading to performance degradation under attacks.
-
The authors propose a robust and unbiased graph signal estimator (RUGE) to mitigate the estimation bias.
-
They develop an efficient Quasi-Newton Iterative Reweighted Least Squares (IRLS) algorithm to solve the non-smooth and non-convex estimation problem, as well as theoretical guarantees.
优点
-
The topic of the paper, robust GNN, is of interest, and a way to alleviate the bias is proposed.
-
A unified perspective on existing robust GNN works is built.
-
A new optimization algorithm, QN-IRLS, is proposed, as well as the corresponding complexity analysis.
缺点
- The manuscript is difficult to follow due to poor writing. Specifically,
1a. The phrase “without the false sense of security” in Sections 1 and 2 is unclear. What does it mean, and why is it significant in this context?
1b. The figures are hard to understand without additional details (either in captions or main text). For instance, in Figure 1, what does the x-axis represent, and how is 200% defined? How is an attack conducted? In Figure 2, how are “clean sample” and “outliers” defined? Similar issues to Figures 5 and 7.
1c. Some of the equations are unclear. For example, in Equation (1), \tilde{f} should be formatted similarly to Equation (2).
- The contribution is unclear.
2a. Regarding the unified perspective: it seems inconsistent with the experimental results in Figures 1 and 2. As shown in Figure 2, the estimator deviates further from the true mean, while in Figure 1, SoftMedian performs the worst, which is a special case of ElasticGNN with p=1 (). 2b. Regarding the optimization: the distinction and connection between QN-IRLS and other standard QN methods is missing.
- The experiments can be further improved.
3a. Only two datasets, Cora and CiteSeer, are included.
3b. While it is understood that space constraints may limit the inclusion of all interesting details in the main text, lines 304-317 occupy significant space without providing primary conclusions or insights.
问题
It would be great if the authors could respond to concerns mentioned in the Weaknesses.
局限性
The authors addressed the limitations.
Thanks for your recognition of the novelty and effectiveness of our method. We are glad to solve your concerns and answer your questions with the following illustrations.
W1a: The phrase "without the false sense of security" in Sections 1 and 2 is unclear. What does it mean, and why is it significant in this context?
Answer: We would like to clarify that the "false sense of security" is a common phenomenon in adversarial robustness [1][2]. This means that the robustness can be overestimated if the adversarial attacker is weak due to various reasons such as not directly targeting the victim model. For example, in this paper, "transfer attacks" are denoted as first attacking the surrogate GCN model and then transferring the perturbations to the specific victim model, which might be much weaker. Therefore, we need to evaluate the models under "adaptive attacks" that directly target the victim model to avoid a "false sense of security".
Ref:
[1] Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. (ICML, 2018)
[2] Mujkanovic, Felix, et al. "Are defenses for graph neural networks robust?." (NeurIPS, 2022 )
W1b: Lack of detailed caption in Figures.
Answer: Let me make them clear as follows:
- Fig 1: The x-axis represents the attack budget, which means the portion of edges allowed to be perturbed. We conduct adaptive local attack in Figure 1, where "200%" means the number of perturbed edge is up to 200% of target node’s degree. For example, if the target node has 5 neighbors, the budget 200% means the attacker can change up to 5*200%=10 edges.
- Fig 2: We mention in the main paper that we put the detailed settings including how we construct clean sample and outliers in Appendix A. In Figure 2, we define "clean sample" as the major data points following the Gaussian distribution , while "outliers" is the data points far away from major data pattern following .
- Fig 5: The x-axis represents the layer/iteration number of the algorithm, y-axis represents the objective value. This figure aims to validate the fast convergence of our QN-IRLS algorithm.
- Fig 7: This figure aims to find the reasons for better performance of RUNG, by getting insight into the distribution of node feature differences of attacked edges for different models. A smaller difference (our RUNG) means the attacker tends to perturb the edges connecting two more similar nodes, indicating that our RUNG can improve robustness by down-weighting or pruning some malicious edges that connect distinct nodes.
W1c: Some of the equations are unclear. For example, in Equation (1), should be formatted similarly to Equation (2).
Answer: We have given the detailed and clear definition of . Please kindly let us know if there are any further confusions.
W2a: Regarding the unified perspective: it seems inconsistent with the experimental results in Figures 1 and 2. As shown in Figure 2, the estimator deviates further from the true mean, while in Figure 1, SoftMedian performs the worst, which is a special case of ElasticGNN with .
Answer: Actually, Figure 1 and Figure 2 provide the results of different experiments: Figure 1 gives the empirical robustness results on real graph datasets, while Figure 2 offers the simulation experiments on synthetic Gaussian data. For the inconsistent results of SoftMedian and ElasticGNN, although these two models share common underlying characteristics, their detailed architecture designs such as different numerical solvers are different, which may cause difference in the empirical results as reported.
W2b: Regarding the optimization: the distinction and connection between QN-IRLS and other standard QN methods is missing.
Answer: The proposed QN-IRLS differs with standard QN methods in two perspectives. First, QN-IRLS iteratively constructs smooth quadratic upper bounds of the objective function so that it can deal with non-smooth objectives such as the robust penalties. Second, the proposed Quasi-Newton algorithm approximates the Hessain matrix of the quadratic upper bounds with a specifically chosen diagonal matrix, which makes the implementation and memory cost much lower than standard QN methods such as BFGS and L-BFGS that do not consider the specific structure of the problems.
W3a: Only two datasets, Cora and CiteSeer, are included.
Answer: To further validate the effectiveness, we include 2 heterophilic datasets and multiple datasets from various domains in global response.
W3b: While it is understood that space constraints may limit the inclusion of all interesting details in the main text, lines 304-317 occupy significant space without providing primary conclusions or insights.
Answer: For the experiments presented in lines 304-317, let me elaborate the primary conclusions and insights as follows:
- For "Transfer Attack", we can observe that all the transfer attacks from various surrogate models are weaker than the adaptive attacks. That indicates the necessity to evaluate the strongest adaptive attack to avoid the false sense of security emphasized in this paper.
- For "Hyperparameters", we observe that the γ in MCP has an intricate impact on the performance of RUNG. Specifically, larger γ makes RUNG closer to ℓ1-based model. While smaller γ encourages more edges to be pruned, which help RUNG to remove more malicious edges and improve the robustness, but small γ may sacrifice the clean performance a little bit.
- For "Robustness under various scenarios", we try to include more datasets or attacks further validate the consistent effectiveness of our proposed RUNG.
We believe we have fully addressed all the concerns. Please kindly let us know if there are any remaining concerns, and we are happy to further clarify.
Thanks for the responses. However, some of the answers do not fully address the questions I raised. Below, I have included further questions calling for clarification.
To your Response to W1a (RW1a for short):
I can understand the concept and agree with your arguments in the rebuttal. However, I believe the manuscript, especially the main context, should be self-contained by including important concepts like these concepts.
To RW1b:
Similar to RW1a, the main context should be self-contained. In your current manuscript, you show some figures without further captions/explanations, and directly draw your conclusion such as "our method is the best", etc.
Again, although you can explain and claim that Weakness 1 can be resolved/explained here, the confusion/missing in the writing raises concerns about the completeness/self-contain of the submitted version.
To RW2a:
I notice your notations defined in Lines 71-72, and I do not think defining the L1 norm as and the L2 norm as is suitable. Do you think an L2 norm of the vector could be the square of its L1 norm?
To RW2b:
Are these statements and comparisons shown in the manuscript?
To RW3a:
I appreciate your added experimental results.
In sum, there are two key questions (or weaknesses I think) concerning the submission: First, the paper is not self-contained and many statements can only be found in the rebuttal (or appendix). Second, it would be appreciated if the question concerning Lines 71-72 could be responded.
Thanks for your feedback, we are glad to address your further concerns as follows.
RW1a: Thanks for agreeing with our rebuttal. We would like to clarify that "the false sense of security" is a common and basic concept in the field of adversarial robustness
. More importantly, we believe our paper is indeed self-contained and have already included the explanations of "the false sense of security" in the submission.
In Lines 28-32 of Section 1, we refer to and highlight that most defenses suffer from "a false sense of security" under transfer attacks, i.e., the attacks are transferred from surrogate models, they encounter catastrophic performance drops when confronted with adaptive adversarial attacks that directly attack the victim model themselves.
In Section 2.1 (Lines 73-91), we conduct a preliminary experiment on existing popular robust GNNs against strong adaptive attacks. As observed in the experiments, all the selected robust GNNs experience a severe performance decrease and even underperform the MLPs, further validating the aforementioned "the false sense of security".
We agree with the reviewer that a more detailed and direct explanation provided in our rebuttal will enhance the clarity. We will include it in our final version.
Are defenses for graph neural networks robust? NeurIPS, 2022
Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML, 2018
RW1b: Thanks for your valuable comments. We believe our figures have been equipped with necessary information and appropriate explanations beside the figures. Specifically, we provided detailed explanations for Figure 1 in Lines 74-91, Figure 2 in Lines 142-149, Figure 3 in Lines 167-180, Figure 4 in Lines 230-235. For Figure 5-7 in the ablation study, rather than directly drawing the conclusion, we briefly discussed the phenomenon and provided explanations to support the statements and conclusion made in the main content. We greatly appreciate your feedback, and would carefully polish our paper following your suggestions.
RW2a: Thanks for pointing this out. Our definition follows the traditional definition in classic papers in this domain to make the notations consistent. For instance, in ElasticGNN [1], is defined as or , and is defined as in the special context of graph smoothing. The reason this domain uses this notation is to emphasize the fact that and show similar characteristics by exerting a linear effect on the magnitude of the vector , while exhibits a quadratic effect.
We agree with the reviewer this notation may cause potential misunderstanding. We would replace norm with -based graph smoothing penalty to avoid misunderstanding.
[1] Elastic graph neural networks. ICML, 2021.
RW2b: Thank you for your comments. In Lines 182-190, we have indeed included discussions and comparisons of related Newton-type and other algorithms [1][2][3] used to solve the problem in this paper. By highlighting the excessive computation and memory requirements of those algorithms, we have provided sufficient motivation and explanation of the design of the developed QN-IRLS algorithm in Section 3.2 (especially Line 208-219), which makes the paper self-contained.
Additionally, we would like to highlight that the Quasi-Newton method is a broad concept — an alternative to Newton’s method that approximates the inverse Jacobian or Hessian when they are unavailable or too expensive to compute (from Wikipedia: https://en.wikipedia.org/wiki/Quasi-Newton_method). We did not specifically discuss standard QN methods such as BFGS and L-BFGS because our QN method is very different from those methods, and there is no direct and strong connection between them. While we agree that including such discussion and comparison in our revision can be helpful, we respectfully disagree that this will affect whether the paper is self-contained or not.
We appreciate the valuable comments and will incorporate the content from the rebuttal into the revised version based on your feedback.
[1] Variable selection via nonconcave penalized likelihood and its oracle properties. 2001
[2] Varma, Rohan, et al. "Vector-valued graph trend filtering with non-convex penalties. 2019
[3] Zhang, Cun-Hui. Nearly unbiased variable selection under minimax concave penalty. 2010
We would like to thank you once again for the constructive feedback. We hope we have addressed the two major concerns mentioned in the further response. These justifications can greatly enhance the clarity of our paper, and we will carefully incorporate them into our revision. We really appreciate it if you could take our clarifications in the rebuttal into consideration.
To RW1b:
For Figure 1: As mentioned in the original review, Lines 74-91 indeed include the dataset, the tested baselines, and the goal. However, the details are missing, and the definition of budget in x-axis is confusing (without your rebuttals).
For Figure 2: Due to the page limit, it is understandable that the details are moved to the appendix.
For Figure 3: If the writing is complete and smooth. Given that appears only once in the figure , how to conclude that "When approaches infinity, the penalty reduces to the . Conversely, when is very small, the “valley” of near zero is exceptionally sharp, so approaches the norm and becomes a constant for a slightly larger ." from Figure 3?
To RW2a:
... In ElasticGNN [1], is defined as or , and is defined as in the special context of graph smoothing. ...
Quoting the definitions from Sec 2 (Preliminary) of ElasticGNN[1]:
We define the Frobenius norm, norm, and norm of matrix as , , and , respectively.
It seems that there is not statement such as " is defined as ".
RW1b:
Thank you for your feedback.
Regarding the x-axis of Figure 1, in the field of adversarial robustness in GNNs, "budget" is a basic and common definition. Therefore, we primarily mention that the experimental setup in this figure follows the details provided in Section 4.1. Thanks for highlighting this; we have included all the details for the figures to accommodate readers from diverse backgrounds.
For Figure 3, we have provided a clear definition and plotted the curve of . From the definition and the curve, it is not difficult to observe that as , and that when as . We initially avoided plotting multiple values of because our primary focus was to illustrate the differences between the , , and MCP penalties. However, thanks to your suggestion, we may include 2-3 different values to better demonstrate the approximation properties of the MCP penalty.
RW2a:
Thank you for your comment. In the Methods section (Section 3) on elastic GNNs,
-
the -based graph smoothing refers to method (e.g., APPNP) including the objective term ,
-
while the -based graph smoothing unifies the methods including the objective term , where .
This is how we define the terms in our paper. As mentioned in our previous response, we will update the definition to " -based graph smoothing penalty" to avoid any misunderstandings.
Thanks again for your detailed feedback, let me know if you have further questions.
Thank you for your response. I will raise my score — best of luck.
This paper uncovers that while -based models exhibit robustness against moderate adaptive attacks, their performance significantly degrades under strong adaptive attacks. The authors investigate the root cause of this phenomenon, identifying estimation bias as the culprit. To address this, they propose a robust and unbiased graph signal estimator. Extensive experiments and ablation studies confirm that their method significantly enhances GNN robustness without compromising clean accuracy.
优点
- This paper provides compelling motivating experiments, observations, and insights. Furthermore, the authors rigorously analyze the root cause of their findings, offering a persuasive and logical explanation.
- Based on these discoveries, the authors propose a simple yet effective method, well-supported both theoretically and empirically.
- Extensive experiments offer thorough analysis to demonstrate the effectiveness of the proposed method.
- The proposed method significantly outperforms other baselines under both local and global adaptive attacks.
缺点
-
The datasets considered by the authors are limited to only three citation networks. While I acknowledge that implementing adaptive attacks on new datasets and baselines is significantly cumbersome and requires considerable effort, demonstrating that the proposed methods work well on datasets from other domains would greatly enhance the applicability and recognition of this work.
-
There is no available source code for the reproduction,
问题
See the Weaknesses
局限性
The authors addressed the limitations.
Thanks for your recognition of the novelty and effectiveness of our method. We are glad to solve your concerns and answer your questions with the following illustrations.
W1: The datasets considered by the authors are limited to only three citation networks. While I acknowledge that implementing adaptive attacks on new datasets and baselines is significantly cumbersome and requires considerable effort, demonstrating that the proposed methods work well on datasets from other domains would greatly enhance the applicability and recognition of this work.
Answer: To further validate the effectiveness, we include 2 heterophilic datasets and multiple datasets from various domains in global response.
W2: There is no available source code for the reproduction.
Answer: We have sent the anonymized link to the AC in a separate comment.
I appreciate the authors' thorough response to my concerns. All issues have been addressed, leading me to increase my score.
The authors point out that some of the successful defenses for GNNs are related to l1-based graph smoothing. Motivated by the bias of l1-based estimators, the authors develop an unbiased variant (RUGE), rooted in high-dimensional statistics. To optimize under such a non-smooth objective, the authors devise a Quasi-Newton Iterative Reweighted Least Squares algorithm. In the empirical evaluation, the authors compare with a rich set of baseline GNNs and defenses on graphs of varying sizes. The authors report the performance on both non-adaptive and adaptive attacks. Moreover, the authors study the influence of hyperparameters and further complement their method with adversarial training.
优点
- The authors provide a unified view of some of the important defenses in the GNN domain
- The proposed method is well-motivated and empirically successful
- The empirical evaluation is extensive and convincing
缺点
- The paper only studies homophilic datasets
- The adversarial training is performed in the biased transductive evasion setting of Xu et al [A]. It is better to follow an inductive evasion setting as in [B]. The authors should at least note that the studied setting has limitations.
- Minor: The references in some cases do not mention the respective venue/journal (e.g., [36])
[A] Xu et al. "Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective" NeurIPS 2019. [B] Gosch et al. "Adversarial Training for Graph Neural Networks: Pitfalls, Solutions, and New Directions" NeurIPS 2023.
问题
- How does RUGE perform on heterophilic graphs?
局限性
The limitations are sufficiently addressed.
Thanks for your recognition of the novelty and effectiveness of our method. We are glad to solve your concerns and answer your questions with the following illustrations.
W1: The paper only studies heterophilic datasets
Answer: We want to point out that our RUNG (Robust Unbiased Aggregation) is a robust aggregation built-in block that can be readily incorporated into various GNN backbones. In this paper, we comprehensively evaluate the robustness on homophilic graphs with the APPNP backbone, but our RUNG can be generalized to heterophilic graphs by incorporating it into heterophilic GNNs. We include the experimental results and validate the effectiveness of RUNG on heterophilic datasets in global response.
W2: The adversarial training is performed in the biased transductive evasion setting of Xu et al [A]. It is better to follow an inductive evasion setting as in [B]. The authors should at least note that the studied setting has limitations.
[A] Xu et al. "Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective" NeurIPS 2019.
[B] Gosch et al. "Adversarial Training for Graph Neural Networks: Pitfalls, Solutions, and New Directions" NeurIPS 2023.
Answer: We evaluate RUNG under inductive evasion setting in the following table. The results show that the robustness of RUNG can be also be further improved with adversarial training under inductive setting. We will mention this setting difference in the revision following your suggestion and provide these additional results as ablation study.
Table 7: Adversarial Training vs Normal Training on RUNG in the inductive evasion setting.
| Budget | Clean | 5% | 10% | 20% |
|---|---|---|---|---|
| Normal Training | 83.2 ± 0.2 | 73.4 ± 0.3 | 69.8 ± 0.5 | 65.5 ± 0.3 |
| Adversarial Training | 82.4 ± 0.9 | 76.3 ± 0.2 | 74.3 ± 0.4 | 71.4 ± 0.6 |
W3: Minor: The references in some cases do not mention the respective venue/journal (e.g., [36])
Answer: We initially cited the most referenced version of the paper, but we will update it to the respective venue/journal in the revised version for proper citation format.
Q1: How does RUGE perform on heterophilic graphs?
Answer: Please refer to W1.
I thank for the additional experiments on heterophilic graphs and adversarial training in an inductive setting. I think the paper is convincing and I vote for acceptance.
Thank you to all the reviewers for recognizing the novelty and effectiveness of our method. Since all reviewers share concerns about the effectiveness of RUNG on heterophilic graphs and other domain-specific datasets, we will include the experimental results for several datasets in our global response as follows.
Heterophilic dataset
To evaluate the robustness on heterophilic datasets, we conduct experiments following the settings in [1]. We build our model by replacing the message passing aggregation block with RUN G in FSGNN [2] which is designed for heterophilic graph. The results summarized below showcase the advantage of our RUN G among all the compared models.
Table: Robustness on heterophilic datasets.
| Method & Dataset | Chameleon | Squirrel |
|---|---|---|
| MLP | 48.84 ± 1.66 | 30.31 ± 1.25 |
| GCN | 49.93 ± 0.70 | 31.16 ± 2.19 |
| EGCNGuard | 45.34 ± 2.80 | 27.34 ± 0.90 |
| H2GCN | 51.42 ± 1.31 | 28.41 ± 1.08 |
| FAGCN | 49.98 ± 1.27 | 33.64 ± 1.10 |
| GPRGNN | 50.42 ± 0.83 | 32.47 ± 1.36 |
| EvenNet | 52.87 ± 1.88 | 33.21 ± 0.96 |
| FSGNN | 53.64 ± 8.39 | 34.86 ± 4.48 |
| RUNG (Ours) | 58.47 ± 7.85 | 42.07 ± 2.94 |
References:
[1] Runlin Lei, Zhen Wang, Yaliang Li, Bolin Ding, and Zhewei Wei. Evennet: Ignoring odd-hop neighbors improves robustness of graph neural networks. Advances in Neural Information Processing Systems, 35:4694–4706, 2022.
[2] Maurya, Sunil Kumar, Xin Liu, and Tsuyoshi Murata. "Improving graph neural networks with simple architecture design." arXiv preprint arXiv:2105.07634 (2022).
Additional datasets
To further validate the effectiveness of our RUNG, we evaluate the robustness under adaptive global attack in additional datasets including 2 blog networks (BlogCatalog and PolBlogs), 1 photo sharing network (Flickr) and 2 citation networks (ACM and DBLP). The following results demonstrate the consistent and significant advantage of our RUNG over other models. We include the tables in the attached pdf.
The paper examines the adversarial robustness of Graph Neural Networks (GNNs) against adaptive attacks and proposes a unified framework for robust estimation. It finds that l1-based models, while resilient to moderate attacks, degrade under strong attacks due to estimation bias. To address this, the authors introduce an unbiased graph signal estimator and connect their work to existing defenses, creating an unbiased variant based on high-dimensional statistics. The proposed approach is both well-justified and empirically successful. The reviewers were satisfied with the responses provided during the rebuttal and found the paper interesting. The authors are encouraged to incorporate the comments and feedback received to strengthen their work.