Propagate and Inject: Revisiting Propagation-Based Feature Imputation for Graphs with Partially Observed Features
For graphs with missing features, we identify a critical limitation of existing propagation-based imputation methods and propose a novel scheme that overcomes this limitation.
摘要
评审与讨论
This paper identifies the problem of having low-variance channels after diffusion with mostly missing values. This happens when the available states are very similar. They propose adding random features to those channels and restarting the diffusion process with these synthetic features and the original low-variance features, which leads to higher-variance features. Experimental results show improvements on downstream tasks.
给作者的问题
论据与证据
Claims and Evidence are generally fine.
方法与评估标准
-
There is no clear motivation for injecting random node states into existing channels.
-
A channel has some meaning, so why would a random feature at a random node make sense?
-
To me, the method is more like dropping low-signal channels and then adding new channels with your proposed diffused signal.
-
As your method puts a higher influence on the synthetic features for diffusion, this seems to already be in the same direction, so you do not need the original low-variance features.
-
If the authors are convinced that the low-variance features are beneficial, an experiment should be conducted to compare it to the case when dropping the low-variance features.
-
A diffusion process with random features will provide structural information, e.g., which nodes are closely connected and far apart. This is no longer related to the original features.
-
Consequently, the proposed method can be seen as a positional encoding that adds the diffused features as additional channels to a graph.
-
Thus, I would not see your method as an imputation method but as adding structural information from which any imputation method can benefit.
-
Your method is not permutation equivariant as a random node is chosen, which should be noted but is generally fine for the applications that are considered.
-
Experiments should, therefore, compare first and foremost with methods for positional encoding, not with imputation methods.
理论论述
Theoretical claims only concern the convergence of diffusion processes, which is always given.
实验设计与分析
- To me, the experiments do not evaluate the interesting parts of this method.
It would be interesting to:
- Apply FISF to other imputation methods as additional channels.
- Compare the additional channels to other positional encodings.
- Evaluate whether removing low-variance channels matters and how many additional channels with synthetic features improve results.
补充材料
I checked the theoretical part of the Appendix.
与现有文献的关系
The paper relates nicely to literature on imputation methods. Connections to positional encodings are essential but are missing.
遗漏的重要参考文献
References to positional encodings are missing, e.g., the following:
Laplacian Positional Encoding: Dwivedi et al., Benchmarking graph neural networks, JMLR 2023. Random Walk positional encoding: Dwivedi et al., Graph neural networks with learnable structural and positional representations, ICLR 2022.
其他优缺点
I like the idea of adding structural information to tasks with missing features. There seems to be a lot of potential, I just wish that this paper went a bit deeper.
其他意见或建议
We greatly appreciate the reviewer’s detailed and perceptive comments.
First of all, we would like to clarify that propagation-based imputation methods for graph learning with missing features are designed to assign values to missing features in a way that improves downstream task performance. Accordingly, these imputation methods preserve the original dimensionality of the feature matrix in their output. In this context, our work focuses on addressing a limitation of current propagation-based imputation methods identified in this study.
Q1. A channel has some meaning, so why would a random feature at a random node make sense?
Yes, each channel has its own meaning. However, when a channel is filled with nearly identical values, such a low-variance channel contributes little to downstream tasks. Our goal is to make that channel, which has become uninformative for downstream tasks, useful. As the reviewer pointed out, while the synthetic feature may not preserve the original meaning of the channel, it helps to restore distinctiveness within the channel, leading to performance improvements in downstream tasks.
Q2-2. Evaluate whether removing low-variance channels matters.
Q2-1. If the authors are convinced that the low-variance features are beneficial, an experiment should be conducted to compare it to the case when dropping the low-variance features.
To show that observed features within a low-variance channel are beneficial in addition to the synthetic feature, we conduct additional experiments comparing FISF to FISF-OSF—a variant that performs the final diffusion process in low-variance channels using only synthetic features, without observed features. Table 33 presents the results; the superior performance of FISF highlights the importance of using low-variance observed features. FISF leverages the fact that the observed features within a low-variance channel have nearly identical values by preserving and diffusing them during the diffusion process, thereby making use of the remaining feature information.
Table 33: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2033.png
Q3-1. I would not see your method as an imputation method but as adding structural information from which any imputation method can benefit.
Q3-2. Apply FISF to other imputation methods as additional channels.
Q3-3. Compare the additional channels to other positional encodings.
We agree with the reviewer’s perspective that diffusion with synthetic features can be viewed as a way of adding structural information, and that this approach can also be applied to other imputation methods as additional channels. To address the reviewers’ concerns, we conduct additional experiments in which FISF is applied to other imputation methods as additional channels (denoted as FISF). We further compare FISF with Laplacian Positional Encoding (LPE) [1] and Random Walk Positional Encoding (RWPE) [2]. Table 34 shows the results. As shown in the table, while LPE and RWPE generally improve the performance of existing imputation methods by providing structural information, FISF consistently achieves the most significant performance improvements. Unlike positional encodings, our approach can use the feature information preserved in low-variance channels.
Table 34: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2034.png
Q4. Evaluate how many additional channels with synthetic features improve results.
In FISF, the number of channels into which synthetic features are injected is controlled by the hyperparameter , and its effect was analyzed in Appendix C.8. However, in the context of FISF, we conduct additional experiments to evaluate how many additional channels with synthetic features improve results. Table 35 shows the results. As shown in the table, even a small number of additional channels using FISF leads to substantial performance improvements. We further observe that, for each dataset, the optimal number of additional channels tends to lie near the number of channels into which the original FISF injects synthetic features.
We will cite the insightful references provided by the reviewer [1, 2] and include all the experimental results presented above in the revised manuscript.
Table 35: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2035.png
Q5. Your method is not permutation equivariant as a random node is chosen, which should be noted but is generally fine for the applications that are considered.
We agree that, while our method is not permutation equivariant due to the random node selection, this has not posed any practical issues in the applications considered. We will explicitly note this property in the revised manuscript for clarity.
[1] "Benchmarking graph neural networks." JMLR 2023.
[2] "Graph neural networks with learnable structural and positional representations." ICLR 2022.
I thank the authors for their detailed rebuttal and for conducting additional ablation studies. These seem to be very valuable in confirming the effectiveness of the proposed approach. It is a promising tool for cases when permutation equivariance is not required. I now support accepting this paper and have increased my score to 3.
We are grateful to the reviewer for taking the time to share this thoughtful rebuttal comment. Your insightful suggestions allowed us to further validate the effectiveness of our proposed approach from a different perspective. We sincerely appreciate your decision to raise your score and your support for the acceptance of our paper.
This paper targets missing data imputation for graph data. The authors highlighted that existing propagation-based methods produce nearly identical values within each channel and they contribute little to graph learning. To resolve this limitation, the authors propose a propagation-based imputation scheme that consists of two diffusion stages. First, the method imputes the data using existing propagation-based methods in which the data obtain the low-variance channels. Then the method removes all the imputed features in the low-variance channels and generates a synthetic feature by injecting random noise into a randomly selected node. Finally, the method diffuses both the observed and synthetic features to produce the final imputed features which have distinct imputed values for those channels. The experiments show that the methods increase the variance of imputed values for different channels so the graph learning tasks like semi supervised node classification and link prediction.
给作者的问题
No
论据与证据
The claims in the paper are supported by clear and convincing evidence.
方法与评估标准
The proposed methods and evaluation criteria make sense for the target problem.
理论论述
The overall proofs in the supplementary are sound.
实验设计与分析
The experimental designs is comprehensive with different data set and methods are included mainly for semi supervised node classification and link prediction. And effects of hyperparameter and ablation study are included which are detailed in the supplementary.
补充材料
I have review the supplementary material for the theoretical proof and experiment parts.
与现有文献的关系
The study is focus on graph missing data Imputation which is applicable to any domain that have graph data which suffer from missing issues such as social network that contain many missing features for people in the network.
遗漏的重要参考文献
No
其他优缺点
Strengths: 1.The writing, visualization and the organization of the paper is clear. And comprehensive experiments includes many data set and methods. In particular, ablation study, effect of hyperparameters and scalability are also included to make the work more solid.
2.Theretical proof is derived to show the convergence of diffusion stages and show why channel-wise inter-node diffusion produce similar imputed values for the channel where the known features have similar values.
其他意见或建议
No
We sincerely appreciate the reviewer’s positive evaluation of our work and the absence of noted weaknesses. We thank the reviewer for recognizing that the claims in our paper are supported by clear and convincing evidence, and for highlighting the clarity of the writing, the soundness of the theoretical analysis, and the comprehensiveness of the experiments, including ablation studies and scalability. That said, we noticed that the reviewer’s overall recommendation was “Weak Accept (i.e., leaning towards accept, but could also be rejected).” If there are any remaining concerns or suggestions for improvement that we may have overlooked, we would greatly appreciate your feedback and are ready to address them promptly.
Throughout the rebuttal period, we have conducted the following additional discussions on our proposed method to further strengthen our paper, all of which will be included in the revised manuscript:
-
Generalizability across downstream GNN architectures
(see Table 30: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2030.png) -
Ablation study on the use of synthetic features
(see Table 32: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2032.png) -
Description of the algorithm
(see Algorithm 1: https://anonymous.4open.science/r/ICML12446-AF4E/Algorithm%201.png) -
Applicability to existing imputation methods as additional channels
(see Table 34: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2034.png) -
Effect of the number of additional channels with synthetic features on performance
(see Table 35: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2035.png)
We hope that these additions and clarifications help address any remaining uncertainties and reinforce your confidence in the significance of our work. If the reviewer finds our responses satisfactory, we would be sincerely grateful if you would consider revisiting your overall recommendation.
This paper addresses the issue of missing features in graph data, which hinders the effectiveness of Graph Neural Networks (GNNs). Existing diffusion-based imputation methods often result in low-variance channels, where feature values across nodes are nearly identical, leading to poor performance in downstream tasks. The paper proposes Feature Imputation with Synthetic Features (FISF), a novel imputation scheme that mitigates the low-variance problem by introducing synthetic features. FISF consists of two diffusion stages: pre-diffusion and diffusion with synthetic features. Pre-diffusion identifies low-variance channels using existing methods like PCFI. Then, FISF injects synthetic features into randomly chosen nodes in these channels, followed by a second diffusion stage that spreads the synthetic features to increase variance and improve node distinctiveness.
给作者的问题
See Other Strengths And Weaknesses.
论据与证据
This paper introduces the low-variance issue in feature imputation. And address the issue by synthetic features.
方法与评估标准
Yes.
理论论述
no theoretical claims
实验设计与分析
yes
补充材料
Yes. The scalability parts.
与现有文献的关系
This may benefit the missing issue in the graph-structured data. However, the missing issues have been explored well before.
遗漏的重要参考文献
I do not find the missing essential references.
其他优缺点
Pros:
FISF introduces a new perspective on feature imputation by addressing the low-variance issue with synthetic features.
This approach shows promising results in experiments.
Cons: This work seems to show its advantage in especially large missing rate, such as 0.995 and 0.999. However, such a large missing rate is impractical in real applications.
There is no comparison regarding scalability and efficiency in the main body.
其他意见或建议
It is better to provide a description of the algorithm in the paper.
It is highly recommended to put the analysis in the appendix to the main body. The reorganization of the paper will greatly enhance this paper.
We sincerely thank the reviewer for the thoughtful questions and valuable suggestions to further improve our work.
Q1. This work seems to show its advantage in especially large missing rate, such as 0.995 and 0.999. However, such a large missing rate is impractical in real applications.
Our FISF consistently demonstrates superiority across various missing rates (), including low , as shown in Figure 3 in the manuscript. As the reviewer mentioned, the performance gain obtained with FISF diminishes as the missing rate decreases. However, this is natural since a smaller means fewer missing features to impute, making it difficult to achieve a significant improvement solely through the superiority of the imputation method. Nevertheless, FISF consistently shows its effectiveness at even low .
Furthermore, addressing high rates of missing features is an important issue in real-world scenarios. As data sources become more diverse and abundant, the prevalence of highly incomplete data is also increasing. Consequently, handling large missing rates has drawn significant attention across various domains, including semiconductors [1], healthcare [2], and transportation [3], where datasets often exhibit extreme missing rates of 97.5%, 99.98%, and 99.99%, respectively.
[1] "Bayesian nonparametric classification for incomplete data with a high missing rate: an application to semiconductor manufacturing data." IEEE Transactions on Semiconductor Manufacturing (2023).
[2] "Temporal Belief Memory: Imputing Missing Data during RNN Training." IJCAI 2018.
[3] Li, Jinlong, et al. "Dynamic adaptive generative adversarial networks with multi-view temporal factorizations for hybrid recovery of missing traffic data." Neural Computing and Applications 35.10 (2023): 7677-7696.
Q2-1. There is no comparison regarding scalability and efficiency in the main body.
Q2-2. It is highly recommended to put the analysis in the appendix to the main body. The reorganization of the paper will greatly enhance this paper.
We agree that scalability and efficiency are important considerations when evaluating imputation methods. To demonstrate the effectiveness and validity of FISF, we conducted extensive and in-depth analyses. However, due to the strict 8-page limit, it was challenging to include these analyses in the main body in addition to presenting the core results. As noted in Appendix C.5 and C.6, we have already provided a complexity analysis and empirical results demonstrating the scalability of FISF. Since the final versions of accepted papers are allowed one additional page, we will reorganize the paper by incorporating the analyses from Appendix C.5 and C.6 into the main body in response to the reviewer’s suggestion. We sincerely appreciate the reviewer’s insightful suggestion and believe that this revision will further strengthen the presentation of our paper. If there are any remaining concerns or suggestions for improvement, we would be happy to receive your constructive feedback and are fully prepared to address any remaining points promptly.
Q3. It is better to provide a description of the algorithm in the paper.
We agree that providing a description of the algorithm is helpful for improving the clarity of the proposed method. In response to the reviewer’s suggestion, we have written Algorithm 1 and will include it in Section 4 of the revised manuscript.
Algorithm 1: https://anonymous.4open.science/r/ICML12446-AF4E/Algorithm%201.png
In this paper, the authors introduce FISF, a novel approach for graph feature imputation. FISF effectively mitigates the low-variance channel problem by strategically injecting synthetic features, thereby enhancing performance in both semi-supervised node classification and link prediction tasks across a wide range of missing rates.
给作者的问题
- The author has supplemented a lot of experiments. Haven't they considered publishing this research in a journal?
- The generation methods of missing features only consider two situations: structural missing and uniform missing. Are these two situations common in real-world scenarios?
- The author uses Grid search to determine the hyperparameters. Since there are quite a lot of hyperparameters and multiple experiments are needed, how long does it actually take to train the model once?
论据与证据
The claims made in the submission are supported by clear and convincing evidence.
方法与评估标准
Yes
理论论述
Yes
实验设计与分析
Yes
补充材料
Yes. All parts.
与现有文献的关系
The key contributions of the paper is important to the broader scientific literature.
遗漏的重要参考文献
No
其他优缺点
- Through numerous experiments, FISF was evaluated on multiple benchmark datasets. The results show that FISF significantly improves the performance of semi-supervised node classification and link prediction tasks, proving the effectiveness of the method.
- The convergence of the diffusion stage was theoretically proven.
- The method is novel as it is the first research to apply synthetic features to imputation.
其他意见或建议
- All the compared baselines are from before 2023. Since I'm not familiar with this field, are there any other more advanced baselines?
- Only GCN is used as the backbone. Other graph neural networks, such as GIN, can be considered to verify the structural generalizability of the method proposed by the author.
We sincerely thank the reviewer for the detailed feedback and insightful questions that help us further improve our work.
Q1. Since I'm not familiar with this field, are there any other more advanced baselines?
We appreciate the reviewer’s thoughtful question. Before submitting the paper, we conducted a thorough investigation of recent methods and included all state-of-the-art baselines relevant to graph learning with missing features. We have closely followed recent advances and carefully examined the literature in this area. Propagation-based feature imputation methods, including FP and PCFI, have demonstrated exceptional performance, and researchers have recently focused on extending the existing methods to new applications [1, 2]. In contrast, our study identifies a key limitation of current propagation-based methods and proposes a solution that effectively addresses this issue, achieving significant performance improvements. We will incorporate very recent work on propagation-based feature imputation, thereby ensuring that our paper reflects the most up-to-date developments in the field.
[1] "Gene-Gene Relationship Modeling Based on Genetic Evidence for Single-Cell RNA-Seq Data Imputation." NeurIPS 2024
[2] "Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features." ICLR 2025.
Q2. Other graph neural networks, such as GIN, can be considered to verify the structural generalizability of the method proposed by the author.
To verify the structural generalizability of the proposed method, FISF, we conduct additional experiments using GIN as the downstream network for the imputation methods. Table 30 presents the semi-supervised node classification results at a missing rate of 0.995. As shown in the table, FISF consistently outperforms state-of-the-art methods across all datasets and missing settings, demonstrating its strong generalizability across datasets, missing settings, and downstream network architectures. We will include this table in Section 5 of the revised manuscript to emphasize this important discussion.
Table 30: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2030.png
Q3. The author has supplemented a lot of experiments. Haven't they considered publishing this research in a journal?
We greatly appreciate the reviewer’s encouraging comment. We submitted this work to ICML to receive timely feedback and to make a prompt contribution to the long-standing topic of missing values in the machine learning community. To demonstrate the generalizability and validity of our method, we conducted extensive experiments and thorough analyses in the submitted paper.
Q4. Are structural missing and uniform missing common in real-world scenarios?
Structural missing and uniform missing, where the features of randomly selected nodes and randomly selected feature values in the feature matrix are removed, respectively, are categorized as Missing Completely At Random (MCAR) among missingness mechanisms. MCAR is the most commonly assumed setting in the missing data community [3, 4]. Our proposed FISF consistently demonstrates its effectiveness under both missing settings.
To further validate the effectiveness of FISF beyond MCAR settings, we also conducted experiments under Missing Not At Random (MNAR) scenarios in Appendix C.3. In MNAR, the probability of missingness depends on the unobserved values themselves. For these experiments, we designed two MNAR settings: MNAR-I and MNAR-D. In MNAR-I, the probability that a feature is missing increases as the feature value increases; in MNAR-D, vice versa. Table 5 in the manuscript shows classification accuracy in semi-supervised node classification on the OGBN-Arxiv dataset under MNAR settings. The results reveal that FISF consistently outperforms the baselines across both MNAR settings, thereby demonstrating its effectiveness even in MNAR scenarios.
[3] "Gain: Missing data imputation using generative adversarial nets." ICML 2018.
[4] "Handling missing data via max-entropy regularized graph autoencoder." AAAI 2023.
Q5. How long does it actually take to train the model once?
To address the reviewer's concern, we report the average training and hyperparameter tuning time of the FISF model, measured on a single NVIDIA GeForce RTX 2080 Ti GPU and an Intel Core i5-6600 CPU at 3.30 GHz. As shown in Table 31, training a single FISF model takes only a few minutes. Despite using grid search, the efficiency of FISF enables the model for OGBN-Arxiv, which contains 169,343 nodes, to complete hyperparameter tuning in less than a day, while other FISF models require only a few hours. We provide a detailed discussion of the complexity and scalability of FISF in Appendix C.5 and Appendix C.6, respectively. We will add this discussion and table to Appendix C.5 in the revised manuscript.
Table 31: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2031.png
Thank you for your responses.
My questions have all been addressed. However, since I'm completely unfamiliar with this field, I will temporarily keep my score unchanged to avoid interfering with the decisions of other reviewers.
We are pleased that our responses have addressed all of the reviewer’s concerns. We sincerely appreciate the reviewer’s insightful feedback, which has contributed to further improving our paper, and their continued support for its acceptance.
This work identifies a limitation of previous works for learning on graphs with missing features, that being the output channels for feature imputation have low-variance. To solve this problem, the authors diffuse the observed features with injected random noise to produce final imputed features. Their method, FISF, is compared to several baselines on standard node classification tasks.
给作者的问题
(1) Is there any theory to corroborate the reason that these low-variance channels are negatively impacting downstream node classification tasks? On a related note, why does the injection of synthetic features improve downstream tasks? Theory on the “why” would make this work incredibly strong.
(2) Why perform channel-wise diffusion? What if all channel features are determined by the synthetic features only? Are there any studies/ablations?
论据与证据
The empirical results support the claims that FISF reduces the number of channels with low-variance as compared to other methods such as FP and PCFI. Furthermore, on the task setting where node features are removed at high rates, FISF demonstrates superior performance on node classification tasks.
方法与评估标准
The methods are sound and evaluation is standard for this problem.
理论论述
The paper does not emphasize any theoretical claims. However, the work provides sufficient support for their claim in Sec 4.3 within the appendix.
实验设计与分析
The experimental design is appropriate for the graph learning tasks.
补充材料
I read through Sec. A and B of the appendix, and briefly looked through Sec. C.
与现有文献的关系
Works investigating graph learning using data with missing node features largely depend upon explicitly/implicitly predicting the values of the missing data. This work identifies one of the issues with a naive implementation of feature imputation, where many output channels have low-variance. This understanding will be important in future works.
遗漏的重要参考文献
To my knowledge, the relevant literature is discussed.
其他优缺点
The empirical results are very strong. The addition of theoretical justifications would make this work even stronger. See questions.
其他意见或建议
See questions.
We greatly appreciate the reviewer’s positive feedback on the strength of our empirical results and theoretical justifications. We also appreciate the insightful questions, which provide valuable guidance for further enhancing our paper.
Q1. Is there any theory to corroborate the reason that these low-variance channels are negatively impacting downstream node classification tasks? On a related note, why does the injection of synthetic features improve downstream tasks? Theory on the “why” would make this work incredibly strong.
We would like to respectfully clarify that our claim is not that low-variance channels produced by propagation-based imputation methods negatively impact downstream node classification tasks, but rather that they contribute little to them. From a theoretical perspective, a zero-variance channel—corresponding to the first eigenvector of the graph Laplacian—is regarded in the literature as an example of zero expressiveness, as it is not useful for discriminating between nodes [1, 2]. Based on this, we identify the issue that existing propagation-based imputation methods produce channels with near-zero variance in their output. We experimentally confirm that these low-variance channels contribute very little to downstream tasks, as shown in Figure 1b and Appendix C.7 of the manuscript.
We theoretically prove that propagation-based feature imputation produces low-variance channels when all observed features in a given channel have the same value. To prevent a group of nearly identical values within a channel from diffusing only themselves and consequently forming a low-variance channel, we inject a synthetic feature with a randomly sampled value that is likely to differ from the existing known values, allowing it to participate in the diffusion process. As a result, our FISF effectively increases the variance of low-variance channels in its output matrix, as shown in Figure 1a, Figure 11, and Figure 12 in the manuscript. This increase in channel variance leads to significant performance improvements across various downstream tasks and diverse domains, as shown in Figure 3, Table 1, Table 2, and Table 4 in the manuscript. In summary, synthetic feature injection enables low-variance channels to overcome their lack of distinctiveness and recover expressiveness.
Appendix D.1 in the manuscript provides justification for synthetic feature injection, including conceptual explanation and in-depth analysis on channel variance. We will add this discussion to Appendix D.1 of the revised manuscript, interpreting the low-variance problem through the lens of spectral graph theory.
[1] Chung, Fan RK. Spectral graph theory. Vol. 92. American Mathematical Soc., 1997.
[2] Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17 (2007): 395-416.
Q2. Why perform channel-wise diffusion? What if all channel features are determined by the synthetic features only? Are there any studies/ablations?
In Appendix C.1 of the manuscript, we conducted an ablation study by adjusting each hyperparameter of FISF and the number of synthetic features injected per channel to analyze the effectiveness of its components. In addition to this study, and to address the reviewer’s concern, we further conduct an additional ablation study on the use of synthetic features. We compare the performance of FISF and three of its variants, depending on how synthetic features are injected and utilized within a low-variance channel.
- FISF-A (only synthetic feature injection): A synthetic feature is injected but not used in the diffusion process.
- FISF-B (fully synthetic features): All missing features are directly replaced with randomly sampled synthetic values without any diffusion.
- FISF-C (diffusion only with a synthetic feature): Diffusion is performed using only the synthetic feature, with known features removed.
Table 32 presents the results of semi-supervised node classification. As shown in the table, simply injecting synthetic features, or performing diffusion using only the injected synthetic feature without any observed features within the channel, results in significantly worse performance compared to the original FISF. The reason for performing channel-wise diffusion using both a synthetic feature and the observed features within a channel is that it enables the output channel to capture both structural information and the feature information from the observed features simultaneously. First, since the diffusion process is based on the graph structure, the diffusion of the synthetic feature can encode structural information. In addition, by preserving the original values of the observed features during the diffusion process, the feature information can also be retained.
We will include this extended ablation study in Appendix C.1 of the revised manuscript.
Table 32: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2032.png
Thank you to the authors for their clarifications. I would agree now with the understanding that low-variance channels provide little contribution, and synthetic features would enable these low-variance channels to have more discrimination. I suppose a natural follow-up is if there is a better strategy to choose these synthetic features than sampling from a random distribution, but this would probably be outside the scope of the claims for this work.
We sincerely thank the reviewer for taking the time to provide this thoughtful rebuttal comment. We are glad to hear that our clarifications have addressed the reviewer's concerns.
As the reviewer insightfully suggested, exploring alternative strategies for generating synthetic features is indeed a meaningful direction. To this end, we conduct an extensive ablation study comparing various value assignment strategies for synthetic features. Since FISF does not involve a learning process during imputation, statistical approaches may serve as the most reasonable alternatives to random sampling. Specifically, we compare the performance of FISF variants using different value assignment strategies, including the max, min, mean, and median of the observed features. We also evaluate a variant called Channel-wise Mean + Std, which statistically determines synthetic feature values on a per-channel basis. The results are presented in Table 36. As shown in the table, the original FISF consistently achieves the best performance. We believe this performance gain stems from the increased diversity across feature channels in the imputed matrix, facilitated by the use of randomly sampled values. We will include this important discussion and the corresponding experimental results on value assignment strategies for synthetic features in the revised manuscript. If the reviewer has any suggestions for a potentially more promising strategy, we would warmly welcome them.
Additionally, we explored the effects of the magnitude of synthetic feature values, as reported in Appendix C.9 of the manuscript.
If the reviewer's follow-up question has been fully addressed, as well as the previous concerns, we would sincerely appreciate your reconsideration of the overall recommendation.
Table 36: https://anonymous.4open.science/r/ICML12446-AF4E/Table%2036.png
Post-rebuttal, all reviewers have converged on an acceptance recommendation, with their primary concerns addressed. After reviewing the paper and rebuttal, I agree that the work makes a meaningful contribution to missing data imputation for graph data. I recommend acceptance. Please ensure that the necessary revisions are incorporated into the final version.