[W2] Additionally, a category of related work on non-iid data, particularly doubly robust estimators, IPW, and representation-based methods, appears to be missing from the review. Including this literature [1, 2, 3] could provide a more comprehensive view of the field.

Response: We sincerely thank the reviewers for providing three doubly robust estimators on non-iid data. We will restructure the related work section to include discussions of these studies (with track changes marked in blue). From these works, we observe that their approaches assume the interference graph aligns with the known social network. For example, DRLearner (Leung et al., 2022) explicitly models interference within a social network, while Net-TMLE (Ogburn et al., 2024) and GDML (Khatami et al., 2024) rely on pre-specified exposure mappings and assume interference is limited to direct neighbors. These strong assumptions often fail to hold in real-world applications. To address these limitations, we propose CauGramer, an interference-agnostic Causal Graph Transformer framework designed to enable robust causal estimation without assuming a known interference structure.

Additionally, we discuss related work on reweighting-based (IPW) and representation-based (IPM) methods in lines 120–140. In Table 1, we summarize the characteristics of representative algorithms for non-iid data, categorizing them under “Reweighting,” “Representation,” and “Attention” columns. This provides a comprehensive comparison across different methodological approaches.

[W3] Since doubly robust estimators represent the state-of-the-art for causal parameter inference and have advantages over IPW-based methods, adding at least one of these approaches as a baseline would further strengthen the comparisons [1, 2, 3].

[Comparison Baselinses: DRLearner, GDML, GDML w/o Focal Set] In our main experiments (Tables 2 and 3), we have included a doubly robust estimator (DRLearner) as a baseline comparison method. As stated in Lines 471–473, “For the sake of fairness, we modify all non-interference methods by incorporating neighbors’ treatment and social networks as additional inputs.” The modified doubly robust estimator used in our experiments closely aligns with the DRLearner proposed by Leung et al. (2022), with the key difference being our use of a GCN instead of a GNN for information aggregation—a modification we believe is an improvement.

Additionally, we include GDML and GDML w/o FSet (Focal Set) from Khatami et al. (2024) as two additional baselines. We do not include Net-TMLE (Ogburn et al., 2024) baseline as it is a semiparametric approach with strong assumptions that are often difficult to satisfy in real-world applications.

Table: Heterogeneous Treatment Effects Estimation on BlogCatalog (BC)

Method	AME	APE	ATE	IME	IPE	ITE
DRLearner
GDML
GDML w/o FSet
CauGramer

Table: Heterogeneous Treatment Effects Estimation on Flickr Dataset

Method	AME	APE	ATE	IME	IPE	ITE
DRLearner
GDML
GDML w/o FSet
CauGramer