PaperHub
6.3
/10
Poster4 位审稿人
最低6最高7标准差0.4
7
6
6
6
3.0
置信度
正确性3.0
贡献度3.0
表达2.8
NeurIPS 2024

Iteratively Refined Early Interaction Alignment for Subgraph Matching based Graph Retrieval

OpenReviewPDF
提交: 2024-05-16更新: 2024-11-06
TL;DR

Improved neural interaction modelling of graph pairs for subgraph matching based retrieval.

摘要

Graph retrieval based on subgraph isomorphism has several real-world applications such as scene graph retrieval, molecular fingerprint detection and circuit design. Roy et al. [35] proposed IsoNet, a late interaction model for subgraph matching, which first computes the node and edge embeddings of each graph independently of paired graph and then computes a trainable alignment map. Here, we present $IsoNet++$, an early interaction graph neural network (GNN), based on several technical innovations. First, we compute embeddings of all nodes by passing messages within and across the two input graphs, guided by an *injective alignment* between their nodes. Second, we update this alignment in a lazy fashion over multiple *rounds*. Within each round, we run a layerwise GNN from scratch, based on the current state of the alignment. After the completion of one round of GNN, we use the last-layer embeddings to update the alignments, and proceed to the next round. Third, $IsoNet++$ incorporates a novel notion of node-pair partner interaction. Traditional early interaction computes attention between a node and its potential partners in the other graph, the attention then controlling messages passed across graphs. We consider *node pairs* (not single nodes) as potential partners. Existence of an edge between the nodes in one graph and non-existence in the other provide vital signals for refining the alignment. Our experiments on several datasets show that the alignments get progressively refined with successive rounds, resulting in significantly better retrieval performance than existing methods. We demonstrate that all three innovations contribute to the enhanced accuracy. Our code and datasets are publicly available at https://github.com/structlearning/isonetpp.
关键词
Graph Neural NetworksGraph Retrieval

评审与讨论

审稿意见
7

The paper introduces EINSMATCH, an early interaction graph neural network for subgraph matching and retrieval that maintains and refines explicit alignments between query and corpus graphs. EINSMATCH has several novelties: computing embeddings guided by injective alignments between graphs, updating alignments lazily over multiple rounds, and using node-pair partner interactions. The model learns to identify alignments between graphs despite only having access to pairwise preferences during training, without explicit alignments. Experiments on several datasets show that EINSMATCH significantly outperforms existing methods.

优点

  1. The analysis of GMN’s worse performance compared to IsoNet (in Section 1) is intriguing.
  2. Extensive experimental results are presented with ample baselines and std errors included as well for most numbers reported.
  3. Source code with data splits is released for ease of reproducing the results.
  4. A large amount of ablation and parameter sensitivity studies are performed.

缺点

  1. The datasets used in the experiments are relatively small, with at most ~20 nodes.
  2. It would be interesting to see the transfer ability of the learned model across datasets, e.g. training on PTC and testing on AIDS, to see how the proposed model would handle such challenging but realistic scenarios.

问题

N/A

局限性

N/A

作者回复

We thank the reviewer for their insightful review. The weaknesses raised in the review are addressed below.

The datasets used in the experiments are relatively small, with at most ~20 nodes.

In early experiments, we found relatively small size of query graphs to actually present more challenge to the learner. Given our distant supervision, there are a larger number of promising alignments between query and corpus graphs that may lead to the relevant supervision.

During the rebuttal, we performed experiments with graph pairs generated from the Cox2 dataset. Here, we considered 100 query graphs with V[25,35]|V|\in[25,35], and 500 corpus graphs with V[40,50]|V|\in[40,50]. The following table shows the results.

ModelsTest mAP score on Cox2
NeuroMatch0.5036
EGSC0.4727
GREED0.1874
GEN0.6590
GMN0.6818
IsoNet (Node)0.7641
IsoNet (Edge)0.7356
EinsM. (Node) Lazy0.7291
EinsM. (Node) Eager0.7393
EinsM. (Edge) Lazy0.8302
EinsM. (Edge) Eager0.7949

We observe that our method EinsMatch (Edge) Lazy outperforms the baselines significantly. The node variant of our model is marginally outperformed by IsoNet (Node).

It would be interesting to see the transfer ability of the learned model across datasets, e.g. training on PTC and testing on AIDS, to see how the proposed model would handle such challenging but realistic scenarios.

During the rebuttal period, we performed experiments evaluating the transfer ability of the learned models across datasets. In particular, we chose the weights for each model trained using the AIDS and Mutag datasets respectively and evaluated them on all six datasets. The results are shown below. We observe that despite a zero-shot transfer from one of the datasets to all others, variants of EinsMatch show the best accuracy.

Models Trained on AIDS

Test Datasets \toAIDSMutagFMFRMMMR
GraphSim0.3560.2250.1920.1980.2100.215
GOTSim0.3240.2750.3700.3390.3140.361
SimGNN0.3410.2640.3740.3440.3310.383
EGSC0.5050.2550.4730.4510.4470.499
H2MN0.2670.2720.3190.2810.2620.297
NeuroMatch0.4890.2870.4420.4030.3860.431
GREED0.4720.3070.4770.4520.4360.49
GEN0.5570.2910.4450.4270.4370.496
GMN0.6220.3420.5690.5440.5320.588
IsoNet (Node)0.6590.4590.6120.5620.5880.640
IsoNet (Edge)0.6900.4680.6200.5680.6240.627
EinsM. (Node) Multi Layer0.7560.6850.8250.7670.7810.794
EinsM. (Edge) Multi Layer0.7950.6830.8000.7510.7920.785
EinsM. (Node) Multi Round0.8250.7020.8280.7770.8000.825
EinsM. (Edge) Multi Round0.8470.7410.8460.7990.8330.836

Models Trained on Mutag

Test Datasets \toAIDSMutagFMFRMMMR
GraphSim0.1880.4720.1900.1930.2050.198
GOTSim0.1940.2720.1850.1920.2020.182
SimGNN0.2060.2830.2030.2090.2200.195
EGSC0.2960.4760.3910.3330.3090.355
H2MN0.2090.2760.2040.2070.2230.197
NeuroMatch0.2750.5760.3680.3040.3040.325
GREED0.3280.5670.3880.3350.3560.37
GEN0.2780.6050.3590.3080.3120.33
GMN0.2990.710.4340.3610.3890.394
IsoNet (Node)0.4580.6970.5030.4560.4460.486
IsoNet (Edge)0.4720.7060.4990.4380.4670.489
EinsM. (Node) Multi Layer0.6010.8100.6950.6110.6280.614
EinsM. (Edge) Multi Layer0.5270.8050.5580.5070.5600.563
EinsM. (Node) Multi Round0.6450.8510.6790.6260.6520.655
EinsM. (Edge) Multi Round0.6250.8580.6390.5980.6340.650
评论

Thank you for your rebuttal.

审稿意见
6

This work proposes a GNN architecture for graph matching problem. The innovations in the work mainly lie in the early interaction between the query graph and the corpus graph, and the node-pair matching setting, as well as the lazy update technique in each round. Empirical results show that this work significantly outperforms the baselines on all the datasets.

优点

  • The writing of the paper is clear and good.
  • The methodology is solid, and the illustration is informative.
  • The results are very promising, both in predictive performance and runtime. And the ablation experiments are well-designed.

缺点

  • Some background part is not well explained, for example the graph matching problem, and the optimal transport problem. Readers may need external resources to understand them. Of course it is hard to explain everything in detail in the main paper.

问题

  • Any related work indicating early interaction is good? As mentioned in line 36.
  • In line 124 the notation AqPAcPTA_q \leq PA_c P^T is a bit confusing.
  • In equation 5, what is the sign of τ\tau? If it's positive then you're encouraging the entropy to be higher, which means the entries of P are more uniform? That is a bit counter-intuitive.
  • What is the complexity of Sinkhorn Knopp algorithm and Gumbel Sinkhorn?
  • Is the K-layer GNN in each round t shared?
  • Any intuition why lazy updates works better? Is it easier to optimize or other aspects?
  • I don't quite understand why the work is not differentiable on nodes with different classes, as mentioned in appendix A.
  • Is there other measure between the PP and PP^*? I feel Tr(PTP)Tr(P^TP^*) does not capture all the information of the matrices.

局限性

As mentioned by the authors, the quality of the matching relies on the distance function, which is a general problem for graph matching works. Also I am a little dubious about the time complexity of the P matrix update, in the experiments it is more expensive than the baseline GMN.

作者回复

We thank the reviewer for their insightful feedback. The questions raised in the review are addressed below.

Any related work indicating early interaction is good?

The GMN paper itself provides some evidence: GMN-match (early cross-graph interaction) is slower but convincingly superior in terms of accuracy to GMN-embed (late interaction between whole-graph representations). Other domains provide strong supporting evidence [a,b]. E.g., in textual entailment (NLI) tasks [b], where we need to classify if sentence s1 implies sentence s2, late comparison of (say) transformer-based embeddings of the two sentences, computed separately, is convincingly beaten by injecting s1 concat s2 into a comparable transformer and fine-tuning a classification head for the NLI task.

[a] A. Lai and J. Hockenmaier. Learning to predict denotational probabilities for modeling entailment.

[b] Entailment as Few-Shot Learner. Sinong Wang, Han Fang, Madian Khabsa, Hanzi Mao, Hao Ma. arXiv:2104.14690v1

In line 124 AqPAcPA _q\le PA _c P^{\top} is a bit confusing.

By AqPAcPA _q\le PA _c P^{\top}, we mean the constraint holds true for each entry, i.e., Aq[u,v](PAcP)[u,v]A _q[u,v]\le (PA _c P^{\top})[u,v] for all u,v.

Suppose AqA _q is the node-node adjacency matrix of the query graph and AcA _c is the node-node adjacency matrix of the corpus graph. Suppose, we could find a permutation PP of the node numbering (i.e., rows and columns) of AcA _c, which is easily verified as PAcPTP A _c P^T, then GqGcG _q\subseteq G _c implies that every edge in AqA _q will also be present in PAcPTP A _c P^T. In other words, if Aq[u,v]=1A _q[u,v]=1 then (PAcPT)[u,v](P A _c P^T)[u,v] must also be 1. This leads to the stated inequality.

Eq 5: the sign of τ\tau

Yes, we indeed want to make the entropy of PP a bit higher as this allows us to convert the hard permutation matrix into a smooth doubly stochastic matrix. To gain basic intuition into this, consider the argmax to softmax conversion. The selection maxiai\max _i a _i, given a vector of real numbers a\vec{a}, is rewritten as maxzΔza\max _{\vec{z}\in \Delta} \vec{z} \cdot \vec{a}, where Δ\Delta is the unit simplex (izi=1\sum _i z _i =1). Adding a maximizer of the entropy of zz readily yields the softmax function. Just like adding an entropy maximizer turns a vector of logits into a softmax distribution, adding an entropy maximizer turns a matrix of logits into a doubly stochastic soft permutation. For technical details, see Cuturi’s classic paper and related works [10, 26].

[10] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26:2292–2300, 2013.

[26] G. Mena, D. Belanger, S. Linderman, and J. Snoek. Learning latent permutations with gumbel-sinkhorn networks. arXiv preprint arXiv:1802.08665, 2018.

What is the complexity of Sinkhorn Knopp algorithm and Gumbel Sinkhorn?

As mentioned in Appendix D.5, both these algorithms consist of some number of alternating row and column scalings. If the input matrix is N×NN\times N, then each scaling takes O(N2)O(N^2) time. The number of scaling iterations is usually a small fixed constant (20 in our experiments).

Is the K-layer GNN in each round t shared?

Yes. We regard the same GNN as being trained after each refinement applied to the proposed (soft) permutation.

Any intuition why lazy updates works better?

Please refer to the global response.

why the work is not differentiable on nodes with different classes.

We meant that, at present, EinsMatch is not coded to account for node or edge labels. Our matching process is purely structural. If there are hard constraints that red nodes in the query graph can match only red nodes in the corpus graph, or there is an additional cost in the form of the difference in their degree of redness, we need to investigate if Sinkhorn-style updates will survive zeroing out parts of PP. The problem seems complicated enough for a separate piece of work.

Is there other measure between the P and P*? I feel Tr(PP)Tr(P^\top P^*) does not capture all the information of the matrices.

Indeed, other measures like Frobenius distance PPF\Vert P-P^*\Vert _F may be used. In Figure B in global-rebuttal-pdf, we updated the histograms, which reveal similar observations as with the Trace metric. Note that PPF2=2V2Tr(PTP)\Vert P-P^*\Vert _F ^2 = 2|V| - 2\, Tr(P^TP^*) if P and P* are both hard permutation matrices, but not necessarily the same if they are soft permutation matrices.

Also I am a little dubious about the time complexity of the P matrix update, in the experiments it is more expensive than the baseline GMN.

L716–L720 give an idea of the clock time needed to train EinsMatch variants. Appendix G.6 gives an idea of the inference times involved, particularly, contrasted against IsoNet (edge) — we find these times within the same order of magnitude (usually within 10% and 40% of each other). Figures 6 and 17 show that, at any fixed level of inference time investment, EinsMatch (lazy) has the most favorable MAP envelope.

Following table shows the time splits between GNN embedding computation and matrix P updates, in percentage of time spent performing that step.

Models% Embedding Computation% Matrix P**P** updates
EinsMatch (Node) Multi Round34.147.8
EinsMatch (Node) Multi Layer13.768.3
EinsMatch (Edge) Multi Round54.939.9
EinsMatch (Edge) Multi Layer19.776.3

These percentages don't add up to 100% because of contributions from other steps like node encoding and score computation. The summary is that yes, updating PP has a non-negligible but non-overwhelming cost, with the benefit of better accuracy.

As mentioned by the authors, the quality of the matching relies on the distance function.

We absolutely agree with you. The same is the case for matching passages, images, video and audio. Good distance function design, mirroring human judgment, is critical to applications.

评论

Thank you for the detailed reply and your effort. I would like to raise my score to 6.

审稿意见
6

This paper proposes an early interaction network for subgraph matching. The proposed method enables (1) early interaction GNNs with alignment refinement at both node and edge levels, where the alignments are refined by GNNs; (2) eager or lazy alignment updates, and (3) node-pair partner interaction, instead of node partner interaction, which improves the model efficiency and embedding quality. The paper includes extensive experiments on real-world datasets, demonstrating the superiority and effectiveness of the proposed approach.

优点

  1. The paper is well-written and easy to understand. The notations are clearly clarified and explained.
  2. The idea of iteratively early interaction between graphs for more accurate subgraph retrieval is novel and reasonable to me, which is also verified by the experimental results.
  3. The performance superiority of the proposed method against other baselines is significant and promising.

缺点

  1. Considering the high complexity of the proposed method, an ablation study (or a hyperparameter sensitivity study) would be beneficial to provide a clearer understanding of the success of the proposed method. In addition, how did the authors tune hyperparameters (apart from T and K) for their approach and the competitors?
  2. Experiments are only conducted on small graph datasets (with an average of 11 nodes). It would be beneficial to discuss or evaluate the applicability of the proposed approach to larger graph datasets.

问题

  • What is the motivation and intuition behind lazy updates? In my understanding, "lazy" means updating the alignment map after every K GNN layers, and "eager" means updating the alignment map and letting the map affect the embedding at each layer. Is there any explanation for the consistently better performance of "lazy" updates?
  • What do the multiple points with the same color in Figure 6 represent? Do they indicate variants with different hyperparameters?

局限性

Yes. This paper discussed the limitations in terms of efficiency and

作者回复

We thank the reviewer for their constructive feedback. The questions raised in the review are addressed below.

ablation study (or a hyperparameter sensitivity study).. how did the authors tune hyperparameters?

We used hyperparameters from the code of IsoNet, which, thanks to the authors, also made the baseline implementations public. In IsoNet's code, the authors suggest that they already tuned the margin in the ranking loss and other hyperparameters for all the baselines. Hence, we used them as-is. We did tune the margin in the ranking loss of three baselines (EGSC, GREED, H2MN) which are not included in IsoNet codebase. More details are in line 703 onwards in Appendix F.4. We run all methods for an unlimited number of epochs, with the patience parameter 50, i.e., training is stopped only if the performance on the validation set does not improve for 50 epochs.

We would like to highlight that, for EinsMatch, we did not tune any hyperparameters beyond IsoNet. We performed a hyperparameter sensitivity study (not tuning) for TT and KK, which shows that for wide ranges of TT and KK, EinsMatch performs better than baselines. We report results for T=3,K=5T=3, K=5 in the main table, because its running time is comparable to most baselines. Increasing TT and KK improves accuracy at more computation cost.

We did perform ablations on the node pair partner interaction component in Table 4 and the effect of lazy updates in Table 3 in the main paper. We write the results here.

The following table compares the performance of node partner interaction with EinsMatch, which involves node pair partner interaction. Node pair partner interaction consistently outperforms node partner interaction for both eager and lazy variants of EinsMatch (Node).

ModelAIDSMutagFMFRMMMR
Node partner (Lazy)0.7760.8290.8510.8190.8440.840
Node pair partner (Lazy)0.8250.8510.8880.8550.8380.874
Node Partner (Eager)0.6680.7830.8210.7520.7530.794
Node pair partner (Eager)0.7560.8100.8590.8020.8270.841

The following table compares the performance of the eager and lazy variants of EinsMatch (Node) and EinsMatch (Edge). Lazy multi-round updates consistently outperform eager multi-layer updates across all datasets, for both Node and Edge models.

ModelAIDSMutagFMFRMMMR
EinsM. (Node, Eager)0.7560.8100.8590.8020.8270.841
EinsM. (Node, Lazy)0.8250.8510.8880.8550.8380.874
EinsM. (Edge, Eager)0.7950.8050.8830.8120.8620.886
EinsM. (Edge, Eager)0.8470.8580.9020.8750.9020.902

We observe that our method, in the presence of the node pair partner interaction component, performs better and lazy update generally outperforms eager update.

applicability of the proposed approach to larger graph datasets.

In early experiments, we found relatively small size of query graphs to actually present more challenge to the learner. Given our distant supervision, there are a larger number of promising alignments between query and corpus graphs that may lead to the relevant supervision.

During the rebuttal, we performed experiments with graph pairs generated from the Cox2 dataset. Here, we considered 100 query graphs with V[25,35]|V|\in[25,35], and 500 corpus graphs with V[40,50]|V|\in[40,50]. The following table shows the results.

ModelsTest mAP score on Cox2
NeuroMatch0.5036
EGSC0.4727
GREED0.1874
GEN0.6590
GMN0.6818
IsoNet (Node)0.7641
IsoNet (Edge)0.7356
EinsM. (Node) Lazy0.7291
EinsM. (Node) Eager0.7393
EinsM. (Edge) Lazy0.8302
EinsM. (Edge) Eager0.7949

We observe that our method EinsMatch (Edge) Lazy outperforms the baselines significantly. The node variant of our model is marginally outperformed by IsoNet (Node).

motivation and intuition behind lazy updates?

Please refer to the global response.

What do the multiple points with the same color in Figure 6 represent?

We vary rounds TT and layers KK. Each point corresponds to a (T,K)(T,K) combination. The color, as the legend shows, corresponds to an algorithm family. The goal is to study the tradeoff between inference speed and ranking accuracy (MAP).

评论

Thanks for your response, which addressed my questions. I will maintain current score.

审稿意见
6

The paper proposes EinsMatch, a neural approach for graph retrieval, i.e. the problem of finding the best candidates of a corpus of graphs that contain a subgraph isomorphic to a given query graph. EinsMatch is based on a Graph Neural Network architecture and introduces technical improvements over existing methods: (i) message-passing iterations across query and candidate graphs which are guided by an injective alignment; (ii) the iterations are lazy in the sense that multiple of them are computed with a fixed alignment before refining it; (iii) node pairs are considered as potential partners instead of single nodes in the message-passing steps. In addition, the authors propose a variant which works at the level of edges and natively leverages edge matchings. This variant seems to be the most competitive one in the experiments the authors run. In general, EinsMatch significantly outperforms a variety of other methods. The authors also report insightful ablation studies on the number of alignment rounds and message passing layers.

优点

  • The proposed architecture nicely interpolates between late and early aggregation approaches, whilst ensuring injectiveness of the candidate mapping. In this sense, it appears as a sensible generalisation of other approaches.

  • The proposed model is highly performant if compared with other methods, and, in particular offers a competitive accuracy-inference time tradeoff if compared with other approaches as GMNs.

  • The reported ablation studies are insightful in that they shed light on the impact of important hyperparameters, namely the number of layers and rounds.

缺点

  • I found the paper somewhat hard to follow. The contributions are rather technical and only one single figure in the manuscript illustrate the approach. I think it would be particularly helpful if the authors improved on this aspect.

  • Still on the presentation side, I found it hard to clearly understand the specific contributions w.r.t. previous works. I believe the paper would increase in quality if the authors would state these more clearly in the main corpus.

  • The authors only provide a high-level intuition behind some specific technical contributions, e.g. the choice of working with node-pair partners. Would it be possible to refer to more formal arguments around those?

问题

Please see Weaknesses. Also:

  • The readers refer oversmoothing problems in some of the previous methods. Can they expand on how their approach side-steps this?

  • Can the authors better explain lines 333 — 336? I found them hard to understand.

  • How does the computational complexity compare with that of IsoNet?

局限性

The authors do not seem to explicitly discuss them in the main corpus, but mention two aspects in Appendix A.

作者回复

We thank the reviewer for their detailed and insightful review.

paper hard to follow...only one single figure

In the global rebuttal PDF, we include another diagram (Figure A) to distinguish between node-pair interaction (earlier work) vs node-pair partner interaction, and also contrast them with IsoNet. If our paper gets accepted, we will include this diagram in the paper.

contributions w.r.t. previous works.

Early interaction models are conventionally more powerful in other domains [a,b], yet in the context of subgraph retrieval, GMN (early interaction) is outperformed by IsoNet (late interaction).

[a] A. Lai and J. Hockenmaier. Learning to predict denotational probabilities for modeling entailment.

[b] Entailment as Few-Shot Learner. S Wang, H Fang, M Khabsa, H Mao, Hao Ma. arXiv:2104.14690v1

In this work, we build a new early interaction model, which addresses the limitations of prior work, also using lessons from IsoNet and GMN. Here, we further elaborate on our contributions.

  1. GMN does not use any explicit alignment structure. They use a non-injective attention mechanism, which changes from layer to layer. In our work, we model inter-graph alignments as relaxed bijective permutation maps that are proposed, then KK layers of GNN score the proposal, leading to a refined proposal. Thus, the node embeddings of one graph become dependent on the paired graph, implementing early interaction. To the best of our knowledge, this is the first early interaction model for subgraph matching where proposed alignments and GNN layers help refine each other.
  2. When we compute the message of node-pair (u,v)(u,v) in GqG _q, we also use (u,v)(u',v') in GcG _c which match with (u,v)(u,v). Therefore, for uGqu\in G _q, the node embedding hu(q)\mathbf{h} ^{(q)} _u not only depends on nodes in GcG _c that align with uu, but also on those nodes in GcG _c, which are aligned with the Nbr(u)\text{Nbr}(u) neighbors of uu. In terms of a formal argument, we have: hu(q)=f(hu(q),hv(q):vNbr(u),hv(c):vGc aligns with some vNbr(u))   (1)\mathbf{h} ^{(q)} _u = f(\mathbf{h} ^{(q)} _u, \\{ \mathbf{h} ^{(q)} _v: v\in \text{Nbr}(u) \\}, \\{\mathbf{h} ^{(c)} _ {v'} : v' \in G _c \text{ aligns with some } v \in \text{Nbr}(u)\\})\ \ \ (1) Existing early interaction models like GMN do not capture signals from vGc aligns with some vNbr(u)\\{v' \in G _c \text{ aligns with some } v \in \text{Nbr}(u)\\}, they only capture signal from uGcu' \in G _c that matches with uu. Hence, in GMN, we have: hu(q)=f(hu(q),hv(q):vNbr(u),hu(c):uGc aligns with u)   (2)\mathbf{h} ^{(q)} _u = f(\mathbf{h} ^{(q)} _u, \\{\mathbf{h} ^{(q)} _v: v\in \text{Nbr}(u)\\}, \\{\mathbf{h} ^{(c)} _ {u'} : u' \in G _c \text{ aligns with } u\\})\ \ \ (2) As our experiments suggest, approach (1) gives better performance than (2). We provide a more detailed description of this in lines 146-161 in the main paper.
  3. We propose two protocols for updating node embeddings and alignments (Lazy and Eager). In Lazy update, we first perform KK GNN runs followed by then one single alignment update and in Eager, we update the alignment in each of the KK message passing steps.

We will include these elaborations in the additional page, if our paper gets accepted.

how their approach side-steps oversmoothing? explain lines 333 — 336.

The usually symmetric and parameter-free aggregation of messages from a node’s neighbors is known to weaken graph representations computed by GNNs [c]. L334 points out that scaling h(u)h(u’) with a relaxed (evolving) permutation P[u,u]P[u,u’] breaks the symmetry and conjectures that this reduces the oversmoothing effect, through task-driven cross-attention between two graphs (as against self-attention in GAT [d]).

[c] A Survey on Oversmoothing in Graph Neural Networks; T. K. Rusch, M. M. Bronstein, S Mishra.

[d] Graph attention networks, P Veličković, G Cucurull, A Casanova, A Romero, P Liò, Y Bengio.

How does the computational complexity compare with that of IsoNet?

Let's consider three node models — IsoNet (Node), Eager EinsMatch (Node) and Lazy EinsMatch (Node) with TT rounds, each with KK propagation steps.

IsoNet (Node)

Initial node featurization takes O(N)O(N) time. Node representation computation aggregates node embeddings over all neighbors, leading to complexity of O(E)O(E) for each message passing step and computation of PP takes O(N2)O(N^2) time. The overall complexity for all the steps becomes O(N2+KE)O(N^2+KE) where KK is the number of message passing steps.

Eager Multi-layer EinsMatch (Node)

  1. Initial Node featurization takes O(N)O(N) time
  2. EinsMatch first computes intermediate embeddings zz (Eq. 33, Page 16), which admits complexity of O(N)O(N) for each layer per node, since it computes uVchk(c)(u)Pk[u,u]\sum _{u'\in V _c} h^{(c)} _{k}(u') **P** _{k}[u,u'] for each node in each layer. The total complexity for KK steps is O(KN2)O(KN^2).
  3. Next we compute (Eq. 34) hk+1h _{k+1} which gathers messages zz from all neighbors per node per layer. The total complexity contributed by this equation is O(KE)O(KE).
  4. Next we update PkP _k for each layer which has complexity of O(KN2)O(KN^2).

Hence, the total complexity is O(KN2+KE+KN2)=O(KN2)O(KN^2+KE+KN^2)=O(KN^2).

Lazy Multi-round EinsMatch (Node)

Here, the key difference from Eager version is that the doubly stochastic matrix PtP _t from round tt is used to compute zz and the KK-step-GNN runs in TT rounds. This increases the complexity of step 2 and 3 to O(KTN2+KTE)O(KTN^2+KTE). Matrix PtP _t is updated a total of TT times, which changes the complexity of step 4 to O(TN2)O(TN^2). Hence, the total complexity is O(KTN2+TN2+KTE)=O(KTN2)O(KTN^2+TN^2+KTE) = O(KTN^2).

Hence, complexity of IsoNet is O(N2+KE)O(N^2+KE). EinsMatch-Eager has complexity O(KN2)O(KN^2) and EinsMatch-Lazy has complexity O(KTN2)O(KTN^2). However, this increased complexity comes with the benefit of significant acuracy boost, as our experiments suggest.

评论

I acknowledge reading the authors' rebuttal, which I indeed found very helpful in better appreciating their contributions.

I strongly believe that the additional figures they proposed, the point-by-point discussion of contributions wrt previous works, as well as the complexity analyses should be included in the next revision of the manuscript.

I also appreciate the clarification on oversmoothing. I believe that, should the authors have space, they could also make these passages less cryptic in the paper. After all, they are mostly (interesting) conjectures.

作者回复

We are thankful to all the reviewers for taking out time to provide us with detailed reviews. Through this rebuttal, we aim to address questions that came up across multiple reviews. Figures have also been shared through the rebuttal PDF attached herewith.

Why is Lazy update better? (Reviewers 1Kfh, 65Ze)

In L125-142 and L164-L167, we provide justification for lazy updates. The underlying optimal alignment between the graphs remains the same for each of KK steps (layers) of message passing of GNN. Hence, when GNN performs its operation, the alignment proposal (matrix PP) should be fixed. In eager update, this matrix keeps changing across the message passing steps, whereas in lazy update, PP is updated (and improved) only after KK layers of GNN, which enhances the inductive bias.

Elaborating, as we discussed in L125-142, we start with a combinatorial formulation and its solution. Define,

\sum _{u\in [n],v\in[n]} [\big(A _q-P A _c P^{\top}\big) _+] [u,v]$$ This cost is minimized through Gromov Wasserstein (GW) approach, where P is updated as $$ P _{t} \leftarrow \text{argmin} _{P} \textrm{Trace}\left(P^T\nabla _{P}\ \text{cost}(P;A _q,A _c) \big| _{P = P _{t-1}}\right) + \tau \sum _{u,v}P[u,v] \cdot \log P[u,v] .

One can view each GW update step as a lazy update step. In the above, Pt1P _{t-1} is used to compute the cost and in our method, we use it to compute the GNN embeddings and effectively those embeddings can approximate the Pt1P _{t-1} dependent cost. Then, PtP _t is updated by solving an entropy-regularized OT problem and we update PtP _{t} using Sinkhorn iterations, which effectively optimizes entropy-regularized GNN guided cost.

最终决定

The paper proposes a GNN-based neural architecture for graph retrieval, i.e., finding the best candidates for a set of graphs containing a subgraph isomorphic to a given query graph. The architecture proposes several innovations, i.e., message-passing iterations are guided by an injective alignment; a lazy update technique is used in each round by computing with a fixed alignment before refining it; and pairs of vertices are considered instead of single vertices in the message-passing steps. Moreover, the authors propose an extension to edge-level matching. The paper is complemented with a rather extensive empirical study, suggesting that the proposed architecture outperforms competing approaches.

After an extensive rebuttal by the authors, all reviewers vote to accept the present paper. I follow this assessment.