PaperHub
7.0
/10
Poster3 位审稿人
最低6最高8标准差0.8
7
6
8
1.7
置信度
正确性3.0
贡献度3.0
表达3.0
NeurIPS 2024

Improving Temporal Link Prediction via Temporal Walk Matrix Projection

OpenReviewPDF
提交: 2024-04-30更新: 2024-11-06
TL;DR

This paper unifies existing pairwise information injection methods for temporal link prediction into a function of temporal walk matrices and introduces an efficient method for maintaining temporal walk matrices

摘要

Temporal link prediction, aiming at predicting future interactions among entities based on historical interactions, is crucial for a series of real-world applications. Although previous methods have demonstrated the importance of relative encodings for effective temporal link prediction, computational efficiency remains a major concern in constructing these encodings. Moreover, existing relative encodings are usually constructed based on structural connectivity, where temporal information is seldom considered. To address the aforementioned issues, we first analyze existing relative encodings and unify them as a function of temporal walk matrices. This unification establishes a connection between relative encodings and temporal walk matrices, providing a more principled way for analyzing and designing relative encodings. Based on this analysis, we propose a new temporal graph neural network called TPNet, which introduces a temporal walk matrix that incorporates the time decay effect to simultaneously consider both temporal and structural information. Moreover, TPNet designs a random feature propagation mechanism with theoretical guarantees to implicitly maintain the temporal walk matrices, which improves the computation and storage efficiency. Experimental results on 13 benchmark datasets verify the effectiveness and efficiency of TPNet, where TPNet outperforms other baselines on most datasets and achieves a maximum speedup of $33.3 \times$ compared to the SOTA baseline.
关键词
Temporal Link PredictionDynamic Graph LearningGraph Neural Network

评审与讨论

审稿意见
7

The paper introduces a framework for analysis of relative encodings as a function of random walk matrices, and a new model for temporal link prediction. The new model offers SOTA performance on multiple link prediction datasets, and achieves this performance more efficiently than current best models. The design is of the model is explained in detail.

优点

This is an overall good paper which proposes and evaluates a new SOTA approach for temporal link prediction. It also introduces an original framework for unifying a range of existing methods.

  1. The provided code is sufficiently commented, and detailed instructions for setup are provided, along with utility functions to start the training process.
  2. Extensive evaluation has been done of the proposed model, and SOTA performance across a range of datasets has been demonstrated.
  3. Current methods have been posed in the newly proposed framework, which is an interesting analytical contribution.

Clearly presented and significant contribution to the field of temporal link prediction.

缺点

  1. The limitations of the approach are only briefly discussed in the appendix.
  2. Although the code is well presented, it did not run out of the box for me. Some effort was required to install additional dependencies not listed in the requirements.txt, and to fix a runtime error in the sampling algorithm of the DataLoader.

Minor points / text linting suggestions:

Line 81: What is a "unified formation", did you mean "formulation"?

Line 94: Style "all the interactions happen" -> "all the interactions that happen"

Line 128: Doesn't A^(i) aggregate the i-step temporal walks, instead of k-step? Typo?

Line 187: "The value of score function.." -> "The value of the score function".

Line 202: What is d? The node degree or dimensionality? Please define. Also in line 2, Algorithm 1.

Line 203: Typo enumerating the H vectors. The index 1 is repeated, whereas it should be followed by 2.

Line 205: Style "Pseudocode code" -> "Pseudocode"

Line 211: Having the integer "i" in the exponent of e could be misleading for a reader who is skimming to get an overview of the paper. This letter is usually reserved as the imaginary unit when exponentiating e, so ideally pick another letter as the index.

Line 261: "and obtained" -> "and obtain"

问题

  1. Line 265: Why does ReLU reduce estimation error? Is this an empirical result, or an architectural consideration specific to your data? Please clarify.

  2. Line 43: The second argument sounds a bit vague. Could specific examples be mentioned for methods that use and don't use temporal information in constructing their embeddings?

  3. Minor point on code: distutils has been removed in python 3.12 and above. To run the code I've had to install the following dependencies in addition to the ones listed: setuptools, numba, sklearn. I've had to fix the following issue.

set(random.sample(test_node_set, int(0.1 * num_total_unique_node_ids)))

Had to be changed to:

set(random.sample(sorted(test_node_set), int(0.1 * num_total_unique_node_ids)))

This happened installing your code in a new python 3.12 environment.

局限性

The limitations are briefly discussed in the appendix.

作者回复

Thank you for your valuable comments. In response, we have clarified how ReLU can reduce estimation error and discussed the utilization of temporal information in existing methods. We have also revised the paper and code according to your suggestions. We hope this addresses your concerns and are happy to answer any further questions you may have.

Q1 Line 265: Why does ReLU reduce estimation error? Is this an empirical result, or an architectural consideration specific to your data? Please clarify.

It is an architectural consideration specific to our construction of temporal walk matrices. Recall that our score function s(cdot)s(\\cdot) for a temporal walk W=[(w_0,t_0),(w_1,t_1),cdots,(w_k,t_k)]W=[(w\_0,t\_0),(w\_1,t\_1),\\cdots,(w\_k,t\_k)] is s(W)=prod_i=1keλ(tt_i)s(W)=\\prod\_{i=1}^k e^{-\lambda(t-t\_i)}, which is always larger than zero. Therefore, each element of the corresponding temporal walks matrices A_u,v(k)(t)=sum_WinmathcalM_u, vks(W)A\_{u,v}^{(k)}(t) = \\sum\_{W \\in \\mathcal{M}\_{u,~v}^k} s(W) should be non-negative. From theorem 2, we know that the inner product of the node representations is approximately equal to the inner product of rows from temporal walk matrices, which is langlebarboldsymbolH_u(i)(t),barboldsymbolH_v(j)(t)rangleapproxlangleboldsymbolA_u(i)(t),boldsymbolA_v(j)(t)rangle\\langle \\bar{\\boldsymbol H}\_u^{(i)}(t), \\bar{\\boldsymbol H}\_v^{(j)}(t) \\rangle \\approx \\langle \\boldsymbol A\_u^{(i)}(t),\\boldsymbol A\_v^{(j)}(t) \\rangle (appears in line 225). In line 265, each element of the raw pairwise feature tildeboldsymbolf_u,v \\tilde{\\boldsymbol f}\_{u ,v} is the inner product of node representations from uu and vv , which can be considered as an estimation of the inner product between the u-th and v-th rows of the temporal walk matrices. Since each element of temporal walk matrices is non-negative, the inner product of two rows from temporal matrices should also be non-negative. Thus we feed the raw pairwise feature into RELU to reduce estimation error. Specifically, consider the l-th element of raw pairwise feature tildef_u,v[l] \\tilde f\_{u,v}[l] and assume it is the inner product of barboldsymbolH_u(i)(t)\\bar{\\boldsymbol H}\_u^{(i)}(t) and barboldsymbolH_v(j)(t)\\bar{\\boldsymbol H}\_v^{(j)}(t), which can be considered as an estimation of the inner product of boldsymbolA_u(i)(t)\\boldsymbol A\_u^{(i)}(t) and boldsymbolA_v(j)(t)\\boldsymbol A\_v^{(j)}(t). Then if tildef_u,v[l]\\tilde f\_{u,v}[l] < 0, using cdot|\\cdot| to denote the absolute vlaue , we will have leftlangleboldsymbolA_u(i)(t),boldsymbolA_v(j)(t)rangletildef_u,v[l]right>leftlangleboldsymbolA_u(i)(t),boldsymbolA_v(j)(t)rangle0right\\left |\\langle \\boldsymbol A\_u^{(i)}(t),\\boldsymbol A\_v^{(j)}(t) \\rangle - \\tilde f\_{u,v}[l]\\right| >\\left |\\langle \\boldsymbol A\_u^{(i)}(t),\\boldsymbol A\_v^{(j)}(t) \\rangle - 0\\right | since langleboldsymbolA_u(i)(t),boldsymbolA_v(j)(t)ranglegeq0\\langle \\boldsymbol A\_u^{(i)}(t),\\boldsymbol A\_v^{(j)}(t) \\rangle \\geq 0 . Therefore, setting the negative value to zero by feeding tildeboldsymbolf_u,v\\tilde{\\boldsymbol f}\_{u ,v} into RELU will have a lower estimation error than using the original value.

Q2 Line 43: The second argument sounds a bit vague. Could specific examples be mentioned for methods that use and don't use temporal information in constructing their embeddings?

Sure. We list the score function s()s(\cdot) of methods that are analyzed in Section 2.2.2 in the following table, where W=[(w0,t0),..,(wk,tk)]W=[(w_0,t_0),..,(w_k,t_k)] indicates a temporal walk and Zi=sum_(w,w),tinmathcalE_w_ i,  t_iexp(α(t_it))Z_i= \\sum\_{\\{(w',w),t'\\} \\in \\mathcal E\_{w\_{~i},~~t\_i}} \\exp(-\alpha (t\_i-t')) indicates the normalize term of CAWN.

DyGFormerPINTNATCAWN
s(W)=1s(W)=1s(W)=1s(W)=1s(W)=1s(W)=1s(W)=prod_i=0k1fracexp(α(t_it_i+1))Zis(W)=\\prod\_{i=0}^{k-1}\\frac{\\exp(-\alpha(t\_i-t\_{i+1}))}{Z_i}

As shown in the above table, only CAWN considers the temporal information carried by a temporal walk. The score functions of the other three methods always yield one, so the element of their temporal walk matrices merely counts the number of temporal walks, ignoring the temporal information. For more discussion about existing methods, please refer to the conclusion starting from line 163 of the paper.

Q3 & W2: Although the code is well presented, it did not run out of the box for me. Some effort was required to install additional dependencies not listed in the requirements.txt, and to fix a runtime error in the sampling algorithm of the DataLoader.

Sorry for the incomplete information regarding the required environment. The Python version used in our experiments is Python 3.9. We will add this information to the requirements.txt file and review our dependencies to ensure that no necessary packages have been omitted. Thanks for your efforts to adapt our code to the latest environment and we will revise our code according to your suggestion.

W1 The limitations of the approach are only briefly discussed in the appendix.

We will move the limitation section into the main body of our paper and expand the discussion.

W3 Minor points / text linting suggestions

We have revised the paper according to your suggestions for lines 81, 94, 187, 203, 205, 211, and 261.

For line 128: Yes, it is a typo, and we will correct it.

For line 202: dd indicates the dimensionality, and we will replace it with dRd_R.

评论

Thank you for your response and clarifications.

As it stands, I have no more outstanding concerns, and am happy to give this work an Accept.

评论

Thanks again! We sincerely appreciate your timely reply and support for our work.

审稿意见
6

Based on the analysis of traditional methods, this paper proposes a unified framework for relative encoding and introduces a new temporal neural network, TPNet. This model not only addresses the high time complexity issues of traditional methods but also enhances relative encoding by incorporating factors such as time decay effects.

优点

1、The authors propose a unified framework for relative encoding, treating them as a function of temporal random walk matrices.

2、The temporal random walk matrix not only takes into account temporal and structural information but also incorporates the effects of time decay.

3、To reduce computational complexity, the authors proposed an approximation scheme for the temporal random walk matrix, detailed in Algorithm 1.

4、The authors conducted thorough axiomatic proofs, providing solid theoretical support for the effectiveness of the TPNet network

缺点

1、If the authors could provide a comparison of memory usage, it would make the model more convincing.

2、This paper does not provide an analysis of the impact of different functions g on relative encoding.

3、The author did not compare with the latest models (published in AAAI 2024 or WWW 2024) in Table 1.

问题

please see weaknesses

局限性

n/a

作者回复

Thanks for your helpful reviews. In response, we have reported the memory usage of different methods, discussed the construction of our decoding function g()g(\cdot), and compared TPNet with a new temporal link prediction method. We hope this addresses your concerns and are happy to answer any further questions you may have.

W1. If the authors could provide a comparison of memory usage, it would make the model more convincing.

We report the peak GPU memory usage in the following table, measured in MB. The last line of the table reports the relative memory usage, calculated by dividing each method's memory usage by TCL's (i.e., method with the smallest average memory usage) and then averaging across datasets.

DatasetTCLJODIEDyRepTGNTPNet (ours)NATTGATGraphMixerDyGFormerCAWNPINT
Wikipedia1712472532712994305707142796332278
Reddit510838839898663126590910496459723381
MOOC33644645946447083573587767312791405
LastFM9111022103810391044196013091450164328181113
Enron145113136138254361542684483606862
Social Evo.14391406143014311504293818381980154723822154
UCI102901111112052245026432101045890
Flights13352190219222591503298917331875167322775532
Can. Parl.11063858621123950965131502017818
US Legis.101628586201297500642440563810
UN Trade39437639940051481179393573313371124
UN Vote7417357597608701592114012829261684774
Contact16531791181318141794342820522195176125962544
Rel. Memory1.001.081.161.181.462.312.643.233.974.735.12

As shown in the above table, TPNet has lower relative memory usage than other link-wise methods (i.e., DyGFormer, NAT, CWAN, and PINT), showing its efficient memory usage. Additionally, on average, TPNet’s memory usage is 1.46 times that of the most memory-efficient method (i.e., TCL). Considering TPNet's SOTA performance and superior computational efficiency, this level of memory usage is satisfactory.

W2. This paper does not provide an analysis of the impact of different functions g on relative encoding.

The function g()g(\cdot) introduced in Equation (3) can be considered a decoding function that extracts pairwise information from the constructed temporal walk matrices. Unlike existing methods that use predefined g()g(\cdot), we design it as a learnable function. Specifically, we feed the raw pairwise feature into an MLP ( line 265 of the paper), which serves as our decoding function g()g(\cdot). By optimizing the learnable parameters in the MLP, we can adaptively determine the most suitable function g()g(\cdot) for different graphs, better modeling graph dynamics.

W3. The author did not compare with the latest models (published in AAAI 2024 or WWW 2024) in Table 1

After reviewing the papers published in AAAI 2024 and WWW 2024, we did not find any that were closely related to our work. However, we found an ICLR 2024 paper that proposes a new temporal link prediction model called FreeDyG [1], which adopts a Fourier Transformer to enhance the node representation learning process. We report AP under random negative sampling for performance comparison.

WikipediaRedditMOOCLastFMEnronSocial Evo.UCI
Transductive SettingFreeDyG99.26 ± 0.0199.48 ± 0.0189.61 ± 0.1992.15 ± 0.1692.51 ± 0.0594.91 ± 0.0196.28 ± 0.11
TPNet99.32 ± 0.0399.27 ± 0.0096.39 ± 0.0994.50 ± 0.0892.90 ± 0.1794.73 ± 0.0297.35 ± 0.04
Inductive SettingFreeDyG98.97 ± 0.0198.91 ± 0.0187.75 ± 0.6294.89 ± 0.0189.69 ± 0.1794.76 ± 0.0594.85 ± 0.10
TPNet98.91 ± 0.0198.86 ± 0.0195.07 ± 0.2695.36 ± 0.1190.34 ± 0.2893.24 ± 0.0795.74 ± 0.05

As shown in the above table, TPNet significantly outperforms FreeDyG on MOOC, LastFM, and UCI, while exhibiting similar performance on other datasets, demonstrating its superiority. We also report the inference time (s/epoch) to compare efficiency.

WikipediaRedditMOOCLastFMEnronSocial Evo.UCI
FreeDyG12.3761.1132.19102.739.75163.944.75
TPNet2.5111.106.3822.711.9533.340.93
Speedup4.93×\times5.51 ×\times5.05 ×\times4.52 ×\times5.00 ×\times4.92 ×\times5.11 ×\times

As shown in the above table, TPNet is more computationally efficient than FreeDyG, achieving at least a 4.52 ×\times speedup.

[1] Tian, Yuxing, Yiyan Qi, and Fan Guo. "FreeDyG: Frequency Enhanced Continuous-Time Dynamic Graph Model for Link Prediction." The Twelfth International Conference on Learning Representations. 2024.

评论

Thanks the authors for the rebuttal, and I have no more concerns.

评论

Thank you for your reply. We are pleased to hear that you have no further concerns.

审稿意见
8

The article investigates the application of relative encodings in the task of link prediction over temporal networks. Initially, the authors formally unify previously used relative encodings, such as DyGFormer, PINT, NAT, and CAWN, within a unique framework. Subsequently, they propose a novel model, TPNet, and conduct comparative evaluations using several datasets and competitors.

优点

The provided framework offers a clearer and broader perspective on existing approaches based on temporal walks. The source code is well-commented and likely to be highly usable. The experimental setup investigates various competitors, datasets, and negative sampling techniques.

缺点

A fundamental concept in the article is "relative encoding." However, the intuition behind this concept is not introduced until line 119. I recommend presenting the meaning of "relative encoding" earlier in the article for better clarity and understanding.

问题

If I understand correctly, the authors introduce a weight decay mechanism to exponentially decrease the weight of older interactions. While this decision is well-motivated by previous studies, it is well known that the recurrence of interactions is fundamentally important in social networks (e.g., sociopatterns.org). I am curious whether the temporal periodicity of interactions is maintained with this approach. Could the authors discuss this aspect in more detail?

局限性

The authors discussed the limitations of the model in the appendix.

作者回复

Thanks for your insightful comments. In response, we have discussed the temporal periodicity modeling ability of our score function and moved the motivation of the relative encoding earlier according to your suggestion. We hope this addresses your concerns and are happy to answer any further questions you may have.

Q1. If I understand correctly, the authors introduce a weight decay mechanism to exponentially decrease the weight of older interactions. While this decision is well-motivated by previous studies, it is well known that the recurrence of interactions is fundamentally important in social networks (e.g., sociopatterns.org). I am curious whether the temporal periodicity of interactions is maintained with this approach. Could the authors discuss this aspect in more detail? .

Although the time decay score function (defined in line 187 of the paper) is not specifically designed for modeling temporal periodicity, it can still capture periodic patterns to some extent. To illustrate this, consider a simple example where two nodes, uu and vv, interact once every Δt\Delta t time interval. Then, we can divide the time into equal-size durations (0,Δt),(Δt,2Δt),...(0,\Delta t),(\Delta t,2\Delta t),..., and examine how the element of the temporal walk matrices changes within each duration. Specifically, assuming the current time t(nΔt,(n+1)Δt)t \in (n \Delta t, (n+1)\Delta t) with n>0n>0, the timestamps of the historical interactions between uu and vv will be [Δt,2Δt,...,nΔt][\Delta t, 2\Delta t,..., n\Delta t], and the element of the one-step temporal walk matrix Au,v(1)(t)A_{u,v}^{(1)}(t) will be Au,v(1)(t)=k=1nexp(λ(tkΔt))A_{u,v}^{(1)}(t) = \sum_{k=1}^n\exp(-\lambda (t-k\Delta t)) (i.e., sum the scores of historical interactions). Since λ>0\lambda >0, Au,v(1)(t)A_{u,v}^{(1)}(t) will decrease as tt increases within (nΔt,(n+1)Δt)(n\Delta t, (n+1)\Delta t) and the value of Au,v(1)(t)A_{u,v}^{(1)}(t) is within (k=1nexp(λkΔt),k=0n1exp(λkΔt))(\sum_{k=1}^n \exp(-\lambda k \Delta t),\sum_{k=0}^{n-1}\exp(-\lambda k \Delta t)) (by setting tt to (n+1)Δt(n+1)\Delta t and nΔtn\Delta t respectively). The maximum value k=0n1exp(λkΔt)\sum_{k=0}^{n-1}\exp(-\lambda k \Delta t) is at least 1 since exp(λ0Δt)=1\exp(-\lambda 0\Delta t) =1 and all other terms are nonnegative. For the minimum value k=1nexp(λkΔt)\sum_{k=1}^n \exp(-\lambda k \Delta t), if λ\lambda is large enough, it will be less than 1. For example, if exp(λΔt)<14\exp(-\lambda \Delta t) < \frac{1}{4}, we have k=1nexp(λkΔt)<k=1n14k<12\sum_{k=1}^n \exp(-\lambda k \Delta t) < \sum_{k=1}^n \frac{1}{4^k} < \frac{1}{2}. Based on the above analysis, Au,v(1)(t)A_{u,v}^{(1)}(t) exhibits the following behavior over time: going up to a value that is no less than 1 at the beginning of each duration and then decreasing to a value that is lower than 1 at the end of each duration, which acts like a periodic function. Then we can infer that uu and vv are likely to interact when Au,v(1)(t)A_{u,v}^{(1)}(t) is low and unlikely to interact when Au,v(1)(t)A_{u,v}^{(1)}(t) is high, capturing this periodic evolution pattern. We show a visual illustration in Figure 1 of the attached PDF file for better understanding.

W1. A fundamental concept in the article is "relative encoding." However, the intuition behind this concept is not introduced until line 119. I recommend presenting the meaning of "relative encoding" earlier in the article for better clarity and understanding.

We have added the motivation for relative encoding to the second section of the Introduction and improved the explanation of Figure 1 for better clarity and understanding. The revised part of the second section is as follows (changes highlighted in bold):

Relative encodings have become an indispensable module for effective temporal link prediction [6-9] where, without them, node representations computed independently by neighbor aggregation will fail to capture the pairwise information. As the toy example shown in Figure 1, A and F will have the same node representation due to sharing the same local structure. Thus it can not be determined whether D will interact with A or F at t3t_3 according to their representations. However, by assigning nodes relative encodings (i.e., additional node features, detailed in Section 2.2) specific to the target link before computing the node representation, we can highlight the importance of each node and guide the representation learning process to extract pairwise information specific to the predicted link from the subgraph. For example, in Figure 1, we can infer from the relative encoding of E (in red circle) that D is more likely to interact with F than with A since D and F share a common neighbor, E.

评论

Thank you for the detailed response. I’ve read it and you’ve clarified my doubt, so I’ve increased my score.

评论

Thank you for your timely reply! We sincerely appreciate your support for our work.

作者回复

We sincerely thank all reviewers for their time and valuable comments. We are pleased that the reviewers acknowledge the value of our work in providing a cohesive perspective on existing methods (Reviewers vApy, SMMz, YNEN), proposing a novel method (Reviewer vApy) with solid theoretical support (Reviewer SMMz), and conducting extensive experiments to verify the effectiveness and efficiency of the proposed method (Reviewers vApy, YNEN).

To the best of our efforts, we have provided thorough responses to address the issues raised by each reviewer, which mainly consist of:

  • Clarification on method design and analysis of existing methods
    • Discussion on the temporal periodicity modeling capability of the score function.
    • Discussion on the construction of the function g()g(\cdot).
    • Clarification of why ReLU can reduce estimation error.
    • Clarification on the use of temporal information in constructing temporal walk matrices.
  • Additional experimental results
    • Reporting memory usage of different methods.
    • Comparison with a new temporal link prediction method.
  • Reorganization of the paper.
    • Move the motivation of the relative encoding earlier.
    • Move the limitation section to the main body of the paper.
最终决定

The study explores the use of relative encodings for temporal link prediction. It first provides a unified view of relative encoding as a function of temporal walk matrices. Then it incorporates a time decay effect on this matrix, named TPNet. Finally, authors provide both theoretical and empirical analysis to demonstrate the effectiveness of the proposed approach.

All the reviewers are quite positive about this submission. Concerns raised by reviewers were addressed properly.

To sum up, it is a solid paper. I’d like to recommend its acceptance.