/10

Poster4 位审稿人

最低3最高3标准差0.0

ICML 2025

Rethinking Time Encoding via Learnable Transformation Functions

Xi Chen,Yateng Tang,Jiarong Xu,Jiawei Zhang,Siwei Zhang,Sijia Peng,Xuehao Zheng,Yun Xiong

OpenReview PDF

提交: 2025-01-09更新: 2025-07-24

TL;DR

Generalizing time encoding in diverse domains through learnable transformation functions

摘要

关键词

Time EncodingDynamic GraphTime SeriesDeep Function Learning

评审与讨论

审稿意见

评分: 32025-03-13

The paper proposes a time encoding method that can work as a plug-and-play functionality to capture diverse patterns in the real world. The method is motivated by the observation that the existing time encoding approaches struggle to capture non-periodic and mixed patterns. To capture such complex patterns, the paper proposes to transform timestamps to representations via the combination of Fourier series and Spline functions. By their inherent inductive biases, the Fourier series is more capable of capturing periodic patterns whereas the Spline functions more excel in modeling non-periodic patterns. The paper assesses the effectiveness of the proposed method by replacing the time encoding modules with the proposed one in various time series tasks. The experimental results confirm the superiority of the proposed model.

update after rebuttal

The authors have partially addressed my concerns, and I am now leaning toward acceptance. I suggest incorporating the newly conducted experiments and the comparison with prior work (TIDER) in the revised version to better highlight the novelty. Additionally, the current experiments only focus on methods modeling temporal patterns, assessing LeTE’s efficacy in time series forecasting would be more comprehensive by also comparing it with approaches designed to capture spatial correlations, such as iTransformer [2] and Sumba [3].

[2] iTransformer: Inverted transformers are effective for time series forecasting, ICLR 2024.

[3] Structured matrix basis for multivariate time series forecasting with interpretable dynamics, NeurIPS 2024.

给作者的问题

Please refer to the questions listed above regarding hyperparameter sensitivity, reproducibility, and novelty clarification.

论据与证据

Most of the claims are supported by experiments. In the introduction, the paper argues that the existing time encoding methods cannot capture the complex patterns caused by holidays, but there is no evidence to verify whether the proposed method can capture such patterns. It is better to provide experiments to show this claim.

方法与评估标准

The proposed methods and evaluation criteria seem reasonable to me.

理论论述

The proofs seem correct.

实验设计与分析

In the method part, the authors suggest setting the hyperparameter $p$ to 0.5, but in the experiments, they adjust it across different datasets and methods. The reviewer is curious whether the efficacy of the proposed method is sensitive to this hyperparameter. Because it will significantly limit its applicability if the performance largely hinges on a careful tuning of this hyperparameter in practice. The paper should present the results in Table 1 by setting $p$ to its default value of $0.5$ .

补充材料

I checked the appendix and the code repository. The code files in the repository are invalid, and the page shows that "The requested file is not found" when clicking the code files.

与现有文献的关系

The proposed method can work as a plug-and-play module for various time series modeling approaches.

遗漏的重要参考文献

The idea of adopting Fourier series to learn representations for time series has been explored in [1]. The paper should discuss its connection and distinction from the prior work to clarify its novelty.

[1] Multivariate Time-series Imputation with Disentangled Temporal Representations, ICLR 2023.

其他优缺点

Strengths:

The proposed method is invariant to time rescaling.
The paper is well-written and easy to follow.

Weaknesses:

It is not clear if the performance gain is sensitive to the hyperparameter $p$ .
The provided code repository is invalid, and this raises a concern regarding reproducibility.
The novelty of the proposed method should be further clarified by comparing it with prior work [1].

[1] Multivariate Time-series Imputation with Disentangled Temporal Representations, ICLR 2023.

其他意见或建议

The paper should present the results in Table 1 by setting $p$ to its default value of $0.5$ .
It is better to provide the experiments to support the claim of capturing the complex patterns caused by holidays.
Please update the code repository to eliminate the concern regarding reproducibility.

作者回复

2025-03-28

Sensitivity to the hyperparameter

Actually, the performance gain can be influenced by the hyperparameter $p$ . As we analyze in the experiments on dynamic graphs (please refer to Appendix G.2, “Comparative Analysis of Different Variants of LeTE,” and Table 9 for details), the performance remains robust across different values of $p$ . Specifically, as shown in Table 9, regardless of whether $p$ is set to 0, 0.5, or 1, our method almost consistently outperforms the baselines. However, the downstream results do affected by the $p$ .

For the time series forecasting experiments, we actually also choose $p$ only from the set {0, 0.5, 1}, so there is no concern regarding careful tuning of this hyperparameter in practice. We have also uploaded a result table where $p$ is set to 0.5 (Tab. 1 in the new anony. repo., especially for the rebuttal). As seen in the newly uploaded tables, our method still achieves high win rates for both MSE and MAE. Intuitively, slight tuning of $p$ within the set {0, 0.5, 1} can lead to even better results.

Code repository

We have updated the original anony. repo., we hope it works now. We also newly added the running logs and the requirements file to the repository. If you still facing the "The requested file is not found" problem, you may try to download the repository and check it. Alternatively, you may also download our code from the openreview Supplementary Material, we also uploaded a copy of the same code when we submitted our manuscript.

Comparison with [1]

Thank you for pointing this out. We acknowledge that [1] (TIDER) also incorporates Fourier series to model temporal patterns in multivariate time-series data. However, our work differs from TIDER in terms of motivation, model design, and application scope.

Motivation: TIDER employs Fourier series to model the seasonal component of time-series data as part of a decomposed temporal structure (trend + seasonality + local bias) within a matrix factorization framework, specifically for the task of missing value imputation.

In contrast, LeTE is a task-agnostic, general-purpose time encoding module, designed to replace previous temporal encodings with fully learnable transformations. Our goal is to provide a unified and flexible time encoding mechanism that can be applied across different temporal modeling scenarios.

Model Design: TIDER is specialized for a specific task and evaluated exclusively on imputation. Its architecture is tightly coupled with that objective. In TIDER, Fourier series are used internally to model a latent factor $V_s$ , and only for capturing periodicity, within a low-rank matrix factorization design.

In LeTE, Fourier-based functions are used to encode timestamps into embeddings, which are then used across multiple downstream tasks. The encoding serves as a plug-and-play module and is jointly trained with downstream objectives.

Application scope: Our experiments cover time-series, dynamic graph, event-based tasks, and real-world applications, demonstrating that LeTE is not only expressive but also highly adaptable across domains.

Moreover, LeTE introduces a unified framework for time encoding via deep function learning, encompassing not only Fourier-based functions but also spline-based functions, and even hybrid combinations (Combined LeTE).

In summary, while both works utilize Fourier series, TIDER focuses on a specific modeling component for a single task, whereas LeTE introduces a general-purpose, extensible, and theoretically grounded time encoding framework that supports a wide range of tasks in temporal modeling.

Holidays

From the perspective of a long time window, holidays can be regarded as a type of periodic pattern. As we demonstrate that our method can effectively capturing periodic patterns, the patterns caused by holidays can also be captured. Please also refer to Appendix G.4: Capturing Periodic, Non-Periodic, and Mixed Patterns in Data, and Fig. 11 and 12 for more details.

As a brief recap, we demonstrate that our method enables models to capture periodic, non-periodic, and mixed time patterns. Specifically, the holiday-related periodicity can be simulated by the low-frequency signal shown in Fig. 11 (synthetic periodic data).

We kindly invite you to review our responses to Reviewer 1gph and Reviewer ozTs, where we illustrate the interpretability of our proposed method and explain how it effectively captures different temporal patterns.

审稿意见

评分: 32025-03-13

This paper proposes a learnable time representation framework—referred to as LeTE—that aims to improve upon prior time encoding methods which rely on fixed or narrow inductive biases (e.g., purely sinusoidal functions). The authors introduce two learnable approaches for modeling time: one based on Fourier series expansions and another on B-splines. Additionally, they propose a combined version that leverages both. By making the transformation functions learnable, the approach can, in principle, encompass various existing time encodings (like Time2Vec) as special cases. The authors further claim invariance to time rescaling and better interpretability over prior methods.

给作者的问题

Nature of the Learned Curve From Figure 2, it looks like the time embedding is effectively learning some function of t through various basis functions. Beyond the direct representation of time, do you see these learned curves capturing other hidden phenomena (e.g., seasonalities, abrupt events)? How should readers interpret or use these curves in practice?

Role of the Scaling Factor [s] You introduce a learnable scaling factor after the layer normalization in each dimension. Given that you already learn coefficients within the B-spline or Fourier expansions, can you clarify why this additional scaling is necessary?

Connection to Kolmogorov–Arnold Theorem The use of B-splines and references to function superposition remind me of current lines of work involving the Kolmogorov–Arnold Theorem (KAN). Could you elaborate on whether LeTE is conceptually related to KAN-based approaches?

论据与证据

Time Rescaling Invariance The authors assert that the proposed method is invariant to time rescaling (e.g., changes in units from days to hours). While they provide a theoretical argument, the empirical validation of this property is less explicit in their experiments.

Enhanced Interpretability The paper claims better interpretability over previous methods by allowing direct reconstruction of the learned transformation functions. However, the evidence provided (largely visualizations in the appendix) may not conclusively show that these representations are more interpretable than, for example, a straightforward sinusoidal basis. Additional qualitative or quantitative assessments would strengthen this claim.

方法与评估标准

Yes. The paper’s chosen methods and evaluation criteria appear well-matched to the goal of improving time encodings and testing them in realistic contexts.

理论论述

I checked the correctness of their equations in the main text.

实验设计与分析

Overall, the experimental designs are broadly sound, using appropriate metrics and mainstream baselines for time-series and dynamic graph tasks.

补充材料

I checked the KAN section when I wanted to figure out their relationship, but their supplementary doesn't demonstrate that.

与现有文献的关系

Overall, the paper’s key contributions fit naturally into—and extend—existing streams of research on time-encoding and functional approximation within neural models, offering a flexible drop-in alternative that maintains or improves upon the advantages of earlier fixed-basis (sinusoidal) approaches.

遗漏的重要参考文献

Not to my knowledge.

其他优缺点

Strength

Comprehensive Literature Review The authors provide a clear, concise overview of earlier time encoding methods, illustrating how LeTE builds upon and generalizes them.

Well-Organized Composition The paper’s structure—with clear figures, tables, and references—helps the reader grasp the proposed approach and its variants (Fourier-based, B-spline-based, combined).

Generalization to Prior Methods By demonstrating that prior time-encoding approaches (like Time2Vec) are particular cases of LeTE, the authors show strong potential for “plug-and-play” deployment. This could be attractive for practitioners seeking simple drop-in enhancements.

Weakness

Overstated Claims While the paper provides theoretical proofs for invariance and interpretability, the experimental demonstrations of these claims are not as strong. For instance, readers would benefit from direct empirical evidence or metrics that validate rescaling invariance.

Clarity on How Time Embeddings Are Used It would help to explain more concretely how these learned embeddings tie into the final predictions, and clarify why this learnable embedding is superior to the previous ones.

Interpretability Remains Nuanced Although LeTE can reconstruct learned transformations, “interpretability” is not necessarily obvious. Additional evidence (beyond raw visualizations) would make the case more convincing.

其他意见或建议

Please see weakness.

作者回复

2025-03-28

Rescaling

Thank you for the insightful comment. We agree that empirically demonstrating rescaling invariance is important.

To directly support this, we conducted an additional tiny experiment by applying the Combined LeTE to Wikipedia/TGN, using two different time input scales: t=t/60 (interpreted as minutes), and t=t/3600 (interpreted as hours), whereas previous experiments used Unix timestamps (in seconds).

As shown in the results of this experiment (please see the anony. repo. Rescaling), both versions achieve similar performance and outperform the baselines. Minor differences may be attributed to other factors in the training environment.

While we provide proofs of rescaling invariance in Appendix C.3, we would also like to clarify that our existing experimental settings indirectly verify this property:

In Sec. 4.2 (Time-series), we use absolute Unix timestamps, which are large-scale values.

In Sec. 4.3 (Dynamic Graph), we use relative time differences, typically much smaller in scale.

Despite the difference in time magnitude across these settings, LeTE consistently outperforms the baselines, illustrating strong robustness to changes in time scale. This further supports the claim that LeTE is inherently invariant to time rescaling, thanks to its learnable transformations.

Interpretability

Please refer to our responses to Reviewer 1gph and Reviewer ozTs. We made a detailed analysis of the learned curves and demonstrate how these curves can be interpreted in practice.

How TEs Are Used

In practice, the learned LeTE are used in one of two ways: Added to feature embeddings (in time series models): $\mathbf{x} = \text{TokenEncode}(x) + \text{LeTE}(t)$ ; or concated with node and edge features (in dynamic graphs): $\mathbf{x} = [\text{Node Features} | \text{Edge Features} | \text{LeTE}(t)]$ .

In this way, LeTE provides temporal signals to the model, enabling it to modulate attention weights or node interactions based on temporal context. By contrast, prior time encoding methods either: use hand-crafted encodings, or use fixed sine functions, limiting their ability to represent complex time patterns (periodicity, non-periodicity, and mixed).

Moreover, LeTE leverages learnable non-linear transformations, including both Fourier-based and spline-based, allowing it to learn time patterns directly from data in a flexible, data-driven manner and capture a richer time patterns. As shown in our experiments (Sec. 4), replacing prior TEs with LeTE consistently improves performance across a diverse set of downstream tasks, demonstrating that the learned embeddings are not only more expressive, but also more generalizable.

Scaling weight

The reason we added the learnable scaling weight $s_i$ after LayerNorm can be concluded as follows.

Actually, we first consider to add this learnable scaling weight is due to we conducted some experiments and compare that the performances of adding or without these scaling weights, we found that the experiments with the scaling leads to better performance.

Upon further reflection, we believe the performance gain can be attributed to the following reasons:

We apply LayerNorm to each dimension of the transformed signal to stabilize optimization and ensure comparable scales across dims. However, this normalization step removes the original scale information that might have been encoded by the learned function coefficients. The scaling factor $s_i$ reintroduces flexible amplitude control after normalization.

In practice, adding the scaling leads to slightly better performance, as it allows each dim to adjust its impact during learning. Without this factor, some dims may become under- or over-represented.

Relationship with KAN

You are correct that the use of B-splines in LeTE is conceptually connected to the ideas from the KAT.

The motivation behind LeTE arises from the limitation observed in existing time encoding methods: they typically employ fixed non-linear functions, which constrain their ability to model diverse time patterns. To overcome this, we adopt a deep function learning perspective, introducing the learnable transformations — either through Fourier series or B-spline. This enables LeTE to flexibly encode complex mixed time patterns.

While this design philosophy aligns with the spirit of KANs, there are some differences:

LeTE is designed specifically to address limitations in time encoding, serving as a lightweight, plug-in module for downstream tasks. In contrast, KANs are proposed as general network architectures.
LeTE includes not only spline-based functions but also introduces Fourier-based one, particularly suited for modeling periodic patterns. However, KAN primarily relies on splines.
KANs are typically layered architectures, while LeTE acts as a plug-and-play TE that maps time to vector embeddings, which are then fed into larger models.

审稿人评论

2025-04-03

Thank the authors for the comprehensive reply. Most of my concerns were addressed and I realized I had a misunderstanding on scaling invariance. I will adjust my rate accordingly.

作者评论

2025-04-04

We are sincerely grateful for your positive assessment and for raising your score! Thank you for your thoughtful recognition and constructive feedback. We will carefully revise our paper based on your suggestions to improve its clarity.

Once again, thank you for your valuable insights. Your detailed comments have been incredibly helpful to us!

审稿意见

评分: 32025-03-15

This paper proposes LeTE (Learnable Transformation-based Generalized Time Encoding), a flexible and learnable time encoding framework that generalizes existing methods (e.g., Time2Vec, Functional Time Encoding). By parameterizing nonlinear transformations via Fourier series and B-spline functions, LeTE provides a more expressive representation of temporal information, capable of modeling periodic, non-periodic, and mixed time patterns. Extensive experiments on time series forecasting, dynamic graph representation learning, and real-world applications demonstrate its superior performance and generalizability.

给作者的问题

N/A

论据与证据

Most of the key claims in the submission are well supported by empirical evidence and theoretical analysis.

Supported Claims:

LeTE is a generalization of existing time encoding methods (e.g., Time2Vec, FTE): This claim is backed by formal derivations and proofs (e.g., Proposition 3.1), showing how specific parameter settings in LeTE reduce to previous methods. The argument is mathematically sound and clearly presented.
LeTE is capable of capturing a wider range of time patterns (periodic, non-periodic, mixed) The authors support this through both the construction of learnable nonlinear transformations (Fourier and spline-based), and comprehensive empirical evaluations across tasks that exhibit different temporal dynamics. The wide range of tasks (forecasting, dynamic graphs, financial modeling) provides convincing evidence that LeTE handles complex patterns beyond those captured by fixed-function encodings.
LeTE achieves better performance with fewer dimensions (higher dimensional efficiency): This is substantiated via ablation experiments (Section 4.5, Figure 5–7), showing that LeTE with 2/8/16 dimensions can outperform traditional FTE with 100 dimensions—a strong empirical support for the efficiency claim.
LeTE is invariant to time rescaling: This is theoretically demonstrated (Proposition 3.2), similar to prior work on Time2Vec and FTE. The formulation and proof are clear and align with expectations for such encoding schemes.

Questionable Claims:

LeTE offers enhanced interpretability: While the authors argue that the learned functions are interpretable (due to their basis in Fourier/spline components), interpretability is only briefly mentioned and weakly demonstrated via some visualization (Appendix G.3). The paper could be strengthened by providing concrete examples of how the learned functions reflect real-world temporal patterns, perhaps via visualization or case studies.

方法与评估标准

Yes, both the proposed method and the evaluation protocol are appropriate and well-aligned with the problem of time representation learning in machine learning models.

理论论述

The theoretic claims are actually extremely trivial. No need to verify the correctness.

实验设计与分析

Yes, I have reviewed the experimental design and find it generally sound and reasonable. The authors evaluate their proposed time encoding method on a wide set of tasks that represent standard and widely accepted benchmarks in the field, including time series forecasting, dynamic graph learning, and a real-world classification application.

补充材料

Yes, I reviewed part of the appendix, mainly the G.3

与现有文献的关系

The paper’s contributions build meaningfully on a well-established line of research in time encoding for temporal machine learning tasks, such as time series analysis and temporal graph.

遗漏的重要参考文献

N/A

其他优缺点

N/A

其他意见或建议

N/A

作者回复

2025-03-28

Thank you for your valuable feedback and suggestions. As Reviewer 1gph also raised similar concerns, and due to space limitations, we have addressed some of these points in our response to Reviewer 1gph. Could you kindly review the first part of our reply there? Below is the continuation of our response following the reply to Reviewer 1gph.

Different Datasets We reconstruct and plot the non-linear functions for a 4-dim LeTE trained on MOOC/TGN (shown in Fig. 3, in anony. repo.). By comparing these results to those from the Wikipedia (Fig. 1), it can be seen that the dim 0 exhibit a lack of periodicity. From the reconstructed equations of dim 0, the higher-frequency terms are generally small. For instance, the coefficients of $\cos(5x')$ and $\sin(5x')$ are relatively small (e.g., -0.0134 and -0.0136), suggesting that their contribution is minimal and insufficient to generate significant fluctuations. As a result, the overall function primarily exhibits slow oscillations, making the plot appear to be mainly non-periodic within a certain input window.

This observation aligns with the findings in our original paper (Appendix G.1 and Fig. 8 (this refers to figure in the original paper), where the spectral entropy statistics also show that the Wikipedia exhibits stronger periodicity compared to MOOC. Thus, by comparing the plots of LeTE across different data, we can indirectly explore the periodic or non-periodic nature of the data present.

Different Backbones

We provide plots of the same dataset trained with TGN and DyGFormer, shown in Fig. 4 and Fig. 5, with the y-axes set to the same level for each backbone to facilitate a direct comparison. As the figures show, despite using different backbone models, the learned functions exhibit similar trends and shapes for each dimension. This illustrates the stability of our TE and makes the interpretability process more reliable.

Of course, there may be some detailed differences between LeTEs trained on different models. This is intuitively due to the presence of various influencing factors, such as the model architecture, the interaction of TE with other modules, the optimization process and etc. However, we can validate the idea by inspecting the plot in a simplified manner.

Comparing lower- and higher-dim LeTE:

We further compare the lower- and higher-dim LeTE by reconstructing the non-linear functions (please refer to and compare Fig. 1 and 6 in the repo.). Intuitively, the higher-dim representation will provide more information. As seen from the plots, dim 2 in Fig. 6 is dominated by the basis function, partially losing the information captured by dim 3 in Fig. 1.

From the perspective of the reconstructed functions, for the Fourier-based dims, the LeTE with only 1 Fourier-based dim has a single input transformation, $x'$ , and all frequency components are computed based on this transformation. This means the LeTE encodes on a broader time scale (reminder: we use Wikipedia here) and models the time difference variations of editing activities without distinguishing patterns at different scales. Since there is only 1 dim, it is harder for the TE to interpret editing patterns at different time scales. In contrast, for the LeTE with 2 Fourier-based dims, each dim has different input transformations ( $x'_0$ and $x'_1$ ), enabling the model to capture more detailed editing behaviors at different scales. For example, dim 0 may rely more on $x'_0$ (with a larger scaling factor), focusing on short-term fluctuations (high-frequency), while dim 1 may rely more on $x'_1$ (with a smaller scaling factor), focusing more on long-term trends (low-frequency). Thus, higher dims allow the model to handle behaviors at different time scales, providing higher interpretability.

Similarly, for the LeTE with 1 Spline-based dim, it primarily focuses on adjusting a single level, potentially describing how time affects behaviors. However, relying on just 1 dim makes it difficult to capture more complex time dynamics. For the LeTE with 2 Spline-based dims, the weights of the coefficients are more distributed, granting the overall LeTE stronger local adjustment capabilities. Moreover, since a dim may be dominated by basis func or Spline funcs, higher dims naturally have stronger expressive power.

Although higher-dim LeTEs offer stronger performance and better explain the information captured by the model, the interpretability analysis of higher-dim LeTEs becomes more complex and may require a dim-by-dim analysis.

Summary

We thank the reviewer for the suggestions on the interpretability of our method. We will consider adding this discussion to the appendix. Additionally, we have prepared code to process the learned LeTE parameters, reconstruct key non-linear transformations, and visualize them. This code will be updated in our publicly available repo after the review process is complete.

审稿意见

评分: 32025-03-16

This paper proposes Learnable Transformation-based Generalized Time Encoding, a new approach to encoding time in machine learning tasks. LeTE generalizes popular functional time encoding strategies and makes the non-linear transformation fully learnable. The authors use techniques from Fourier expansions and spline functions to parameterize these transformations and this flexibility allows for better modeling of periodic, non-periodic, and complex mixed temporal patterns compared to methods that rely on a fixed function. They demonstrate LeTE’s effectiveness on a variety of tasks: event-based image classification, time-series forecasting, dynamic graph link prediction, and a real-world financial risk control application.

给作者的问题

Is the learned time encoding functions in real data scenarios interpretable?

论据与证据

The claims are supported by the experiments on various tasks: event-based image classification, time-series forecasting, dynamic graph link prediction, and a real-world financial risk control application.

方法与评估标准

Yes

理论论述

N/A

实验设计与分析

The experiment designs make sounds.

补充材料

Yes

与现有文献的关系

Time encoding is very important for time-series analysis, sequence modeling, and graph representation learning.

遗漏的重要参考文献

None

其他优缺点

Strengths

Clear motivation and novelty: The paper addresses a known limitation in existing time-encoding methods: most rely on fixed periodic assumptions (e.g., sine or cosine), making them less effective for mixed or non-periodic time dynamics.
Comprehensive experimental evaluation The experiments span multiple domains:

Weaknesses / Concerns

Interpretability dicussion could be expanded: detailed demonstrations of how domain experts might interpret the learned curves (especially in high-dimensional time encodings) could strengthen the real-world applicability argument. More ablations or visualizations of learned functions in real data scenarios would further highlight interpretability.

其他意见或建议

N/A

作者回复

2025-03-28

Thank you for your valuable feedback and suggestions. We are happy to further discuss the interpretability of our method real data scenarios. We would also like to clarify that demonstrating interpretability for very high-dim encodings is challenging. We choose to use a 4-dim Combined LeTE to present our analysis. The training process and settings are consistent with those used in the main experiments of our paper. We will demonstrate the interpretability of our model from the following perspectives:

Reconstructing the learned non-linear transformation functions and plotting them to provide an intuitive analysis.
Analyzing each dim to interpret what information it represents.
Comparing different datasets under the same backbone.
Comparing different backbones' LeTE under the same dataset.
Comparing the low- vs. high-dim LeTE to assess the impact of dimensionality on interpretability.

Reconstructing

As discussed in the paper, the previous TEs exhibit a degree of interpretability by using fixed sine functions, which reflect periodic patterns. However, this strong inductive bias also limits their expressiveness and generalization to complex patterns or non-periodicity. In contrast, the LeTE is fully learnable, and the learnable non-linear functions can still be reconstructed and visualized from learned parameters, allowing for interpretability through function inspection.

We demonstrate this interpretability using a 4-dim Combined LeTE, trained on the Wikipedia/TGN. Fig. 1 (see anony. repo.) shows the learned functions for each dim. The first 2 dims are Fourier-based, and the last 2 are Spline-based.

Analysing each dim

Fourier-based: The Fourier coefficients explicitly encode frequency components, offering an intuitive view of the captured patterns. Compared to fixed sine functions, our learnable Fourier-based TE captures periodic patterns with finer granularity and greater flexibility, enabling the representation of both periodicity and subtle non-periodicity within specific ranges.

For a single dim, low-frequency components capture long-term trends, while high-frequency components focus on short-term fluctuations. As an example: Wikipedia dataset, which records editing activities, where nodes represent users or pages, and edges with timestamps capture editing events (frequency magnitude spectrums are shown in Fig. 2, note the inputs are time differences in this case):

Dim 0 shows a strong high-frequency response. The learned coefficients include $\cos(3x'): 0.29, \cos(4x'): -0.16, \cos(5x'): -0.42$ . This suggests that dim 0 is sensitive to short-term repetitive edits, i.e., high-frequency editing behavior.

Dim 1 captures low- to mid-frequency patterns, with large coefficients: $\sin(1x'): 0.96, \sin(4x'): 0.61, \cos(4x'): 0.29$ . These reflect longer-term periodic behaviors. For example, frequency-1 may correspond to daily or weekly editing cycles, while frequency-4 may capture sub-daily repeated interactions. This dim may reflect user habits or regular community editing patterns.

Thus, Fourier-based dims not only retain the periodic interpretability of sine functions but also exhibit richer frequency composition, allowing it to simultaneously capture both short-term bursts and long-term rhythms. Moreover, this approach could be extended to analyze more complex patterns. As our goal here is to present the underlying idea, we will not go deeper here.

Spline-based: The Spline functions offer complementary advantages, particularly for non-periodicity. In Spline-based dims, where we applied a basis function (Tanh), if the weight of the it is higher, it may dominate a specific dim—such as dim 2 in Fig. 1. However, there are other dims where Splines dominate, e.g., dim 3. We combine the specific case on Wikipedia and explain:

Dim 2: The output increases monotonically with time difference, indicating a time-decay-like effect — the longer the time since last edit, the stronger the encoding response. This may suggest the TE has learned that re-activation after long inactivity is a significant event in this specific case.

Dim 3: The function exhibits sharp peaks and local bumps, indicating that the TE assigns particular importance to certain time intervals. These may correspond to known active editing windows or reaction delays. The sharpness of some coefficients suggests the TE has captured rare but important temporal phenomena, such as one-off campaigns or anomaly spikes.

The Spline inherently capture local time features, indicating specific time intervals that the model considers critical. Sharp peaks coefficients within the curves suggest the occurrence of sudden events or anomalies. This local characteristic is advantageous for identifying rare phenomena.

Due to space limitations and similar concerns raised by Reviewer ozTs. Please kindly check the remaining parts in our response to Reviewer ozTs.

最终决定Accept (poster)

2025-05-01

This paper proposes LeTE, a new time encoding method for ML models. This method parameterizes nonlinear transformations via Fourier series and B-spline functions, providing a more expressive representation of temporal information, and thus is capable of modeling complex temporal patterns. The method is shown to be effective in experiments on time series forecasting, dynamic graph representation learning, and other real-world applications.

Reviewers give generally positive feedback to this paper, with concerns being resolved by the rebuttal.

I recommend to accept this paper, but strongly encourage the authors to include the key points of their rebuttal (e.g., new results, arguments about novelty) in the camera-ready.