7.3

/10

Poster3 位审稿人

最低7最高8标准差0.5

3.0

置信度

正确性3.0

贡献度3.0

表达3.0

NeurIPS 2024

Almost-Linear RNNs Yield Highly Interpretable Symbolic Codes in Dynamical Systems Reconstruction

Manuel Brenner,Christoph Jürgen Hemmer,Zahra Monfared,Daniel Durstewitz

OpenReview PDF

提交: 2024-05-15更新: 2025-01-15

TL;DR

We introduce Almost-Linear Recurrent Neural Networks (AL-RNNs) to derive highly interpretable piecewise-linear models of dynamical systems from time-series data.

摘要

关键词

recurrent neural networksdynamical systemschaosattractorsinterpretability

评审与讨论

审稿意见

评分: 7置信度: 42024-07-09

This paper introduces a new recurrent neural network (RNN) architecture called Almost-Linear (AL)-RNN for reconstructing nonlinear dynamical systems from time-series data. The key innovation of AL-RNN is training parsimonious piecewise linear (PWL) representations of dynamical systems. By combining linear units with a small number of rectified linear units (ReLUs), AL-RNNs can effectively capture the dynamics of complex systems while maintaining interpretability. The authors demonstrate the effectiveness of AL-RNNs on benchmark datasets (Lorenz and Rössler systems) and real-world data (ECG and fMRI), showing that they can discover minimal PWL representations that accurately capture the dynamics of these systems.

优点

Novelty: The AL-RNN architecture is a novel contribution to the field of dynamical systems reconstruction. It addresses the limitations of existing methods (PLRNNs and SLDS) that often result in overly complex models.

Interpretability: The AL-RNN's structure, with its minimal use of ReLU units, naturally leads to a symbolic encoding of the dynamics, making the model more interpretable and facilitating mathematical analysis.

Empirical Effectiveness: The paper demonstrates the effectiveness of AL-RNNs on both benchmark and real-world datasets, showing that they can accurately capture the dynamics of these systems.

缺点

Usefulness of symbolic dynamics seems to be limited and can be misleading: Even when the underlying dynamics is deterministic, the extracted symbolic transition dynamics is probabilistic, which can be misleading (Figure 6d).

Lack of guidance on model size: There is no discussion on how to determine the minimum number of ReLU units (P) needed, as well as the number of linear units (M). The authors explored a range of (P) and chose an arbitrary value. The method would be much more useful in practice if it could be regularized to automatically find the minimal (P).

It is crucial that the model dynamics is partly driven by the observation data, e.g. teacher forcing, but this is not mentioned in the main text equations (eq1~5). Teacher forcing is mentioned only as a training method, not for testing.

The paper lacks details on the training process and hyperparameter selection, which could hinder reproducibility.

Theorem 1 seems to be incorrect. As a counterexample, consider a linear dynamical system with a stable orbit (i.e., with 1 subregion, 0 ReLU units). In this case, the symbolic state remains constant, but not all states are fixed points.

Writing needs to be improved. Section 3.2 on symbolic dynamics seems unnecessary. The symbolic partitioning is intuitive to understand in terms of the activation of ReLU units, but the formal definitions of Section 3.2 do not seem to add any further understanding. The theory section also seems unnecessary and could be moved to the appendix. Many of concepts introduced in these sections don't seem to be mentioned afterwards, e.g. shift operator.

Undefined Terms: The paper does not properly define (N), which makes it confusing to understand. Additionally, the term "hyperbolic AL-RNN" is used without a clear definition.

Excessive Use of Acronyms: The paper uses too many acronyms, which can hinder smooth reading. It would be good to reduce their usage. Here is a list of the acronyms used in the paper: AL-RNN: Almost-Linear Recurrent Neural Network BPTT: Backpropagation Through Time DH: Hellinger Distance DS: Dynamical System DSR: Dynamical Systems Reconstruction DST: Dynamical Systems Theory ECG: Electrocardiogram fMRI: Functional Magnetic Resonance Imaging FP: Fixed Point id-TF: Identity Teacher Forcing KL: Kullback-Leibler LDS: Linear Dynamical System MSE: Mean Squared Error ODE: Ordinary Differential Equation PDE: Partial Differential Equation PLRNN: Piecewise-Linear Recurrent Neural Network PWL: Piecewise Linear RADAM: Rectified Adaptive Moment Estimation ReLU: Rectified Linear Unit RC: Reservoir Computer SEM: Standard Error of the Mean SINDy: Sparse Identification of Nonlinear Dynamics SLDS: Switching Linear Dynamical System SOTA: State-of-the-Art STF: Sparse Teacher Forcing TF: Teacher Forcing STSP: ?

问题

How does the performance of AL-RNNs compare to other state-of-the-art DSR methods on a wider range of benchmark and real-world datasets?

How does the choice of the number of linear units (M) affect the performance and interpretability of the AL-RNN model?

Is there a principled way to determine the optimal number of ReLU units (P) for a given dataset?

Can the symbolic dynamics approach be modified to better handle deterministic systems, avoiding the misleading probabilistic representation of transitions?

局限性

The paper does mention some limitations, such as the challenge of determining whether a topologically minimal and valid reconstruction has been achieved from empirical data.

作者回复

2024-08-04

We appreciate the referee’s overall positive assessment and the valuable feedback provided!

Weaknesses

W1 (usefulness of symbolic dynamics): First, please note that the symbolic encoding itself is not probabilistic, i.e. the symbolic seq. (as shown in Figs. 13, 14 or 17) are as deterministic as the underlying system itself. As new Fig. R5 in the provided PDF now further highlights, quantities like the topological entropy obtained from the symbolic encoding correlate highly with quantities obtained directly from the system dynamics, like the max. Lyapunov exp., but are much easier to compute. This further confirms that important topological properties of the underlying system can be inferred from the symbolic encoding we used. In general, symbolic dynamics has led to many powerful insights and results about the dynamics of certain systems (certain proofs about chaos or the number of unstable periodic orbits could only be derived symbolically, for instance; see Wiggins 1988, Guckenheimer & Holmes, 1983).

Hence, we think the referee’s point concerns more the representation of symbolic dynamics in the form of transition graphs. This type of graph represent. is quite standard in symbolic dynamics (see, e.g., textbook by Lind & Marcus 2021), where arrows between nodes usually represent admissible transitions, just like in graph representations of finite state machines or formal languages, for example. The graphs are meant to represent the set of all possible seq. (or ‘syntactically correct’ sentences). We agree, however, that one needs to be clear about the semantics of these graphs and their interpretation, which we will clarify.

W2 (guidance on model size): We chose $P$ according to a grid search as the min. value at which performance started to plateau (because we wanted to obtain topologically minimal representations), so its choice is not arbitrary but related to the kinks (or humps) in the curves in Fig. 3. Likewise, $M$ was determined by systematic grid search, see Appx. Fig. 9. However, we like the referee’s idea of determining an optimal number of linear subregions by regularization. We now implemented this, and find that the numbers of subregions determined this way agree well with those obtained by our prev. crit., see Fig. R6 in PDF.

W3 (teacher forcing): Sparse teacher forcing is indeed only applied during training, not during testing. Hence it is, correctly, not included as part of those equations. This is in fact crucial: DSR models are supposed to be generative models that, after training, can generate new data with the same geometrical and topological structure as those produced by the observed system. During test time, therefore, the once trained model cannot rely on actual observations, but follows solely its own dynamics. While TF is a broad term, sparse TF is different (Brenner et al. 2022, Mikhaeil et al. 2022) and has been introduced specifically in the context of DSR where it is SOTA (see also Tab. R1). We will clarify this.

W4 (reproducibility): Please note that we provided code for reproducibility with the org. submission here: https://anonymous.4open.science/r/SymbolicDSR-1955/README.md

The hyperpar. settings for all runs were further listed under A2: Training Method. In the rev. we will further clarify how exactly these optimal hyperpar. were determined.

W5 (Th.1 incorrect): The th. is correct, but we see where this misunderstanding comes from: For the referee’s example, the system would need to be non-hyperbolic, i.e. would need to have a center subspace. However, this case (req. conjugate pairs of eigenvalues to lie exactly on the unit circle) we explicitly ruled out in our theorems (it is a 0-measure set in par. space). We mentioned this in the pg. above the th. (since common to all of them), but for clarity will make explicit in the theorems themselves.

W6 (theoretical sects.): We are happy to reduce sect. 3.2 and move it partly to the Appx., the ref. is right that some of the concepts are not followed up upon. Others, like that of a shift operator, shift space, or topolog. partition, are however directly used in the theorems in sect. 4 (for the sake of the theorems, we prefer to be formally precise about certain concepts even if they may be intuitively clear).

W7 (undefined terms): N is the observation dim. of the data, and by ‘hyperbolic AL-RNN’ we mean an AL-RNN which is hyperbolic in each of its linear subregions, i.e., such that the transition matrices defined in eq. 2 have no eigenvalues on the unit circle. Will be clarified.

W8 (use of acronyms): Agreed, we will remove all acronyms which are not standard or are only rarely used.

Questions

Q1: Please note that our goal was not to introduce a novel SOTA method for DSR, but rather to introduce an approach for retrieving topologically minimal and symbolically interpretable representations from data. Still, one may ask whether the AL-RNN is at least on par with other SOTA methods for DSR. In Table R1 in the rebuttal PDF we answer this Q. We also included human EEG data & Lorenz-96 as further benchmarks. As can be seen, the AL-RNN even outperforms most current SOTA methods, which may be due to its simple design making training much more stable and robust (cf. also new Figs. R1-R3).

Q2: The number of linear units makes no difference to interpretability, since it does not affect the total number of linear subregions (hence neither the symbolic encoding nor the computation of fixed points or cycles). As shown in Fig. 9, more linear units can, however, still improve performance up to a certain level.

Q3: See W2 above.

Q4: See response to W1 above. We also would like to note that esp. in math. chaos theory probabilistic approaches to deterministic systems are indeed commonplace, e.g. in the def. of invariant measures (which are probability ms.) and ergodic theory (see, e.g., textbooks by Katok & Hasselblatt 1995, Alligood et al. 1996).

评论- Thank you for your response

2024-08-13

The authors have sufficiently addressed the concerns, and the updated result looks great. I'm raising my score to 7.

评论- Thank you!

2024-08-13

We are glad to hear the referee likes our update on results. We very much appreciate the referee's feedback which helped us to see parts of the paper which needed further clarification and support.

审稿意见

评分: 7置信度: 22024-07-13

The paper proposes to limit the number of non-linear units in a RNN to facilitate the analysis and hence understanding of inferred dynamical systems. The authors show that even with a limited number of non-linear units, the model is able to explain a large portion of the data for the Rössler and Lorentz systems. Furthermore the proposed model is related to the notion of symbolic codes and it is shown theoretically how these can be used to further analyse properties of the dynamical system under study. Finally, the authors analyse insights obtained by applying the model to real-world data.

优点

(Disclaimer: this is not really my area of expertise and although I like the paper a lot and have found no obvious objections in the method or the evaluation, I can not judge the novelty of the paper.)

Exceptionally well written
Well motivated and comprehensive introduction and problem motivation
Evaluation on both simulated data (eg Lorenz 63 and Rössler system) as well as real-world measurements (ECG, fMRI data)
Theoretical contribution that allows one to infer properties of the underlying dynamical system from the model via symbolic codes

缺点

A potential weakness of the paper is the lack of comparison to other methods. On the other hand, the paper addresses a relatively niche topic so that I am not sure whether (accessible) baseline methods to compare to are available?

问题

What is $\phi$ in eq (1)?
I don't quite understand the notation in equation (2): where does the \phi from eq (1) go? What does the subscript $\Omega(t)$ mean, and how can there be $2^M$ configurations for $D_{\Omega(t)}$ if $D_{\Omega(t)}$ is a diagonal matrix? Doesn't the fact that $D_{\Omega(t)}$ is a diagonal matrix imply that there is only a single configuration (modulo different values that they diagonal may take)?
In equation (6) as well as the lines of text just above it, there is a dot, which I believe may be a decimal point. Is this a common notation? I was at first thrown of by this (but I was also not familiar with the symbolic codes idea before.) If it is not standard (or maybe in general to facilitate understanding), it might be useful to add a note on this notation.

局限性

Limitations are well described in a dedicated section in the main text.

作者回复

2024-08-04

We thank the referee for the supportive and positive feedback, we are happy to hear the referee liked our work!

Weaknesses

In a sense this is the first study of its kind. We are not aware of any other work in the DSR field (and beyond) making this link to symbolic dynamics, and attempting to reduce model complexity in a way that allows for easy translations into topologically minimal representations. This approach to model interpretability is the major contribution of this work, so it is hard to compare to other methods.

However, one may still ask whether AL-RNNs can at least compete with other DSR methods in terms of the quality of the DS reconstructions they achieve (our examples demonstrate that they are good enough even with a very low number of linear subregions, which in itself is a major advantage for obtaining mechanistic insight into dynamics). New Table R1 which we now added to the rebuttal PDF confirms AL-RNNs are at least on par with - in fact even outperform - most other SOTA methods (which lack our method’s interpretability). We also added new benchmarks to this Table.

Questions

Q1: $\phi$ is the ReLU nonlinearity, sorry for this oversight. We will clarify this in the updated manuscript.

Q2: The ReLU ( $\phi$ ) was absorbed into the $D_{\Omega(t)}$ matrix: Note that we can rewrite eq. 1 equivalently into eq. 2 by placing a 1 on the diagonal of $D_{\Omega(t)}$ for each state $z_{p,t}>0$ at time $t$ , and a 0 otherwise (just by definition of the ReLU as $\max[0,z_{p,t}]$ ). This matrix depends on time $t$ , however, because it is determined by the values of all the states at that time. $\Omega(t)$ denotes the subset of states for which we have $z_{p,t}>0$ at time $t$ . This should also answer the referee’s last question: Since each entry on the diagonal of the $M \times M$ matrix $D_{\Omega(t)}$ can be either 0 or 1 at any time $t$ , we have a total of $2^M$ possible configurations for $D_{\Omega}$ . This may admittedly have been a bit hard to follow, especially as we didn’t make clear the meaning of $\phi$ in the equation above! We will clarify this whole section accordingly.

Q3: Yes, its interpretation is a bit similar to a decimal point (but note it’s not a decimal system of course): The symbolic sequences correspond to theoretically infinite length trajectories, and the point “separates past from future”, i.e. indicates which symbol corresponds to the current time $t$ along the trajectory. We will clarify this type of notation in the revision (it's indeed standard in symbolic dynamics), valid point, also for the parts above, thank you for your feedback on this!

评论- Thank you for the clarification

2024-08-12

I appreciate the author's response which has clarified my questions. Concidering this as well as the responses to the other reviewers, I recommend acceptance of this paper.

评论- Thank you!

2024-08-13

We are happy to hear we could satisfactorily clarify your points. Thank you for your support!

审稿意见

评分: 8置信度: 32024-07-17

This paper addresses the broad problem of learning interpretable dynamical systems from data; it builds upon existing approaches that use piecewise linear RNNs (PLRNNs), with one interesting twist: constraining the number of linear subregions to be much smaller than the "usual" $2^N$ . This is achieved by allocating a small number P of units to have a ReLu activation function, every other unit being linear. This leads to $2^P$ linear subregions. The paper begins by proving a couple of theorems showing that these dynamical systems are amenable to interpretable symbolic analysis (though these results are generic to PLRNNs, not specific to the new version). The authors then fit AL-RNNs (this new breed of PLRNNs) to chaotic attractors, demonstrating that good dynamical reconstructions can indeed be obtained with relatively small P, providing a post-hoc justification for the approach. Finally, the approach is applied to ECG and fMRI data; in the ECG case, symbolic analysis returns a highly interpretable graph linking the different linear subregions.

优点

The core idea is neat; although the difference between existing PLRNNs and these new AL-RNNs is more a difference of degree than a difference of nature (just changing the number of units having a ReLu activation), it's great to have thought of taking PLRNNs into that regime and to show empirically that (i) it doesn't compromise DSR quality too much, yet (ii) it gives dynamics that are computationally more amenable to symbolic analysis.

Another strength (for me at least) is that this paper contributes to exposing an audience that has historically primarily cared about dynamical systems reconstruction (e.g. me) to the concept of symbolic dynamics -- indeed I'm glad I reviewed this paper and thus got a useful (even if rudimentary) primer on SD.

缺点

I would very much like to see model recovery experiments; how easy are those amost-linear RNNs to fit, and to fit consistently? I could imagine that the model might settle into a suboptimal set of linear subregions early on during training and then have a hard time snapping out of it. I have to admit I don't have a good intuition for this, but this is something the authors could substantiate numerically by running simple model recovery experiments. On this note, what hard degeneracies do we expect here due to a majority of the state dimensions being unobserved? Can the authors use the symbolic dynamics grounding of section 3 (currently a little disconnected from the rest, I have to say) to derive meaningful measures of how well the ground truth system's topology / symbolic dynamics are recovered despite those degeneracies? (e.g. for small enough P it might be possible to look at all permutations of the transition matrix between linear subregions and conclude that the ground truth has been recovered?).

Re consistency, could the authors comment on whether the error bars (across training runs) in e.g. Figure 5d-f are to be considered small or large? The figure caption says "shows close agreement among different training runs" but judging from these whisker plots, the underlying coefficient of variation seems quite high (which actually triggered the concern I articulated above concerning potentially inconsistent recovery of ground truth PWL dynamics).

问题

You omitted to say that $\phi(\cdot)$ in Equation 1 is the ReLu function -- this is pretty critical!
In equation 4, why are you over-parameterizing the linear part? It seems that the first $M-P$ columns of $W$ can be absorbed into the first $M-P$ columns of $A$ . So you really just have the last $P$ columns of $W$ to learn.
In Figure 2, why did you write " $P^2$ subregions / symbols"? Did you mean $2^P$ ? (and accordingly, should $P^4$ in fact read $2^{2P}$ )?
l.185: can you please define "hyperbolic AL-RNNs" (AL-RNNs in which fixed points are all hyperbolic?) and what that implies concretely for A and W?
Theorems 1-3 make intuitive sense but appear a little ill-phrased to me -- for example, in Theorem 1, the specific $z^\star$ that appears in the first clause of the iif statement is not even referred to in the other clause (and indeed it cannot be uniquely identified from knowing “the corresponding symbolic sequence $a^\star$ ” -- you don't really say what you mean by “corresponding”, btw; do you mean the symbolic sequence associated with any state space trajectory that contains $z^\star$ ?). Perhaps this theorem could be rephrased as "if there exists a fixed point $z^\star$ in $U_e$ , then $e^\infty$ is a fixed point of the shift map; conversely, if $e^\infty$ is a fixed point of the shift map for some $e$ , $U_e$ must contain a fixed point of the $F_\theta$ map.” (?) Same concern in theorem 2 and 3. In theorem 2, I think the $k+p$ subscript should be modulo $p$ ?
Should we be concerned by the fact that your measures of DSR accuracy in Figure 3 do not decrease mononotonically with the number of PWL units? Local minima in the teacher-forcing loss?
For the ECG and fMRI datasets, I couldn't see a quantitative assessment of model performance; in particular, for the fMRI dataset, I think the authors should discuss whether and how much the addition of a categorical task-stage decoder impairs relevant DSR performance metrics (or perhaps even improves consistency across training runs?).

局限性

Perhaps the authors could discuss the extent to which it really is easier to analyze $2^P$ subregions rather than $2^N$ subregions -- quantitatively I understand that this is a lot fewer subregions, but when $2^P$ is beyond a handful, whether it's 50 or 5 million, it's unclear to me how "easy" it is to analyze/understand these things (or indeed what that even means...).

post-rebuttal EDIT:

Having read the rebuttals to all 3 revieweres, I am raising my score to an 8; strong paper likely to have an impact.

作者回复

2024-08-04

We thank the referee for the enthusiastic support and appreciation of our work!

Weaknesses

W1 (consistency of fits/ model recovery): One crucial advantage of AL-RNNs is that they indeed consistently deliver the same model over many training repetitions. The errors in Figs. 5d-f are in fact very small: To put these numbers into context, we now normalized all 3 graphs by proper reference values (Fig. R1 in rebuttal PDF), with normalized numbers substantially below 1. Moreover, we now compared the consistency across training runs to those obtained with vanilla PLRNNs by evaluating the agreement in trajectory point distributions across linear subregions, revealing a much higher agreement among AL-RNN solutions (Fig. R2). The parsimony of AL-RNNs compared to other models leaves them with much less ‘wiggle room’ in finding different solutions, as can also be appreciated from the direct comparison in Fig. R3 (see also Figs. 20 and 21). Finally, as suggested, we also did model recovery experiments, finding that the recovered solutions are virtually identical across repetitions (original: $D_{stsp}=3.14$ , $D_H=0.28$ ; recovered: $D_{stsp}=3.38+/-0.18$ , $D_H=0.28+/-0.03$ ; 3 linear subregions in all cases).

W2 (degeneracies): While these models indeed do have the capacity to capture unobserved dimensions in their latent space (e.g. Brenner et al. 2024), we usually still would use delay embedding for this (as for the ECG). This, both, eases the training process itself and enables evaluation of agreement in attractor geometry. While it is possible to harvest the symbolic representation to measure the similarity in 2 reconstructions (overlap in symbolic codes and their graphs, e.g. agreement in adjacency matrices), for empirical data the topologically minimal representation is usually unknown, of course, and is precisely what we would like to infer through the AL-RNN (see Discussion). Hence, empirically, measures like $D_{stsp}$ or $D_H$ remain the methods of choice for initially evaluating the quality of DSR in delay-embedding spaces. But we checked this idea now for evaluating the consistency across training runs, and the symbolic graphs in fact remain identical across different runs.

Questions

Q1: Absolutely, thanks for pointing out this oversight!

Q2: Yes, true, the full $\mathbf{W}$ matrix is kind of a ‘historical quirk’ from previous formulations of the PLRNN and the resp. codebase. For model training and performance it makes no difference ( $D_{stsp}$ : $t(19)=-1.58, p=0.13$ , $D_{H}$ : $t(19)=0.68, p=0.51$ ), but for parsimony the diagonal of $\mathbf{W}$ should be removed for the linear units. We will comment on this in the revision.

Q3: Yes, thanks for catching!

Q4: With “hyperbolic AL-RNN” we mean the system is hyperbolic in each of its linear subregions, implying that none of the Jacobian matrices $\mathbf{A}+\mathbf{W} \mathbf{D}_{\Omega}$ has eigenvalues exactly on the unit circle (a measure-0 set in parameter space). Will be clarified!

Q5: Thanks for pointing out this source of misunderstandings, we will rephrase the theorems for clarity. The key is the term “corresponding” by which we meant the mapping from trajectories of the original system onto symbolic seq.. We will precisely define this mapping and the term “corresponding” in our rev..

Re Th.2: The notation here is correct, this is the standard def. for a p-cycle (otherwise it would just read $\mathbf{z}_k=\mathbf{z}_k$ , but we wish to indicate the map returns after p iterations).

Q6: Some of the wiggling up and down of some of the curves after the initial kink (which we are looking for) is likely to be just noise across different training runs with different minima achieved. Also note that we are not directly optimizing for $D_{stsp}$ and $D_{H}$ , so while in STF based algorithms MSE remains a good proxy in general (e.g. Hess et al. 2023), there is no guarantee the relation is strictly monotonic. However, the hump in the figure for the Lorenz-63 is a bit more suspicious, and so we dug a bit deeper: It seems that in this case the first minimum at P=2 indeed indicates the topologically minimal representation (see also Fig. R6), and that when surpassing this optimal point in fact performance first decays again. It is thus more a feature rather than a bug, and would make identifying the optimal representation even easier.

Q7: For the ECG data, the quantitative results were not included in Fig. 7 but in Fig. 3, so perhaps easy to miss, will be made clearer. For the fMRI data, as requested we now have produced the same type of graph, included as Fig. R4 in the rebuttal PDF. That an additional categ. decoder significantly improves DSR quality on these same fMRI data has been shown before in (Kramer et al. 2022; Brenner et al. 2024).

Limitations

Since the number of subregions grows exponentially with $P$ and $M$ , $P<<M$ eases model analysis profoundly. One major purpose of DSR in our minds is to provide mechanistic insight into system dynamics (rather than just forecasting). The thorough understanding of the dynamical mechanisms of chaos in Fig.5, e.g., is only possible because we have less than a handful of linear subregions (likewise for the empirical examples). Besides this important visualization aspect facilitating human interpretation, there are also clear numerical benefits: For instance, in a brute-force search algorithm for fixed points and cycles, the number of iterations we would need to find such objects also increases exponentially with $M$ . Hence reducing this to $P<<M$ is always a huge benefit that enables to dig much deeper into model mechanisms. Of course it is difficult to put an exact threshold on this. However, that we were able to capture even rather complex ECG and fMRI dynamics with just a few linear subregions ( $\leq 8$ ) we find encouraging. Whether this will always or mostly be the case in empirical settings, is an interesting and open question, which will be discussed.

2024-08-12

Thanks for the thorough response. I think that (i) the inherent interpretability of the model, (ii) the model recovery experiments showing consistency of training, as well as (iii) the response to Reviewer 5E4v with favourable comparison to SOTA, show that this is a very practical model likely to have a substantial impact in the field. I am raising my score to an 8.

评论- Thank you!

2024-08-13

Thank you very much for the appreciation of our paper and rebuttal, and for engaging so constructively and thoughtfully with our work!

作者回复

2024-08-05

General reply

We thank all three referees for their thorough reading and the constructive and helpful feedback on our manuscript. We are happy to see that all referees provided a generally supportive and positive assessment of our work. We hope we could address the remaining concerns in the detailed point-by-point replies below in the individual rebuttals, and the additional new material provided in the rebuttal PDF.

In brief, the rebuttal PDF contains the following new results and figures:

Table R1 provides a systematic comparison of AL-RNN performance to many other state-of-the-art DS reconstruction models, incl. additional benchmark systems for testing (human EEG and high-dim. chaotic Lorenz-96 system). While the idea behind the AL-RNN was not to provide a new SOTA for DSR, it does indeed outperform most other techniques when trained with sparse teacher forcing (potentially due to its simple and parsimonious design).
Figs. R1-R3 highlight that a particular feature of the AL-RNN, especially in comparison to the standard PLRNN, is the consistency in inferred models across multiple training runs, i.e. very similar or same model solutions are obtained across many different parameter intializations.
Fig. R4 is the same as Fig. 3 from the paper for the fMRI data.
Fig. R5 shows that topological properties can be cheaply computed from the symbolic encoding.
Fig. R6 illustrates selection of the optimal number of piecewise-linear units through a regularization approach.

Cited References

K. T. Alligood, T. D. Sauer, and J. A. Yorke, Chaos: An Introduction to Dynamical Systems, Springer-Verlag, New York, 1996.

M. Brenner et al. 2022, Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems, Proceedings of the 39th International Conference on Machine Learning (ICML 2022)

M. Brenner et al. 2024, Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics, Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

J. Guckenheimer and P. Holmes, Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, Springer-Verlag, New York, 1983.

F. Hess et al. 2023, Generalized Teacher Forcing for Learning Chaotic Dynamics, Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

Katok, A., & Hasselblatt, B. (1995). Introduction to the Modern Theory of Dynamical Systems. Cambridge: Cambridge University Press

D. Kramer et al. 2022, Reconstructing Nonlinear Dynamical Systems from Multimodal Time Series, Proceedings of the 39thth International Conference on Machine Learning (ICML 2022)

Lind & Marcus, 2021, An Introduction to Symbolic Dynamics and Coding.. H. Cambridge University Press, 2nd edition

J. Mikhaeil et al. 2022, On the difficulty of learning chaotic dynamics with RNNs, Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

S. Wiggins, Global Bifurcation and Chaos, Springer-Verlag, New York, 1988.

评论- comments on our rebuttal?

2024-08-12

Dear Referees,

The discussion period is coming to a close, and we wondered whether we could satisfactorily address your points, or whether there are any issues remaining that may need further clarification? Thank you very much again for reviewing our work, and for your supportive and positive feedback so far!

Kindly, Authors

评论- Thank you!

2024-08-13

Dear Referees,

We are glad to hear we were able to satisfactorily address all issues raised, your feedback is much appreciated. The additional results and clarifications surely improve the paper, thank you once again for engaging so constructively and thoughtfully with our work!

最终决定Accept (poster)

2024-09-25

The reviewers agreed on the premise of the paper of introducing a new architecture for discovering interpretable partitions on dynamical systems. The supporting results included applications to both toy analytical systems and real world data. The reviewers all concur with an accept recommendation, all mentioning the method’s simplicity and capacity for broader impact.

The authors attached one extra page of result tables, figures, and plots, which convinced one reviewer to raise their score. The AC thinks that making the suggested corrections and stylistic edits and including the additional results would be feasible within the revision, and not represent a dramatic change from the submitted version. The AC agrees with the reviewers that the paper is an accept.