Neurons as Detectors of Coherent Sets in Sensory Dynamics
Neurons can cluster sensory dynamics into coherent sets by projecting inputs onto a singular vector and rectifying the result
摘要
评审与讨论
This paper proposes a theoretical framework conceptualizing sensory neurons as detectors of coherent sets in high-dimensional stochastic dynamical systems. For OU processes, they analytically derive that the subdominant singular function of the stochastic Koopman operator reduces to a linear projection followed by rectification, naturally explaining neuronal temporal receptive fields and the ubiquity of rectification in neural responses. The framework predicts complementary neuronal classes specialized for prediction (contracting coherent sets) versus retrospection (expanding coherent sets), which the authors connect to known functional dichotomies like tufted/mitral cells and lagged/non-lagged cells.
优缺点分析
Strengths:
- The connection between coherent set detection and neural computation is original and mathematically rigorous.
- Analysis of real neuronal data from multiple sensory systems supports theoretical predictions.
Weaknesses:
- Oversimplified dynamics: The analytical results are restricted to linear OU processes; real sensory dynamics are likely far more complex.
- Incomplete nonlinear treatment: The extension to nonlinear dynamics via Galerkin projections is mentioned but not thoroughly developed or validated.
- Limited scope of data analysis: The empirical analysis focuses on relatively simple temporal receptive field patterns and doesn't address more complex neuronal response properties.
问题
- The analytical results are limited to OU processes, but the authors claim the framework extends to nonlinear dynamics. Could authors provide concrete examples with analytical or numerical solutions for specific nonlinear systems relevant to sensory processing?
- How does the proposed framework account for the extensive feedback connections in sensory systems? Would feedback fundamentally alter the coherent set structure or the neuronal computations?
- What are the computational requirements for neurons to perform the proposed coherent set detection? How does this scale with the dimensionality of the sensory input space?
- How do authors distinguish the proposed coherent set framework from other established theories like efficient coding, sparse coding, or predictive coding that also explain temporal receptive fields and rectification?
局限性
N/A
最终评判理由
The rebuttal addressed my concerns.
格式问题
Could the author consider re-formulating the Related Work section (currently the Relationship to Other Work section) and putting it right after the Introduction section?
We thank the reviewer for taking the time to provide thoughtful feedback on our manuscript. Below, we address each of the reviewer’s comments in turn.
1. Oversimplified dynamics: The analytical results are restricted to linear OU processes; real sensory dynamics are likely far more complex.
Exactly! 1. This is why we propose the data-driven approach that can be applied to almost any dynamics. 2. Any local neighborhood of a nonlinear system can be approximated as a linear system through Taylor expansion. In many cases, the behavior of a nonlinear system can be largely captured by the local linear behaviors around its critical points. 3. Critical points and invariant manifolds linking them can serve as a skeleton of the global nonlinear system. In potential future works, we are going to propose a neural network such that each critical point corresponds to a group of neurons, and the information from different groups of neurons are aggregated through the next layer of the neural network. But this is beyond the scope of this manuscript.
2. Incomplete nonlinear treatment: The extension to nonlinear dynamics via Galerkin projections is mentioned but not thoroughly developed or validated.
We will add a section to the Supplementary Material providing a short pedagogical introduction to the Galerkin method using a time-delay embedding basis.
3. Limited scope of data analysis: The empirical analysis focuses on relatively simple temporal receptive field patterns and doesn't address more complex neuronal response properties
We restricted our analysis to neurons close to the sensory periphery where we could closely link the neural responses to the stimuli. We suspect that more complex neuronal response properties to sensory stimuli, for instance in cortical areas, may be described by similarly simple receptive fields with respect to intermediate (upstream) neural activity. If such a dataset were available we would be able to apply our approach to it.
Our analyses were restricted due to using previously published data. As noted elsewhere, neural data was supplied in the form of temporal receptive fields for Mitral/Tufted cells and Spatiotemporal receptive fields for RGCs, without a record of individual stimulus/response trials. RGC receptive fields were estimated by the original authors using reverse correlation of white noise. Receptive fields for Mitral/Tufted cells were estimated by the original authors by describing the best response to brief odor pulses. More complex neuronal response properties would require purposefully defined stimuli and experimental conditions.
4. The analytical results are limited to OU processes, but the authors claim the framework extends to nonlinear dynamics. Could authors provide concrete examples with analytical or numerical solutions for specific nonlinear systems relevant to sensory processing?
Earlier work by Froyland (2013) and others show the efficacy of this method for nonlinear systems. We are not sure if there is an obvious explicit dynamical system model ‘relevant to sensory processing’ that we can analytically or numerically solve. Perhaps, the results in our Fig. 2 come closest to answering this question.
5. How does the proposed framework account for the extensive feedback connections in sensory systems? Would feedback fundamentally alter the coherent set structure or the neuronal computations?
We focus on the feedback free setting, noting that certain latent features can be extracted from stimulus streams without feedback. While there is feedback in the vertebrate retina and invertebrate lamina, the processing is primarily feedforward. Previously many feedforward network models were proposed to capture retinal computation accurately [Maheswaranathan & Baccus, 2023]. We can clarify these points in the future version of the manuscript. The potential impacts of feedback would be interesting to explore in future work.
6. What are the computational requirements for neurons to perform the proposed coherent set detection? How does this scale with the dimensionality of the sensory input space?
What makes our approach so enticing is its biological plausibility. Currently, we are focusing on neurons processing one-dimensional stimuli, such as Drosophila lamina neurons and sensory and relay neurons of the olfactory bulb. So long as one uses a lag vector basis, the computational requirement of the proposed approach from each neuron is simply a linear temporal filter followed by a spiking nonlinearity, both could be implemented by ion channels with different time constants. In future works, we will consider spatio-temporal stimuli where the spatial component of the spatio-temporal filter could be implemented by synaptic weight vectors.
7. How do authors distinguish the proposed coherent set framework from other established theories like efficient coding, sparse coding, or predictive coding that also explain temporal receptive fields and rectification?
Efficient coding is a general principle that aims to maximize the mutual information between stimuli and neural responses. In contrast, the aim of our work and the information bottleneck method is to infer the information from past stimuli about future stimuli most effectively. We can show in Supplementary Material that the linear projection (but not rectification) in our method is equivalent to the information bottleneck method for Gaussian distributions and linear systems. They may not be equivalent for more complex situations. In predictive coding, neurons output prediction error, i.e. the difference between optimal prediction and actual signal. The above methods do not give a principled derivation of rectification. Finally, sparse coding is typically static and discards the dynamical properties of the stimuli. As suggested by the reviewers, we will re-write, expand and rename the Relationship to Other Work Section and place it after the Introduction.
Froyland, G. (2013). Physica D: Nonlinear Phenomena, 250, 1-19.
Maheswaranathan, N., McIntosh, L. T., Tanaka, H., Grant, S., Kastner, D. B., Melander, J. B., ... & Baccus, S. A. (2023). Neuron, 111(17), 2742-2755.
Thanks for the comprehensive response. Since the authors have well addressed my concerns, I will raise my score.
This paper proposes that sensory neurons function as detectors of coherent sets, which are defined here as maximally stable binary partitions of the input state space. The paper first formalizes coherent sets in terms of the subdominant singular vectors of the stochastic Koopman operator. It then shows that for OU dynamics the optimal detector is a linear projection onto the dominant singular vector, followed by positive or negative rectifying (for detecting one set or its complement). For unknown and possibly nonlinear dynamics it then proposes a data-driven method based on past-future cross-covariance. Coherent sets for prediction are defined by thresholding in subspaces with contracting dynamics while coherent sets for retrospection are defined by thresholding in subspaces with expanding dynamics. Analysis of temporal receptive fields of biological neurons shows how this framework can classify them into predictive versus retrospective cells.
优缺点分析
Strengths:
Sophisticated mathematical approach Clever and novel bridge between inference over stochastic processes and temporal characteristics of real neurons.
Weaknesses:
Evidence is largely circumstantial. The theoretical framework offers an organization of existing data based on qualitative measures (i.e., the sign of the time constant of the orthogonal exponential function), but does it make quantitative new predictions?
问题
I'm confused about the association of expanding regions (i.e., directions in state space) with retrospection and contracting directions with prediction. For example take the 2d OU process with A = [[1,0],[0,-1]] and D = [[1,0],[0,1]]. The dynamics are expanding along x_1 and contracting along x_2. IIUC, the coherent sets for prediction are {x: x_2 < 0} and {x: x_2 ≥ 0} while the coherent sets for retrospection are {x: x_1 < 0} and {x: x_1 ≥ 0}. This seems backwards. Isn't it much easier to predict sign(x_1) than sign(x_2)? For example if x_1(0) = 2 then there's a high probability that sign(x_1(t)) > 0 for any t > 0, because of the repulsive dynamics. In contrast, if x_2(0) = 2 then we can't reliably predict x_2(t) for t > 0, because the attractive dynamics will erase the initial conditions and the future trajectory will bounce back and forth around x_2 = 0.
What happens when the dynamics are repulsive (or attractive) in all directions? I believe B_tau (F_tau) is still well-defined. If a neuron projects to its top singular vector does the neuron just fail to make meaningful predictions (retrospections)?
I was convinced by the proof on Wikipedia but an academic citation would be more appropriate. Maybe the Bhatia & Rosenthal (1997) reference there would work.
I may not have been the ideal reviewer for this paper but I had to do a lot of secondary reading to understand the mathematical properties of several of the constructions (Perron-Frobenius operator, forward-backward and backward-forward operators, Galerkin projections). Some intuitive guidance in secs 3-4 would probably help other readers.
局限性
yes
最终评判理由
Sophisticated and well-conceived theory offering a novel perspective on neural data
格式问题
none
We thank the reviewer for taking the time to provide thoughtful feedback on our manuscript. Below, we address each of the reviewer’s comments in turn.
1. Evidence is largely circumstantial. The theoretical framework offers an organization of existing data based on qualitative measures (i.e., the sign of the time constant of the orthogonal exponential function), but does it make quantitative new predictions?
Based on our OU analysis in >3 dimensional space, we expect to see neuronal temporal filters with multiple (>2) phases. We think that such observations are difficult because they are often obscured by noise and limited amounts of data. We predict that, as the recording techniques improve and the amount of data grows, many more such multiphasic (>2) filters will be seen.
2. I'm confused about the association of expanding regions (i.e., directions in state space) with retrospection and contracting directions with prediction. For example take the 2d OU process with A = [[1,0],[0,-1]] and D = [[1,0],[0,1]]. The dynamics are expanding along x_1 and contracting along x_2. IIUC, the coherent sets for prediction are {x: x_2 < 0} and {x: x_2 ≥ 0} while the coherent sets for retrospection are {x: x_1 < 0} and {x: x_1 ≥ 0}. This seems backwards. Isn't it much easier to predict sign(x_1) than sign(x_2)? For example if x_1(0) = 2 then there's a high probability that sign(x_1(t)) > 0 for any t > 0, because of the repulsive dynamics. In contrast, if x_2(0) = 2 then we can't reliably predict x_2(t) for t > 0, because the attractive dynamics will erase the initial conditions and the future trajectory will bounce back and forth around x_2 = 0.
We are afraid that the provided example, as described, is backwards: {x: x_2 < 0} and {x: x_2 ≥ 0} should be expanding coherent sets for retrospection, {x: x_1 < 0} and {x: x_1 ≥ 0} should be contracting coherent sets for prediction. When you focus on the set where x_2 has a fixed sign, the dynamics will expand along positive and negative directions of x_1. When you focus on the set where x_1 has a fixed sign, the dynamics will contract to the x_1 axis on one side of the x_2 axis.
3. What happens when the dynamics are repulsive (or attractive) in all directions? I believe B_tau (F_tau) is still well-defined. If a neuron projects to its top singular vector does the neuron just fail to make meaningful predictions (retrospections)?
If a critical point is purely unstable, it will not be visited by the autonomous dynamics. Hence it will not be physically relevant. Moreover, applying this approach to nearly isotropic attractive fixed points gives partitions that are not ‘distinguished’, to use Froyland’s (2013) terminology. These partitions do not represent a genuine clustering of states which could describe qualitatively different parts of the phase space. We will add this paragraph to the manuscript.
4. I was convinced by the proof on Wikipedia but an academic citation would be more appropriate. Maybe the Bhatia & Rosenthal (1997) reference there would work.
Thanks! We will adjust the reference.
5. I may not have been the ideal reviewer for this paper but I had to do a lot of secondary reading to understand the mathematical properties of several of the constructions (Perron-Frobenius operator, forward-backward and backward-forward operators, Galerkin projections). Some intuitive guidance in secs 3-4 would probably help other readers.
We will add a section to the Supplementary Material providing a short pedagogical introduction to transfer operators in the context of drift-diffusion models, as well as the Galerkin projections.
Froyland, G. (2013). Physica D: Nonlinear Phenomena, 250, 1-19.
Thanks for your replies. I'm having trouble with your answer 2 because I can't tell which part of my question you disagree with. Given A = [[1,0],[0,-1]] and D = [[1,0],[0,1]], do you agree that x_1 is an expanding direction and x_2 is a contracting direction? Do you agree that the coherent sets for prediction (e.g., as identified by eq 4) should be {x: x_1 < 0} and {x: x_1 ≥ 0} while the coherent sets for retrospection are {x: x_2 < 0} and {x: x_2 ≥ 0}? Maybe the confusion is about the terms "expanding coherent set" and "contracting coherent set". Would you call {x: x_2 < 0} expanding or contracting? I realize now that either answer is defensible (and I honestly don't know which you intend) because it's expanding in one direction and contracting in the other.
Yes, we agree that x_1 is an expanding direction and x_2 is a contracting direction. We also agree that {x: x_1 < 0} and {x: x_1 ≥ 0} are predictive coherent sets and {x: x_2 < 0} and {x: x_2 ≥ 0} are retrospective coherent sets. We would call {x: x_2 < 0} an expanding coherent set. You are right that for any coherent set in this example both expanding direction and contracting direction exist. Thus, the expanding and contracting coherent sets are not distinguished by the radial trajectory relative to the origin. In our terminology, we call a coherent set expanding if the same past state can expand into different future states, and we call a coherent set contracting if different past states can converge to the same future state. Thus, identifying contracting coherent sets leads to prediction, while identifying expanding coherent sets means retrospection. We will make the terminology more clarified in the future version.
Good to see we're in agreement about the important points. I would advise dropping expanding/contracting terminology for sets, but now that you see what had me so confused I'll leave it to you. Thanks for the very nice paper.
Stochastic processes are typically described by stochastic differential equations (SDEs) or (via the Fokker-Planck equation) a deterministic PDE for the probability density function. However, they can also be characterized in terms of "transfer operators" (Koopman, Perron-Frobenius, etc.) for some fixed amount of time evolution tau (as opposed to the infinitesimal evolution given by the PDEs). The advantage of these transfer operators is that they can be estimated directly from data: pick a finite set of basis functions; evaluate them at a set of samples X as well as their time-evolved (by tau) counterparts Y; and then compute all second-order statistics. The approximate Koopman operator, e.g., corresponds to the normal equations (the optimal linear description of the evolution of x in this basis); the forward-backward operator amounts to applying the normal equations once in the forward and once in the backward directions. Furthermore, the discrete-time, discrete-state counterpart to the SDE is a random walk on a graph; and consequently, spectral clustering methods have counterparts for SDEs, which are derivable from the transfer operators. Intuitively, the eigenfunction with the largest eigenvalue corresponds to the most coherent set of points under the trajectory. Of interest is the second eigenfunction, which corresponds to a partition of the state space into the two most coherent regions (the first eigenfunction includes the entire state space, I think).
The present MS shows that, for Ornstein-Uhlenbeck (OU) processes ("a reasonable model of summed input to a neuron"), the second eigenfunctions of the forward-backward (F) and backward-forward (B) operators are linear projections of the state, which can be derived explicitly from the equations for the dynamics. The F eigenfunction corresponds to expanding coherent sets, and therefore tells us about low-dimensional structure in the past ("retrospective"), whereas the B eigenfunction corresponds to contracting coherent sets and tells us about low-dimensional structure in the near future ("predictive").
For neurons to extract this structure, they must estimate the dynamics directly from the data, along the lines sketched above. The authors propose that neurons project incoming data onto the second singular vector (as shown to be optimal for OU processes) via their vector of synaptic weights, and then rectify the result (or its additive inverse). The magnitude of this scalar output indicates how strongly the input belongs to one (or the other) of the two coherent sets. Neurons that project onto the F eigenvector are predictive; onto the B eigenvector, retrospective.
Finally, the authors consider neural data, letting the basis functions be simply a 1D variable at a set of discrete delays. A standard model of luminance input yields singular vectors that resemble the temporal filters of retinal cells in drosophila. ON and OFF cells in the early visual system are explained as the rectified responses to the positive and negative projections of luminance. Tufted and mitral cells in the olfactory bulb are interpreted as predictive and retrospective, respectively. Consistent with this claim, temporal receptive fields for such cells are shown to be (in many cases) orthogonal to either growing or decaying exponentials (or, somewhat confusingly, both). Something similar can be said of retinal ganglion cells, although almost all of these are predictive.
优缺点分析
This is a refreshingly different take on neural coding (at least to this reviewer's knowledge), and provides a powerful framework for interpreting the temporal filters of neurons that fits nicely with the electrophysiology of early sensory areas. The explanation of rectification is also compelling (although perhaps more of a precising of existing intuition than a brand new idea). The approach has the potential to be a very general explanatory principle for temporal receptive fields. Nevertheless:
Weaknesses:
- I am not wholly convinced that the sets of points that cohere in state space are the right low-dimensional signal. There is not much intuition provided about why (e.g.) the smoothed derivatives of Fig. 2B are good for extracting what is presumably the feature of interest, the luminance itself. (How would the filters change for a different model of luminance?)
- It would be nice to make a stronger connection to the historical theoretical work on this topic, especially the literature on efficient coding (as well as the information bottle in refs 1-4), particularly in the retina where the authors provide their own interpretation. (See e.g. the section on "Efficient Coding in Retina and Thalamus" in https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2022.929348/full). I realize the authors are short on space.
- The MS is not exactly self-contained (this reviewer had to read ref. 10 first, but perhaps that's my fault).
问题
- How are we to interpret cells with filters orthogonal to both a decaying and a growing exponential?
- The purpose of the analytical result is (I take it) to show that the procedure is optimal for at least one plausible kind of stochastic dynamics. But in the data-driven case, wouldn't one use the linear projection onto the second singular vector even without this proof?
- The basis used in section 5 is sensible, but in general how does one choose the basis? It seems that it would be difficult to compare against electrophysiological data without having first nailed this down.
- Why use CCA in Section 5.2? (It seems that would be slightly different from finding the eigenvectors of the matrices in Eqn. 22.)
局限性
yes
最终评判理由
The authors propose a refreshingly different take on neural coding (at least to this reviewer's knowledge), and a powerful framework for interpreting the temporal filters of neurons that fits nicely with the electrophysiology of early sensory areas. The explanation of rectification is also compelling (although perhaps more of a precising of existing intuition than a brand new idea). The approach has the potential to be a very general explanatory principle for temporal receptive fields. My remaining reservation is that the manuscript may not reach all of its intended audience (computational neuroscientists) in part because of its novelty; but the authors have proposed to rectify some of this in the supplement with a "short pedagogical introduction to transfer operators."
格式问题
none
We thank the reviewer for taking the time to provide thoughtful feedback on our manuscript. Below, we address each of the reviewer’s comments in turn.
1. I am not wholly convinced that the sets of points that cohere in state space are the right low-dimensional signal. There is not much intuition provided about why (e.g.) the smoothed derivatives of Fig. 2B are good for extracting what is presumably the feature of interest, the luminance itself. (How would the filters change for a different model of luminance?)
What we show with the example OU process is that for a locally linear dynamical system, this algorithm extracts the coherent sets of the full state of the system. The instantaneous luminance itself is not the full state of the system. Rather, it is the local dynamics of the luminance (encoded in the correlation of the lag vectors) that the neuron has to cluster. For instance, the same luminance value can initiate an ON edge if the luminance is locally increasing or an OFF edge if the luminance is locally decreasing. In fact, the two coherent sets in our example correspond to the ON and OFF edges which are important latent variables as they indicate the passing of an object boundary over the photoreceptors. The smoothed derivatives of Fig 2B are optimal for extracting that signal, when restricted to the linear setting, their exact shape adapts to the statistics of whatever (locally linear) model of luminance is provided. We will add a paragraph with this clarification to the text and plot the response of the model ON and OFF neurons to the luminance trace in Figure 2 to illustrate this point.
2. It would be nice to make a stronger connection to the historical theoretical work on this topic, especially the literature on efficient coding (as well as the information bottle in refs 1-4), particularly in the retina where the authors provide their own interpretation. (See e.g. the section on "Efficient Coding in Retina and Thalamus" in https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2022.929348/full). I realize the authors are short on space.
Thanks for the reference! We will cite it in the revised manuscript. Efficient coding is a general principle that aims to maximize the mutual information between stimuli and neural responses. In contrast, the aim of our work and the information bottleneck method is to infer the information from past stimuli about future stimuli most effectively. We can show in Supplementary Material that the linear projection (but not rectification) in our method is equivalent to the information bottleneck method for Gaussian distributions and linear systems. They may not be equivalent for more complex situations. As suggested by the reviewers, we will re-write, expand and rename the Relationship to Other Work Section and place it after the Introduction. In particular, we will address the work of Dong and Atick (1995) in the re-write of the Relationship to Other Work Section. Briefly, while their explanation of the lagged and non-lagged cells relies on non-linearity, ours does not.
3. The MS is not exactly self-contained (this reviewer had to read ref. 10 first, but perhaps that's my fault).
We will add a section to the Supplementary Material providing a short pedagogical introduction to transfer operators in the context of drift-diffusion models.
4. How are we to interpret cells with filters orthogonal to both a decaying and a growing exponential?
Such filters could correspond to a saddle point in a >2-dimensional space with 2 unstable directions. These filters would have to be orthogonal to the other unstable direction and a stable direction.
To better explicate this analysis in the manuscript, we plan to run an analysis of a known OU process in the lag vector coordinates, compute the CCA filters, and repeat the orthogonality analysis. The insights gained from this analysis will be added to the revised main text.
5. The purpose of the analytical result is (I take it) to show that the procedure is optimal for at least one plausible kind of stochastic dynamics. But in the data-driven case, wouldn't one use the linear projection onto the second singular vector even without this proof?
Solving OU analytically relates the subdominant singular function of the SKO (the sign of which partitions the phase space into coherent sets, so-called spectral clustering) and the subdominant singular vector which can be extracted from the data. Specifically, we show that the subdominant singular function is an inner product of the subdominant singular vector and the state. The Galerkin projection onto the feature basis, such as lag-vectors, generalizes this relation to nonlinear dynamics.
6. The basis used in section 5 is sensible, but in general how does one choose the basis? It seems that it would be difficult to compare against electrophysiological data without having first nailed this down.
For a partially observed OU system, time-delay embedding with the lag-vector length greater or equal to the order of the system fully captures its dynamics, De Persis & Tesi (2019). Furthermore, we analytically proved that the subdominant singular function of an OU process lies in the space spanned by linear functions. When applying our method to early sensory processing, we compute such a basis from sensory stimuli and find it satisfactory. If the reviewer is asking about choosing a correct basis of stimulus features for more central neurons we agree that this is a difficult problem. Instead, we anticipate that the lag-vector constructed from the activity of upstream neurons would provide an appropriate basis.
7. Why use CCA in Section 5.2? (It seems that would be slightly different from finding the eigenvectors of the matrices in Eqn. 22.)
Eigenvectors of the matrices in Eqn. 22 are equivalent to the canonical directions just like eigenvectors of the gramian of a matrix being equal to the singular vector of the matrix. We will clarify this connection in the revised manuscript.
Dong, D. W., & Atick, J. J. (1995). Network: Computation in neural systems, 6(2), 159.
C. De Persis, P. Tesi, IEEE Trans. Autom. Control 65, 909–924 (2019).
Thank you for the thorough replies (and I apologize for responding so late). The authors have answered most of my questions and I retain my recommendation of "accept." (I refrain from a "strong accept" since the official description is very strong.)
The authors propose a novel perspective on the functional role of sensory neurons, as "coherent set detectors" that identify regions of the stimulus space that evolve cohesively over time. Through modelling neurons as latents underlying stochastic dynamical systems governing the dynamics of sensory stimuli, the authors show that neurons can detect coherent sets through spectral clustering of the stochastic Koopman operator underlying the dynamic system. For OU processes, the authors show that the subdominant singular basis functions reduce to linear projections. It is additionally possible to extend the framework to nonlinear dynamics with Galerkin projections and CCA for interpreting real neural data.
优缺点分析
Strengths.
- The authors propose a novel theoretical framework connecting dynamical systems theory to neural computation through coherent set analysis, which is an under-explored question in understanding functions of sensory neurons.
- The derivation for OU processes is technically sound, with clear progression from the Koopman operator to implementable neural operations.
- The paper is well-written, with clear demonstrations and easy-to-follow derivations, despite the mathematical complexity.
Weaknesses.
- The extension to non-linear dynamics is largely data-driven, and largely base on the stationarity assumption, whereas real neural dynamics almost always violates such assumption.
- Neuron classification based on orthogonality to exponentials lacks rigorous statistical testing, with no comparison to null hypothesis/model.
问题
- How could neurons biologically learn the required singular vectors? The whitening step requires global covariance - how is this reconciled with local learning rules?
局限性
See above.
最终评判理由
Raised scores post-rebuttal as the authors have provided additional clarifications.
格式问题
N/A
We thank the reviewer for taking the time to provide thoughtful feedback on our manuscript. Below, we address each of the reviewer’s comments in turn.
1. The extension to non-linear dynamics is largely data-driven, and largely based on the stationarity assumption, whereas real neural dynamics almost always violates such assumption.
We agree with the reviewer’s point: non-stationarity presents a significant challenge for algorithmic models of neural activity. A common workaround is to assume local stationarity, that is, the distribution of inputs changes slowly compared to the timescale of the dynamics. In this case, we can continuously update the estimate of the dynamics using exponential forgetting (see, e.g., B. Widrow and S. D. Stearns, Adaptive Signal Processing, Prentice Hall, 1985). For example, visual input from ambient light may remain relatively stable while an agent is indoors, only shifting to a different distribution once the agent steps outside. Our method can accommodate this type of non-stationarity by computing Eqs. 19-20 locally in time, using a temporal filter that discounts older observations. This allows the estimated covariances to track gradual changes in the underlying distribution without assuming global stationarity. We will add this clarification to the manuscript.
2. Neuron classification based on orthogonality to exponentials lacks rigorous statistical testing, with no comparison to null hypothesis/model.
We agree that more rigorous statistical validation is important and aim to incorporate such testing in future work. However, our analyses were restricted due to using previously published data. Neural data was supplied in the form of temporal receptive fields for Mitral/Tufted cells and Spatiotemporal receptive fields for RGCs, without a record of individual stimulus/response trials. RGC receptive fields were estimated by the original authors using reverse correlation of white noise. Receptive fields for Mitral/Tufted cells were estimated by the original authors by describing the best response to brief odor pulses. Given the structure of our current dataset, it is unclear what form of statistical testing would be both valid and informative. In particular, we lack measures of variability and cannot construct a meaningful null model. If the reviewer has specific suggestions for how to proceed despite these limitations, we would be happy to apply the proposed tests to our data.
To better validate this hypothesis, we plan to run the same analysis on filters generated using a lag-vector based Galerkin expansion of a simple OU process. We expect to see similar classes of filters that are orthogonal to exponentials at specific timescales set by the dynamics of the OU process. The insights gained from this analysis will be added to the revised main text.
3. How could neurons biologically learn the required singular vectors? The whitening step requires global covariance - how is this reconciled with local learning rules?
An online CCA algorithm implemented by neural networks with biologically plausible local learning rules was previously derived by optimizing a similarity matching objective (Lipshutz et al., 2021). This approach jointly determines both the neuronal activity dynamics and synaptic update rules, while implicitly performing whitening. Future work will aim to extend this framework—originally developed for static CCA between two concurrent data streams, where sufficient statistics are stored in synaptic weights—to the setting of past-future CCA, in which the sufficient statistics take the form of spatiotemporal filters. We anticipate that incorporating temporal structure will not pose significant challenges, as it can be implemented locally within each neuron through the use of distinct ion channels with different time constants. Another difference from the original CCA formulation is that the neuronal output corresponds to a projection of only past inputs onto the canonical direction, with future inputs used exclusively for learning. However, a similar configuration has been addressed in the static case (Golkar et al., 2020), and we therefore do not expect it to present major difficulties.
D Lipshutz, Y Bahroun, S Golkar, AM Sengupta, DB Chklovskii (2021) Neural Computation 33 (9), 2309-2352
S Golkar, D Lipshutz, Y Bahroun, A Sengupta, D Chklovskii (2020) Advances in neural information processing systems 33, 7283-7295
The authors' reply has provided additional clarification for the paper, I have raised my scores accordingly.
This paper proposes a novel method for understanding early sensory encoding in terms of transfer operators in nonlinear dynamics. In contrast with efficient or predictive coding, which focus on information compression or future prediction of stimuli, this work views populations of early sensory neurons as detectors for snippets of dyanmics. By assuming that summed sensory inputs to neurons follow a multivariate Ornstein-Uhlenbeck process, the authors establish that the optimal solution to the general problem -- projection onto the subdominant singular vector of the stochastic Koopman operator -- can be implemented by linear projection and threshold rectification. They then apply this method to luminance and olfactory data, finding close matches to the structure of coding in the retina and olfactory bulb, respectively.
Reviewers found the approach highly novel and clearly written, though the material may be quite technical for many neuroscientists. The method presents some problems for a biologically plausible implementation, though the authors provide some suggestions by which the required operations (learning dynamics from data, learning the subdominant eigenfunction) could be carried out. Reviewers also suggested that authors clarify the relationship between their proposed approach and others like efficient coding that also explain many aspects of early sensory coding.
During discussion, the authors made several clarifications on the method and its potential implementation, largely satisfying reviewer concerns from the first round.
Overall, this is a well-executed and highly novel contribution to the understanding of early stage sensory neurons as encoders of complex dynamical stimuli.