Binding in hippocampal-entorhinal circuits enables compositionality in cognitive maps
摘要
评审与讨论
The paper proposes a model for spatial representation in the hippocampal formation using a residue number system (RNS) to encode positions as high-dimensional vectors. These vectors are combined into a single representation through vector binding and maintained by a modular attractor network. The model demonstrates robustness to noise, high-resolution encoding, and effective path integration.
优点
- Solid theory behind every addition to the RNS model to capture HF functionality
- Testable hypotheses for experiments
- Limitation about the bio-plausibility of the proposed model was mentioned in the discussion
缺点
- There were several mention of compositionally as the motivation, but there is no direct analyses to show the effectiveness of model in compositionally
- There is virtually no comparison to other models in terms of coding range, robustness to noise and compositionality
- Code is not provided
问题
- How does the time-scale of unit responses compared to actual neurons? Is the attractor model fast enough to track the changes in the environment?
- The potential prediction for encoding of episodic memory in this framework is not clear to me.
局限性
- Comparison to existing methods, or an ablation study on the proposed method to verify the intended computational role for each component
- Test of the cognitive map aspect of HF
Thank you very much for your thoughtful review and accurate summary of its main points. We concur that the strengths listed: theory (proofs and experimental validation) for every part of the RNS model, in addition to testable hypotheses for experimental neuroscience, are the core results of the paper. We would like to clarify a few points regarding weaknesses/limitations of the paper and answer your questions.
Code is not provided
We would like to politely mention that code to replicate experiments was included in the Supplementary Material zip of the submission.
There were several mention of compositionally as the motivation, but there is no direct analyses to show the effectiveness of model in compositionally
Thank you for raising this issue. We'd like to use this opportunity to more clearly define what we mean by compositionality and how the model achieves it.
We use the term 'compositionality' to denote a design principle of the HF model. Compositionality refers to complex representations that are composed by simpler building blocks and formation rules [A11]. This means that even a small number of primitives can be richly expressive, because they have a large combinatorial range. The representations we posit in the hippocampal formation meet this definition: they are composed by modules representing residue numbers and contextual tags that are formed by binding operations. These design choices result in the large and robust combinatorial range, as we show theoretically and empirically.
At the same time, we recognize that the term 'compositionality' has different meanings across machine learning, cognitive science, and neuroscience. We are also aware that there are interesting cases of compositionality that our model doesn't speak about. These include compositional generalization to out-of-distribution test datasets, and compositional generation of novel strategies, as studied in program induction.
To make these points explicitly, in the final version, we will include this definition of compositionality upon introduction of the term (around lines 19-20 in the current PDF), and we will discuss these other related senses of compositionality in the discussion under limitations (around lines 304-315).
There is virtually no comparison to other models in terms of coding range, robustness to noise and compositionality
Please see our discussion in the general rebuttal on model comparison. In that section, we have explicitly aimed to address the coding range, robustness to noise, and compositionality points of comparison.
We should also note that no other related work attempts to measure coding range (of modular attractor networks implementing RNS), robustness, and compositionality for a single model of the hippocampal formation. Consequently, we've done our best to indicate points of comparison when appropriate for individual experiments. We also feel that the theoretical foundations and interpretability of the model provide potential advantages over other models that could be proposed in the future.
How does the time-scale of unit responses compared to actual neurons? Is the attractor model fast enough to track the changes in the environment?
In our model, the unit responses are on the order of 100ms. We believe that this is a biophysically reasonable parameter, and it is consistent with the timescale of theta oscillations (4-12 Hz) believed to be important for neural computation in HF. Empirically, we find that this timescale is sufficient to track changes quickly enough for reliable path integration.
The potential prediction for encoding of episodic memory in this framework is not clear to me.
Thank you for raising this point. Page constraints made it a bit difficult to discuss this point in the main text, but we will gladly add this information to Sec. 4.2 and the Discussion in a revised version.
The larger picture is that many neuroscientists believe that the two functions of the hippocampus -- in episodic memory and in spatial navigation -- are supported by the same neural circuits and principles (e.g., [A12]). Put more starkly, spatial navigation and episodic memory are two sides of the same coin: one is navigation in real physical space, the other is navigation in a more abstract, conceptual space.
If this unified picture of navigation and memory is correct, then our model would have implications for how memories are structured and stored in HF. The experiments for sensory recall (Sec. 4.2, Fig. 7) are designed to explain how imperfect memories corresponding to sensory patterns can be denoised by the hippocampal-entorhinal loop. Further, in the Appendix (Sec. C.3 and Fig. S5), we test the model's ability to perform recall sequences of concepts, even in the presence of neural noise.
Regarding limitations:
Comparison to existing methods, or an ablation study on the proposed method to verify the intended computational role for each component
Please see our discussion of model comparison in the general rebuttal.
The ablation study that you mention is a helpful suggestion. In the PDF attached to the general rebuttal, we have conducted an additional ablation experiment (Fig. X3) to quantify the impact on performance.
[A11] Szabó, Z.G. (2022). Compositionality. The Stanford Encyclopedia of Philosophy.
[A12] Buzsáki, G., & Moser, E. I. (2013). Memory, navigation and theta rhythm in the hippocampal-entorhinal system. Nature neuroscience, 16(2), 130-138.
This paper proposes a model for spatial representations in the hippocampal formation. The model relies on a residue number system for encoding spatial positions and uses complex-valued vectors to represent individual residues. These vectors are then combined into a unified vector representing spatial position through a conjunctive vector-binding operation that preserves similarities. The model ensures consistency between individual residues and overall position representation through a modular attractor network, which corresponds to the grid cell modules observed in the entorhinal cortex.
优点
While there has been an ample amount of work addressing the computations in hippocampal formation, this paper introduces several interesting ideas and combines them into a comprehensive framework. This model integrates principles of optimal coding, such as maximizing coding range and spatial information per neuron, with an algebraic framework for computation in distributed representation.
缺点
While theoretically valuable, the approach remains relatively high-level without a realistic evaluation and comparison with behavioral or neural data.
问题
Can you clarify what you mean by "carry-free" hexagonal coding?
Can you reflect on scalability in terms of the number of neurons. For example, regarding the statement, "In particular, we require that distinct integer values are represented with nearly orthogonal vectors." how does this requirement affect the scalability of the approach?
What is the numerical precision required for numerical stability in terms of the neural activity and the synaptic weights? Can you discuss how realistic this approach is in the context of real neurons with firing raters under 100Hz?
局限性
The authors have adequately addressed the limitations.
Thank you for your accurate summary of our work and fair assessment of its strengths, weaknesses, and limitations. We appreciate that you found the ideas to be interesting and comprehensive.
We agree that our modeling approach is "relatively high-level", and that such an approach comes with strengths and weaknesses. We would emphasize that our motivation for this abstraction is to include all of the components of the HF needed to have a minimal working algorithm, and no more. In other words, we can justify the functional relevance of each region and computational step, without introducing "bells and whistles" that might dilute that message.
In the general rebuttal to all reviewers, we have included some additional ways in which our model can be evaluated relative to neural data. These are meant to contextualize the predictions for neuroscience that are outlined in the discussion.
Regarding your specific questions:
Can you clarify what you mean by "carry-free" hexagonal coding?
Good question. Upon reflection, we will revise "carry-free hexagonal encoding" (lines 213-4) to "carry-free implementation of a triangular frame", in order to make the meaning of this statement clearer. For completeness, we also provide some further clarification below.
In general, carry-free means that the components of the representation can be updated in parallel, i.e., without dependence on the results of other updates. For example, binary representations are not carry-free with respect to addition, since the final state of a bit depends on two components and the results of computation ("carry-over") on lower-order bits.
For the triangular frame discussed in Section 3.5 and Appendix A.3, each 2-D position is represented with three coordinates. Each coordinate can take up to integer values from . To illustrate why these computations are not carry-free in general, suppose we update coordinate [1, 1, 0] by [0, 0, 1] - the resulting coordinate is [1, 1, 1]. This is actually equivalent to the 2D position expressed by the coordinate [0, 0, 0], but we wouldn't know that without further computation. Thus, we'd either need further "carry-over" operations to reduce states to one member of an equivalence class, or we make equality testing cumbersome.
Can you reflect on scalability in terms of the number of neurons. For example, regarding the statement, "In particular, we require that distinct integer values are represented with nearly orthogonal vectors." how does this requirement affect the scalability of the approach?
We appreciate the question and opportunity to clarify.
In a -dimensional space, linear algebra dictates there are only up to exactly orthogonal vectors. Our method relies on the fact that there are many more vectors that are almost orthogonal.. That is, they have a non-zero, but still small, inner-product. This idea is fundamental to the theory of dimensionality reduction in machine learning, and indeed, the kinds of random codes we employ here are commonly used for this purpose (e.g., [A7]).
Our analysis in Theorem 1 (Appendix A.1) implies that, to represent a set of distinct states using vectors whose inner-product is (with high probability) at most , it suffices to take . Note that the dependence of the dimension on (the total size of the universe) is just logarithmic! This result is consistent with a wide body of other work in the machine learning literature that obtains similar rates for coding schemes of this nature [A7, A8].
From a practical perspective, this means that a relatively modest number of neurons (as we use in our experiments) can achieve a large dynamic range.
What is the numerical precision required for numerical stability in terms of the neural activity and the synaptic weights? Can you discuss how realistic this approach is in the context of real neurons with firing raters under 100Hz?
This is an important question. We have a few comments:
- The von Mises noise experiments (in Figs. 3 and 4) provide an implicit answer, because adding phasor noise decreases reliability of the precision.
- To answer your question more explicitly, we have conducted a follow-up experiment testing the effect of quantizing to a small number of bits per synaptic weight. Please see Figure X2 in the PDF attached to the general rebuttal; we will also add it to the supplement). Within this range, we've found that a) 5 bits is nearly as good as full precision, b) even 3 bits still performs well, and c) higher precision faces diminishing returns, in a way that higher dimensionality does not. This last point is consistent with prior work on quantization in theoretical computer science (e.g., [A9, A10]), and with the observation that biological neural networks have high dimension but with low-precision components.
- Nothing in the model requires high firing rates to function.
[A7] Rahimi & Recht (2007), NeurIPS
[A8] Dasgupta & Gupta (2003), Rand. Struct. Algor.
[A9] Clarkson & Woodruff (2009), ACM STOC
[A10] Zhang, May, Dao, Ré (2019), AISTATS
The paper proposes a computational model that incorporates a number of properties about encoding of space representation in the system. The mathematical framework appears to be well-justified and carry the desired properties. These properties are related to some of the observations made about the properties of space encoding in hippocampus, however I've found that the paper conflates conceptual similarity (a certain mechanisms seems to have certain properties) and computational/mechanistic similarity (a certain brain mechanisms is actually computing like the model suggest), but more about this later.
The way I see this work is that is presents an elegant case for encoding of information in mathematical / algebraic sense, but I struggle to connect these properties and the way the system is evaluated to biology. Put another way - I am not sure why this particular computational model is a good model for HC? How do we even evaluate if it's close to what the brain is doing? Or maybe this aspect is actually not important to the authors, and the main contribution on this work lies elsewhere? I must admit I might have misunderstood the motivation and the goal of this work, and I will reflect this in my confidence level. To make a better job in the subsequent round of evaluation I would like the authors to explain how do they understand the importance of this work? What is it that main thing that it bring to the table? Apart from mathematical elegance.
The evaluation of the model is based on simulated trajectories, but the comparison is done "within" the proposed model, and does not provide external points of reference to allow the reader to understand if the model is better that some other ones? If what regard? Is it empirically better or worse at explaining known biological quirks of the HF, or the goal was only to model similarity conceptual on an abstract level?
优点
- The math is rigorous and there is a clear sense that the constructed mathematical framework is a good match to desired properties.
缺点
-
The task on which the model is tested (inputs, output, goals) was not clearly defined in the paper, from Section 4 we know that it is about path integrations and there are simulated trajectories (generated according to behavioural rules of animals) that the model is compared against, but
-
I think there is a dissonance between the claims of the paper regarding neuroscientific impact of this model and the actual comparisons between the model and biology that are brought forward in the paper. If I am correct that these are actually pretty loose, then from here we logically move onto the next question - if the importance is not in that, then what is it in?
-
The evaluation of simulated trajectories is not too informative, because it is unclear how trivial or non-trivial it is to show the match between simulated trajectories and the model following them.
问题
(1) The proposed model is based on arithmetic, element-wise operations over vectors, and modulo operations... Is the claim here that computations similar to these ones are happening in HC, or those are just some operations, that satisfy a number of properties? Basically do you want to say that the mechanism of the model is close to HC, or that just some of the observed characteristics of the model are close to HC?
(2) Following up on (1) - the closeness between the model and HC is, as far as I can tell, only conceptual, right? There were no comparison made against actual empirical measurements of a biological HC during some task?
122: Could you please elaborate what you mean by "grid modules", I understand that this is different from grid-like spatial activation patterns? Are you referring to grid cells representing different scales, each scale being a "module"?
(3) Why the comparison between the trajectories is done using simulated animal trajectories and not actual ones? I understand that it's impossible to model such a chaotic system as a real mouse running in a grid... but if that is the case, what is the benefit of trying to predict trajectories at all, and using simulated trajectories based on some rules of animal behaviour? I guess I am confused about the chosen way to compare the model with biology.
Figure 6A (1): What exactly is the baseline model that is marked as "without attractor dynamics"?
Figure 6A (2): How come the decoded trajectory matched the true one so closely? Is this result impressive or is it trivial in the context of how the model works and how true trajectories were generated?
(4) The predictions listed on lines 316-323 - are they true in biological system? (A) multiplicative interactions between dendritic inputs providing conjunctive binding operation - this one seems to be at the core of potential achievements of the model, but it is very superficially explained, I think it would be great to have a more extensive explanation of what "multiplicative interactions between dendritic inputs providing conjunctive binding operation" actually is, how can we see it manifest in biology, and, after that, how your model achieves it. (B) "Binding between MEC modules" - what is specifically meant by conjunctive composition and binding in neuroscience context as it pertains to your work? Because if we only mean to say that the brain does combines inputs of the modules, then sure, that's trivial and the fact that a model also does that is kind of expected. If you mean some specific mechanism or form of conjunctive composition - then what is it? How does it manifest in biological HF? Does your model do it in the same way? How can we assess that? (C) "relatively fixed attractor weights, plastic HC->sensory weights" - while these are properties of the model, are they properties of the brain?
局限性
The authors extensively address the limitation of this work and this provides valuable context to understanding the significance.
Thank you for your thoughtful review. We appreciate your agreement about the rigor and suitability of the mathematical framework. Apologies if responses seem terse, we've tried to give clear explanations within the word count.
To make a better job ... What is it that main thing that it bring to the table? Apart from mathematical elegance.
Our work gives a theory of computations (representations and algorithms) in the HF. We posit that the HF instantiates our model's recurrent dynamics, population codes, and binding mechanisms.
These lead to testable predictions for experimental neuroscience and ascribe function to existing observations about neural data. The model's robustness (Figs. 3 & 4) could help explain the brain's robustness. We also list core contributions in the general rebuttal.
the paper conflates conceptual ... and computational/mechanistic similarity
We agree that the two are distinct. Our contributions to each are also distinct:
-
Our model is conceptually similar because the five principles of spatial representation (outlined in Sec. 2.1) are the same for our model and (we believe) for HF. However, the intention is not to stop at mere conceptual similarity, but rather to motivate our proposed mechanisms.
-
More significantly, the model has computational/mechanistic similarity. The variables and computations of the modular attractor network (detailed in Sec. 2.2) map onto specific neural populations and circuit-level dynamics. Sec. 2.3 maps the model's parts to neuroanatomy.
Binding and its realization in HF circuits is "at the core of potential achievements". We discuss binding further in the general rebuttal and re: question (4).
Thus, we don't think the paper conflates the two, nor is it exhibiting only conceptual similarity.
The evaluation ... better that some other ones?
Please refer to our discussion of model comparison in the general rebuttal.
Weaknesses:
The task ... was not clearly defined
These were defined in Appendix B: Experimental details. We run four categories of empirical tests; each described in a subsection of the appendix. Path integration is just one evaluation.
Still, we anticipate that the spirit of this comment is to make these clearer in the main text. We will definitely revise the main text accordingly.
a dissonance ... if the importance is not in that, then what is it in?
We respectfully disagree that the connections to biology are loose. Please refer to the discussion of computational/mechanistic similarity and response to question (2).
The evaluation of simulated trajectories...
Please refer to our responses for Question (3) / Figure 6A.
Questions: Re (1): The claim is the former: that the computations resemble those in the HF.
Re line 122: A grid module is a population of grid cells in MEC that have approximately the same "scale" -- the spacing between the firing field peaks that form the hexagonal lattice. Grid modules are seen as functionally significant since (a) scales appear discretized in experimental data, and (b) neuron's scale correlates strongly with anatomical location [9].
Re (2): There are two comparisons. We give examples of grid response fields from our model that resemble neural data (Fig. 6C, Fig. S1). Our model also recreates experimental data regarding global remapping (Appendix C.2, Figure S2). We would be excited to compare our model to further neural data, but we also believe that it is beyond the scope of this paper.
Re (3): a) It is a standard dataset used in computational neuroscience. Recent examples include [37, A6]. b) There are significant practical advantages: synthetic data gives more control to over room sizes, trial lengths, and numbers of trials. c) The dataset is realistic. The authors used statistics of actual rodent trajectories and validated simulation quality [36].
Re Fig. 6A (1): For the model without attractor dynamics, the update in equation 7 is replaced by .
Re Fig. 6A (2): Performance comes from the model's robustness to noise (Fig. 3, Sec. 3.3) and ability to interpolate between integer values (Fig. 4, Sec. 3.4). The task isn't trivial since noise accumulates over time and since the attractor network denoises sub-integers without using additional resources.
Re (4):
The predictions ... true in biological system?
Excitingly, we don't know yet! These predictions are offered as new hypotheses to test model similarity to the brain.
it would be great to have a more extensive explanation of what "multiplicative interactions ...
We appreciate the opportunity to clarify. Please refer to the general rebuttal section on binding.
(B) "Binding between MEC modules" - what is specifically meant ...
To clarify, in our paper: binding is just another name for conjunctive composition (please see line 67).
In our model, the state of each grid module depends on binding the states of other grid modules () and hippocampus () (per Eqs. 6 and 7). Our claim in Line 319 is that sigma-pi neurons in MEC implement this binding.
Because if we only mean ... then sure, that's trivial ... If you mean some specific mechanism ... then what is it?
We mean multiplication, implemented by sigma-pi neurons with nonlinearities in dendritic compartments [57].
How does it manifest ... How can we assess that?
We don't know yet, but it's biologically plausible, since nonlinear computations on EC inputs within HC dendrites are important for assigning contexts to place cells [26]. It remains to test if a) these computations implement binding, and b) similar operations occur within MEC grid modules.
(C) ... are they properties of the brain?
To test this, there are methods (structural and functional imaging) for measuring the timescale and persistence of synaptic plasticity.
[A6] George et al. (2023), eLife
Thank you for your replies, My main reason for lower score was that connections to biology are potential, but not yet tested or realised, making this work a model, but a bit lacking on the side of explaining why it could be the model. I thinks is a solid and beautiful model, but at this stage of neuroscience I am not sure that's enough anymore.
My confidence score for my marks is low, so hopefully it will not hurt your chances too much :)
This paper introduces a normative model for spatial representation within the hippocampal formation, integrating optimality principles with an algebraic framework. Spatial positions are encoded using a residue number system (RNS) and represented by high-dimensional, complex-valued vectors. These vectors are combined into a single vector representing position through a similarity-preserving, conjunctive vector-binding operation. The model incorporates a modular attractor network, mirroring the grid cell modules in the entorhinal cortex, to ensure self-consistency among these vectors. The paper showcases the model’s robustness, sub-integer resolution, and path integration capabilities through both theoretical analysis and experimental validation.
优点
The use of RNS for spatial representation is a novel approach that maximizes coding range and spatial information per neuron. In addition, the model integrates principles from neuroscience, cognitive science, and artificial intelligence, providing a holistic view of spatial representation in the hippocampal-entorhinal circuits. The authors also provide rigorous theoretical analysis and empirical experiments to support the model’s claims, demonstrating noise robustness and precise spatial representation. The model makes several testable predictions about neural mechanisms, which can guide future experimental research.
缺点
The model’s complexity might pose challenges for practical implementation and experimental validation in biological systems. While the model is comprehensive, it remains a high-level abstraction of spiking neural circuits, potentially overlooking finer neurobiological details.
问题
- How biologically plausible is the RNS as a coding mechanism in the hippocampal-entorhinal circuits? Are there any existing biological structures that directly support this model?
- What specific experiments could be designed to empirically test the predictions made by the model? How feasible are these experiments with current technology?
- The model suggests encoding contexts as vectors in the entorhinal cortex. How does it manage the vast diversity and complexity of possible contexts in real-world environments?
- While the model shows robustness to noise in simulations, how would it perform under the more complex and varied types of noise encountered in biological systems?
局限性
The model abstracts away many neurobiological details, focusing on high-level representations and processes. This could overlook important aspects of the hippocampal-entorhinal circuitry, such as specific neuronal firing patterns and synaptic plasticity mechanisms. While the modular attractor network is theoretically scalable, it is unclear how well this scalability translates to biological systems. The actual implementation of such a network in the brain might face limitations due to resource constraints and other biological factors.
Thank you for accurately summarizing our study and capturing the core strengths of the paper. We appreciate your questions, as we think they get at the fundamental context surrounding the paper. We've done our best to address each of them fully and concisely below.
The model’s complexity might pose challenges for practical implementation and experimental validation in biological systems.
- We provided code, which provides proof of concept for practical implementation and lowers barriers for experimental validation.
- In addition, we envision that the model can be implemented with spiking neural networks, as recent works (e.g., [A4]) have done for complex-valued vectors. Such implementations could also help bridge the gap to practical implementation.
- How biologically plausible is the RNS as a coding mechanism in the hippocampal-entorhinal circuits?
Prior work [4, 5, A5] suggests that the RNS is not only biologically plausible but indeed realized by the brain. This idea is consistent with the striking organization of grid cells into discrete modules along the dorsoventral axis [9].
Are there any existing biological structures that directly support this model?
Yes. In addition to the partition of multiple grid units into discrete scales [9], there are the complimentary roles of lateral and medial entorhinal cortex in processing spatial and non-spatial relations, respectively) [39], and the role of hippocampus as an index to many possible patterns [40]. Throughout the paper we have also tried to highlight the consistency with biological structures and experiments (e.g., lines 124-6, 241-3, 260-3).
- What specific experiments could be designed to empirically test the predictions made by the model? How feasible are these experiments with current technology?
Important questions. We address these points in the general rebuttal under suggestions for experimental neuroscience. We believe the experimental methods and analyses outlined there are feasible with current technology.
- The model suggests encoding contexts as vectors in the entorhinal cortex. How does it manage the vast diversity and complexity of possible contexts in real-world environments?
Good question. Assigning contexts to vectors provides a simple but explicit way of measuring the similarity between two contexts (computing their inner product). Indeed, such vector embeddings are widely adopted in machine learning (e.g., word2vec). The situation we consider explicitly in the paper (e.g., in Appendix C.2, Figure S2) are discrete contexts, in which no contexts are similar, since it mirrors global remapping in the hippocampus.
We should caveat that there is an extensive literature on contextual remapping in the hippocampal formation, and we do not attempt to explain every finding in detail. However, an interesting direction for future work is to consider contexts that have manifold structure (for example, in which environmental boundaries have continuously varying color) and applying our model to such cases.
- While the model shows robustness to noise in simulations, how would it perform under the more complex and varied types of noise encountered in biological systems?
To more fully answer this question, we have conducted some additional experiments under biologically motivated kinds of noise: synaptic failure (dropout), limited synaptic precision (bounded synapses), and ablations/lesions (cell death). The results are summarized in the PDF attached to the general rebuttal in Figures X1, X2, and X3, respectively. We hope that they demonstrate how the model would fare in the face of "resource constraints" and other disturbances in the weights.
[A4] Orchard & Jarvis (2023), ICON Proceedings
[A5] Stemmler et al. (2015), Science Advances
Thanks for the response. My concerns have been addressed. I would like to keep my rating.
We are grateful to all reviewers for their thorough reviews and critical feedback. In addition to addressing each review individually, we'd like to discuss a few common points.
Core contributions: There are many fundamental yet unresolved questions about the function of the hippocampal formation. We propose a computational model that unifies many postulated functions of the HF and formulates neurally plausible mechanisms that achieve them. The resulting normative model of a cognitive map in the HF yields insights into the functional significance of many experimental phenomena. For example, it helps explain:
- Why are grid cells organized into discrete modules? (Because modular attractor networks exhibit superlinear scaling in dimension.)
- Why do hexagonal lattices appear in grid cells? (Because they improve spatial resolution by ~3x.)
Other theoretical results include the scaling of coding range according to Landau's function, the concentration inequality for kernel approximation, and information-theoretic analysis of superlinear scaling with dimension. Overall, reviewers seem to agree that these were strengths, finding the theoretical analysis to be "rigorous" and the mathematical assumptions to be "well-justified."
The role of abstraction: A few reviewers commented on the relevance to biology. The art of theoretical neuroscience is to find the right level of abstraction that allows one to elucidate system-level computational function while still making testable predictions about the brain. Our model aims to be concrete enough about postulated computations and associated mechanisms: including population-level representations, dynamics, and instantiations in specific brain regions, while also providing useful suggestions for experimental neuroscience.
Relative to some other HF models, our model captures the circuits at an abstract level, and we have tried to make the consequent limitations explicit. But this abstraction also has benefits. As reviewer aqEg says, it allows us to provide "[s]olid theory behind every addition to the RNS model to capture HF functionality" (emphasis added). It lets us be mathematically precise enough to run full simulations of the system, use results from high-dimensional statistics to provide rigorous theory, and separate assumptions from predictions.
Connections to experimental neuroscience: The model's circuit mechanisms lead to testable predictions for experimental neuroscience. Possible evaluations include:
- Representational Similarity Analysis [A1] compares models to neural data by comparing similarity matrices whose entries are the pairwise similarity of representations of two conditions. Model representations (vectors) could then be compared to publicly available experimental datasets, e.g., [46].
- More detailed biophysical models of neurons in HC and MEC, or high-resolution recordings of single units, could help analyze the plausibility of our binding mechanism.
- Neuroanatomical tracing experiments can determine if the direct connections predicted by our model (e.g., between different modules in the attractor network) exist.
Model comparison: A couple of reviewers asked for comparisons to other models. We would highlight a few points of comparison:
- Coding range (1): The superlinear scaling of the capacity of the modular attractor network with dimension (Sec. 3.2) is better than Hopfield/Noest associative memories [21, 22], for which capacity scales at best linearly in the dimension.
- Coding range (2): The triangular frame results in states per module, vs. other models [e.g., 32, 33] with only states per module (lines 231-234).
- Compositionality: The heteroassociative memory can accurately recover multiple patterns from a single input (Fig. 7C). This improves over other recent heteroassociative models [32, 41] which can recover at most one pattern per input, no matter how large the dimension. We will add a line in Sec. 4.2 to state this comparison explicitly.
Regarding robustness vs. other models, we found it tricky to avoid "apples-to-oranges" comparisons -- in part since the formulation of an attractor neural network for residue number systems is relatively novel. Instead, we focus on experiments showing model robustness to types of noise commonly postulated in neural systems.
Realization of the binding operation: In our model, the binding operation is implemented by element-wise vector multiplication. Thus, the model posits that HF has neurons that can multiply their inputs. There are many biologically plausible neuron models of multiplication [A2, sec. 21.1.1]; a recent study gives a concrete example [A3].
In our case, the neuron model for multiplication that makes most sense is the sigma-pi neuron [57, 58]. It is so-called since groups of inputs are first multiplied together (in dendritic compartments), then the outputs of multiplication are summed (in the soma). It maps onto our model: multiplication implements binding, while the sums then compute inner products (similarity).
Additional experiments: To strengthen our results, we have run a few more experiments, based on reviewers' suggestions to consider biologically motivated sources of noise and ablation studies. Please see the attached PDF. We find that:
- The model is robust to varying levels of synaptic noise, also commonly known as "dropout" (Fig. X1).
- The model also requires fairly limited synaptic precision (Fig. X2) -- analogous to bounded storage capacity at synapses in real neurons.
- The model can handle some lesions of weights (Fig. X3). Focused lesions of entire columns or modules are more severe than distributed lesions.
We look forward to the discussion period and are happy to clarify further. We will also use each reviewer's feedback to improve the final version of the manuscript.
[A1] Kriegeskorte et al. (2008), Front. Sys. Neuro.
[A2] Koch, C. (2004), OUP
[A3] Groschner et al. (2022), Nature
This paper proposes a normative model for spatial representation in the hippocampal formation that combines optimality principles, including maximizing coding range and spatial information per neuron, with an algebraic framework for computations in distributed neural representation. The model uses a residue number system to encode positions as high-dimensional vectors. These vectors are combined into a single representation through vector binding and maintained by a modular attractor network. The model demonstrates robustness to noise, high-resolution encoding, and effective path integration. The results and concepts are well explained and potentially impactful for computational neuroscience.