5.7

/10

Poster3 位审稿人

最低5最高7标准差0.9

3.0

置信度

正确性3.3

贡献度2.3

表达3.0

NeurIPS 2024

The motion planning neural circuit in goal-directed navigation as Lie group operator search

Junfeng Zuo,Ying Nian Wu,Si Wu,Wenhao Zhang

OpenReview PDF

提交: 2024-05-14更新: 2024-11-06

TL;DR

A neural circuit model of motion planning and its formulation as a Lie group operator search problem.

摘要

关键词

Lie group equivariancemotion planninggoal-directed navigationring attractor network

评审与讨论

审稿意见

评分: 7置信度: 42024-06-22

This paper describes how goal-directed navigation in a group-structured space can be framed as group operator search, describes some neural network implementations of this search, and relates one of these networks to the central complex of the famous fruit fly.

优点

The exposition is clear and, for the most part, well-thought through. The figures especially were nice.

The relationship between goal-directed navigation in continuous group-structured spaces (hopefully that's the right wording), group operator search, and neural implementations of it as found in the fly is, to the best of my knowledge, novel, and definitely interesting.

The emphasis on removing assumptions, specifically on the structure of the neural response, the shape of the neuronal nonlinearity, through clever analysis was nice.

The analysis was high quality and thorough, with good knowledge of the biology.

Deriving the linear response result (equation 30) was cool, though I admit, I stepped through the maths without checking the calculations.

I think the results are of definite interest to the neuroscience community.

缺点

I think there were three main ares of weaknesses, of variable importance.

First, it is very puzzling that the model suggests the speed of the orienting response is minimal when the heading direction and the goal are anti-aligned (fig 3D, its not puzzling why the model does this, its puzzling because a reasonable model shouldn't do this). Westeinde et al., who the authors cite, show that the animals don't behave like this, which they convincingly attribute to the behaviour of the PFL2 neurons. These neurons are not included in the authors framework. It would make the paper stronger if they were included, would this be possible? At the very least this fault mode of the algorithm (not re-aligning when you're 180 degrees off, or even realigning very slowly, seems like a definite fault), should be discussed, and the resolution that PFL2 neurons provide should be mentioned, even if it cannot be fit into the framework.

Second, I don't agree with the claim that $u(x|s) = u(x-s)$ is equivalent to the representation being equivariant. As a simple example, imagine a 2 neuron population coding for a 1D angle, ${\bf u}(\theta) \in \mathbb{R}^2$ , with tuning properties $u_1(\theta) = cos(\theta)$ , $u_2(\theta) = a\sin(\theta)$ , i.e. the neural activity traces out an ellipse as $\theta$ varies. There is a consistent rotation matrix that can perform the role of $\hat{R}(\theta)$ , so satisfying the equivariance condition (equation 5), but each neuron does not have a tuning curve that is just a translated version of some canonical curve. [To show this generalises to many neurons just add many neurons with translated but also amplitude shifted tuning, the same point holds] So it seems that assuming $u(x|s) = u(x-s)$ is a strict specialisation of the equivariance assumption, which on its own is very reasonable. Do you agree with this, or have I misunderstood some of the claims?

Third, the contribution is, as the authors say, limited by the existence of many models of the same type. The main novelty with regards to fly neuroscience seems to be in the mathematical framing and analysis of an already well established model. The general framing of the group operator search, before specialising to the only left or right case, in order to match the fly, also seems novel (though I haven't read everything hehe), and is interesting. That said, it is a shame that the more elegant set of results, on pure operator search, are not the ones that match the biology. Are there settings in which you might expect the representation space algorithm to be implemented? Either way, I like the paper, even if large parts of it are simply an elegant mathematical reframing of existing ideas.

Small point: on figure 4A it shows E-PG responses going to P-EN, then another arrow leading to PFL3, yet I think in the model E-PG goes direct to PFL3? If that is correct could such an arrow be added?

Then finally two related classes of suggestions, on related literature and typos.

There seemed to be a few bodies of literature either it would be interesting to know if the authors consider relevant. First, there are many existing models of goal-to-motor circuits for angles, see references 7-14 of Westeinde et al. It seems worth pointing out? Second, there are two recent papers on how artificial neural networks empirically perform a related computation, modular arithmetic, which includes comparison of pairs of cosines to produce the correct output (Nanda et al. 2023, Zhong et al. 2023). I think there might be a similarity in the algorithmic details, but I'm not sure. Most interestingly the two cited papers find that ANNs use their nonlinearities to perform an approximation to multiplication type calculations, I wonder if the way nonlinearities are used in this work is somewhat similar? Third, there's been a seam of papers that use group and representation theory to understand grid cells (Gao et al. 2018, Gao et al. 2021, Dorrell et al. 2023, Xu et al. 2024), how closely related do you think these approaches are (and they also suggest that equivariant ideas have made it into neuroscience beyond the sensory stage, as sometimes seems to be suggested)? Does this suggest you can directly port your algorithms into 2D navigational problems, and does this match any known cell types? Finally, you could use these normative approaches (or ones like Sengupta et al. 2018), and copious experimental evidence to justify what I see as the $u(x|s) = u(x-s)$ ansatz (see weakness 2).

Finally, a few typos: Line 171 'computs' Line 202 'outputs' -> to output Line 156/7 SI verb conjugation incorrect, similar in 158 Line 297 gains doesn't makes sense, 'it adds to' perhaps? Line 313 Missing 'needs to be...'

问题

There's a few questions in the weakness section above. Additionally, how much do these results hold for discrete groups?

局限性

Limitations were discussed, though a few more discussion points could be added based on the weaknesses above.

作者回复

2024-08-06

Thank you for your positive review on our work! Following are detailed replies to your comments.

”First, it is very puzzling ... it cannot be fit into the framework.”

Thank you for mentioning PFL2 neurons, and we are happy to include more discussions in the revised manuscript. Let’s discuss some issues here first.

The behavior of animals in the anti-aligned situation is still debated. While Westeinde et al, Nature, 2024 observed maximum speed near the anti-aligned direction, Green et al, Nature Neuroscience, 2019 reported minimal turning velocity, which agrees with our model’s predictions.
We do acknowledge that a reasonable model should not have the ‘false nulling’ problem (settling at the opposite direction, Westeinde et al, Nature, 2024), and the necessity of introducing PFL2 neurons. Essentially, in Westeinde's model, PFL2 only amplifies the velocity near the opposite direction, and relies on noises to escape the exact opposite direction. Our model similarly allows random turning when the goal is directly behind, mimicking natural behavior.
Our model can be revised easily to include PFL2 neurons. We only need to speed up the turning velocity near the opposite direction by using the PFL2 activities as a gain factor to modulate the P-EN to E-PG feedback. In principle, the inclusion of PFL2 will only change the $\lambda$ in our theoretically defined objective function.

” Second, I don't agree with ... some of the claims?”

Thank you for your deep thinking. We have carefully checked your example, and found that it is also consistent with our conclusion $u(x|s) = u(x-s)$ . Based on your example, we define

u(x|s) = [u_1(s), u_2(s)]^\top = (\cos s, a \sin s)^\top = [\cos (0-s), a \cos (\pi/2 - s)]^\top

The preferred directions of the two neurons correspond to $x_1=0$ and $x_2=\pi/2$ . Then we define a $2 \times 2$ rotation matrix $R(\theta)$ with rotation angle $\theta$ as,

R(\theta) = \begin{pmatrix} \cos \theta & -a^{-1}\sin\theta \\\\ a\sin \theta & \cos \theta \end{pmatrix}, \ \ \ \textrm{where}\ \ \ a \neq 0.

Applying the rotation matrix $R(\theta)$ to $u(x|s)$ yields,

R(\theta) u(x|s) = \begin{pmatrix} \cos (\theta + s) \\\\ a\sin (\theta + s) \end{pmatrix} = u(x|s + \theta).

Comparing $u(x|s) = (\cos s, a \sin s)^\top$ with $u(x|s+\theta) = [\cos (\theta + s), a\sin (\theta + s)]^\top$ , it can be checked that $u(x|s) = u(x-s)$ .

Moreover, it can be checked that the above rotation matrix $R(\theta)$ satisfies the definition of rotation group (Eq. 4 is an equivalent definition), including

\det (R(\theta)) = 1, \\\\ R(\theta_1)R(\theta_2) = R(\theta_1 + \theta_2), \\\\ R(\theta)^{-1} = R(-\theta).

We need to point out that our theory and model correspond to the special case that $a=1$ , where the magnitude or sum of neural responses is invariant over stimulus direction $s$ . Moreover, we feel the cases where $a\neq 1$ can be used to explain the heterogeneity of neural responses to stimulus feature $s$ , e.g., the summed firing rate of V1 neurons depends on the stimulus orientation (the cardinal effect).

"Third, the contribution is, ... an elegant mathematical reframing of existing ideas. "

We agree with the reviewers that the recurrent circuit model is not brand new, and is based on circuit models developed via bottom-up approaches in recent Drosophila studies (Refs. 19-20). The contribution of this paper is taking a top-down approach and marrying the abstract operator search with the circuit model, and meanwhile we provide strong analytical results. On the dynamics side, we also mathematically explain why PFL3 neurons need to have nonlinear activation function, while such math understanding are absent in Refs. 19-20.

Yes, shifting the sensory representation to left/right directions followed by nonlinear activation function and pooling can be the basic circuit motif to implement a more complicated group operator search. The basic idea is that we need to shift the sensory representations along each dimension of the group space. Please see the example of the 2D case in Global Rebuttal.

” on figure 4A ... such an arrow be added?”

Yes, the E-PG neurons send their axons to both P-EN and PFL3 neurons (Eqs. S13a and S14a in Supplementary). And actually this is the original meaning we would like to convey by Figure 4A: the long arrow should be treated as a single axon emanating from E-PG neurons and targeting both PFL3 and P-EN neurons, rather than two axons with each from E-PG to P-EN and another from P-EN to PFL3 neurons. We will revise our figure 4A accordingly to avoid this confusion.

” There seemed to be a few bodies ... does this match any known cell types?”

Thank you for leading us to these valuable works of literature!

Certainly, we will discuss and compare these models in the revised manuscript.
Second, we go through the two papers you mentioned. Nanda et al. 2023 proved that an ANN performs modular addition tasks by projecting inputs into a circle and working in the Fourier space, which is somehow similar to our model. This could be a strong support for the application potential of our model. As for Zhong, et al, 2023, we are not sure if you are referring to ‘Goal Driven Discovery ...’. To our knowledge, it is not related to our model.
Third, we are glad that you bring up grid cells. Very recently, Mosers’ lab posted a biorxiv paper Vollan et al, 2024, which demonstrated a 2D counterpart of our model in rodents' grid cell systems. Please see strategy 2 in the 2D example in Global Rebuttal.

” how much do these results hold for discrete groups?”

Unfortunately, our result does not apply to discrete groups. Our model relies on the continuous property of Lie groups (Eq17), which does not hold for discrete groups.

” a few typos: ...”

Thank you for pointing out the typos, we really admire your accuracy and meticulousness. We will correct them in our revised version.

评论- Reviewer Response

2024-08-11

Thank you for your response, in return:

On PFL2 neurons, I see, I think adding them into the model, or at the very least a discussion of them into the paper, would significantly improve it.

I agree, my counter-example was in fact just stupid, thank you for pointing that out with nicer language. I was confused between each neuron having a tuning curve that is a translated copy of a universal curve (which the example shows is not implied) vs. each neuron having a response that is a function of only the angular difference, but you are absolutely right. If you additionally assume the length of the neural encoding vector is constant over stimuli (as you do), do you recover the result that all neural tuning curves are translations of a shared curve, or am I again confused?

I think your other comments all make sense, the paper I was referencing under Zhong et al. 2023 was actually "The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks", apologies for the missing title. I would be very interested to read a direct comparison of the intricacies of computation in your model vs. these two (Zhong & Nanda).

Finally on the 2D grid cell example, Vollan et al. is indeed a very relevant and exciting paper, but I disagree or do not understand how you are currently linking its results to your model. They find conjunctive grid-heading cells, much like the conjunctive angle-speed L or R cells in the fly, the P-EN ones (if I've got it right). But these seems like evidence for a similar mechanism of path-integration, not of action choosing, and your paper is about the latter. Further, are you using the left-right alternating sweeps as evidence of considering right vs. left angular turns? Then this is somehow dividing over time what is in your model divided between different populations?

As you can tell, I am unconvinced that there is currently evidence for similar action choosing mechanisms to grid cells, from Vollan et al. or elsewhere. Certainly, the circuits are suggestively similar, but that seems to be it. Have I missed something?

Either way, don't feel you have to respond if you are busy. I already think the paper should be accepted, and I continue to think this. But clarifying these points in the paper would make it better, as you have largely suggested you will do.

2024-08-13

Thank you for your follow-up comments! We will add the discussion about PFL2 neurons in our revised manuscript.

I was confused between each neuron having a tuning curve that is a translated copy of a universal curve (which the example shows is not implied) vs. each neuron having a response that is a function of only the angular difference?

Actually, the two descriptions are equivalent in our model.

Tuning curve: the tuning curve of a neuron corresponds to fixing the preferred direction $x$ in the expression $u(x-s)$ but changing the stimulus direction $s$ . Since the preferred directions of $x$ of all neurons are uniformly distributed, and all neurons have the same peak firing rate, the tuning curves of neurons are just translated copies.
Population response: when we present a particular stimulus direction $s$ , the population responses of all neurons obey $u(x-s)$ but which should be treated as a function over $x$ . Therefore, the response of all neurons evoked by a stimulus $s$ is a function of only the angular difference $x-s$ .

Our study assumes the above structure $u(x-s)$ in the theoretical derivation (lines 81-83). Then the neural dynamics presented in Eq. S12 can generate the theoretically assumed $u(x-s)$ (Eqs. S16). We plan to add the above details to the Supplementary Material to help readers understand the concepts here.

Comparing with modular arithmetic in recent studies

Thank you for sharing these valuable references. We compare our model with them across three aspects:

Encoding of Angles: In Nanda et al. 2023, input numbers are encoded as one-hot vectors, where the number’s value is indexed by the position of the ‘1’ element. In contrast, our model represents the angle as the position of a Gaussian bump (the $s$ in Eq. S16 is the bump position representing an angle). Loosely speaking, the one-hot vector is an extreme case of a Gaussian bump with a width of 0 ( $a \rightarrow 0$ in Eq. S16). This makes the two models receive inputs with similar encoding mechanism.
Outputs: Both Nanda et al. 2023 and Zhong et al. 2023 aim to calculate modular additions by outputting a scalar value of the angle, a+b. In contrast, the motion planning circuit in our model doesn't explicitly output the modular addition angle $a+b$ , but outputs the sine function of the modular addition, i.e., $\sin (a+b)$ in Eq. (18), which then drives "muscles" to rotate the heading direction sequentially. Therefore, the output of our motion planning circuit corresponds to the intermediate output in Nanda et al. 2023 before unembedding (the middle block of equations shown in Fig. 1 in Nanda 23). Additionally, Nanda et al. 2023 and Zhong et al. 2023 perform Fourier transformation explicitly during the embedding process, while our model does so implicitly.
Scalar arithmetic To explicit output the angle of modular addition/subtraction (i.e., the angular difference $h-s$ ) in neural circuit, we could 1) normalize the sensory and goal inputs to make them have the same population response, i.e., $\sum_j u(x_j - s) = \sum_j u(x_j - h)$ ; then 2) sum the goal input and the sensory input but rotated to the opposite direction, i.e., $u_{new}(x_j) = u(x_j- h) + u(x_j - s+\pi)$ , which was used in e.g., Zhang et al., NeurIPS, 2016, "Congruent" and "Opposite neurons" (see its Fig. 2). And then the position of $u_{new}(x_j)$ will be the modular angular difference $h-s$ . Finally, we could use $u_{new}(x_j)$ to drive, e.g., superior colliculus, to generate an eye movement towards the direction $h-s$ , which can be regarded as a movement read out of angles in the brain. In this way, the sum of two neural population inputs is similar to the vector sum, and hence similar to the ‘Pizza’ algorithm in Zhong et al. 2023.

Though we would like to compare the computational complexity, we find this challenging because they did not use network models with specific structures.

Discussion of Vollan's study with the 2D generalization

Regarding grid cells, you are correct that the conjunctive grid-heading cells closely resemble the P-EN cells. To our knowledge, there is still no direct evidence in the literature for the 2D counterparts of PFL cells. Nevertheless, based on the angular representation in grid-heading cells (2D case of P-EN), we reason the 2D counterpart of PFL may also have the angular representation rather than four-direction representation (north, south, east, and west), otherwise the four-direction representation in 2D PFL may not have sufficient resolution to provide gain modulation to the 2D P-EN with angular representation. At least, we believe that Vollan et al. 2024 offers significant insight, suggesting that the gain signals fed back from DN to PEN might also form a continuous ring manifold.

BTW, we conducted simulations to extend our model to the 2D case, as a request from Reviewer Wau7. Please check our reply to the Reviewer Wau7.

审稿意见

评分: 5置信度: 12024-07-11

The paper presents a formulation of the problem of motor planning as a Lie group operator search. The authors define a 2-layer neural network implementing the transformation from sensory input to rotation direction, which presents similarities to the neural circuits of the Drosophila responsible for goal-directed navigation.

优点

The paper's theoretical approach to describing motor planning is interesting. A strength of this work is the attempt to find a plausible implementation of the 1D rotation operator search as a neural network, which could be realistically implemented in the Drosophila. The mathematical formulation of the problem seems sound, and the authors exhibit good knowledge of the tools employed in the study.

缺点

The potential of this work is largely limited by the poor quality of the exposition. The paper is difficult to read, because the design choices are not well motivated and the results are presented in a confused manner.

For instance, the introduction of the theory writes: "Suppose the motor system generates the same kind of actions from a Lie Group G (e.g., translation or rotation) to transform the world state s.". Why does this assumption make sense? Why should the motor system generate actions through translation and rotation of the sensory input?

Results are often presented with vague sentences, such as "And the computational complexity of the feedforward circuit can be even lower than the common signal processing algorithms in certain conditions." Which signal processing algorithms? Why are we comparing a solution to a motion planning problem to a signal processing algorithm? Which are these "certain conditions"? Or, "The group operator search formulation of motion planning can provide a normative approach to generalize existing motion planning algorithms into different transformations." Which algorithms? Which transformations? How can this generalization be done?

The description of the numerical simulations is limited to 7 lines (279-285), and visualized in half of figure 4. In my opinion, it is impossible to evaluate the correctness of the experiments and the sensibility of the results, given the minimal amount of information provided.

I recognize that the study might describe meaningful results, and that the mathematical derivations of a neural circuit to transform sensory input into rotations might be relevant for this research field. However, I do not think the paper provides sufficient elements to understand the model, the theoretical derivations and the experiments. Hence, I doubt the paper can have an impact in its current form.

问题

This paper seems to propose a general theory of motion planning, but 1D rotation is the only example provided. How can this theory be generalized to generic trajectory planning?
The authors mention that this is "one of the first studies formulating motion planning as the group operator search problem", but cite none. Which other studies do the authors refer to? How does this work fit in the literature?
What is the meaning of the following sentence: "This distinction is reflected by the fact that although the operator’s parameter θ∗ = h − s is arithmetic subtraction, the neural operator Rˆ(θ) by no means of arithmetic subtraction between neural responses u(h) and u(s)." (lines 91-93)?

局限性

The limitations of this work are adequately addressed. However, the lines 318-323 are unclear to me. What do the authors mean when they say that "group representation theory (...) guarantees to find such an operator and deriving corresponding circuit model"?

作者回复

2024-08-06

Thank you for your review! We want to clarify about the writings first:

‘Suppose the motor system ...through translation and rotation of the sensory input?’

In the heading direction example, the fly’s brain could use its wings to rotate its heading (flying) directions. Here, the direction $s$ corresponds to the world state, and all rotations constitute a rotation group. Flapping wings to change the heading direction (world state) corresponds to the upper left green arrow shown in Fig. 1A. Meanwhile, the fly’s sensory system will keep monitoring its heading direction, and update instantaneously (bottom left green arrow, Fig. 1A), corresponding to the motor system transforming the sensory inputs. The sensory-action loop depicted in Fig. 1A is canonical for the brain and embodied robots.

A similar principle also works for other transformations, e.g., walking/running in a 2D plane constitutes a 2D translation group, the 3D rotations of our head constitute a 3D rotation group SO(3), and 3D translations and rotations of our elbow constitute a SE(3) group.

The present study only considers the simple 1D rotation group, and eventually lands on concrete neural circuits in flies.

” And the computational complexity ... these "certain conditions"?”

The common signal processing algorithm refers to Fourier transformation (Eqs. 13 and 15, detailed algorithmic steps in Table S2 of Supplementary Information, computational complexity in Table.1 and lines 138-149). We find the complexity of finding the group operator in the neural circuit ( $\mathcal{O}(N log|h-s|)$ ) will be lower than the Fourier transformation ( $\mathcal{O}(N log N)$ ) when the stimulus direction $s$ is close to the goal direction $h$ enough, i.e., $|h-s| < N$ . This was explained by the text in lines 220-223.

Of course, we will revise our text accordingly to make it less vague.

The comparison between the complexity of the FFT-based algorithm with the neural circuit is a natural one between artificial algorithms and neural circuit algorithms. We hope this comparison could show the advantage of neural circuit computation, and its potential to be a new building block for goal-directed navigation tasks in the future.

”The group operator search formulation ...this generalization be done?”

We will revise our manuscript and discuss/explain with more details based on our current discussion in Lines 312-324. Specifically, the theoretical derivation presented in Sec. 2 is a systematical way that guarantees to find required operators in complex group transformations, e.g., in 3D rotation SO(3) group, or SE(2) group. In addition, our Global Rebuttal lists one example of 2D translation operators. We can incorporate this discussion in the revised manuscript.

Here are our answers to the questions:

” This paper seems to propose ... generic trajectory planning?”

Due to the page limit and the amount of experimental support, we only showed a 1D rotation case that can link to the recently discovered Drosophila’s heading direction system. However, we provided a general framework in Section.2, which, quote reviewer wau7, ‘’helps the reader grasp the theoretical framework of group operator search’. Besides, a recent work (Vollan et al,2024 from Mosers’ lab) indicates that there exists a 2D version of our model in rodent brains, which endorses the generality of our model to high dimensional cases. Please see Global Rebuttal for its generalization to generic trajectories.

” The authors mention that ... this work fit in the literature?”

To our best knowledge, we don’t know any previous literature that ever linked the motion planning computation (group operator search), concrete neural circuits, and analytically solved recurrent neural dynamics together, so we don’t insert a citation here. The text “One of the first studies …” is actually our modest rhetoric, and we are sorry that the wording confused you. We will revise the text accordingly.

In addition, the goal of the present study is to build a theoretical framework of goal-directed navigation with both mathematical and biological groundings, and we have provided a thorough comparison with other works in the ‘Comparison to other work’ (lines 300-311). Reviewer a5Tr acknowledges our paper as ‘an elegant mathematical reframing’ of other models, and we believe this mathematical reframing would endow it with potential to extend to more complicated situations. Regarding citations, we provided references in the ‘Comparison to other work’ paragraph (Ref 19-21), whereas all these cited works only built the circuit model but didn’t connect it to the group operator search. In addition, Reviewer a5Tr also provided a few references in his/her review (references 7-14 of Westeinde et al), which will be included in the revised manuscript.

” What is the meaning ...”

We intend to use this sentence to motivate the challenge of developing a neural circuit searching the required operators, which is not as simple as it appears as computing the arithmetic subtraction of $\theta^*= h-s$ . This is because the stimulus direction $s$ and goal direction $h$ are distributedly embedded in two neuronal responses $u(s)$ and $u(s)$ respectively, where each $u$ is a vector with each element denoting the firing rate of a neuron and dimension determined by the number of neurons in the population. The challenge is the subtraction of angles $\theta^*= h-s$ doesn’t imply we can find the operator by naively subtracting $u(h) – u(s)$ , otherwise the problem will be trivial.

In principle, finding the neural operator involves below steps

Decode the embedded directions $s$ and $h$ from neuronal responses.
Compute the angular difference, i.e., $\theta^* = h-s$ .
Substituting $\theta^*$ into the expression of the rotation operator (Eq. 8).

The above steps are much more than calculating a subtraction $h-s$ . And the present work linked the biological neural circuitry to the abstract group operator problem.

2024-08-11

I thank the authors for their response and, after reading the other reviews, I acknowledge that my poor understanding of the paper is only partially due to the quality of the exposition. I have read the paper once again and found that my evaluation was biased by the fact that I find the introduction excessively vague and that the paper has several typos / unclear sentence constructions which gave me the impression of poor care in the exposition. I list some of them here, but I might have missed many others:

line 84, is -> are
line 84, "concrete profile"?
line 146, this procedure are -> this procedure is
line 163, gradient accent -> gradient ascent
fig. 3 caption, "neuron are" -> neurons are
line 173, "two issues to be resolved for v_t computation in feedforward circuits". Missing verb
line 200, "we used the summed neuronal activities doesn't depend on the represented direction", does't -> do not, "we used the fact that..."
line 202, outputs -> to output
line 233, "of the depends" -> depending?
line 253, "emerge" seems to be wrong in this sentence

However, this does not prevent the paper from presenting an interesting connection between Lie groups and computational neuroscience. For this reason, and after considering the positive reviews of the other reviewers, I increased my score.

2024-08-11

We appreciate your meticulousness and thank you for re-evaluating our work! We will correct these typos and carefully go through our manuscript.

2024-08-07

We hope our reply can help answer your questions. We wish the contribution of our work can be acknowledged and you could reconsider the rating.

审稿意见

评分: 5置信度: 42024-07-20

Intelligence systems, both biological and artificial, need to leverage sensory information and sensory-motor feedback in order to plan their (motor) actions in order to interact with their environment. A specific behavioral scenario is goal-directed navigation, wherein an agent needs to plan its actions based on its current state in order to move towards a specified goal state. The authors formulate this goal-directed naviration behavior as a Lie group operator search problem. Specifically, they argue that given an group-equivariant representation of the stimulus space, the agent needs to search for the appropriate operator from a Lie group (corresponding to the agent's action space) that transforms its current state observation to the goal state observation. They demonstrate the utility of their framework by considering a 1D rotation group of actions in a goal-directed navigation problem as in the Drosophila head-direction system. The authors demonstrate that their framework of operator search can be implemented by a 2-layer feedforward circuit, that mimics the biological circuit motif in Drosophila, and is computationally simpler than commonly used signal processing algorithms, such as fast Fourier transform-based strategies. Moreover, the authors also propose a complete model of the Drosophila head-direction system by combining the sensory and motor circuit modules, that together implements the goal-directed navigation behavior. This study has the potential of connecting group-theoretical perspectives of representation learning with the development of planning systems in artificial agents, as well as explaining the circuit motifs in biological systems that implement such goal-directed navigation behavior.

优点

The study is well structured and connects group-theory principles to the desiderate of planning systems. The paper also provides a thorough introduction of the theoretical framework and related ansatz that are required to grasp the contributions of the proposed framework.
The paper is quite well written and easy to read. Although there are a few typos, and minor referencing issues, I was able to read the paper and follow the key arguments presented by the authors.
This work presents a theoretical neuroscience framework for goal-directed navigation that is strongly grounded in the biological circuitry of the Drosophila head-direction system. Therefore, it presents a strong link between theoretical and experimental neuroscience.
The proposed framework is interesting and relevant to the larger AI community, especially in grounding concepts in equivariant representation learning and reinforcement learning-based planning systems.

缺点

The key weakness is in the disconnect between the abstract/introduction of the paper and the rest of it. Although the authors argue that their framework is general and present theoretical arguments that can be extended to abstract group operators, the key results and demonstrations hinge specifically on the 1D head-direction system of Drosophila. Therefore, it is not immediately clear how effective the Lie gorup operator search theory can be applicable to other planning systems, including navigation in 2D environments.
The current demonstration in the 1D system assumes an equivariant representation of the stimulus space, such that the changes in representation of head direction follows the rotation of the animal. It is unclear how this assumption can be applied to other abstract planning systems, and how the proposed framework can be applied in case the equivariant assumption no longer holds.
Related to the previous point, the current study also assumes uniform coding of the stimulus space, i.e. each head direction is represented with equal fidelity. This assumption certainly doesn't hold for the visual system or even 2D space, wherein representations of certain stimuli or locations are more sensitive to changes than others. It is unclear how the group operator search can be implemented in this case. These issues, I feel, limit the applicability of the proposed framework to very specific systems.
The numerical simulations/demonstration presented in this work are limited to 1D stimulus case. It would be interesting to see how the framework extends to higher dimensional stimulus cases, even if that requires reduced faithfulness to the underlying biological circuitry. Adding these experiments will definitely improve the relevance of the proposed framework for the NeurIPS community, and the larger AI community.

问题

Given the current framing of the introduction and the rest of the paper, I am wondering if Section 2 (albeit useful to have for completeness) is slightly distracting from the core contributions of the work. Specifically, Section 2 helps the reader grasp the theoretical framework of group operator search but isn't really necessary for appreciating the rest of the paper. Perhaps, this section would be more relevant if the authors could demonstrate the utility of their framework in more abstract group operator settings. I do acknowledge that the authors touch on this aspect in their limitations section as well.
In Eq 19, should it be $dx'$ instead of $dx$ ?
Why did the authors assume equal norm and equal sum of responses for all stimuli? Preserving the 1st and 2nd moments of responses seems to require a layernorm-like operation. Is there experimental evidence for this statistic in the Drosophila system? Would the authors suspect a role of inhibitory cells in maintaining this sort of "balance"?
The authors assume uniform coding, i.e. all stimuli are represented with equal fidelity/accuracy. How true is this in the Drosophila head-direction system? Can this assumption be removed for the proposed framework to be applicable to other systems?
Section 4, Page 8 introduces a new notation $r_s+$ , which isn't grounded/related with quantities introduced before. Can the authors clarify why this was required and what does it correspond to in terms of $u(s)$ ? As such, why is there a need for this extra population of neurons in the proposed model, when you already have access to $u(s)$ from the ring attractor? Perhaps, I am missing something in the explanation here.
How is $u(h)$ obtained from the ring attractor? Is there a separate memory system that stores the goal representation?
Minor typo: Line 171 should start as "computes". There are also a few referencing issues, wherein the correct equations are not referenced in text. Apart from these minor typos, the paper is very well written.

局限性

There are no immediate negative societal impact of this work. But it would be nice to add an impact statement (per NeurIPS guidelines) in the Conclusions.

作者回复

2024-08-06

Thank you for your positive comments on the strengths of our work, especially on how it links the theoretical framework to concrete neural circuits. Please check our Global Rebuttal for our replies to your concerns about the generalization of the proposed framework, and the equivariant assumption.

Reply to review’s questions:

“Given the current framing ... the authors touch on this aspect in their limitations section as well.”

We really appreciate the reviewer’s suggestion to improve our writing. Section.2 appears to be unnecessary or even overkilling in the current case because the 1D case is intuitive (Eq. 6), while Sec. 2 does provide some necessary understanding of the neural circuit computation, including

The geometric understanding of motion planning circuit computation (Fig. 3C).
The comparison of computational complexity (Table 1). Specifically, we need to refer to the workflow in Sec. 2 (Eqs. 13 and 15) to determine the computational complexity of finding operators by using FFT (lines 140-143; Table 1, middle column). This is necessary to show the feedforward circuit can have lower complexity (lines 215-223; Table 1, right column)

Certainly, we cannot agree more that Sec. 2 would be more relevant if we demonstrated it on a more abstract group operator search (we discuss this in lines 315-323). For example, in searching for operators in the 3D rotation group SO(3), we can use spherical harmonics as the basis function to derive the representation of operators -- the Wigner D rotation matrices. We will include this in the Discussion in our revised manuscripts.

“In Eq 19, should it be 𝑑𝑥′ instead of 𝑑𝑥?”

Yes, thank you for pointing it out, it should be $dx’$ instead of $dx$ .

” Why did the authors assume equal norm ... maintaining this sort of "balance"?”

The invariant norm/sum of neuronal responses has been widely observed in neural data, which was evident in Drosophila's heading direction systems (E-PG neurons) (e.g., Fig.1G&H in Ref. [31] and Fig.1 in Ref. [32]) and other areas, like in orientation neurons in the primary visual cortex, moving direction neurons in the Medial Temporal area, etc. This invariance assumption is also widely accepted in theoretical modeling, e.g., Fig. 3.8 in the textbook Dayan & Abbott 2001; Zhang, J. Neurosci., 1996; Jazayeri, Nat. Neurosci., 2006; Wu et al., Neural Computation 2008; Zhang, NeurIPS 2022, etc. Specifically, a recent work (Ref. 27) demonstrated that this invariance is necessary for a translational/rotational equivariant representation.

Yes, the invariant sum of neural responses results from inhibitory neurons, which is modeled as divisive normalization (DivNorn) in our model. DivNorm is suggested from PV inhibitory neurons by various experiments. (e.g., Niell, Annual Rev. Neurosci., 2015).

In addition, despite the invariance to certain stimulus features, experiments show that the responses can be modulated by factors such as stimulus intensity, which affects the gain of neuronal responses (Salinas & Sejnowski, 2001)

” Authors assume uniform coding, ... be applicable to other systems?”

First, the uniform coding results from the symmetry of translation/rotation groups (The paragraph above Eq. 7 in Ref. 27), and is approximately true in the neuroanatomy of the heading system of Drosophila in Ref. [31-32]. Removing the uniform coding will break the symmetry and thus the equivariant representation in the neural circuit. It is possible that under other group structures the neuronal distributions on the attractor manifold are not uniform. Indeed, scaling group transformation would require log coding (Esteves, et al, ICLR, 2018).

Besides the systematic non-uniformity, the system can deviate from uniformity due to randomness. It has been suggested that the Continuous Attractor Network (CAN) is robust to noises (Deneve, Nat. Neurosci., 1999; Wu, Neural Computation, 2002), thus an approximate equivariant representation can still be held. Certainly, the heterogeneity will break the perfect equivariant representation. One possibility is that the neural heterogeneity and noise can help the brain implement sampling-based Bayesian inference (e.g., Orban, Neuron 2016; Echeveste, Nat. Neurosci., 2020; Zhang, Nat. Comms., 2023).

At last, the uniform coding (neurons are distributed uniformly in the attractor manifold) is not incompatible with non-uniform coding of neurons (such as cardinal effects in visual systems where more neurons prefer horizontal and vertical bars). One possibility is that we still have translation-invariant recurrent weights (ensuring uniform attractor manifold), but the feedforward weights connecting external inputs and network neurons are non-uniform, which will make non-uniform neuronal response reside in a uniform attractor manifold.

” Section 4, Page 8 introduces a new notation 𝑟𝑠+...I am missing something in the explanation here.

Sorry for the ambiguity, we will revise to clarify the notations. $r_{s\pm}$ here denotes the firing rate of P-EN neurons representing heading rotation speed (Figure 4A), which is driven by DN $r_{v\pm}$ (the output neurons in Figure 3A). As indicated in the diagram, P-EN neurons receive copies of the heading direction (E-PG neurons) signal (Eq. S14 in Supplementary) and project leftward- and rightward-shifted feedback to E-PG neurons (Eq. S12A, last term in Supplementary), which will shift the heading direction representation according to the velocity signal generated by PFL3 and DN neurons (suggested by Refs. [27, 35] ), and assemble an intact sensory-action loop.

” How is 𝑢(ℎ) obtained ... the goal representation?”

We used a separate neuronal population to directly represent the goal signal u(h) (last term in Eq. S13a in Supplementary). In Drosophila, it has been discovered to be represented by FC2 neurons (Ref. [11], Fig1i).

” Minor typo: ....”

Thanks for pointing them out! We will correct them in the revised version.

2024-08-07

We hope our reply will address your concerns and potentially lead to a reassessment of our manuscript.

评论- Reply to the authors' rebuttal

2024-08-09

I would like to thank the authors for their rebuttal and providing clarification to my questions. I would also like to thank the authors for the additional explanation of how the proposed framework can be extended to higher dimensional systems, e.g. 2D navigation.

To summarize, my three main points of contention were the following:

Extensions to higher dimensional systems.
Incorporating approximate equivariance instead of perfect equivariance.
Assumptions regarding equal norm of neural responses and uniform coding.

For the 2nd and 3rd points, the authors provided specific justifications and evidence from the systems+theoretical neuroscience literature. While I take the point about equal norm (based on activity of inhibitory cells), I find it harder to agree with the argument about uniformity of neural coding, i.e. each stimulus is encoded with equal fidelity or neurons are uniformly distributed around an attractor manifold. From the explanation provided by the authors, it is not immediately clear to me that their proposed framework of planning as a Lie group operator search can be applied to settings where such non-uniformities exist. Nevertheless, I agree with the authors that in the drosophila head direction system, this might not be an issue. However, the authors claim that their framework is more general than just the Drosophila head direction system.

For the 1st point, I like the authors' argument in the general comment. My only concern is the claim or prediction that there needs to be $2M$ populations of neurons (where $M$ is the intrinsic dimensionality of the action space) to encode for directional derivatives of the objective function. It is interesting to see some preliminary evidence pointing to this. But given that the authors claim that their framework is a general solution for goal-directed navigation, I think it would be important to demonstrate that a 2D navigation can be solved using their computational framework.

Based on this, I would like to maintain my score for the time being. However, I am happy to reconsider my evaluation based on the authors response.
Thank you again for your hard work. I sincerely believe that the proposed framework is promising and could be useful for the wider community, if its merits hold for higher dimensional systems.

Regards,
Reviewer wau7

评论- Reply on the non-uniform distribution of neurons and 2D cases

2024-08-12

- Extension to 2D scenarios

We simulated a 2D counterpart of the present neural model, which achieves motion planning in a 2D environment. We followed Strategy 1 outlined in the Global Rebuttal, using four additional groups of neurons (targeting north, south, east, and west) to calculate the velocity. These results are consistent with the $2M$ prediction. Specifically, we represented the location and goal using two separate 2-dimensional continuous attractor networks, each containing 128x128 neurons, referred to as EPG and FC2 neurons, as in the main text. The rest of the model remains largely unchanged, except for the inclusion of four groups （north-, south-, east- and west-ward) of PENs, PFL3s, and DNs instead of two (left- and right-ward). Once we initiate the self-location and goal location, the EPG equivalent neurons generate an activity bump at the self-location, which then moves toward and ultimately terminates at the goal location. By plotting the DN neuron activities, we can verify that the movement velocity along the north-south axis is proportional to the firing rate difference between DN-North and DN-South neurons, and similarly for the east-west axis.

PS: we realize in the author-reviewer discussion period, the OpenReview closed the channel of uploading a one-page PDF and hence we don't have a way to show you the figure of 2D simulations. If you like, we are happy to include the new results as a figure in either the main text or Supplementary Material, whichever you think is appropriate.

- Non-uniform distribution of neurons on the stimulus manifold

We believe the reviewer has raised a critical question regarding the reconciliation of translation symmetry with the non-uniform distribution of neurons on the stimulus manifold. On the one hand, as we previously noted, uniform distribution of neurons is required by translation symmetry, analogous to having a regular grid covering the stimulus axis. On the other hand, we recognize that real systems are not perfect; for example, neurons in the brain are often distributed non-uniformly along the stimulus manifold, corresponding to an irregular grid.

At the neural circuit dynamics level, the same neural dynamics (Eqs. S12-S15 in the Supplementary) can still approximately facilitate motion planning and the rotation of heading representations on the ring attractor, if we consider a non-uniform distribution of neurons (the $x$ in Eq. S12a is irregular). To realize exact computation in the non-uniform case, the recurrent weights (line 85 in Supplementary) between neurons need to be fine-tuned numerically, by using a technique similar to Noorman, bioRxiv 2022.

Overall, we think the non-uniform distribution does not alter the neural circuit implementation but it requires new theoretical insights to explain why the system continues to function effectively. Below we give our explanations at the conceptual and algorithmic levels.

At the conceptual level, one way to reconcile these two aspects is through symmetry breaking (SB). Specifically, while translation symmetry (uniform distribution of neurons) represents an idealized scenario, real systems likely exhibit some degree of SB, shifting the system from a high-symmetry state (uniform distribution) to a lower-symmetry state (non-uniform distribution). This SB does not imply that translation symmetry is entirely inoperative. For instance, in spontaneous SB, the governing dynamics can still obey translation symmetry, even if the solutions themselves do not exhibit the same symmetry. Thus, spontaneous SB could potentially reconcile the conceptual contradiction between translation symmetry and non-uniform neuron distributions. However, whether this is indeed the case remains an open question, since the connection between group symmetry and neural circuits is still in its early stage of development.

At the algorithmic level, the challenge lies in translating neural responses defined on an irregular neural grid (non-uniform neuron distribution), especially when the translation destination does not align with a grid point. Recent advances in graph neural networks and spectral graph methods offer promising solutions to this issue (e.g., Defferrard, NeurIPS 2016; with mathematical analysis in Shuman et al., Applied and Computational Harmonic Analysis 2016). In brief, translation is a special case of convolution, e.g., $f(x-a) = \int f(x) \delta(x-a) dx$ corresponds to $f(x)$ convolving with a delta function. The convolution can then be expressed in the Fourier domain, where the Fourier basis serves as the eigenfunction of the Laplacian operator defined on a regular Euclidean grid (uniform neuron distribution). For an irregular grid, the graph Laplacian eigenfunction can replace the Fourier basis, allowing us to define translation on an irregular grid.

Finally, we intend to include a condensed version of this discussion in the revised manuscript.

2024-08-13

Dear Reviewer wau7,

The author-discussion period will be ending by today. We would like to know whether our reply addresses your concerns, and/or there are extra things you want us to clarify. Thanks!

Best Authors

评论- Reply to authors' clarification

2024-08-14

I would like to thank the authors once again for providing clarifications to my questions.

Extension to 2D scenarios: Thank you for the explanation. I must admit it would be easier to comprehend the 2D simulations with figures, but the textual description does a decent job. From your description, I understood that you were able to demonstrate that the proposed framework can be applied to infer velocities proportional to differential firing of DN neurons (for specific axis). While this is promising, I had a different use case in mind. I was wondering if you could leverage your proposed framework to solve a goal-directed navigation in a 2D maze (T-maze or grid world). I apologize if my question was not clear. From your description, I am convinced that the current framework can be extended to low-level planning actions wherein the goal is to decide the immediate action rather than a sequences of actions to arrive at a goal. Perhaps, the authors can add a statement along these lines to clarify the scope of their framework in the introduction. Also, adding the 2D simulations in the appendix would strengthen the proposal.

Non-uniform distribution of neurons on the stimulus manifold: Thank you for a detailed description of the different factors involved in extending the proposed framework to the case of non-uniform distribution of neurons on the stimulus manifold. I agree with the authors that new non-trivial theoretical extensions are required to extend the current circuit framework to this case. Given the ubiquitous nature of non-uniform manifold coding in biological systems, I believe it is a crucial blocker for the proposed system to apply to different systems. However, the authors have already cited previous literature that this is not an issue in the Drosophila system, and therefore I feel the proposed framework presents a useful contribution to the field of systems neuroscience.

Taken together, I feel this work is promising and has several merits. My concerns arise from the generalizability of this framework to multiple biological systems (as claimed by the authors in the paper) and the relevance to the wider NeurIPS audience. E.g., I am not completely convinced that the proposed framework can be leveraged to readily improve planning in artificial agents. However, I am inclined to recommend an accept rating for the paper based on the discussions we have had during the rebuttal period, which I believe has led me to better understand the authors' perspective and claims around their framework. Also, this work opens up unique opportunities/avenues for people interested to work at the intersection of group symmetry and neural circuits.
Therefore, I have increased my score to recommend borderline accept and will defer to discussion with other reviewers and (S)ACs for assessing the fit to NeurIPS.

2024-08-14

Thank you for your valuable comments! We sincerely appreciate your support of our work. Your insightful suggestions are greatly appreciated, and we will certainly consider demonstrating our model on 2D mazes, artificial agents, and non-uniform coding systems in our future research.

作者回复

2024-08-06

Global Rebuttal

Generalizability to complex scenarios

All three reviewers asked how the proposed framework can generalize to complex scenarios. Yes, the proposed framework can generalize to complex scenarios in theory, but the challenge comes from the limited amount of experimental evidence in complex scenarios.

We will use, for example, the 2D translation group as an example to explain the generalizability of our framework. An important step in motion planning neural circuit is approximating the derivative of the objective function over the transformation amount (Eq. 18), which is realized by the spatial difference in the neural circuit (Eq. 20), i.e., the sensory representation is rotated to the positive and negative direction ( $\theta_t + \Delta \theta$ and $\theta_t - \Delta \theta$ in Eq. 20). This spatial difference strategy can be also used in the 2D case. In principle, there could be 2 different but equivalent strategies.

In an allocentric representation with an x-y coordinate, where the sensory representation will be shifted along four directions, i.e., $\pm x$ and $\pm y$ , and then we expect to have four populations of neurons (the 2D counterparts of PFL3 left $r_{\theta-}$ and PFL3 right $r_{\theta+}$ ). Then the pooling operation will be similar to the 1D case while we will have four output neurons (2D counterparts of DN left $r_{v-}$ and DN right $r_{v+}$ ) which will generate actions along four directions in the 2D plane. This kind of shift was also considered in Burak, Plos Comp. Bio. 2009, although that paper didn’t study the motion planning circuit. Overall, for groups with intrinsic dimension $M$ , we need $2M$ populations of neurons where in each dimension there are two neuronal populations shifting the sensory representation along positive and negative of that direction.
In an egocentric representation with $r$ - $\theta$ polar coordinate, assume direction is represented in a $2\pi / 2n$ resolution, then each sensory representation will have 2n copies shifted along 2n different directions. Copies along opposite directions can be used as counterparts of PFL3 left $r_{\theta-}$ and PFL3 right $r_{\theta+}$ to calculate the derivative along this direction.

No matter which strategy shown above is used to compute the spatial derivative, the pooling operation will be similar. The difference is just how neurons are arranged in the stimulus feature manifold to find the spatial derivative.
Both strategies are plausible in the neural system. Vollan et al. (2024) from Mosers’ lab suggests that rodents’ entorhinal cortex probably exists a system similar to the 2nd strategy above. Their results show that a kind of grid cell conjunctively coding location and heading direction projects to pure grid cells, and this projection is shifted by its preferred heading direction.

We are happy to include a compressed version of the above discussion in the revised manuscript that has an extra page to strengthen our manuscript if reviewers encourage us do to that. The reason for not including them in the submitted manuscript is just due to the page limit.

The significance and contribution of the present study

The main contribution of the present study is building an overarching connection between the abstract Lie group theoretical framework to concrete fly’s motion planning neural circuits in goal-directed navigation, providing a deep, novel understanding of neural computation, which is acknowledged by the reviewer a5Tr. Even if the 1D rotation appears simple (mentioned by the reviewer wau7), we would like to point out that no previous study has ever linked a concrete 1D goal-directed circuit to the Lie group operator search problem, which resulted from two challenges

There was a lack of analytical results of complex nonlinear recurrent circuit dynamics, and thus it is very hard to analytically identify the operator representation in circuit dynamics.
There was not sufficient amount of experimental observation to build the link between data and theory.

We believe the present study successfully solved the two challenges by providing analytical results and linking them to recently discovered fly’s circuits.

Equivariant representation

Equivariant representation is a general framework of neural representation and has been well studied by many previous literatures. Regarding the reviewer’s concern about its application to other planning systems, we would say many methods have been developed by previous studies to build representations equivariant to various transformation groups, including very general ones, such as discrete rotation and reflection groups in Cohen, et al, ICML, 2016, E(2) group in Weiler, et al, NeurIPS, 2019, Gao, et al, NeurIPS, 2021 etc. With all these methods, we do not worry about the equivariant condition in other planning systems.

最终决定Accept (poster)

2024-09-25

The paper proposes a novel approach to motion planning in the brain by framing it as a Lie group operator search problem. The majority of reviewers found the paper to be well-written, clear, and providing the necessary theoretical background to understand the proposed framework. The formulation of navigation as a Lie group operator search is particularly interesting.

Based on my reading of the reviewers' feedback and the active engagement between the reviewers and authors in the discussion (which I highly encourage readers to review), most of the comments and criticisms have been addressed or do not prevent the paper from meeting the NeurIPS acceptance criteria.