Feedback control guides credit assignment in recurrent neural networks
Feedback control may enable biological recurrent neural networks to achieve accurate and efficient credit assignment, facilitating real-time learning and adaptation in behavior generation.
摘要
评审与讨论
The authors explore the relationship between feedback control and learning with recurrent neural networks (RNN). Specifically, they enforce a control signal onto a RNN that is used to generate a trajectory for a outreaching task, and then propose to use local learning rules on the neurons in the RNN. They show that with feedback control the network can adapt faster to perturbations, of the task and show that the local (in time) gradients are better aligned with the global ones.
优点
The claims are all very reasonable and well illustrated. I this is the first time such feedback-based learning used in proper control settings, which is surprising given that it is based on control theory.
缺点
Main problem: My main concern is that I the task chosen consists on bringing a system to a desired static target, so it is possible that there is no "out of equilibrium dynamics", rather the learning simply consists on bringing the "arm" to the required target and it just so happens that the shortest trajectory aligns with the velocity profile. While it could be that the trajectory is indeed learned (and with some implicit or explicit regularization it should be the case), the current task is not conclusive. If the point is to really learn a trajectory, the authors should have picked a task where the trajectory is a bit more complex than going to equilibrium. Maybe a limit cycle? Otherwise the work might be a minor modification of Meulemans et al. Also, I fail to see the "biological circuits". If we are talking about recurrent neural networks, this is fine, but usually when we talk about circuits in biology we would refer to cell types (and this has a lot of constraints). In fact the authors themselves state that they are agnostic to the biological implementation, which is hardly in compatible with the title. I would replace it by recurrent neural networks.
Other issues:
- The key findings are not clear in the introduction. The term "inference learning" is only used there and in one of the figure, but it is not clearly defined. If the authors mean that feedback control can train an RNN then this has already been shown. For the second finding, "increased accuracy of approximate local learning rules" it would be better stated as increased accuracy WITH local learning rules (or something similar). For the third, the second order gradient is not really injected (this would suggest that the gradient is imposed on purpose); rather, the feedback control is implicitly related to second order optimization methods.
- Line 142: it seems natural that if the network is perturbed from its trajectory the feedback would be stronger to compensate for the perturbation. I don't see why this is "suggested". Also, the sentence is badly written "suggest that the during task... activity is increasingly by feedback").
- LInes 164 and 165. The authors say that " using a local learning rule without feedback control show an increasing performance gap compared to those trained with feedback control and BPTT". The sentence could be interpreted as if the network is trained with feedback control AND BPTT (combined). A better wording would replace AND by OR
- In 3.4 it is a bit hard to follow. It seems as if the authors are using an eligibility trace to train the RNN through BPTT. But this intermediate step might not be real BPTT as it is commonly used.
Literature issues:
- The work of Meulemans et al 2022b is credited with alleviating the temporal storage required by BPTT. While they did do that (and it is a good paper), I think that they based the memory load decrease on previous work (Bai et al., Deep Equilibrium models 2019), which if memory serves does use BP. The logic of my comment is that by training the equlibrium point of the network one can avoid the memory load, regardless of the training method.
- The connection between feedback-based learning and second order optimization has been is very closely related to Meulemans, et al. "A theoretical framework for target propagation" 2020. That paper mentions target propagation, but it is very similar to feedback based learning (as the authors probably can infer).
- This is a personal opinion, the authors do not need to take it into consideration: The biological plausibility claims seem to rely on the locality of the learning rules. While it's a requirement that learning rules should not break the laws of physics (or in this case basic biological knowledge), learning rules should at least have some basis on biology, which I am missing here. A brief mention of why would one think that the learning rules are close to biological ones would be welcome. My guess for this feedback-based work would be something with a temporal component such as temporal hebbian rules (ex: Aceituno et al. "Learning cortical hierarchies with temporal Hebbian updates." 2020)
问题
I am not 100%, but I think that the work of Gilra and Gerstner had a very similar architecture. Could you mention what are the main differences?
局限性
They did mention some limitations, but the key issue of whether this is general motor control (or shaping attractors) is not addressed. Also, if the paper is about how feedback control guides credit assignment on biological circuits, being agnostic to the biological circuit is a problem, rather than a strength. To make a biological statement there should be some non trivial biological predictions, or some mention of what exactly does this bring to neuroscience that wasn't already known.
Thank you for taking the time to review our work.
My main concern is that I the task chosen consists on bringing a system to a desired static target...
Thank you for flagging this - we also trained RNNs on an additional, commonly used task, where the network has to generate both the sine and cosine wave given some input frequency (see cartoon in extra figure E). Here, the relevant perturbation is a frequency shift. Our adaptation to perturbation results remain qualitatively the same (extra figure F and G), expanding the generality of our method. We will include these results as an additional supplementary figure in the final manuscript.
Also, I fail to see the "biological circuits".
Thank you for flagging this. In the discussion, we state that we are agnostic to the exact source and implementation of the feedback signal. For the RNN itself, while it does not directly correspond 1-1 to a single brain region, it has been used in the literature as a useful model of a non-linear dynamical system such as the brain. Hence, our study is more about the brain as an integrated system rather than a single brain region or circuit per say.
.. The term "inference learning" is only used there and in one of the figure, but it is not clearly defined.
Thank you for flagging this. We are referring to the inference learning (IL) definition from Alonso et al. 2022, which is based on the predictive coding (PC) literature. In IL, neuron activities are modified first using some gradient, and only after this step are the weights modified. This contrasts with backpropagation, where weights are modified first and only. As defining this term in Figure 2/introduction may not be ideal, we have omitted it in our updated manuscript and instead included the relevant citation in the discussion. We now instead refer to simply "approximate learning in the activity space", due to approximate gradient alignment (which was not explicitly shown in previous work).
Alonso, Nick, et al. "A theoretical framework for inference learning." Advances in Neural Information Processing Systems 35 (2022): 37335-37348.
For the second finding ...
Thank you for flagging this - here, we refer to accuracy in terms of approximation of the gradient, not performance accuracy, as the referenced section 3.4 outlines.
For the third, the second order gradient is not really injected ...
Thank you for flagging this - you are right! We now changed this to read: "Feedback control enables more efficient weight updates during task adaptation due to the (implicit) incorporation of adaptive, second-order gradient into the network dynamics "
Line 142: it seems natural that ...
Thank you for flagging this - we now changed this to simply read: "Thus, during the task perturbation, the network activity is increasingly driven by the feedback signal."
LInes 164 and 165. The authors say that "using a local learning rule ...
Thank you for flagging this - we corrected this and we thank the reviewer again for these important detailed comments!
In 3.4 it is a bit hard to follow. It seems as if the authors are using an eligibility trace to train the RNN through BPTT. But this intermediate step might not be real BPTT as it is commonly used.
Thank you for flagging this! In the beginning of section 3.4, our main goal is to briefly outline why biologically plausible learning is particularly hard in RNNs, due to the need for recursive computation of the past network states using the network Jacobian. This and the problems with the currently proposed biologically plausible solutions was explored in more detail in Marchall et al. 2020., where RFLO was shown to be quite a severe approximation of RTRL. This mainly serves as context for our main contribution of this section - that with feedback control, RNNs are more driven by their present than their past states, which simplifies the learning problem and allows for more accurate local learning.
The work of Meulemans et al 2022b is credited with alleviating the temporal storage required by BPTT.
Thank you for flagging this. For simplification, as we are not exploring equilibrium systems in this work, we removed this specific claim in the updated manuscript.
The connection between feedback-based learning and second order optimization ...
Thank you for flagging this citation omission! We now cite it in our updated section on related work.
A brief mention of why would one think that the learning rules are close to biological ones would be welcome.
Thank you for this suggestion and citation, which has now been added to the discussion section of our updated manuscript.
I am not 100%, but I think that the work of Gilra and Gerstner had a very similar architecture. Could you mention what are the main differences?
Thank you for flagging this! Our work is indeed similar to that of Gilra and Gerstner 2017 (eLife), with the main differences being 1) we use a rate instead of a spiking network, which simplified our analysis, such as gradients + Hessians 2) in their work, the feedback loop is trained separately from the the recurrent network, while we pretrain them together 3) in their work, in contrast to ours, the network output is close to reference from the very start of learning due to "tighter" feedback control, which further eases the learning problem.
I appreciate the clarifications and extra work, I will upgrade my score.
However, a key concern is that the title is still misleading. The general view, which is nicely explained here [1], is that RNNs are useful models for generating hypothesis for biological circuits, but they are not biological circuits themselves. Generally, there is a crucial difference between proposing an hypothesis/model and implying (on the title!) that the hypothesis is true. As an example, the works of Gilra and Gerstner use the words recurrent neural networks, not biological circuits.
Furthermore, quoting from your reply: "..., it has been used in the literature as a useful model of a non-linear dynamical system such as the brain". The fact that RNNs are general models for non-linear makes it even more strange to use the words "biological circuits" in the title, as this reply would indicate that the results are valid of any non-linear system even beyond neuroscience. I know that this is not what the authors intended, but it highlights the point that RNNs are not biological circuits.
[1] Neural circuits as computational dynamical systems
Thank you so much for reviewing our rebuttal. We acknowledge your concern with our title. We had a couple of working titles, one of which was "recurrent neural networks" instead of "biological circuits", and we are happy to change it back to this. We initially chose "biological circuits" because the motivation for our project was/is on the biological learning side, but we agree that revising it is a good idea.
Feedback controllers are ubiquitous in neuroscience but their functions are not fully understood. This paper studies how feedback control interplays with biologically plausible online learning on a standard motor control task. The authors show that:
- feedback control enables to adapt to task variations without any plasticity, by approximating the gradient of the loss with respect to the hidden states.
- it makes tractable approximations to RTRL more reliable by shrinking the recurrent Jacobian eigenvalues.
- it incorporates some second-order information in the weight updates, leading to faster learning.
优点
The paper studies an important understudied question, that is the interplay between feedback and learning. The paper is overall well-written and is easy to read. The message is clearly delivered. The experiments are carefully designed and well executed, and support the claims of the paper.
Overall, the paper will be an insightful read to the community.
缺点
While the experiments are overall well executed, there are a few points that should be improved to make the paper's claims more robust:
- In the appendix, it is written that the learning rate is taken to be constant. To make claims about e.g. learning speed, the optimizer, in particular its learning rate, has to be tuned.
- Figure 5b: it is not clear from this experiment that RFLO-c contains some second-order information. The alignment with the 2nd-order gradient result is not convincing as the estimated gradient is more aligned to the first-order gradient than to the second-order one. This experiment needs to be improved for it to support its claim. The BPTT-c baseline that I mention below may be a good starting point for further analysis as the gradient of a "controlled loss" (which is not the case for RFLO).
- A BPTT-c/RTRL-c baseline would be an interesting add to disambiguate between the role of feedback control and approximate gradient estimation through RFLO. This baseline would include feedback control in the recurrent neural network dynamics and optimize for the MSE loss at the output. This would be useful in e.g. Fig3b and Fig5b.
问题
- l98-99: can the author clarify the link between the use of a local learning rule and the rapid adaptations shown in neuroscience studies?
- Fig1: a, b, c legends are missing in the figure.
- l140-141: "that" missing after "feedback control,"? typo "outout".
- Fig2: "approximate inference learning": what do the authors mean by inference learning? I could not find any definition.
- l167-168: "does" missing + typo for "adaptatation".
- Appendix A.3: the authors mention that they use Adam with weight decay. The standard practice is to use AdamW instead (c.f. the AdamW paper for more detail). Can the author confirm that they are using AdamW?
局限性
The paper is theoretical and its limitations have been properly addressed.
Thank you for taking the time to review our paper.
In the appendix, it is written that the learning rate is taken to be constant. To make claims about e.g. learning speed, the optimizer, in particular its learning rate, has to be tuned.
Thank you for pointing this out. To address your concern, we have included a learning rate sweep for Figure 3b in the extra figures page (see figure B).
Figure 5b: it is not clear from this experiment that RFLO-c contains some second-order information.
Thank you for raising this important point. To clarify, we are not claiming that our learning method performs exact 2nd-order optimization. Instead, it effectively interpolates between a 1st and 2nd order method, with the gradient showing a significant projection in both directions. In the extra figure A, we show the measured alignment between the 1st and 2nd order gradient as a baseline (yellow line). This baseline alignment is lower than the overlap of our gradient with both 1st and 2nd order directions, indicating that the observed alignment is not accidental. Additionally, having any curvature information allows the optimizer to better navigate the learning landscape and improve learning efficiency.
A BPTT-c/RTRL-c baseline would be an interesting add to disambiguate between the role of feedback control and approximate gradient estimation through RFLO...
Thank you for the suggestion. We have added the BPTT-c baseline to Figure 3b in the extra figures page (see D) to help disambiguate the roles of feedback control and approx gradient through RFLO. It seems that BPTT+c performs worse than BPTT (remember that the test loss is reported in the open-loop setting, i.e. no feedback is provided to the network regardless of whether it was trained with it or not). This suggests that BPTT+c overfits to feedback control here, unlike RFLO+c. However, note extra figure B, which suggests that our choice of learning rate for BPTT+c may simpy not be ideal. For Figure 5b, however, this baseline is not applicable because in it, there is no plasticity in the weight matrix yet, and both gradients are computed in the activity space. While we could perform the same analysis in the weight space, calculating the Hessians of the full weight matrix is intractable due to poor scaling.
l98-99: can the author clarify the link between the use of a local learning rule and the rapid adaptations shown in neuroscience studies?
Long-term skill learning is known to involve synaptic changes in the motor cortices (e.g. Roth et al. Neuron 2020). However, we don’t know how rapid learning is implemented in the brain, nor where it happens. In the neuroscience study that motivated our work (Perich et al. 2018), plasticity was not directly measured, due to the extreme difficulty of such measurements. Nevertheless, animals progressively learn to adapt better to persistent perturbations, which requires some form of rapid learning along the sensorimotor pathways. Thus, in the fine-tuning phase of this work, we enable plasticity in the recurrent layer of our RNNs, which serve as a model of a nonlinear dynamical system doing closed-loop control. Additionally, we use a learning rule that could feasibly be implemented in the brain, as it respects the locality constraint. Thank you for flagging this important point - we have now edited the relevant section in the main text to clarify this reasoning better.
Fig1: a, b, c legends are missing in the figure. l140-141: "that" missing after "feedback control,"? typo "outout". l167-168: "does" missing + typo for "adaptatation".
Thank you for catching these errors!
Fig2: "approximate inference learning": what do the authors mean by inference learning? I could not find any definition.
Thank you for flagging this. We are referring to the inference learning (IL) definition from Alonso et al. 2022, which is based on the predictive coding (PC) literature. In IL, neuron activities are modified first using some gradient, and only after this step are the weights modified. This contrasts with backpropagation, where weights are modified first and only. As defining this term in Figure 2 may not be ideal, we have omitted it in our updated manuscript and instead included the relevant citation in the discussion.
Alonso, Nick, et al. "A theoretical framework for inference learning." Advances in Neural Information Processing Systems 35 (2022): 37335-37348.
Appendix A.3: the authors mention that they use Adam with weight decay. The standard practice is to use AdamW instead (c.f. the AdamW paper for more detail). Can the author confirm that they are using AdamW?
We use Adam, and we explicitly add the regularization terms (i.e., weight decay) directly to the cost function. This approach is functionally different from the weight decay correction in AdamW. While we did not use AdamW, it does not impact our results as we do not report any performance gains with Adam and instead use SGD during fine-tuning. We now make this explicit in the detailed methods section.
I have read the author's rebuttal and keep my score as it is.
Regarding the choice of the learning rate, it seems that the authors do not fully follow the standard ML practice:
- The optimal learning rate seems to be at the border of the lr range considered; it should not be the case.
- The figures comparing different methods should use the optimal learning rate for each method. I trust that the authors will update their results accordingly.
Thank you! We will update Figure 3b in the results accordingly given the learning rate sweep.
Recent work has shown that feedback signals can be critical to rapid adaptation in control tasks, and may explain how biological intelligence can make rapid adjustments when solving such tasks. This paper studies how feedback control achieves this. To do so, the authors train an RNN enhanced with feedback control on a common control task, and study how the feedback signal lead the network to achieve more rapid adjustments when perturbations are introduced. The 3 main findings are that the feedback signals align well with the optimal global gradient of the error, that they help the network better weigh current information (vs. less relevant past information) during perturbations, and that they indirectly inject second-order information to the RNN.
优点
- This work focuses on improving the theoretical understanding of an important method. Given that our understanding of many deep learning methods are woefully inadequate, such work is critically important for the field's development.
- The method and the results are clearly presented, the figures are excellent, and the writing is easy to follow.
缺点
I am not familiar with feedback control and motor tasks; hence, I ask the AC to please take this into consideration to appropriately weigh my review. My remarks on the methods could be wrong or trivial. That said, I'll do my best to provide feedback.
-
Several sections of the paper seem to just present results from previous work, including section 3.1 and the entirety of the methods section. This makes the contributions of this paper seem rather thin.
-
I may be missing something, but some of the results seem minimally surprising. For example, in section 3.2, the authors state "...the feedback contribution to the overall network output increases during perturbation." But how could it not increase during perturbation? Isn't the network explicitly trained to use the feedback information to make corrections during perturbation? The same goes for the alignment between the feedback signal and the optimal global gradient, and the indirect introduction of second-order information-- is it not by design that the network use feedback to make corrections, and thus the larger the correction needed (i.e. the larger the optimal gradient) the larger the feedback signal? And is it not by design that second-order information gets introduced via the recurrent connections that enables the network to "save" information from previous timesteps in the hidden state?
-
The authors claim that feedback control guides credit assignment in biological circuits, but uses BPTT during the pretraining phase of the RNN, which they acknowledge is not biologically plausible. It seems to me that backprop is still doing much of the heavy lifting in terms of solving credit assignment, thus I'm not sure this claim is sufficiently justifiable. A more defensible claim given the current results may be that feedback control may guide motor adaptation in biological circuits. Similarly, some parts of the intro and abstract strongly suggest that the presented method would perform credit assignment without suffering from the biological implausibilities of backpropagation (e.g. the abstract sets up the problem as "backpropagation is known to perform accurate credit assignment of error, how a similarly powerful process can be realized within the constraints of biological circuits remains largely unclear"), yet the actual method relies heavily on backpropagation.
-
The experiments are performed on a single task, using a small single layer RNN with 400 hidden units, and therefore it's unclear whether the findings would scale to other tasks and larger architectures. Given that the primary goal of this paper is to improve understanding of an existing learning algorithm, and most of the analysis are performed via empirical testing, I believe it's important for the authors to demonstrate that their conclusions are robust over a wider range of tasks and hyperparameters/architectures.
问题
- How does this work relate to hierarchical predictive coding, and to the feedback connections introduced by Hinton in [1] (and further explored by [2])?
- The learning setting presented in this work seem very similar to the setting of reinforcement learning, which also deals with control tasks and shifting distributions. Do you foresee these same results (i.e. feedback control improves performance) to carry over to some RL tasks? If not, what are the differences that limit these results from applying there?
[1] Hinton, G. (2022). The forward-forward algorithm: Some preliminary investigations. arXiv preprint arXiv:2212.13345.
[2] Ororbia, A., & Mali, A. (2023). The predictive forward-forward algorithm. arXiv preprint arXiv:2301.01452.
局限性
I'd like to see an expanded discussion in the limitations section regarding the remaining aspects of the feedback-enhanced RNN that remain biologically implausible. Particularly, the usage of BPTT in a work that aims to explain how biological credit assignment is performed is quite troubling for me, given its significant biological implausibility. Ideally, I think the authors should show that their results hold on a network trained using a biologically-plausible learning rule enhanced with feedback control.
Thank you for taking the time to review our paper.
Several sections of the paper seem to just present results from previous work ...
While our work builds upon the task adaptation findings of previous work (Feulner et al. 2022), it offers a substantially distinct focus and set of contributions, as outlined in our introduction. Our primary goal is to investigate the mechanistic interpretability of the model published by Feulner et al. 2022., and to connect their results to the broader literature of online learning in recurrent neural networks. Specifically, we address why rapid adaptation occurs without any plasticity and how this benefits approximate, local learning algorithms, whose limitations have been previously studied by Marschall et al. 2020). Beyond learning accuracy, we also analyse learning efficiency and show significant differences between learning with and without feedback during the fine-tuning stage. Thus, despite the similarity in the problem setup and the methods presented in the main text, our analysis is distinct from Feulner et al. (2022), where the emphasis was on modeling primate observations and data. Here, we ask a different question: based on what we know about local learning in recurrent neural networks, why does this model work so well?
I may be missing something, but ...
Thank you for your insightful comments. While it may seem intuitive that the feedback contribution to the overall network output would increase during perturbation, it is important to note that this behaviour is not explicitly designed in our networks. The same applies to the incorporation of second-order information. The key contribution of our work lies in empirically showing that these mechanisms naturally emerge from our model. Even if some aspects may seem anticipated, demonstrating this behaviour in a practical and previously unexplored context is a significant step forward in understanding online learning in recurrent neural networks, and specifically how it relates to biologically plausible learning.
The authors claim that feedback control guides credit assignment in biological circuits ...
Thank you for raising this important question. Yes, this approach reflects principles commonly observed in biological learning systems, specifically the different time scales of learning:
- Foundational Development: Certain core circuits and functions may develop on a slower timescale, guided by an innate basis shaped over evolutionary time. This provides a foundational framework for subsequent learning (analogous to BPTT pretaining).
- Rapid, Contextual Adaptation: Building upon this foundation, biological systems exhibit rapid adaptability to specific environments and tasks. This occurs on a much faster timescale, and is thought to occur through fine-tuning of pre-existing circuits (analogous to fine-tuning with a local learning rule).
For example, infants possess a general capacity for movement, learn to walk relatively slowly, but quickly adapt to different terrains like ice. While BPTT plays a significant role in the credit assignment during pretraining, the rest of this work focuses on alternative, biologically plausible credit assignment during rapid, contextual adaptation, on perturbation not seen during pretraining.
Given that the primary goal of this paper is to improve understanding of an existing learning algorithm ...
Thank you for raising this important point. We now include extra figures with this rebuttal that include the final adaptation to perturbation for different learning algorithms as a function of learning rate and hidden size (extra figures B+C). We also include results from an additional, commonly used synthetic task (where the network has to produce a sine and a cosine wave given some frequency, see cartoon in extra figure E), showing that our results remain qualitatively the same (extra figure F+G), expanding the generality of our method. We will add these results to the supplementary figures of the final paper.
How does this work relate to hierarchical predictive coding, and to the feedback connections introduced by Hinton in [1] (and further explored by [2])?
In the FF network architecture, the activity vectors from the layer l+1 at previous timestep contribute to the activity vectors at layer l, which is a common feature with our work. However, nature of and reason for this contribution is different. In our case, the information communicated between the layers is a global error to control the activity space online. In FF, the information communicated uses layer activities themselves and is done to sync up local learning between different layers. The follow up work links FF to the predictive coding framework, where the error neurons are explicitly represented. We link our work to other relevant literature in predictive coding in our discussion.
The learning setting presented in this work seem very similar to the setting of reinforcement learning...
Even though, in principle, feedback control studied here resembles a reinforcement learning (RL) problem due to the interaction with the environment, there are some major differences from classic RL problems. For instance, unlike many RL problems, the feedback here is an explicit function of the network's prediction. Moreover, during pretraining, we are backpropagating through the environment itself, which is generally assumed to be impossible in most RL problems. However, our results may be relevant to certain RL subfields, such as movement control in robotics, and we briefly mention this in the "Impact Statement" section of our manuscript. Moreover, it may be interesting to expand our work to settings where feedback is less explicit (as we note in our limitations section), which would more closely link it to RL problems.
I'd like to see an expanded discussion ...
Thank you for this suggestion. As explained above, here we use BPTT for pre-training of the networks only.
Thank you for the detailed response.
I still hold the opinion that this work, while interesting, offers limited novelty over previous works it builds upon, and the findings are not sufficiently surprising given the context to be considered a major contribution. In addition, even though BPTT is used only for pre-training, there are no guarantees that the representations learned via BPTT and via biological learning rules are similar, and thus no guarantees that the findings during fine-tuning would equally apply to biologically learned representations.
However, many of my concerns have been addressed. Having read the reviews of other reviewers and the subsequent discussions, I now believe this work passes the bar for acceptance. I am changing my score from 3 to 5 to reflect this.
The paper studies the effect of feedback control on motor learning in recurrent neural networks, finding that feedback control improves learning performance and better aligns with the true gradient w.r.t. the task.
优点
- Alignment with the true gradient is an interesting result and helps explain why feedback works
- The authors study alignment from different perspectives (e.g. step-wise/full gradients, Newton method)
- The task the authors consider is widely used in monkey experiments, therefore it should be possible to adapt the conclusions to real data or use them to guide new experiments
缺点
- The training setup is rather limited; it would be interesting to see training done for other tasks and architectures (or RNN sizes).
- The paper might benefit from some theoretical analysis of why the feedback signal alings with the true gradient, although it’s not clear if that can be easily done.
问题
What is the difference between RFLO and RFLO+c? Does the first lack the feedback term in Eq. 1? This should be clearly stated within Sections 2.2-2.4.
Line 141: “outout” Line 141: “ is increasingly by”
局限性
The authors have addressed the limitations.
Thank you for taking the time to review our paper.
The training setup is rather limited; it would be interesting to see training done for other tasks and architectures (or RNN sizes).
Thank you for raising this important point. We also trained RNNs on an additional, commonly used task, where the network has to generate both the sine and cosine wave given some input frequency (see cartoon in extra figure E). Here, the relevant perturbation is a frequency shift. Our adaptation to perturbation results remain qualitatively the same (extra figure F and G), expanding the generality of our method. We also show results for different RNN sizes (extra figure C, final adaptation loss as a function of learning algorithm and hidden layer size) - where we mostly see the expected increase in accuracy post adaptation with increased network capacity. We will include both of these results as an additional supplementary figure in the final manuscript.
The paper might benefit from some theoretical analysis of why the feedback signal alings with the true gradient, although it’s not clear if that can be easily done.
Given that the feedback weights approximately align with the transpose of the forward weights and the feedback is a linear projection of the loss function derivative (the error), the (approximate) alignment of the feedback with the gradient of the network activities is somewhat expected. However, why this would aid local learning is not entirely clear. In Section 3.4 and Figure 4 of our work, we address this question empirically by demonstrating that the output of the networks with feedback control relies less on past and more on present states. This reliance reduces the bias introduced by the severe Jacobian approximation by the eligibility trace, as studied by Marchall et al. (2020), making it more accurate. We acknowledge that further theoretical analysis could provide additional insights and leave this exploration for future work.
What is the difference between RFLO and RFLO+c? Does the first lack the feedback term in Eq. 1? This should be clearly stated within Sections 2.2-2.4.
Thank you for highlighting the lack of clarity in Figure 3(b). You are correct in that +c indicates the presence of feedback in Eq.1 during adaptation to persistent perturbation via local or non-local learning. We now clarify this in both the figure caption and methods section 2.4:
"Note that when the same weight matrix is used for both control and learning, we denote this by adding to the respective learning rule."
Line 141: “outout” Line 141: “ is increasingly by”
Thank you for catching these typos!
Thank you for the clarifications and additional experiments! Having read other reviews and responses, I'm keeping the score of 7 as I think it's an interesting work.
We thank all the reviewers for their careful review of our manuscript and all the insightful comments! Here, we attach a single page with extra figures to support our individual rebuttals below.
This paper presents a recurrent neural network model focused on feedback control-based motor learning. The authors use a motor learning task commonly used in monkey electrophysiology to investigate the role of feedback control in learning. The results are well-presented, the paper is well-written, and the figures are clear and effective.
The reviewers unanimously support the paper's acceptance, although their levels of enthusiasm vary. Nonetheless, they all acknowledge the paper’s significant contributions. Based on my reading of the discussions, most of the reviewers' comments have either been addressed or are not seen as barriers to the paper's acceptance at NeurIPS. The comments and discussions also raise several interesting points that I encourage readers to consider.
Given these factors, I recommend this paper for acceptance.