4.5

/10

Rejected4 位审稿人

最低3最高5标准差0.9

3.8

置信度

正确性2.8

贡献度1.8

表达2.5

NeurIPS 2024

Neural Synaptic Balance

Pierre Baldi,Alireza Rahmansetayesh

OpenReview PDF

提交: 2024-05-14更新: 2024-11-06

TL;DR

The paper presents a theory of neural synaptic balance based on systematic relationships between the input weights and the output weights of neurons.

摘要

For a given additive cost function $R$ (regularizer), a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward layered networks of ReLU units trained with $L_2$ regularizers, which exhibit balance after proper training. We develop a general theory that extends this phenomenon in three broad directions in terms of: (1) activation functions; (2) regularizers, including all $L_p$ ($p>0$) regularizers; and (3) architectures (non-layered, recurrent, convolutional, mixed activations). Gradient descent on the error function alone does not converge in general to a balanced state where every neuron is in balance, even when starting from a balanced state. However, gradient descent on the regularized error function must converge to a balanced state, and thus network balance can be used to assess learning progress. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Finally, and most importantly, given any initial set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic algorithm to the same unique set of balanced weights. The reason for this convergence is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. The theory is corroborated through simulations carried out on benchmark data sets. Balancing operations are entirely local and thus physically plausible in biological and neuromorphic networks.

关键词

neural networksdeep learningactivation functionsregularizationsscalingneural balance

评审与讨论

审稿意见

评分: 5置信度: 32024-07-09

This paper aims to study and explain the phenomenon of neural synaptic balance, where a balanced neuron means that the total norm of its input weights is equal to the total norm of its output weights. Particularly, the authors study the reasons why and when randomly initialized balanced models (so, models whose neurons are balanced) tend to be balanced at the end of training as well. The study takes into account many different components of neural networks (activations, layer kinds, regularisers).

优点

The study is very comprehensive, and sheds light on some interesting properties of deep neural networks.

缺点

While it is true that, as the authors state in the conclusion, neural synaptic balance is a theory that is interesting on its own, I would encourage the authors to expand the discussion on possible application domains of this theory. Why is it interesting? What are the advantages that a complete understanding of such phenomenons could bring to the table?

问题

Backpropagation is not biologically plausible, and hence does it really make sense to state that the methods proposed by the authors are, if they are then applied to backdrop-based models? I would suggest to either remove such a discussion, or to expand on it, showing even empirically on small models, that the results extend to different kinds of neural networks, where both neural activities and synapses are updated locally in a bio-plausible way (PC). A third way of addressing this would be to add a discussion on it, and avoid to do the experiments.

局限性

No concerns here

作者回复

2024-08-05

We thank reviewer VcLX for the positive review of this work and insightful comments.

While we have focused here on developing the theory of neural synaptic balance, neural synaptic balance has practical applications. It can be viewed as an additional, complementary, method of regularization on par with other methods, such as dropout. It is based on a rigorous theory that connects it to convex optimization. And finally, it may have additional applications in biological or neuromorphic systems, due to the locality of the balancing operations. The interesting fact about neural balance is that, while balancing a single neuron may ruin the balance of its adjacent neurons, iterated stochastic balancing of all the neurons in a network leads to a unique, stable, configuration of the weights (the globally balanced state).

Regarding the biological implausibility of backpropagation, we will add a discussion to the final version. Note that the balancing algorithm presented in our work can be applied to a network after training with any learning rule. In other words, Theorem 5 does not depend on the training algorithm and balancing can be applied to any set of weights, at any time, during or after learning, and with any cost function. [For example, one could train a network with L2 regularization and apply L1 balancing to the weights after the training is complete.].

审稿意见

评分: 3置信度: 52024-07-12

The authors present a theory of neural synaptic balance, defined as the condition in which a total loss achieves the same value for the input weights to a neuron and its output weights. This is different from the well studied E/I balance in neuroscience and machine learning literature. The authors show mathematical derivations of how to balance a neuron without affecting the outcome of the network and show that balancing a network is a convex optimization process.

优点

The paper is overall clear and detailed, the mathematical proofs are sound and the paper structured well moving from straightforward claims to less trivial points.

缺点

The paper is about neural synaptic balance, but the authors do not provide convincing motivation why we should care about such balancing. As they mentioned, adding a simple L2 regularizer will balance the network naturally (in a distribution sense, not necessarily each neuron individually) during training and have other well-known benefits, so the elaborate mathematical derivations on the general balancing process seem redundant. In addition, in the authors' own plots, unbalanced networks sometimes outperform the balanced networks (e.g., fig 3E), which just emphasizes the point. One of the mentioned motivations is biological neurons, but they claim that biological neural data about synapses do not exist. However, they could test their hypothesis against the currently available connectomes e.g., from or the Drosophila fly brain. They mention spiking networks, but the notion of input-output homogeneity is unclear in spiking networks. Finally, physical neurons' energy consumption is mentioned without details.

问题

Why is the energy consumption of physical neurons lower when they are balanced? Why not just have a regularizer to keep the overall activation low and weights small? Why does each neuron need to be balanced separately?

局限性

The whole framework is specific to BiLU neurons or perhaps to other power-law functions. The relevance to spiking neurons is therefore questionable. It is also questionable as a general principle for machine learning.

作者回复

2024-08-05

We thank reviewer QTyq for the positive review of this work and insightful comments.

"Why is the energy consumption of physical neurons lower when they are balanced?" Because the balancing algorithm also decreases the norm of weights.

Why not just have a regularizer to keep the overall activation low and weights small? Using the balancing algorithm is indeed another way by which we can achieve a balanced state while keeping the overall activation low and weights small. We have shown that using a regularizer is not the only way to achieve a balanced state and keep the overall activation low and weights small.

Why does each neuron need to be balanced separately? It is more elegant and biologically (or neuromorphically) more plausible to be able to achieve a global balanced state through local rules that each neuron can apply independently of all the other neurons in the network, at any point in time, in a completely asynchronous way. In other words, neurons do not need to exchange information between each other in order to achieve a global balanced state. Global order emerges from local order.

审稿意见

评分: 5置信度: 42024-07-13

This paper provides a thorough characterization of regularizers which lead to synaptic balance (when the "cost" of input weights to a neuron or pool of neurons is tied to the cost of output weights) in trained neural networks. Their results apply to many different activation functions and architectures.

优点

The paper is very well-written and easy to follow. I was able to read everything, including the math, smoothly. The mathematical arguments themselves are crisp and correct, which I really appreciated.

缺点

The paper is strongly lacking in motivation. I never really understood why I should care about synaptic balance. Also, it is clear from the numerical experiments that synaptic balance only emerges in networks when it is enforced via a regularizer (expect in the case of infinitely small learning rate), but why is this surprising? It seems obvious that adding a regularizer for some property tends to result in that property. It would be shocking if synaptic balance occurred without some regularization towards the property. Thus, while the "what" and "how" of the paper are nicely addressed, I feel the paper is missing the "why". I believe if the authors could address this from the outset, it would make the paper much stronger, and I would of course be willing to increase my score.

问题

-It is claimed throughout the paper that "network balance can be used to assess learning" progress. I do not really understand how. If my total loss $\mathcal{E}$ is the sum of a task loss $E$ and a regularizer $R$ , then there is nothing preventing a situation where I get $E = 0$ and $\mathcal{E},R > 0$ , meaning that task loss is decoupled from the network balance loss. If the authors could clarify this point, that would be great.

Small typos:

Line 128: alpha is not rendered in latex
Figure 4 caption, subplot (D-F) "CFAR10" -> "CIFAR10"

局限性

Yes.

作者回复

2024-08-05

We thank reviewer cT2m for the positive review of this work and insightful comments.

Synaptic balance does not necessarily emerge in networks trained with a regularizer (unless they are trained very carefully, with very small learning rates, etc). Our work shows that one can obtain synaptic balance without a regularizer, simply by applying the balancing algorithms described in the paper during training or just at the end of training. However,reviewer cT2m is right that we could have provided a clearer motivation. In addition to the theoretical motivations, there are also practical motivations as discussed in the overall Rebuttal. In particular, balancing can be viewed as an alternative way of regularizing networks, in the same way that dropout is viewed as an alternative or complementary way of regularizing networks. This will be made clear in the revised version.

The surprising result in our work is that without any regularization, if each neuron tries to balance its input and output synapses independently (without any coordination with any other neurons) the network reaches a unique, stable, and globally balanced state. Thus a unique global order emerges from local, independent, balancing operations.

By the term “network balance can be used to assess learning” we mean that if a network trained by regularized SGD is in a balanced state and does not move from it, then the gradient must be zero and the learning must have converged. Conversely, if the state is not globally balanced, then learning has not fully converged.

All the minor points are fixed in the revised version.

评论- Changed Score to Borderline Accept

2024-08-10

I thank the authors for their reply. I have increased my score to a borderline accept.

I am still confused by this point:

"By the term “network balance can be used to assess learning” we mean that if a network trained by regularized SGD is in a balanced state and does not move from it, then the gradient must be zero and the learning must have converged. Conversely, if the state is not globally balanced, then learning has not fully converged."

Could the authors please be more precise as to which gradient they are talking about? The total gradient (i.e., including the regularizer), or the gradient of the "task" component of the overall loss function?

评论- which gradient

2024-08-11

We thank this reviewer for appreciating our reply. Bu "regularized SGD" we refer to the "total gradient". In any case, we will revise the text to remove any confusion.

审稿意见

评分: 5置信度: 32024-07-13

The authors provide a theoretical approach to the analysis of balanced neurons and networks. Their theoretical work includes proof of the convergence of stochastic balancing. In addition, they investigate the effect of different regularizers and learning rates on balance, training loss, and network weights, including practical simulations for two classification problems.

优点

The paper tries to reveal the inner structure of neural networks during the training phase. This is a very important but difficult problem; its solution could provide new insights for developing better training algorithms. The work proposed can ultimately be an important step toward more transparent networks as opposed to their current black box character.

缺点

The paper has some weaknesses, most notably how the material is presented and part of the evaluation.

Theorem 5.1, dealing with the convergence of stochastic balancing, is arguably the central piece of the paper. However, its formulation is bulky and should be reduced to a shorter, more manageable size, potentially with the help of lemmata. This becomes apparent when seeing that its proof contains the proof of another proposition.

In Figure 4, the authors say that these panels are not meant for assessing the quality of learning. However, measuring not only the training loss but also the accuracy on a test set will give important insights. How does the classification performance relate to the degree of balancing? Why did the authors not include this analysis? It could give important insights into the relationships between overtraining, generalization capability, balance, and accuracy.

The author should discuss the consequences of their work on network training. They do not discuss the immediate practical consequences or any recommendations they can make based on their results.

问题

It would help the paper's clarity if the authors answered their own questions in a brief summary at the end of the paper, as concise as possible:

Why does balance occur? Does it occur only with ReLU neurons? Does it occur only with L2 regularizers? Does it occur only in fully connected feedforward architectures? Does it occur only at the end of training? And what happens if we balance neurons at random in a large network?

局限性

The authors could be more specific about the consequences of their work, including limitations. For example, can they recommend any specific learning rate, network structure, or other features for optimal training?

作者回复

2024-08-05

We thank reviewer TDzF for their positive review of this work and insightful comments.

Regarding Theorem 5.1, the reviewer has mentioned a fair point. In the revised version we will shorten Theorem 5.1, and move Proposition 5.4 and its proof outside of the proof of Theorem 5.1.

Regarding Figure 4, for a fixed set of weights, synaptic balancing does not change the input-output function of the network, as shown by the theory. Thus, for a fixed set of weights, we do not expect to see any change in performance after applying the balancing algorithm. The new figure attached to our rebuttal does what this reviewer is asking for, which is showing the regularizing effect of balancing throughout learning.

As explained in our general response, we will add text on the application of synaptic balancing to regularization and cite additional work.

We will add a brief summary at the end of the revised version to improve the paper's clarity.

评论- Response to Rebuttal

2024-08-13

Thanks for adding the figure, which should improve the quality of the presentation. I have upgraded this part of my grading. Compared to the other reviewers, I am more easily convinced that this work could lead to a better understanding of how neural networks operate. However, I agree with the other reviewers that this needs to be better motivated. The feeling is that something important is missing. If we only knew what.

作者回复

2024-08-05

We thank the reviewers for appreciating our work and for their insightful comments. We have provided a separate response to each reviewer. The primary goal of our paper is to present the theory of synaptic balancing in neural architectures and the main theorem (Theorem 5.1) connects synaptic balancing to convex optimization. The simulations included are meant to corroborate the theory.

Overall, the main criticism is that we should have included additional information regarding the regularization value of synaptic balancing in the motivation section or in the conclusion. This is a fair point. The reason we did not make this point as clear as we should have done is that we focused primarily on the main result (Theorem 5,1) establishing the properties of the balancing algorithm. Although we discuss the applications of the balancing algorithm, we should have given more space to this. While Theorem 5.1 remains the cornerstone of synaptic balancing, in the revised version, we will make space for additional text and a new figure to describe the regularization applications of synaptic balancing. We will free up space primarily by shortening the proof of Theorem 5.1 [as described below], since the complete proof is available anyway in the supplementary material. The new figure is attached to this rebuttal.
We will add a few sentences on regularization in the motivation section, and in the new conclusion we will make very clear that:

synaptic balancing is a novel approach to regularization;
synaptic balancing is very general in the sense that it can be applied with all usual cost functions, including all L_p cost functions;
synaptic balancing can be carried in full or in partial manner (due to the convexity property in Theorem 5.1);
full or partial synaptic balancing can be applied effectively at any time during the learning process: at the start of learning, at the end of learning, or during learning, by alternating balancing steps with stochastic gradient steps.
simulations show that these approaches can improve learning in terms of speed (fewer epochs), accuracy or generalization abilities (see examples in new figure). Thus, in short, balancing is a novel effective approach to regularization that can be added to the list of tools available to regularize networks like dropout and other regularization tools.

We hope the reviewers will agree that this addresses their main concern and that synaptic balance is a novel theoretical and practical topic worthy of being presented at the NeurIPS conference.

最终决定Reject

2024-09-25

This paper studies neural synaptic balance that happens when the total cost of a neuron's input weights is equal to the total cost of its output weights. While the paper presents a clear and mathematically solid study of when this phenomenon arises. However, the motivation for studying neural synaptic balance is not convincingly articulated.