Brain-inspired Geometry Constrain on Represention for Compositional Generalization
摘要
评审与讨论
This paper takes inspiration from neuroscience and how the human brain employs compositionality and attempts to translate this into the space of artificial neural networks. They propose a method called “Minimal Distances Variance” (MDV) which is a regularization technique whose goal is to guide a classifier towards a better organized representation space (that resembles the organization of representations in the human brain).
优点
Figure 1 looks cool. I appreciate the effort that the authors put into this Figure. It probably conveys their approach better than any of their written text.
缺点
-
What does “Constrain on Representation” in the title mean? Is there a typo here? While typos can generally happen, I think there should not be any typo in the title of a submitted conference paper.
-
In the Abstract, you mention “neural representations” in the second sentence. I am not sure I can follow. What does “neural representations” mean here? The representations of an artificial neural network? The representations in a human brain? This does not come across clearly. The term has to be defined before.
-
I tried really hard reading this paper but it is incredibly poorly written. I cannot follow at most times.
-
I am not really sure I understand what is going on here in general. I have never heard of CG, PS, or any of the datasets they evaluated their method on. Most settings seem to be contrived.
-
On page 6, the authors claim that “[...], PS offers a potential approach to understanding deep neural networks, with mechanisms resembling those of the human brain, which may lead to more interpretable and transparent AI.” I highly doubt that. It is not even clear to me how PS could help “understand neural nets with mechanisms that resemble the human brain”
-
In Figure 2, the standard deviations are incredibly large and overlap between their regularization technique and vanilla logistic regression. So, what this figure shows is that there is no difference between their method and logistic regression. I am not sure what else I should take away from this figure.
-
What is the baseline? They compare their method against a baseline but it is unclear to me which method the baseline is. It does not seem to be explained anywhere. Is the baseline logistic regression? Or am I missing something crucial?
问题
This paper needs a major revision. The revision for it to become a high-quality paper that I am comfortable accepting to ICLR would take much more time than we all have during the rebuttal period.
The authors propose to investigate the parallelism score of Bernardi et al. 2020 (a measure of representational geometry related to abstraction ability) in the context of artificial neural networks.
To do this, they define a parallelism score between the centroids of representations for each class with respect to different conditioners. They then validate this score by creating datasets with a variety of levels of parallelism (and variance) and measure the correlation of accuracy of downstream linear classifiers trained on these datasets with the known parallelism of the datasets. They indeed show that their metric is correlated with compositional generalization (accuracy on unseen combinations of inputs requiring abstract generatlization) on these synthetic datasets. Further, they test this correlation with a wide variety of pre-trained models, and three separate test datasets, again showing highly robust correlation, validating their metric.
They then introduce a version of their parallelism score which can be used as a regularization term, and prove that it is an unbiased estimator, allowing for robust optimization. They improve this regularization term with a sort of maximum margin loss they call ‘distance variance’ to avoid certain failure cases. Finally, they test models using this regularization term on four datasets: Shapes3D, PACS, Office-Home, and NICO. They show that their model achieves the highest accuracy on these datasets when compared to simple baselines, and further ablation experiments validate the positive impact of their additional terms.
优点
- The paper addresses the issue of representational geometry from the exciting perspective of the parallelism score introduced by Bernardi et al.. This is a perspective which has to-date not been given significant attention in the machine learning community, however it has been demonstrated to be highly relevant in biological systems, and therefore the increased attention is very welcomed. Furthermore, this adds to the originality and significance of the paper.
- The paper's methodology appears fairly rigorous and sound, where they perform a fairly extensive evaluation to validate that their proposed score correlates with generalization on both synthetic and real data with a suite of pretrained models and datasets.
- The paper achieves the goal of improving compositional generalization through an induced parallelism (although the results on that front are quite limited).
缺点
- The writing of the paper is very poor, and there are many typos throughout. At this papers current stage, it is not fit for publication. The authors should carefully proofread the text and potentially request the assistance of others if necessary. If the authors fix this, then the remainder of the paper would be above the marginal acceptance threshold in my opinion.
- The experiments which demonstrate the benefit of the proposed regularization term are quite limited and the paper would benefit from a expansion of this section of the paper as this appears to be the most impactful contribution.
- There is no code released for the work.
- Model details and baselines should be explained in the main text not appendix. (Baselines are actually never defined as far as I can tell.)
问题
- Can the authors explain the negative correlation between PS-Class and PS-Domain again? I understand the models are trained to disregard domain-related features, but I would expect this leads to no correlation between PS and CG for domain, not negative correlation?
- Can the authors explain what the baseline models are for each experiment?
This submission proposes new objective functions to learn representations for compositional generalization. Some of the techniques were inspired by recent observations in neurophysiological observation in context-dependent decision making tasks.
There seems to be modest amount of innovation in this paper. The Introduction motivates the use of the parallelism score from neuroscience studies, but it turns out that incorporating this score to the training objective does not fix the problem. The authors thus additionally proposed a regularization based on “minimal distance variances”.
While the results show some improvements, their significance is unclear. In addition, the presentation and writing need major improvements.
优点
The attempt of using insights from neurobiology to constrain objective functions in machine learning is interesting.
The paper combines some theoretical reasoning with empirical evaluation.
缺点
The study seems to be making some implicit assumptions about the underlying data structure that makes the parallelogram-like latent representation helpful for generalization. It is unclear what these assumptions are.
The writing and presentation would need major improvements before this paper can be published.
The objective proposed seems to be conceptually similar to the ones proposed previously to do visual analogical reasoning (not cited in the paper), e.g., Reed, Scott E., et al. "Deep visual analogy-making." Advances in neural information processing systems 28 (2015).
问题
Many things could be improved. I would suggest to start with the improving the writing and presentation.
How well the model generalizes likely depends on the structure of the data. I feel that there is a lack of discussion on the underlying assumptions of the properties of the data distribution that would make the proposed geometrical configurations ideal for generalization.
The limitation of the approach should also be acknowledged.
It would be useful to discuss how the proposed method is different from the objectives functions in deep visual analogy-making as mentioned above.
The authors introduce Parallelism Score (Bernardi et al 2020) in neural networks. It was found that networks trained to optimize Parallelism Score performed better in compositional generalization.
优点
There seems to be consistent improvement with optimizing Parallelism Score.
缺点
The paper is poorly written, very difficult to read. Needs major overhaul in writing.
问题
N/A