PaperHub
7.5
/10
Poster4 位审稿人
最低6最高10标准差1.7
8
10
6
6
3.3
置信度
ICLR 2024

Estimating Shape Distances on Neural Representations with Limited Samples

OpenReviewPDF
提交: 2023-09-24更新: 2024-03-16
TL;DR

Novel estimator of geometric similarity with tunable bias-variance tradeoff, outperforms standard estimators in high-dimensional settings.

摘要

关键词
representational geometryshape metricsdissimilarity metrics

评审与讨论

审稿意见
8

This paper extends some works from shape analysis into the space of neural networks. That is, how does one measure the distance between two NNs where a NN is represented as a mapping h:features ->R^N. The main idea is to think of this mapping into the space of point clouds and hence one can use a Kendall Shape space type distance. The key difference contribution I see is the fact the point clouds have some stochastic properties to them so the shape distance is measuring the distance between expectations. The authors spend a great deal with considering the bounds of these distances.

优点

A key component of this paper is the bounds on the distances when using an estimator for the cross covariance estimator. The authors present Lemma 1, 2 and Theorem 1 to do this (and theorem 2). The authors explain this bound well as it makes sense that the higher the dimension, the more landmarks need to be observed.

The authors show how under a variety of scenarios their estimator is performing. The variety is suitable.

缺点

When referencing the shape distances such as (2) and (3) the authors cite Williams 2021 but these are simply common shape distances which have been around for much longer. Unless this specific formulation is novel to Williams 2021 I don't see how it's different than just measuring the distance on a circle.

When considering the "failure modes of plug-in..." there is the missing component of mentioning scale. If h_i differ in scale their ρ\rho distance can be large but their θ\theta distance can be very small as it rescales in the denominator.

Some of the figure axis are impossible to read.

问题

When setting up the background, the authors define hi:ZRnh_i:\mathcal{Z}\rightarrow \mathbb{R}^n as a function representing each neuron. I have no problem with this definition. My question is, why do the authors limit their writing to that of NNs while this is true for a much wider class of functions. I may have missed something in an assumption but it seems these functions can be quite generally considered.

For (5) isn't the last equation enough?

评论

We thank the reviewer for their positive and thoughtful comments on our work.

When referencing the shape distances such as (2) and (3) the authors cite Williams 2021 but these are simply common shape distances which have been around for much longer. Unless this specific formulation is novel to Williams 2021 I don't see how it's different than just measuring the distance on a circle.

We now clarify that the shape metric estimator is a common formulation and include older citations in the introduction. Specifically, we emphasize in a new paragraph 3 of the introduction that although shape distances are drawn from an older, established literature, they were, to the best of our knowledge, proposed as a method to measure similarity in neural network representations by (Williams et. al. 2021). We then cite this paper frequently due to the fairly self-contained appendix that explains these concepts while using notation that is similar to ours.

When considering the "failure modes of plug-in..." there is the missing component of mentioning scale. If hih_i differ in scale their distance can be large but their distance can be very small as it rescales in the denominator.

The reviewer’s understanding is correct. We have now added a sentence to convey this explicitly to the reader under the definitions in equations (2) and (3). .

Answers to questions:

When setting up the background, the authors define as a function representing each neuron. I have no problem with this definition. My question is, why do the authors limit their writing to that of NNs while this is true for a much wider class of functions. I may have missed something in an assumption but it seems these functions can be quite generally considered.

We agree that the estimator is broadly applicable, we only need IID responses from a function where the responses have finite covariance. We have crafted our exposition to make the paper more accessible to practitioners studying neural networks. However, we have edited the text to clarify that the theory we develop can be considerably generalized to measure shape distances between any two functional mappings (not only neural nets).

For (5) isn't the last equation enough?

We have notated the covariance matrices (for systems i and j) and cross-covariance matrices (between systems i and j) separately to make their distinction and definitions in equation (6) clear. We have also reformulated the text above equation (5) to make the definitions easier to parse for the reader.

评论

Thanks for the comments, albeit a bit late in the process.

Since the estimator is more broadly applicable than NN's, it would be useful to mention other applications. I do not expect more experiments, simply stating other areas of application would suffice. My issue is that the paper almost handicaps itself into a very particular scenario which could hinder its impact.

评论

Since the estimator is more broadly applicable than NN's, it would be useful to mention other applications. I do not expect more experiments, simply stating other areas of application would suffice. My issue is that the paper almost handicaps itself into a very particular scenario which could hinder its impact.

We thank the reviewer for their feedback on increasing the reach of our paper. We have added text to the intro that (1) emphasizes how shape metrics have broad applicability across scientific disciplines with references to papers on these applications and (2) explains the challenges that arise when estimating shape metrics for neural representations from biological and artificial neural networks in particular (i.e. noisy and high-dimensional data).

Below is the revised text from the introduction for easy reference:

Many approaches have been proposed to quantify similarity in neural network representations. Some popular methods include canonical correlations analysis (Raghu et al., 2017), centered kernel alignment (CKA; Kornblith et al., 2019), representational similarity analysis (RSA; Kriegeskorte et al., 2008a), and shape metrics (Williams et al., 2021). Each of these approaches takes in a set of high-dimensional measurements from two networks—e.g., hidden layer activations or measured biological responses—and outputs a (dis)similarity score. Shape distances additionally satisfy the triangle inequality, thus enabling downstream algorithms for clustering and nearest-neighbor regression that leverage metric space structure (Williams et al., 2021). Shape metrics have numerous applications in the physical sciences (Saito et al., 2015; Andrade et al., 2004; Rohlf & Slice, 1990) and while the use of similarity metrics in the study of both biological and artificial neural representations has grown in popularity (Kietzmann et al., 2019; Storrs et al., 2021; Kriegeskorte et al., 2008b; Maheswaranathan et al., 2019; Nguyen et al., 2021; Klabunde et al., 2023) this setting has introduced statistical issues that have not been adequately addressed. Specifically, shape metrics are often applied to low-dimensional noiseless measurements (e.g., comparing 3D digital scans of anatomy across animals (Rohlf & Slice, 1990)) whereas in the study of neural networks the applications have been high-dimensional (e.g., comparing neural activity between brain regions (Kriegeskorte et al., 2008b)). Here we demonstrate that in both biological and artificial neural networks the bias of these metrics can be significant because of the high noise and dimensionality inherent to the study of neural representations.

Despite the challenges of applying similarity metrics to high-dimensional, noisy measurements, there is little work on quantifying accuracy (e.g. through confidence intervals or characterizing bias) on estimators of representational similarity with the noteworthy exception of research on RSA (Cai et al., 2016; Walther et al., 2016; Schutt et al., 2023). This poses a serious obstacle to adoption of these methods, particularly in experimental neuroscience where there is a hard limit on the number of conditions that can be feasibly sampled (Shi et al., 2019; Williams & Linderman, 2021).

审稿意见
10

The paper theoretically quantifies the performance of some typical shape distances between neural representations and the dependence of the performance with respect to the number of samples. The authors find that typical methods have low variance but high variance, and instead propose a new method that enables a tunable bias-variance tradeoff.

优点

  • The paper contains significant insight into the performance of 'plug-in' estimators of shape distance between two neural representations and the performance dependence on the number of samples and the dimensionality of the ambient space.
  • The paper identifies the key component behind the non-ideal performance of these estimators (the Σij||\Sigma_{ij}||_\ast term) and proposes a simple but intuitive and effective estimator to allow a tunable bias-variance tradeoff.
  • There are adequate experiments on synthetic data to validate this new estimator.

缺点

  • The paper should contain some more intuition-building sentences so that the mathematical formulation is more easily digestible. However, the authors have done a really nice job of making the mathematical foundations and derivations themselves clear.
  • One small experiment with VAEs (even if it is confined to the appendix) might be useful for quantifying the performance of stochastic networks.

问题

NA

评论

We thank the reviewer for their positive and thoughtful comments on our work. We agree that the mathematical details of this paper can be difficult to digest. To address the reviewers point that our paper should contain more intuition-building sentences to help the reader digest the mathematical formulation, we have added the following:

  • Edited wording in front of eq. 1 to make clear that these are assumptions.
  • Beginning of appendix A: “We can intuitively think of the Procrustes distance as the Euclidean distance between two vectors remaining when the rotations and reflections have been “removed”. Similarly, the Riemannian shape distance can be thought of as the angle between two vectors after these rotations and reflections are removed. These definitions in eq. (2) and eq. (3) also make clear that Procrustes distance, like Euclidean distance, is sensitive to the overall scaling of hih_i or hjh_j, while the Riemannian shape distance, like the angle between vectors, is scale-invariant.”
  • Added wording above eq. (5) to improve clarity.
  • We have revamped the introduction and edited the text (in response to this and other reviewer’s comments) to improve clarity as much as possible while remaining within the page limit.

We have now included an experiment where we compare the representation of two neural networks trained on the same task but from different initialization points and with different training procedures (Appendix E: Applications to deep learning). Interestingly we find that the plug-in estimator, similarly to simulations, shows a high bias that slowly converges as we increase the number of samples, but the moment estimator maintains little bias regardless of the number of samples (Fig 5). We discuss how the ways in which the plug-in estimator’s bias can depend on irrelevant nuisance variables including number of samples and effective dimensionality may lead to erroneous scientific conclusions about the similarity between networks.

审稿意见
6

In this paper the authors study the estimators of the so-called "shape distance", more precisely Procrustes size-and-shape distance ρ\rho, and Riemannian shape distance θ\theta., that are distances defined on data manifold. They measure the uncertainty (bias and variance) of the empirical estimates of these distances, relying on centration inequalities, and design a new estimator whose bias/variaqnce tradeoff can be controlled. Their theoretical results are illustrated on:

  • a synthetic experiment
  • measure of calcium in the neural activity of a mouse

优点

Clarity

The paper is well written, and the and theoretical results are explained in a way that make it accessible to a broad range of readers. The proof techniques are well explained.

Quality

Experiments are convincing and support

缺点

Out of scope: no link with neural networks

This work has nothing to do with artificial neural networks. In the statement of the problem, the authors assume that thew observations are X=hi(Z)X=h_i(Z) with ZZ the input data and hih_i the neural network. No hypothesis is made on hih_i nor ZZ. The whole paper could be rewritten with XX to get rid of the assumption that the measures are the outputs of a neural network. Even Theorem 2 mentions "neural" for no good reason: authors only meant to talk about the existence of some r.v XX.

The only experiment linked with neural network is in Sec 4.2, and the "neural data" is actually calcium measurements from mouse primary visual cortex, which are typical tabular data.

Overall, a conference or journal focused on statistics seems to be a better match for the content of the article than ICLR.

Novelty

The novelty is poor: all the proofs are classical bias-variance decomposition and concentration inequalities.

Moreover, there exists previous work on the topic that are not covered in the literature review:

Kent, J.T. and Mardia, K.V., 1997. Consistency of Procrustes estimators. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1), pp.281-290.

Goodall, C., 1991. Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society: Series B (Methodological), 53(2), pp.285-321.

问题

Application to artificial neural networks

Which results your method yield in the latent space of an artificial neural network?

评论

We thank the reviewer for their feedback and for their positive comments regarding clarity ("the paper is well-written" and "proof techniques are well-explained") and quality ("experiments are convincing"). Our reading of their review shows two major concerns which we address below.

First, It seems like the major concern is that the work is "out of scope" because there "is no link with neural networks." While we think this is overstated, this was useful feedback that helped us revise and re-frame the paper to appeal to a broader audience.

  • In our revision, we emphasize that Procrustes/shape distances are actively being applied to artificial neural networks in the literature (see: here, here, here, here, here, and here). More broadly, there is vast literature on comparing hidden layer representations in artificial networks (see 24 page review by Klabunde et al.). Notably, this interest is highly interdisciplinary between machine learning, neuroscience, and psychology as reviewed by Sucholutsky et al. This interdisciplinary interest is why we chose to submit this paper to the Neuroscience & Cognitive Science track of ICLR, and why our original submission contained an application to biological networks. Given all of this, plus the positive feedback from the other three reviewers, we think it is clear that our work is a good fit for ICLR.

  • To make our envisioned applications more clear and concrete for the ML community, we now include a new direct application of our method to artificial networks (Appendix E: Applications to deep learning ). We show that plug-in estimators of shape distance converge slowly, and our new method provides a viable alternative (Fig 5). The details of this new experiment are described in our global response to reviewers.

  • The reviewer says our paper has "nothing to do with artificial neural networks" because our theorems are applicable to a more general setting (i.e. they could apply to "any r.v. X"). In one sense this is true—our theorems apply to a fairly broad variety of problems. We argue that this generality is a strength not a weakness! We agree with the comments from reviewer emQE who suggested that we edit the text to explain that our analysis (and the shape distance) can be used to compare any two functions, not just neural networks. Such comparisons between functions may be useful in other contexts within machine learning (e.g. quantifying differences between functions sampled from a Gaussian Process).

Second, the reviewer brings up a concern about "novelty." However, the concern is less about novelty and more about the technical sophistication ("all proofs are classical bias-variance decomposition and concentration inequalities"). We show below this concern is based on a fundamental misunderstanding of what we proved; the techniques involved are considerably more complex than the reviewer suggests. The reviewer also requests that we add two citations, but does not explain why these prior references reduce novelty of our results. Overall, we think these concerns are not generous to our efforts.

  • First, simple but novel mathematical proofs should be viewed as a strength! Contrary to the reviewer, we believe that results are particularly valuable and accessible when the mathematical techniques are established, but the conclusion is novel. This is especially true for an applications-focused venue like ICLR. It would be fair to count novelty as a weakness if our theorems / practical guidance afforded practitioners were already widely known. But the reviewer does not make this case against the paper.

  • We put considerable effort into refining the proof arguments and explaining their intuition to a broad audience. But this clarity should not be confused with the proofs being obvious. Furthermore, we use a broad range of mathematical techniques rooted in several fields. We use random matrix theory (not mentioned by the reviewer) to prove Theorem 2. We use principles from convex optimization (epigraph reformulation) to arrive at a tunable bias-variance tradeoff in our method-of-moments estimator. We use the Matrix Bernstein inequality to prove theorem 1; this inequality is not a "classical concentration inequality" in the usual sense of the term---it was derived in contemporary literature by Tropp (2010), and the proof is quite involved (not just a simple application of Markov's inequality). Altogether, we believe the combination of approaches is non-trivial and merits the community's interest.

评论
  • We thank the reviewer for the references, and we have added them to our literature review in section 2, ‘Background and Problem Setting’, as well as in paragraph 3 of our introduction. These classic papers both motivate the importance and heritage of the problem we consider here, but are largely concerned with the problem of estimating form and shape from noisy and nuisance-transformed data, as opposed to estimating shape distances, and in particular, the convergence properties and uncertainty of estimating shape distances. Specifically, Goodall 1991 covers various methods for mean shape estimation in special cases, but contains only a brief discussion of shape differences, and no discussion of the statistical estimation of shape distances in the same form as we do here. Kent 1997 studies the asymptotic consistency of Procrustes estimators, again of shape, not shape distances, and in the special, restricted case of 2-dimensional shapes (configurations of points in a plane). Our results are non-asymptotic and provide explicit rates of convergence. This is a significant advance over proving asymptotic consistency.

Responses to questions:

Which results would your method yield in the latent space of an artificial neural network?

In our original draft we did not apply the estimator to artificial networks. We have now run the requested experiments and shown in this setting the plug-in estimator shows a significantly larger bias than the moment estimator (see Appendix E and Fig 5).

评论

Thank you for the additional experiment regarding neural networks.

As mentioned by reviewer emQE, the position of the paper is a bit weird because it narrows the contribution to neural networks (for no good reason), with no experiments on neural networks in the initial submission, whereas the nature of the contribution itself is more focused on statistics than neurosciences, despite the Neuroscience & Cognitive Science track you mention.

I'd like to clarify that I did not want to understate the efforts of the authors regarding the proofs, for which I praised the clarity, and the fact it was well explained in my initial review. My criticism had nothing to do with the technical sophistication, I believe this is a misunderstanding of the authors.

I am just wondering why a paper whose major contribution belongs to the field of statistics would position itself in the Neuroscience & Cognitive Science track of ICLR, while offering no new insights in neurosciences or neural networks.

It seems that the submission met at least partially its public (by judging the other reviews), but I will stand by my rating. I am lowering my confidence since I am not familiar with the field.

评论

As mentioned by reviewer emQE, the position of the paper is a bit weird because it narrows the contribution to neural networks (for no good reason), with no experiments on neural networks in the initial submission, whereas the nature of the contribution itself is more focused on statistics than neurosciences, despite the Neuroscience & Cognitive Science track you mention.

Our intent was to focus the message of this paper on neural representations to reach a specific community that has a large presence at ICLR. This was not done for "no good reason" and is not meant to "narrow the contribution" of our work. Rather, our intent is to bring these results to the attention of researchers who may not see them if they were published in a theoretical journal. Our intended audience is still quite broad as it spans research on biological measurements of neural representations (including fMRI, electrophysiology, etc.) as well as artificial networks.

Nonetheless, we agree with the feedback from reviewer emQE that we could improve the paper by mentioning the broader implications of our results. We have now done so extensively in the introduction, bringing it to the immediate attention of readers. We have pasted the text below for easy reference.

Many approaches have been proposed to quantify similarity in neural network representations. Some popular methods include canonical correlations analysis (Raghu et al., 2017), centered kernel align- ment (CKA; Kornblith et al., 2019), representational similarity analysis (RSA; Kriegeskorte et al., 2008a), and shape metrics (Williams et al., 2021). Each of these approaches takes in a set of high- dimensional measurements from two networks—e.g., hidden layer activations or measured biologi- cal responses—and outputs a (dis)similarity score. Shape distances additionally satisfy the triangle inequality, thus enabling downstream algorithms for clustering and nearest-neighbor regression that leverage metric space structure (Williams et al., 2021). Shape metrics have numerous applications in the physical sciences (Saito et al., 2015; Andrade et al., 2004; Rohlf & Slice, 1990) and while the use of similarity metrics in the study of both biological and artificial neural representations has grown in popularity (Kietzmann et al., 2019; Storrs et al., 2021; Kriegeskorte et al., 2008b; Maheswaranathan et al., 2019; Nguyen et al., 2021; Klabunde et al., 2023) this setting has introduced statistical issues that have not been adequately addressed. Specifically, shape metrics are often applied to low-dimensional noiseless measurements (e.g., comparing 3D digital scans of anatomy across animals (Rohlf & Slice, 1990)) whereas in the study of neural networks the applications have been high-dimensional (e.g., comparing neural activity between brain regions (Kriegeskorte et al., 2008b)). Here we demonstrate that in both biological and artificial neural networks the bias of these metrics can be significant because of the high noise and dimensionality inherent to the study of neural representations.

Despite the challenges of applying similarity metrics to high-dimensional, noisy measurements, there is little work on quantifying accuracy (e.g. through confidence intervals or characterizing bias) on estimators of representational similarity with the noteworthy exception of research on RSA (Cai et al., 2016; Walther et al., 2016; Schu ̈tt et al., 2023). This poses a serious obstacle to adoption of these methods, particularly in experimental neuroscience where there is a hard limit on the number of conditions that can be feasibly sampled (Shi et al., 2019; Williams & Linderman, 2021).

Overall, we don't think that it is fair to reject a paper on the grounds that it would also be interesting to a mathematically-oriented statistics community. The litmus test is whether the ICLR community would find this work useful. We think that our paper makes that case strongly and the other three reviews support this perspective.

I'd like to clarify that I did not want to understate the efforts of the authors regarding the proofs, for which I praised the clarity, and the fact it was well explained in my initial review. My criticism had nothing to do with the technical sophistication, I believe this is a misunderstanding of the authors.

The original review stated that “the novelty is poor” with only the explanation that “all the proofs are classical bias-variance decomposition and concentration inequalities.” We have sought to demonstate this claim is untrue in our original rebuttal. From your response, our understanding is that novelty is not a major concern in your consideration of the paper.

评论

The litmus test is whether the ICLR community would find this work useful

I agree; and in this regard, all reviewers (including myself) expressed their view on it. ICLR community is quite broad.

I don't consider myself part of the "Neuroscience & Cognitive Science" sub-community, so I won't speak in its name. Defining the scope is a choice that I prefer to delegate to the AC.

Regarding technical correctness, I don't have much more to say: the paper is well written.

I lowered my confidence score to reflect this: I don't want to remain "grumpy reviewer 2" for too long if other readers found the paper useful.

审稿意见
6

The paper proposes a novel estimator for distances between neural representations from both biological and artificial neural networks. The key technical contribution is a new moment based estimator of the nuclear norm of the cross covariance matrix which has significantly lower bias than the plug in estimator. The authors provide both theoretical and empirical analysis to illustrate the benefits of the proposed estimator.

优点

  1. The problem is clearly identified and analysed theoretically, and the estimator proposed to solve it is principled and technically sound. The theoretical analysis in Section 3 is fairly detailed and appears to be novel. It clearly explains the flaws in the plugin estimator and the subsequent derivation of the new estimator is also well grounded in theory.

  2. Simulations and experiments match the intuition used to derive the estimator (Fig 2A) and clearly highlight the strengths of the estimator over the plug-in baseline.

缺点

  1. The overall motivation of the problem is a bit weak. The experiments don't clearly illustrate the downstream benefits of this work. Further experiments showing tangible gains on at least one real world application would greatly strengthen the paper in my opinion.

  2. The main technical weakness of the approach appears to be its effect on the variance. While deriving the estimator the authors seem to primarily focus on reducing the bias compared to the plug-in estimator and acknowledge at the end of Section 3.3 that the bias is being bounded at the expense of the variance. This is also seen in the experiments (Fig 2) where the proposed estimator appears to have a significantly higher variance than the moment estimator. This raises questions about the reliability of the estimator, specifically it is not clear if it can reduce the mean squared error for all distributions. This is further illustrated in Fig 4 where on real neural data it appears that other than the case where similarity is set to 0 (A) the plug-in estimator performs comparable to (C, D) or better than (B) the proposed estimator.

问题

  1. In the last equation on page 5, why is x[0,1]x \in [0,1]?

  2. In Section 4.2 it is acknowledged that the proposed estimator is highly variable in low SNR regimes and so neurons with the highest SNR are selected. Can you provide the relative proportion of such neurons in the presented scenario and also (if possible) comment on their prevalence in general? This is important because if the relative proportion is small then it means that the proposed estimator can only be applied in very few cases.

伦理问题详情

N/A

评论

We thank the reviewer for their thoughtful comments about our work, in particular for highlighting our novel theoretical analysis of this problem.

We disagree with the reviewer’s comment that “experiments showing tangible gains on at least one real world application would greatly strengthen the paper” since our original submission did include an application to experimental neuroscience where data are noisy and sample-limited. Nonetheless we acknowledge that the importance of studying the uncertainty in estimating shape distances may not have come across clearly in our introduction. We have added paragraph 3 in the introduction to make clear that, although the Procrustes and shape distances brought to light by Williams et. al. (2021) are currently being applied in this field to measure representational similarity between both artificial and biological neural networks (see: Schuessler et. al. 2023 (https://arxiv.org/abs/2307.07654), Lange et. al. 2022 (https://arxiv.org/abs/2206.10999), Boix-Adsera et. al. 2022 (https://arxiv.org/abs/2210.06545), Alvarez-Melis et. al. 2018 (https://arxiv.org/abs/1806.09277), and Ding et. al. 2021 (https://arxiv.org/abs/2108.01661)), there has been little progress in understanding the theoretical and empirical behavior of the estimators for these distances. This is a major gap in understanding, because—as we show—the naive plug-in estimators are biased and can converge slowly, and there are circumstances when less naive estimators such as the estimator we propose can be advantageous. For instance, Figure 2B shows us that in the fixed dimensionality case, one may reduce the variance of an essentially unbiased estimate by increasing the number of stimuli sampled, faster than the bias of the plug-in estimator can be reduced in the same way.

In our additional experiments on an artificial neural network (introduced in the rebuttal period), we can see that the bias of the naive plug-in converges slowly with respect to increasing samples whereas the moment estimator maintains small bias (Fig 5, Appendix E). These findings are relevant to the active research being performed to compare representations in neural networks using Procrustes/shape distances (see citations in previous paragraph). Furthermore, the bias of the plug-in estimator may depend on nuisance properties of the populations being compared leading to confounds. We concretely demonstrate this by showing that the bias of the plug-in estimator depends on the effective dimensionality of the neural populations (Fig 6, Appendix E). Thus plug-in estimated differences across experimental conditions could simply result from differences in effective dimensionality and not the actual similarity. The quantification of bias in the moment estimator allows the practitioner to precisely determine if differences cannot be explained by estimator bias – a highly practical scientific benefit to the novel estimator.

Lastly, we would like to emphasize that the objective of this paper is not to provide a universally ‘correct’ choice of estimator for shape distances, and we have added wording to make this fact more clear (introduction, final paragraph). As the reviewer points out, there are cases when the plug-in estimator performs comparably or better than the method-of-moments estimator. The method-of-moments estimator then provides the practitioner a way to study the uncertainty of shape estimates and explicitly trade off bias and variance by imposing a bias constraint. The reviewer is correct that this can mean for some choices of bias constraint that the variance becomes so large that the estimator is not useful. However, this ability to trade off bias and variance with a tunable parameter is, we believe, a significant contribution. We also emphasize that this estimator is just one of the results of our paper. Theorems 1 and 2 theoretically characterize the plug-in estimator and provide worst-case performance guarantees, potentially guiding the design of biological experiments which use shape distances. We also provide intuition regarding the bias of the plug-in estimator and how the error decreases as a function of neurons and input samples, and how this can converge slowly. We believe that the culmination of these results and novel estimator methodology is an important step toward a thorough understanding of the statistical challenges of estimating shape distances.

评论

Answers to questions:

In the last equation on page 5, why is x[0,1]x \in [0,1]?

xx is in [0,1][0, 1] for mathematical convenience. To optimize the polynomial approximation we must choose a domain – ideally this domain would be between the min and the max singular value of the cross-covariance matrix. Because both are unknown we set the lower end of the domain to be 0 (singular values are positive) and we estimate the maximum singular value then scale the data so that it is 1. This is the same approach taken in Adams et al. (2018).

In Section 4.2 it is acknowledged that the proposed estimator is highly variable in low SNR regimes and so neurons with the highest SNR are selected. Can you provide the relative proportion of such neurons in the presented scenario and also (if possible) comment on their prevalence in general? This is important because if the relative proportion is small then it means that the proposed estimator can only be applied in very few cases.

These 40 neurons are the maximal SNR units so they are exceedingly rare, 40 out of 10,000 neurons recorded. This is a wide-field calcium imaging experiment— known to be noisy. These results suggest that accurate shape metric estimation in large neural populations will require higher SNR, potentially with electrophysiological recordings. The confidence intervals of our moment estimator quantify this whereas the plug-in estimator naively interpreted would have led to biased results.

评论

Thank you for the detailed response. I appreciate the honest acknowledgement of the limitations of the proposed estimator and the modification of the wording in the introduction to reflect this and I do not think that this takes anything away from the merits of the work. I had already recommended accepting the paper and so I will keep my score unchanged.

评论

We thank the reviewers for their feedback and for their positive comments regarding clarity (“the paper is well-written” and “proof techniques are well-explained”), quality (“experiments are convincing”) and impact (“Simulations and experiments … highlight the strengths of the estimator over the plug-in baseline.”).

Overall, three out of the four reviews were enthusiastic and offered good suggestions for improving the manuscript. We summarize our major edits below and we defer a detailed rebuttal to the one negative review to our individual responses. Importantly, the major concerns from this negative review (e.g. that we are "out of scope" for ICLR) are not substantiated by the other three reviews.

Most importantly, we have implemented the suggestion to include a demonstration of our theory and novel moment estimator in artificial deep neural networks. Specifically, we examine the performance of the plug-in and moment estimator on representations in the penultimate hidden layers of two ResNet-50 networks trained on ImageNet (Appendix E: Applications to deep learning). These results emphasize the direct applicability of shape metric estimators to studying representations in deep learning. As observed in the original experiments, the plug-in estimator has a positive bias which we show is correlated to nuisance variables like effective dimensionality of the representations. This makes it difficult to form scientific conclusions on the basis of the plug-in estimator (unless one averages over a large number of sampled stimuli to remove the bias).

The other changes to the manuscript are for increased clarity in response to remaining reviewer suggestions. We detail these in the individual response to each reviewer.

AC 元评审

This submission analyzes the behavior of estimators of shape distances, which measure the difference between the outputs of different neural representations, in a transform-invariant fashion. Namely, one generates a large number of network outputs, by applying the network to random inputs, and then computes the squared procrustes distance between these point sets (or a similar invariant distance). This problem arises in computational neuroscience, where one would like to build (artificial) neural network models for the behavior of (biological) neural systems — there, an important issue is determining the extent to which two neural representations are equivalent.

The paper studies the behavior of plug-in estimators of the square procrustes distance, as well as a shape similarity measure which it terms the cosine shape similarity. It argues that although these estimators converge to correct answers almost surely as the number of samples M increases, their finite-M performance is relatively poor, especially in high dimensions. This Is explained in terms of a bias-variance tradeoff: although this estimator has low variance, its bias is large. The paper proves upper bounds on the bias which scale as N / M^{1/2}, where N is the dimension and M the number of samples, and lower bounds for particular examples, which scale as N^{1/2} / M^{1/2}. It proposes an extension which reduces the bias, at the expense of larger variance, by replacing a nuclear norm term in the plug-in estimator with a linear combination of spectral moments, determined by bootstrapping. This modified estimator also provides confidence intervals on the shape distance.

Reviews found the paper to be technically solid, with novel bounds on shape distance estimators measuring distance between neural networks, and well-written. The main limitations of the submitted manuscript were: first, the proposed moment based estimator is validated empirically, the manuscript does not analyze its bias-variance tradeoff theoretically. Second, the submitted manuscript could have done more to motivate shape distance as a tool for studying artificial neural networks.

为何不给更高分

The paper has sufficient scope and technical contribution to merit acceptance; at the same time, the problem addressed here most interesting to a subset of the ICLR audience.

为何不给更低分

The paper provides novel theory on shape estimation, helping to clarify the relatively large bias of the plug-in estimator, and proposes a novel replacement which, in experiments, exhibits reduced bias. These technical contributions are above the bar for acceptance.

最终决定

Accept (poster)