Unsupervised Feature Learning with Emergent Data-Driven Prototypicality

Yunhui Guo,Youren Zhang,Yubei Chen,Stella X. Yu

OpenReview PDF

提交: 2023-09-22更新: 2024-03-26

摘要

关键词

Representation LearningHyperbolic SpacePrototypicalityUnsupervised Learning

评审与讨论

审稿意见

评分: 3置信度: 42023-10-27

This paper proposes HACK for unsupervised learning that can arrange images in hyperbolic space. HACK optimizes image assignments to a fixed set of uniformly distributed particles in the hyperbolic space. It's found that the prototypicality property is emergent from such optimization: images similar to many training instances (more prototypical) are closer to the origin in hyperbolic space. The authors validate the effectiveness of HACK using synthetic data with natural and congealed images. They also test the method on the real MNIST and CIFAR datasets to reveal prototypicality. Lastly, the discovered prototypical and atypical examples are shown to reduce sample complexity and increase model robustness to some extent.

优点

The proposed unsupervised method HACK does have clear distinctions with existing methods: unlike supervised learning, HACK allows the image to be assigned to any target (particle). Unlike existing unsupervised learning method, HACK learns to match to a predefined geometrical organization in hyperbolic space (uniformly distributed).
The core instance assignment problem is cast as a bipartite matching problem and solved with the well-known Hungarian algorithm that has good convergence properties.
Besides validating the efficacy of HACK in learning prototypicality, the authors also explored its use in sample complexity reduction and model robustness aspects.

缺点

I think the presentation of this paper needs improvements. One main issue is that the authors keep talking about how HACK works and how it can encode both visual similarity and prototypicality, without enough explanations about the reason why. It's suggested to list the intuitions upfront, so readers won't always question why HACK is designed this way and why it works at all. Specifically,

Missing intuition everywhere about why images should be assigned to uniformly distributed particles. Only until Section 4.2, it's mentioned that this is to achieve maximum instance discrimination as in (Wu et al., 2018).
Follow-up questions: is such uniform target the best option? Ablations on other targets will help.
Missing another intuition: why prototypicality will merge from optimizing for maximum instance discrimination? This is never explained but super important.
Figs 5,6,8 are supposed to show evidence that HACK indeed captures 1) visual similarity. Unfortunately I don't have the same observations from the very small image examples. Clearer examples will help. Also, image retrieval experiment is an important alternative. 2) prototypical examples (in the center of the Poincare ball) vs. atypical examples around the boundary. Again, such trend is not clear from the given small image examples.

问题

Questions around reducing sample complexity:

Fig.9(a) shows that models trained on atypical examples performs better than on typical examples, especially when the amount of training examples used is small. This is a bit counter-intuitive and different from many other studies, where DNNs are shown to pick up regularities in typical data and then further benefit from or memorize noise/atypical data. Any comments?
Fig.9(a) shows that with increasing amount of data (either typical or atypical) converges to similar test accuracy. Is that close to the optimal accuracy, or performance will keep improving with more data? Another (maybe more practical) way to prove sample complexity reduction is to compare to the "best" model performance and measure how much less data are used, rather than in the low-data regime where performance is far from ideal.

Questions around robustness:

Fig.9(a) basically shows "more atypical data, better generalization accuracy", while Fig.9(b) says that using fewer atypical data improves model robustness. The observations are a bit contradicting and it seems hard to strike the balance between accuracy and robustness. Any comments?

审稿意见

评分: 5置信度: 22023-11-01

The paper introduces a novel approach to map images into a feature space that not only indicates visual similarity but also encodes the prototypicality of the image based on its location in the dataset. Instead of using Euclidean space, the authors utilize hyperbolic space for unsupervised feature learning. In this space, the proximity of a point to the origin signifies its prototypicality. They present an algorithm called HACK, which assigns each image to uniformly packed particles in hyperbolic space, optimizing the dataset's organization. The method grounds the concept of prototypicality in congealing, aligning images to appear more common and similar, which aligns with human visual perception. The paper's contributions include the first unsupervised feature learning method capturing both visual similarity and prototypicality, and the demonstration that identified prototypical and atypical examples can optimize sample complexity and model robustness.

优点

Strength:

Paper is well organized.
The use of hyperbolic space instead of Euclidean space is well-motivated.

缺点

Weakness:

CIFAR and MNIST are too toy. ImageNet experiment and fair comparison with previous unsupervised learning (especially contrastive learning) are important, but missing in this work.
LeNet is also too toy for a fair comparison with the latest results on unsupervised learning. A model of the ResNet level is a must.
Some related works on prototype learning are not cited, like “Prototypical Contrastive Learning of Unsupervised Representations”.

问题

Questions:

In terms of optimization, the proposed method also needs to alternatively optimize the encoder (θ) and the assignment (π), which show no advantage over previous “prototype contrastive learning work” that requires to optimize both sample features and prototype assignments ("centroids")
“pack the particles into a two-dimensional hyperbolic space” Is it possible to expand the embedding space to over two dimensions? I believe representing high-dimensional data into a two-dimensional space is too limited for practically useful embeddings.

审稿意见

评分: 5置信度: 42023-11-02

In this paper, the authors propose an unsupervised feature learning algorithm, HACK, that captures visual similarity and prototypicality. Specifically, HACK first generates uniformly packed particles in the Poincare ball of hyperbolic space. Then, it optimizes data assignments to a uniformly distributed particle set by naturally exploring the properties of hyperbolic space, in which prototypical and semantic structures of data emerge finally.

优点

In this work, the authors propose the unsupervised feature learning method from a novel perspective that aims to capture both visual similarity and prototypicality.

缺点

The motivation of the proposed method is not clear. It lacks clarification of motivation to state that: what are the shortcomings of existing methods that do not consider prototypicality? Why does the unsupervised feature learning method need to consider prototypicality? The motivation mentioned in the first paragraph of Section 1 is too vague.
In the paper, the work has limited motivation, which seems to be a combination of existing technologies with introducing existing concepts.
It is also necessary to analyze the unique points of the proposed method compared to existing related methods, so as to further clarify the motivation and novelty. However, the paper lacks concrete analyses of the difference between the proposed and existing related methods.
The writing of this paper needs to be improved. Some sentences include too many prepositions, which decreases readability.
The layout of the article needs to be improved, for example, there is too much white space on page 8.

问题

Please see the Weaknesses.