PaperHub
5.5
/10
Rejected4 位审稿人
最低5最高6标准差0.5
5
6
6
5
3.5
置信度
ICLR 2024

Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification

OpenReviewPDF
提交: 2023-09-17更新: 2024-02-19
TL;DR

We propose a covariance-corrected whitening framework to help the deep classification models get rid of degradation dilemma.

摘要

关键词
imbalanced classificationneural networkZCA whiteningsampling

评审与讨论

审稿意见
5

This paper proposes a normalization method with class-aware sampling to cope with class imbalance in long-tailed classification. By analyzing covariance matrices of features trained on an imbalanced dataset, the authors find out the issue of long-tailed learning that learnt feature components heavily correlate with each other, degrading rank of feature representation. To mitigate it, the proposed method embeds a whitening module to decorrelate features and applies a sampling strategy based on class distribution toward stable training. In the experiments on long-tailed image classification, the method exhibits competitive performance with the other approaches.

优点

  • Analysis about feature covariance is interesting and effectively inspires the authors to embed feature decorrelation into neural networks.
  • Performance is empirically evaluated on several benchmark datasets to demonstrate the efficacy of the method in comparison to the others.

缺点

- Sampling technique.

While the whitening is well introduced into long-tailed classification in Secs.3.1-3.3, the sampling strategy (Sec.3.4) is presented in a heuristic manner without providing detailed (theoretical) analysis nor motivation.

It is unclear why classes are first divided into several groups (in Fig.4). The concept of "group" as a superset of classes is introduced in a procedural way, lacking discussion about its effect on sampling.

Then, ad-hoc sampling rule is defined by using lots of hyper-parameters in Eqs.(4,5). What is a key difference from the standard class-balanced sampling? For realizing such a class-aware sampling, it is more straightforward to control the class frequency in sampling between the uniform (class-balanced sampling) and the ratio of class samples (instance-balanced sampling), though it is hard to grasp the purpose of GRBS in this manuscript.

Besides, BET is just a simple technique to control frequency of the GRBS. As shown in Table 4, the GRBS itself degrades performance, while it is improved by BET. Following this direction, class-balanced sampling (CB) could also be improved by applying such an ad-hoc control of sampling frequency. Pile of these ad-hoc techniques makes the method theoretically unclear.

- Feature analysis.

This paper lacks in-depth analysis about feature co-variance (Fig.2). Qualitative discussion/analysis is required to clarify why such a rank reduction happens in the scenario of class imbalance. SVD results in the bottom row of Fig.2 are less discussed; add more comments on it such as by clarifying what the two axes mean. There are also several works to cope with class imbalance by means of feature co-variance [R1,R2] and normalization [R3][Zhong+21]. The authors should discuss the proposed method in those frameworks for clarifying its novelty.

Fig.3 provides a confusing analysis based on trace norm of covariance matrices. The right-hand figure shows that the proposed method increases "instability", possibly leading to unfavorable training. It is a confusing result and makes it hard to understand the authors' claim. The trace norm is dependent on the feature scale (magnitude) which is less relevant to instability of training. Thus, it seems not to be a proper metric for measuring the instability in this case; the authors should apply the other metric invariant to feature scales.

- Experimental results.

The performance results in Sec.4 are inferior to SOTAs reported, e.g., in [R3]. For fair comparison to the SOTAs, the method should be embedded into the popular backbone networks, e.g., ResNet-50 for ImageNet-LT and iNat18. It is also valuable to check whether discriminative features of deeper backbones behaves in the similar way to Fig.2 or not.

In Fig.4 right, it is meaningless to compare training losses among different sampling strategies since even an identical training dataset can be regarded as different ones by varying sampling rules.

[R1] Xiaohua Chen et al. Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification. In AAAI22.

[R2] Yingjie Tian et al. Improving long-tailed classification by disentangled variance transfer. Internet of Things 21, 2023.

[R3] Lechao Cheng et al. Compound Batch Normalization for Long-tailed Image Classification. In MM22.

- Minor comments:

In p.8: Table ?? -> Table 2

问题

Please provide responses to the above-mentioned concerns about sampling technique and analysis about features.

评论

Thank you for the review and constructive comments.

Q1: While the whitening is well introduced into long-tailed classification in Secs.3.1-3.3, the sampling strategy (Sec.3.4) is presented in a heuristic manner without providing detailed (theoretical) analysis nor motivation.

A1: As shown in Figure 3, the batch covariance statistic exhibits significant fluctuations, impeding the convergence of the whitening operation. Our proposed covariance-corrected modules are designed to obtain more accurate and stable batch statistic estimation for whitening to avoid its non-convergence and reinforce its capability in imbalanced scenarios. The results on Table 4 also verify their effectiveness.

Q2: It is unclear why classes are first divided into several groups (in Fig.4). The concept of "group" as a superset of classes is introduced in a procedural way, lacking discussion about its effect on sampling.

A2: We introduced the reason why sample categories are divided into different groups, "In order to make the categories in each group relatively balanced, we select from N sorted categories at equal intervals to form G groups.". This also makes the distribution of samples in each batch relatively stable, because batch samples are collected in a group. Please refer to Figure 4 for details.

Q3: Then, ad-hoc sampling rule is defined by using lots of hyper-parameters in Eqs.(4,5). What is a key difference from the standard class-balanced sampling? For realizing such a class-aware sampling, it is more straightforward to control the class frequency in sampling between the uniform (class-balanced sampling) and the ratio of class samples (instance-balanced sampling), though it is hard to grasp the purpose of GRBS in this manuscript.

A3: 1) Our results in Table 4 demonstrate that class-balanced sampling can cause model performance to degrade, because class balancing can make the model overfit to classes with few samples. This is a well-known conclusion. 2) Our proposed GRBS controls the sampling probability of the category, and combined with the BET training strategy prevents the model from overfitting to tail classes.

Q4: Besides, BET is just a simple technique to control frequency of the GRBS. As shown in Table 4, the GRBS itself degrades performance, while it is improved by BET. Following this direction, class-balanced sampling (CB) could also be improved by applying such an ad-hoc control of sampling frequency. Pile of these ad-hoc techniques makes the method theoretically unclear.

A4: 1) The results in Table 4 show that our GRBS performs better than CB before and after using whitening operation. 2) The purpose of GRBS and BET is to let the tail classes participate in more iterations without affecting the representation learning of head classes. More importantly, their combination can obtain stable batch statistics, thereby avoiding the non-convergence of whitening.

Q5: This paper lacks in-depth analysis about feature co-variance (Fig.2). Qualitative discussion/analysis is required to clarify why such a rank reduction happens in the scenario of class imbalance. SVD results in the bottom row of Fig.2 are less discussed; add more comments on it such as by clarifying what the two axes mean. There are also several works to cope with class imbalance by means of feature co-variance [R1,R2] and normalization [R3][Zhong+21]. The authors should discuss the proposed method in those frameworks for clarifying its novelty.

A5: 1) Thanks for your comment, we have introduced the meaning of coordinate axes. 2) In section 3.2, we discussed that the whitened features in the last hidden layer have more large singular values to avoid feature concentration. It is a well-known phenomenon that highly correlated features can lead to model degradation. We discussed it in section 3.1. We observed network degradation in imbalanced classification and proposed an effective solution. 3) Our method is not directly related to [R1,R2,R3], but we have cited them in the updated version.

评论

Q6: Fig.3 provides a confusing analysis based on trace norm of covariance matrices. The right-hand figure shows that the proposed method increases "instability", possibly leading to unfavorable training. It is a confusing result and makes it hard to understand the authors' claim. The trace norm is dependent on the feature scale (magnitude) which is less relevant to instability of training. Thus, it seems not to be a proper metric for measuring the instability in this case; the authors should apply the other metric invariant to feature scales.

A6: 1) In section 3.3, we have discussed that previous work proved that large stochasticity (e.g., unstable) of whitening matrix would cause slow training and degenerated performance. We observe instability in the covariance matrix in imbalanced data sets, and it causes the whitening operation to not converge. Our modules can obtain stable batch statistics to solve the above problems. 2) E refers to the sum of the variances of all channels. We just represent it as the sum of the diagonal elements on the covariance matrix.

Q7: The performance results in Sec.4 are inferior to SOTAs reported, e.g., in [R3]. For fair comparison to the SOTAs, the method should be embedded into the popular backbone networks, e.g., ResNet-50 for ImageNet-LT and iNat18. It is also valuable to check whether discriminative features of deeper backbones behaves in the similar way to Fig.2 or not.

A7: 1) [R3] uses AutoAugment to train the model, which is unfair comparison. 2) The visualization results in the appendix demonstrate that deeper backbones (ResNet-110, EfficientNet-B0 and DenseNet121, Figures 7-9) and model trained on large-scaled iNaturalist-LT datasset (Figure 10) can obtain high correlated features.

Q8: In p.8: Table ?? -> Table 2

A8: Thanks for your comment. We have revised it in the new version.

评论

Dear Reviewer xPQZ,

We thank you for your time to review this paper.

We have tried hard to address all your concerns. And there is only three days left for our discussion. We kindly ask if our responses address your questions well or if there is anything we can do further. Really appreciate your feedback.

Authors

评论

Thanks for the response. Though, I keep my score because the sampling strategy is not theoretically motivated nor validated.

评论

Dear Reviewer xPQZ,

We thank you for your time to read our responses and your feedback.

We provide the following two clarifications to address your concerns about sampling strategy.

(1) Paper [1] concluded that whitening over batch data suffers significant instability in training DNNs, and hardly converges. To address the issues of batch stochastic and non-convergence of whitening, the paper [1] introduced group whitening. However, discussions with Reviewer piR7 revealed that group whitening fails to be effective in imbalanced classification scenarios. Upon considering the switch to our proposed channel whitening approach, we encountered whitening non-convergence during model training on imbalanced data. This led us to observe, as depicted in Figure 3, that batch covariance exhibits extreme instability in imbalanced datasets. Consequently, we recognized the need to devise an alternative solution to stabilize batch covariance.

Building upon this insight, we propose two covariance-corrected modules. The effectiveness of these modules is validated through qualitative and quantitative results presented in Figure 3 and Table 4. Notably, our proposed modules successfully stabilize batch covariance, and we no longer encounter whitening non-convergence during the training process.

We will include the above discussions to our paper for better understanding the motivation behind our proposed sampling strategy.

[1] Lei Huang, Lei Zhao, Yi Zhou, Fan Zhu, Li Liu, Ling Shao. An Investigation into the Stochasticity of Batch Whitening, 2020 CVPR.

(2) In the appendix, we give a simple example to demonstrate that our proposed GRBS sampler can reduce the sample variance between batches to improve the stability of batch statistics.

We hope the clarifications above address your concerns. If you have any additional suggestions or comments, please don't hesitate to share them. We are always happy to answer your questions and engage in further discussion.

Thank you for your dedicated efforts in helping us refine this paper.

Authors

审稿意见
6

This paper addresses the imbalance class problem using DNNs trained end-to-end. It firstly finds that he highly correlated features fed into the classifier is a main factor of the failure of end-to-end training DNNs on imbalanced classification tasks. It thus proposes Whitening-Net, which uses ZCA-based batch whitening to help end-to-end training escape from the degenerate solutions and further proposes two mechanisms to alleviated the potential batch statistic estimation problem of whitening in the class-imbalance situation. Experimental results on the benchmarks CIFAR-LT-10/100, ImageNet-LT and iNaturalist-LT, demonstrate the effectiveness of the proposed approaches.

优点

  • This paper is a well-motivated paper and the solution is well-supported. This paper addresses the imbalance class problem using DNNs trained end-to-end, and empirically finding that the highly correlated features fed into the classifier makes the failure of end-to-end training on imbalanced classification. Based on this, it uses batch whitening (termed channel-whitening in this paper) to decorrelate the features before the last linear layer (classifier), and further proposes two mechanisms to alleviated the potential batch statistic estimation problem of whitening in the class-imbalance situation.

  • The imbalance classification problem is common in the learning and vision community, especially in the situation using DNNs. The main line of methods is decoupled training. It is glad to see the proposed end-to-end trained whitening-Net outperformed the decoupled training methods, showing great potentiality.

  • The presentation of this paper is clear and easy to follow.

缺点

1.The descriptions of this paper should follow the common specification. This paper uses the Batch whitening (whitening over the batch dimension, like its specification, batch normalization (standardization) ) [Huang CVPR 2018, Huang CVPR 2020], but it terms as channel whitening. I understand this paper want to address the “channel” decorrelation, but the method is commonly said as “batch” whitening (v.s., batch normalization).

2.I am not confident to the novelty. Indeed, batch whitening is a general module proposed in [Huang CVPR 2018, Huang CVPR 2020], and is also plugged in before the last linear layer to learn decorrelated representation for normal class distribution. I recognize the novelty of this paper using BW for imbalance classification, but overall, the novelty seems not to be significant.

Other minors:

-It is better to proofreading the paper. E.g., “re-samplingPouyanfar et al” in Page 2, “Aditya et al. Menon et al. (2020) propose” in Page 3. “Table ??, ” in Page 8.

-I am not sure this paper whether use a correct reference, e.g., “. Cui et al. Cui et al. (2019)” “Cao et al. Cao et al. (2019) ” …., Besides, There provide too much reference in the first paragraph, and most of words in the first paragraph is the reference. I personally suggest only preserve the representative references in the first paragraph, and leave the others in the related work for details.

问题

Well proofreading and responding the weaknesses.

伦理问题详情

NA

评论

Thank you for the review and constructive comments.

Q1: The descriptions of this paper should follow the common specification. This paper uses the Batch whitening (whitening over the batch dimension, like its specification, batch normalization (standardization) ) [Huang CVPR 2018, Huang CVPR 2020], but it terms as channel whitening. I understand this paper want to address the “channel” decorrelation, but the method is commonly said as “batch” whitening (v.s., batch normalization).

A1: Thanks for your comment. "Batch" refers to the sample dimension, and we reduce the correlation between channels. We can revise it in the next version.

Q2: I am not confident to the novelty. Indeed, batch whitening is a general module proposed in [Huang CVPR 2018, Huang CVPR 2020], and is also plugged in before the last linear layer to learn decorrelated representation for normal class distribution. I recognize the novelty of this paper using BW for imbalance classification, but overall, the novelty seems not to be significant.

A2: Our contribution is not only to use whitening operation before the classifier. 1) Our extensive experimental results (Fig.2, Fig.6-10) show that the representations learned by neural networks on imbalanced data sets have higher correlations, causing the network to fall into a degenerate solution. We think this is an interesting observation. Reviewer iY3M also agrees that the findings are not limited to solving imbalanced classification problem. 2) We propose to use whitening operation before the classifier to remove the correlation between channels. We find that unstable batch statistics will cause whitening to not converge, so we introduce two covariance correction modules to obtain stable batch statistics and thereby reinforcing the capability of whitening. 3) Extensive experimental results on four imbalanced benchmarks demonstrate effectiveness of our proposed method. 4) The [Huang CVPR 2018] applied Group Whitening to avoid large stochasticity (e.g., unstable) of covariance matrix. In contrast, our proposed biased covariance-corrected modules can get more accurate and stable batch statistics to avoid non-convergence of whitening in imbalanced scenarios. More importantly, if we replace channel whitening with group whitening, the results in the following Table A show that [Huang CVPR 2018] is invalid on imbalanced classification. The results are obtained based on codebase https://github.com/kaidic/LDAM-DRW.

Method                               |  Accuracy
-------------------------------------------------
ERM                                  |   66.4
Ours                                 |   76.4
-------------------------------------------------
[Huang CVPR 2018]                    |   66.6

Table A: Test accuracy on CIFAR-10-LT dataset with imbalance factor 200.

Q3: It is better to proofreading the paper. E.g., “re-samplingPouyanfar et al” in Page 2, “Aditya et al. Menon et al. (2020) propose” in Page 3. “Table ??, ” in Page 8.

A3: Thanks for your comments. We have revised them in the new version.

Q4: I am not sure this paper whether use a correct reference, e.g., “. Cui et al. Cui et al. (2019)” “Cao et al. Cao et al. (2019) ” …., Besides, There provide too much reference in the first paragraph, and most of words in the first paragraph is the reference. I personally suggest only preserve the representative references in the first paragraph, and leave the others in the related work for details.

A4: Thanks for your detailed review. 1) "Cui et al. Cui et al. (2019)" is because we manually wrote the author's name before citing it, which resulted in duplication with the ICLR citation format. 2) We have revised the Introduction section and moved the references to related works.

评论

Dear Reviewer piR7,

Thank you very much for taking the time to review our paper and for your constructive comments.

We have made extensive clarifications and discussions as you indicated. We hope we have effectively addressed your concerns.

The key points in our rebuttal include:

  1. We summarized the contributions of this paper for your reconsideration. In addition, for paper [Huang CVPR 2018] you mentioned, our experimental results show that it is ineffective in imbalanced classification problems. Our motivation and its solution are different, in particular we propose two covariance-corrected modules to avoid non-convergence of whitening.

  2. We improved the writing accordingly.

We are eager to hear your valuable opinion on the efforts we have made during the rebuttal period. If you have any further questions that you would like us to address, we are more than willing to discuss them in detail.

We eagerly await your feedback and look forward to engaging in a fruitful discussion.

Authors

评论

I have read the response. I thank the authors' work in responses. I understand the proposed method is better than (Huang CVPR 2018) for imbalanced classificaiton in my initial review. Based on my understanding, the proposed withening method (not including the improved two strategies for imbalanced classificaiton) in this paper is exactly the DBN method in (Huang CVPR 2018) , am I right? So, based on the experiments in the initial version, your method is better than (Huang CVPR 2018) . Note my concern is that the novelty is incremental. I think the authors should pay more attention to address the novelty and contribution in the rebuttal. That is why I am not confident to accept (or further champion) this paper.

评论

Dear Reviewer piR7,

Really appreciate your time to read our responses. Below is our further clarification on your concerns.

We would like to highlight two key differences between our approach and the method proposed in the paper (Huang CVPR 2018). These differences contribute to the effectiveness of whitening in alleviating network degradation in imbalanced classification. We have conducted additional experiments to provide further evidence and insights into these differences, which we present below for your reference.

a) Selective Application of Whitening: In their work (Huang CVPR 2018), the authors replaced all batch normalization (BN) layers in ResNet with whitening, known as the Decorrelated Batch Normalization (DBN) approach. However, as shown in the experimental results presented in Table B, this approach proves to be ineffective (achieving 67.1% accuracy) for imbalanced classification tasks. In contrast, our proposed method utilizes whitening selectively, specifically in the last hidden layer, which helps alleviate the degenerate solution while significantly reducing both training and inference time. Our approach achieves improved performance with an accuracy of 72.3%.

b) Channel Whitening vs. Group Whitening: To address the computational complexity associated with the whitening operation, (Huang CVPR 2018) divided the channels into different groups, as described in Section 3.4 of their paper. However, the results in Table B reveal that this approach fails to effectively perform whitening for imbalanced classification (achieving 66.6% accuracy). In other words, group whitening does not adequately decorrelate each channel. This is precisely why our method is referred to as channel whitening, as it focuses on achieving channel decorrelation and overcoming the limitations of group whitening.

Method                                                           |  Accuracy
----------------------------------------------------------------------------
ERM                                                              |   66.4
Ours w/ Channel Whitening                                        |   72.3
Ours w/ Channel Whitening & GRBS & BET                           |   76.4
----------------------------------------------------------------------------
[Huang et al, CVPR 2018] - All layers                            |   67.1
[Huang et al, CVPR 2018] - Last layer w/ Group Whitening         |   66.6

 Table B: Test accuracy on CIFAR-10-LT dataset with imbalance factor 200.

c) We would like to provide further insight into the motivation behind our proposal of two covariance-corrected modules. Throughout our experiments, we encountered non-convergence when attempting to use channel whitening. This issue aligns with the findings of Huang et al. [CVPR 2020], who attribute the phenomenon to the stochasticity of batch statistics. To mitigate this, they introduced group whitening to reduce the impact of stochasticity. We discussed their conclusions in Section 3.3 of our paper.

However, the results presented in Table B demonstrate that group whitening is not effective in achieving channel decorrelation for imbalanced classification tasks. Upon closer investigation of the covariance statistics, we observed their instability on imbalanced classification. This observation inspired us to propose two modules dedicated to stabilizing the covariance statistics, thereby addressing the issue of non-convergence in whitening.

We hope the clarification above addresses your concerns. If you have any additional suggestions or comments, please don't hesitate to share them. We are always happy to answer your questions and engage in further discussion.

Authors

审稿意见
6

The paper addresses the problem of image classification and claims two-fold contributions: first it identifies that in imbalanced problems high correlation between features is an indicator of poor performance and secondly it proposes a whitening algorithm to address the problem. The method is evaluated on 4 datasets.

优点

  1. The observation about correlation is interesting and I believe it has applications beyond this paper
  2. The algorithm proposed is explained clearly although innovation is limited
  3. Evaluation is strong. I appreciated evaluation on a dataset which is naturally imbalanced as evaluation on synthetic sets have limitation in practice.

缺点

  1. Some clarification in evaluation would be beneficial: It is not clear to me, given this form of the paper, if the improvement in the performance is on the frequent classes or the ones that have little representation. I believe that is a relevant question especially when focusing on imbalance problems.
  2. (Minor) Paper needs some revision:
    • page 8 "Table ??,"
    • what does "Many" "Medium" "Few" "All" refer to in table 3

问题

Please see Weaknesses

============================ Post rebuttal comment: I have read other reviews and authors response. As mentioned in a message bellow, I appreciated that the issue I have raised has been properly dealt with. Therefore I view the paper as being on the "acceptable" side and I am keeping my initial recommendation.

伦理问题详情

None

评论

Thank you for the review and constructive comments.

Q1: Some clarification in evaluation would be beneficial: It is not clear to me, given this form of the paper, if the improvement in the performance is on the frequent classes or the ones that have little representation. I believe that is a relevant question especially when focusing on imbalance problems.

A1: The results in Table 3 show that our method can greatly improve the test accuracy of the model on few shot, while ensuring that the test accuracy on many shot does not drop much.

Q2: Paper needs some revision: page 8 "Table ??,".

A2: Thanks for your comment. We have revised it in the new version.

Q3: What does "Many" "Medium" "Few" "All" refer to in table 3?

A3: We follow the description of previous works and divide the categories in the dataset into many shot, medium shot and few shot according to the number of samples. For example, the large-scale ImageNet-LT consists of 115.8K training images from 1000 classes and the number of images per class is decreased from 1280 to 5. Many shots have more than 100 images, medium shots have 20-100 images and few shots have less than 20 images.

评论

While the answer is appreciated, the question has not been answered.

Let me explain into more detail: The title of the paper contains "Imbalanced" Classification. This means that from the total number of classes, some have more examples (frequent classes), some have few examples (rare). The question is: the increase in performance (the proposed method with respect to the baseline) shows an increase in the recognition rate for frequent classes or for rare one? This question basically requires some data from the confusion matrix

The provided answer refers to the case when the number on question is about how many examples from the total available are taken as being labelled. This is a different thing.

Thank you.

评论

Dear Reviewer iY3M,

We appreciate you for acknowledging our response. We can give more expressions to the remaining concern.

a) We adhere to the approach outlined in [1] for reporting accuracy across three class splits:: Many-shot (more than 100 images), Medium-shot (20-100 images) and Few-shot (less than 20 images). The many-shot classes are the frequent classes you mentioned, while the few-shot classes represent the rare ones.

b) Our previous A1 has actually introduced the issues you are concerned about. For your reference, we provide results in Table B below. In comparison to the baseline method, our approach demonstrates significant improvements in preserving accuracy for many-shot classes while also enhancing accuracy for few-shot classes.

Method        |  Many-shot     Medium-shot    Few-shot   |   All
--------------------------------------------------------------------
ERM           |    55.7           45.5          40.6     |   44.6
LWS           |    44.3           51.0          52.9     |   51.1
Ours          |    49.3           53.4          53.8     |   53.2

               Table B: Top 1 accuracy on iNaturalist-LT.

[1] Liu, Ziwei and Miao, Zhongqi, et al. Large-Scale Long-Tailed Recognition in an Open World.

c) We hope this explanation addresses your concerns. If there are any further comments, please feel free to raise them and we will always be happy to answer your questions.

Authors

评论

Thank you for your answer! My question has been answered.

评论

Dear Reviewer iY3M,

Really appreciate your feedback. Many thanks for your time to read our responses carefully and for your efforts in helping to make this work better.

Lastly, if our responses have resolved your concerns, please consider raising the score.

Authors

审稿意见
5

The paper addresses imbalanced image classification task with a Whitening-Net. Specifically, the authors first show that the reason of model degeneration lies on the correlation coefficients among sample features before the classifier, i.e., large correlated coefficients lead to model degeneration. Thereby, they proposes to use ZCA whitening before classifier to remove or decrease the correlated coefficients between different samples. For stable training, the also present the Group-based Relatively Balanced Sampler (GRBS) to obtain class-balanced samples, and a covariance-corrected module.

优点

Good results on most of the datasets.

缺点

The novelty is limited. The so-called Whitening-Net is more like a batch normalization before the classifier.

There are many works that use BN layer before classifier to address imbalanced image classification task. Some are as follows when searching in Google. Please clarify if there are similar or not, and provide some results by BN at least. [1] Improving Model Accuracy for Imbalanced Image Classification Tasks by Adding a Final Batch Normalization Layer: An Empirical Study. ICPR,2020. [2] Consistent Batch Normalization for Weighted Loss in Imbalanced-Data Environment

There are some typos, e.g., “As show in Table ??” in page 8. "ERM" is not explained in page 1.

问题

See the weaknesses.

评论

Thank you for the review and constructive comments.

Q1: The novelty is limited. The so-called Whitening-Net is more like a batch normalization before the classifier.

A1: Our contribution is not only to use whitening operation before the classifier. 1) Our extensive experimental results (Fig.2, Fig.6-10) show that the representations learned by neural networks on imbalanced data sets have higher correlations, causing the network to fall into a degenerate solution. We think this is an interesting observation. Reviewer iY3M also agrees that the findings are not limited to solving imbalanced classification problem. 2) We propose to use whitening operation before the classifier to remove the correlation between channels. 3) We find that unstable batch statistics will cause whitening to not converge, so we introduce two covariance correction modules to obtain stable batch statistics and thereby reinforcing the capability of whitening. 4) Extensive experimental results on four imbalanced benchmarks demonstrate effectiveness of our proposed method.

Q2: There are many works that use BN layer before classifier to address imbalanced image classification task. Some are as follows when searching in Google. Please clarify if there are similar or not, and provide some results by BN at least.

A2: 1) Paper [1] uses BN after the classifier (Fig.2 of [1]), which is different from us using whitening before the classifier to alleviate degenerate solution. 2) Paper [2] proposes a Weighted Batch Normalization (WBN) by re-weighting the batch statistics to slove the size-inconsistency problem. Their motivations and methods are completely different from ours. 3) The following experimental result (Table A) shows that the model performance is lower than the ERM baseline, after replacing whitening with BN. The results are obtained based on codebase https://github.com/kaidic/LDAM-DRW.

Method                              |  Accuracy
------------------------------------------------
ERM                                 |   38.6
Ours                                |   47.2
------------------------------------------------
ERM w/ BN                           |   37.4

Table A: Test accuracy on CIFAR-100-LT dataset with imbalance factor 100.

Q3: There are some typos, e.g., “As show in Table ??” in page 8. "ERM" is not explained in page 1.

A3: Thanks for your comments. We have revised them in the new version.

评论

Dear Reviewer bhjW,

We appreciate your willingness to improve the rating score.

We kindly request your guidance on how we can address any remaining concerns to ensure their resolution. If you have any further suggestions, please don't hesitate to share them with us. We are always open to feedback and value your insights.

Thank you for your dedicated efforts in helping us refine this paper.

Authors

评论

Thanks to the reviewers for your great efforts and time. We thank all reviewers for your valuable comments. In the process of discussions with you, we have answered the questions and revised the paper according to the reviewers' initial suggestions.

If there are any further suggestions or comments, please feel free to raise them and we will always be happy to answer your questions and resolve your concerns. In closing, thank you to all the reviewers.

评论

Dear ACs and all reviewers:

We sincerely appreciate the time and efforts of the reviewers in providing their valuable feedback. We have incorporated the suggested modifications in our manuscript, which are highlighted in red.

The key points of our rebuttal can be summarized as follows:

  1. As suggested by Reviewer iY3M, we have added explanations for different shot classes in Section 4.2.

  2. We have added the results in A2 of Reviewer piR7 to the appendix and discussed them in Section 3.2.

  3. We fixed all typos and incorrect citations mentioned by reviewers one by one.

If there are any further suggestions or comments, please feel free to raise them and we will always be happy to answer your questions and resolve your concerns.

Authors

评论

Dear ACs and all reviewers:

Based on the feedback from Reviewer xPQZ, we have added more discussions on the motivation of our proposed two covariance-corrected modules. They are mainly used to obtain the stable batch covariance through new sampling methods, thereby avoiding the non-convergence of whitening. Their effectiveness is validated through a comprehensive analysis of qualitative and quantitative results presented in Figure 3 and Table 4.

If there are any further suggestions or comments, please feel free to raise them and we will always be happy to answer your questions and resolve your concerns.

Authors

AC 元评审

This paper studies the imbalance classification problem of DNNs. The authors first provide empirical results to reveal that the high correlation phenomenon may cause poor performance, and then they propose a whitening technique to deal with this problem. The effectiveness of the method is validated on popular image datasets.

This is a borderline paper. On the one hand, the reviewers appreciate the conciseness and effectiveness of the method. However, on the other hand, some reviewers also have concerns about its technical novelty and the lack of theoretical analysis.

Reviewer bhjW and Reviewer piR7 pointed out that the whitening-net is very similar to a batch normalization before the classifier. Reviewer xPQZ thinks that the detailed theoretical analysis is necessary for supporting the motivation of the paper. So they are still negative about the novelty and quality of this paper. The authors are encouraged to prepare for the next venue by considering the reviewers' comments.

为何不给更高分

N/A

为何不给更低分

N/A

最终决定

Reject