PaperHub
5.0
/10
Poster4 位审稿人
最低5最高5标准差0.0
5
5
5
5
3.5
置信度
正确性2.8
贡献度2.3
表达2.5
NeurIPS 2024

Sample Selection via Contrastive Fragmentation for Noisy Label Regression

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06
TL;DR

To address the problem of regression with noisy labels, we propose the Contrastive Fragmentation framework to select clean samples by Mixture of Neighboring Fragments and curate four benchmark datasets along with a novel metric, Error Residual Ratio.

摘要

关键词
Noisy LabelsRegression

评审与讨论

审稿意见
5

This paper targets the noisy label regression problem. Inspired by the classification loss which helps get good representation, they first propose to discretise continuous label space into pieces which thus divides data into disjoint fragments. Then they look at the F/2 maximally contrasting fragment pairs on which the binary classifiers are built. The main learning objective is MoE where a neighborhood agreement is applied to enhance the decision consistency. And the representation and prediction are used to select clean samples and then train the regressor.

优点

  1. The presentation is overall good and the idea of constructing maximally contrasting pairs is interesting.
  2. The fundamental works in regression have been discussed and many of them are compared detailedly.
  3. I think MoE is also a good choice to integrate all classification learners.

缺点

  1. Some illustration is not clear for me.
  2. Some baselines which takes the order information should be discussed or compared, because binary classification ignores such information which however was thought important in previous regression works. 3.The technical contributions and novelty are not well highlighted in the main body of the paper.

问题

  1. When illustrating the motivation of contrastive fragment paring, you pointed out the advantages could be better generalization and robustness. Regarding the generalization, were you referring to its performance on clean-label data training? Then is Fig. 2(c) on noisy or clean data? I felt confused when I read this part.
  2. It seems that the proposed approach to construct F/2 maximally contrasting pairs of fragments ignored the order relationship between fragments that has been thought crucial to regression tasks [Yang et al., 2022b]. Can you provide some evidence if you do not think so?
  3. The ratio of noise labels is important according to literature. Since the proposed method does not mention it in methodology, were you suggesting that the proposed method could be noisy ratio unaware and applicable to any level of noisy ratio?
  4. Clean samples selection strategy from line 185-188 looks intuitive. You can certainly consider two views somehow complementary when necessary, but your methodology did not touch the representation learning.
  5. Zhang et al., 2023 in line 91 on page 3 seems a strong baseline, which compute distance on representation. May I know why it is not included?

局限性

My major concern is that the proposed pair construction strategy is interesting but not well motivated by noisy label topic, which makes me doubt which components contribute much to this research problem. Also the connection with recent work lack deep discussion.

作者回复

[Limitations, Q2]. Pair construction strategy is interesting but not well motivated by the noisy label topic casting concerns regarding contribution. The maximally contrasting pairs of fragments ignores order relation.

We would like to clarify that ConFrag is strongly motivated by the topic of noisy label regression.

Focus on noisy label regression. A method tailored for noisy regression problems must primarily focus on detecting noise with high severity, rather than treating all noise as equal. The components of our method such as maximally contrasting fragment pairing and fragment prior, which are based on the distance in continuous and ordered label space, are designed to prioritize filtering out such high severity noise.

Order Relation. Our framework leverages ordinal relations by training on contrastive fragmented pairs to learn robust features. These features are then aggregated and ordered. Clean sample selection is done through the mixture of neighboring fragments which ensures the integrity of the learned order relations.

W2. Discuss or compare baselines considering ordinality since binary classification ignores ordinality which however was thought important in previous regression works + Limitations state that “the connection with recent work lacks deep discussion.”.

We discuss additional works that consider order information in Appendix E.1. Furthermore, we evaluate noisy regression baselines that account for ordinality in the introduction (ln 35-42), specifically C-Mixup [1] and OrdRegr [10]. OrdRegr is a noise transition matrix based loss correction method which requires noise rate estimation. Even when provided with ground truth noise in synthetic settings, the algorithm was highly ineffective. For example, in IMDB symmetric 40% noise experiments, OrdRegr performed at least 31.86% worse than other baselines in terms of MRAE. Therefore although we have implemented both methods, we only report C-Mixup results. We will ensure that OrdRegr results are included in the manuscript as well.

To the best of our knowledge, the most recent published related work is "Robust Classification via Regression for Learning with Noisy Labels." which improves the classification by reformulating it as a regression problem. We will include this to references. If there are any other works we have overlooked, kindly let us know and we will make sure to review and include it.

Additionally, we would like to reemphasize that our framework utilizes ordinal relations by training binary classifiers on contrastive fragmented pairs to develop robust features. These features are then aggregated and ordered. Clean sample selection is performed through the mixture of neighboring fragments, ensuring the preservation of the learned order relations.

Q1. You pointed out the advantages could be better generalization and robustness. Regarding the generalization, were you referring to its performance on clean-label data training? Then is Fig. 2(c) on noisy or clean data?

Generalization in the motivation of contrastive fragment pairing refers to the generalization of feature extractors trained on noisy datasets, as obtaining robust and generalizable features is crucial for sample selection in noisy label settings. Fig. 2(c) shows that under symmetric 40% label noise, training expert feature extractors on contrastive fragment pairs is better for sample selection (and thus better regression performance) than training a single feature extractor on all fragment because they are less prone to overfitting and learn more generalizable features. We will also make it clear that the results in Fig. 2(c) are on noisy data.

Q3. Is the proposed method noisy ratio unaware and applicable to any level of noisy ratio?

The reviewer correctly noted that knowing the ratio of noisy labels beforehand is beneficial but challenging to estimate in practice. We address this point at the start of Section 2, line 86, by stating, 'ConFrag is noise rate-agnostic unlike prior methods as it operates without knowing a pre-defined noise rate.' This highlights that our method does not require prior knowledge of the noise ratio.

Q4. Clean samples selection strategy from line 185-188 looks intuitive. You can certainly consider two views somehow complementary when necessary, but your methodology did not touch the representation learning.

We use the term 'representation' to indicate that our fragment pair trained binary classifiers automatically learn data representations. This means they can be considered representational learners or feature extractors. However, we acknowledge that 'representation learning' often refers to techniques like self-supervised learning. We will clarify this distinction in the manuscript.

Q5. Zhang et al., 2023 in line 91 on page 3 seems a strong baseline, which compute distance on representation. May I know why it is not included?

OrdinalEntropy [8] proposes a regularizer that learns higher entropy features by increasing the distance between representations while preserving the ordinality via weighting the representation and target space distances. Since it doesn’t tackle the noisy regression problem directly on its own, it was not included as a baseline in the manuscript. However, it is surely something we can try to mix together with other baselines as well as our method to enhance the ordinality and to better learn the high-entropy features. Results in Table R3 show that combining OrdinalEntropy with vanilla, baselines, and our method shows a slight drop in performance.

W1. some unclear illustrations

we will enhance Figure 2(a) to aid the understanding. If there are any other illustrations that require attention we would be happy to revise!

W3. clearly state technical contributions

We will highlight the technical contributions and novelty better in the manuscript, especially within the introduction.

评论

I think most of my concerns are addressed. I have updated my score.

审稿意见
5

This paper aims at addressing noisy labels in real-world regression problems and propose the ConFrag method. ConFrag transforms regression data into disjoint fragmentation pairs to enhance the distinction between clean and noisy samples. It leverages neighboring fragments and expert feature extractors to identify noisy labels through neighborhood agreement. The approach is validated through experiments on diverse datasets and introduces a new metric, Error Residual Ratio (ERR), which shows its consistent superiority over fourteen existing methods.

优点

  1. This method leverages the inherent orderly relationship between the label and feature space to identify noisy labels.
  2. Four newly curated benchmark datasets and a metric are constructed for conducting experiments.

缺点

  1. The motivation behind the idea should be stated more clearly in the introduction section. Since there is a close connection between labels and features, it is necessary to clarify why contrastive fragment pairing is introduced and whether the design of the data selection metric considers the connection.
  2. Certain statements in the paper require further clarity. For instance, the meaning of "f" in "all of the noisy labeled data (f…)" needs explicit clarification upon first mention, although the meaning of the symbol can be inferred in subsequent discussions. Additionally, the phrase "we employ neighboring relations in fragments to leverage the collective information learned" in the introduction warrants elaboration.
  3. It is novel to create four curated benchmarks. However, it is recommended to also validate your method on the existing noisy regression datasets used in other papers.

问题

Since contrastive fragment pairing transforms some closed-set label noise into open ones, could this nature be considered when designing the data selection metric to choose open noise labels?

局限性

Adequately addressed

作者回复

W1. The motivation behind the idea should be stated more clearly in the introduction section. Since there is a close connection between labels and features, it is necessary to clarify why contrastive fragment pairing is introduced and whether the design of the data selection metric considers the connection.

We appreciate the feedback and suggestions! To clarify the motivation behind our idea, we will elaborate on the rationale for contrastive fragment pairing. Our method addresses the challenges of learning from noisy data by employing distinctive feature matching, which improves generalization [12,13]. Additionally, this approach helps convert closed-set noise into open-set noise, which is less harmful to the learning process [14]. We will also explain that our data selection approach, the mixture of neighboring fragments (Section 2.3), considers the correlation between labels and features that the samples with close label distances are likely to have similar features. This correlation is essential for effective fragmentation and sample selection using the mixture of neighboring fragments.

W2. Certain statements in the paper require further clarity. For instance, the meaning of "f" in "all of the noisy labeled data (f…)" needs explicit clarification upon first mention, although the meaning of the symbol can be inferred in subsequent discussions. Additionally, the phrase "we employ neighboring relations in fragments to leverage the collective information learned" in the introduction warrants elaboration.

We will make sure to include that ff means the index of the fragment that the data is assigned to upon the first mention! We will also elaborate the phrase "we employ neighboring relations in fragments to leverage the collective information learned," explaining that it utilizes a mixture model [15] to achieve probabilistic consensus in both prediction and representation spaces.

W3. The performance on existing noisy regression datasets?

We appreciate your recognition of our four curated benchmarks, covering age, price, and music production year prediction tasks. As suggested by the reviewer, we further evaluate the performance of our approach and baselines on two noisy regression datasets from existing noisy regression literature. The first dataset is from SuperLoss [9, 11], which injects 0.2, 0.4, 0.6, and 0.8 symmetric noise into the UTKFace dataset. The second dataset is the IMDB-org-B dataset, a real-world noise dataset studied in [16, 3, 17]. The results can be found in Table R4. We perform on par with or better than the baselines on both datasets as shown in the table below.

Q1. Could the nature of closed-set noise transformation into open-set noise be used advantageously during the data selection metric?

Thank you for the insightful question. As illustrated in Fig. 2(a), transforming closed-set noise into open-set noise can also be seen as converting it into anomalies or outliers. By leveraging the extensive literature on anomaly and outlier detection techniques, we could effectively handle open-set noise to enhance the data selection process.

评论

The author has fully answered my question. I maintain my original evaluation of this paper.

审稿意见
5

This paper addresses the issue of label noise in regression tasks. The proposed method partitions the data and trains several binary classifiers for the most distant partition pairs. Noisy data samples are detected using the voted probability of all classifiers. The method outperforms other baselines on several public datasets across different domains, as measured by a newly proposed metric.

优点

The intuition behind the method is sound. It aims to identify samples whose y values are most misaligned with their x values, based on the weighted combined opinion of all classifiers.

缺点

  1. The focus on regression tasks is limiting. Given that the proposed method transforms regression tasks into classification tasks, it could potentially be extended to address data noise in classification tasks using the same intuition.

  2. The paper lacks a detailed discussion of the types of noise detected and not detected. It's unclear whether this method can detect close-set noise for individual classifiers or if it's more effective at identifying noisy samples at the center or boundaries of the fragments.

  3. There is no ablation study on the amount and type of noise. The paper doesn't address whether the method is effective for all noise ratios or explore the typical types and ratios of label noise in real-life scenarios. It's unclear how the method performs with varying levels of noise (e.g., 1%, 0.5%, 5% of noise data or noise strength).

  4. The paper doesn't consider scenarios where the noise degree is so small that most noisy samples cannot be transferred from closed-set to open-set noise. It's unclear whether increasing the number of fragments would be beneficial in such cases.

问题

In addition to addressing the weaknesses mentioned above, I hope the authors can answer these questions:

  1. Is there a systematic way to determine the optimal number of fragments?

  2. The paper states, "If x is more likely to belong to fragment 2 than 5, then it should be more likely to belong to 1 than 4 and 3 than 6." Are there limitations to this statement? Would the proposed method still work if this relationship doesn't hold?

局限性

Yes, the limitations and broader impacts are addressed.

作者回复

W1. Focus on regression is limiting. ConFrag transforms regression tasks into classification tasks; it could potentially be extended to address noise in classification

We believe that noisy regression is an important task on its own! However, most noisy label learning research focuses on classification tasks. Regression assumes a continuously ordered relationship between features and labels, a premise well-supported in existing research [5, 6, 1, 7], whereas classification is categorical rather than ordered. Our experiments demonstrate that many noisy classification approaches do not perform well in noisy regression tasks mainly due to the fundamental difference between regression and classification. We employ cross-entropy loss (classification) as a surrogate objective to enhance the robust feature extraction, which in turn improves sample selection used to solve the regression task. This approach has been both theoretically and empirically validated thanks to better feature learning [8], as detailed in Appendix D.1. Table 2 compares regression-trained experts (ConFrag-R) and classification-trained experts (ConFrag), showing the superiority of using surrogate classification objectives. The potential for extending our approach to classification exists, but it is out-of-scope of this work.

W2. Detected noise analysis, specifically, close-set noise detection and noise in the center or boundaries of the fragments.

Close-set noise detection. Fig R1 shows the selection rate of closed-set noise on the IMDB-Clean-B dataset with symmetric 40% noise. As training progresses, they become less likely to be selected, showing ConFrag’s ability to detect closed-set noise.

Boundary, Centre noise. We compare the selection rate and the error reduction rate (ERR) between the samples at the boundary and center of fragments. Table R1 shows that the average difference of the selection rates and ERR between two groups is 2.29% and 2.43%, respectively. These results confirm that ConFrag consistently performs robust sample selection regardless of the sample's position.

[W3,4] Do additional ablation on the amount and strength of noise (with emphasis on 1%, 0.5%, 5% noise) and also evaluate on real-life noise. unclear whether increasing the number of fragments would be beneficial in extreme small noise.

We perform thorough experiments on the diverse amount and strength of noise following previous research on noisy regression [1, 9, 10, 11]. It involves the ablation on symmetric noise levels of 20/40/60/80% and Gaussian noise with a standard deviation of 30/50%. Notably, we are the only work that tests on both symmetric and Gaussian noise! The results presented in Section 4.3 show that ConFrag is effective across all of these noise ratios and strengths.

In Table R2, we analyze ConFrag using tiny noise amounts (0.5/1/5% symmetric noise) and strengths (Gaussian 2/5%) on IMDB-Clean-B.

For noise ratios of 0.5/1/5%, ConFrag achieves a very low ERR, indicating that it can detect noisy samples regardless of the noise ratio. However, when the noise strength is very small (Gaussian 2/5%), ConFrag is not able to filter them effectively. We acknowledge this as a limitation of our method. Since ConFrag primarily focuses on sample selection, it can be easily reinforced by other techniques during the training process to mitigate the effects of tiny noise strengths. Indeed, in the Gaussian 2/5% experiments, integrating C-Mixup or Co-teaching with our ConFrag improves the performance by 4.74% and 4.51%, respectively. Regarding the reviewer’s question on whether increasing the number of fragments can be beneficial when the noise strength is small, please refer to our answer to Q1. Last but most importantly, we evaluate the performance of ConFrag on real-world noise, which includes many types of noise with varying strengths. Table R4 reports the results on a version of the IMDB dataset with real-world noise [3], IMDB-org-B. The results show that the vanilla version of our method performs on par with other best-performing baselines on real-world datasets. Importantly, since ConFrag only concerns the sample selection process, it can be easily integrated with other noisy label methods. By combining our approach with other techniques, ConFrag outperforms all baselines by a non-trivial margin. Demonstrating the practical effectiveness of our approach.

Q1. A systematic way to determine the fragment number?

Analysis in Appendix G.2 shows that fixing the fragment number FF to 4 yields the best performance for the IMDB-Clean-B and SHIFT15M-B datasets, as it is a non-sensitive hyperparameter to the performance. This setting was subsequently used in all our experiments. The analysis further shows that given a large enough dataset, the sensitivity to the number of fragments decreases.

Finer fragmentation can improve detection of tiny strength noise but increases the risk of overfitting due to fewer samples per task. In larger datasets, this risk is mitigated because the data size per fragment remains large enough, allowing ConFrag to benefit from finer fragmentation. Based on these insights, we recommend starting with F=4F=4 and incrementally increasing it until performance deteriorates, especially for larger datasets.

Q2. Limitations to the statement, "If x is more likely to belong to fragment 2 than 5, then it should be more likely to belong to 1 than 4 and 3 than 6."? Can it hold without this relationship?

The statement is highly related to the fundamental characteristic of regression: the continuous and ordered correlation between the label and feature space [5,6,1,7]. Since our approach is rooted in this characteristic, its effectiveness can be compromised if it does not hold. However, we reemphasize that this is a common characteristic found in most regression tasks, and prior works on regression with imperfect data build their methods on top of this characteristic [5,6,1].

评论

Thank you for addressing my concerns. I have updated my rating accordingly.

审稿意见
5

The paper introduces an innovative approach for the collective modeling of regression data, grounded in the idea that similar labels often correspond to shared significant features. The authors convert the data into separate, yet juxtaposed fragment pairs, employing a combination of adjacent fragments to detect noisy labels. This detection is achieved via a mutual agreement within the prediction and representation spaces.

优点

  • The method seems technically solid and correct
  • To the best of my knowledge, this paper presents a novel method that has not introduced in the past
  • The paper presents a robust evaluation setup

缺点

I think that the paper is overwhelmed with its presentation of too many details, which can move to the appendix to let the reader focus on the important parts. That being said, this is the author's decision, and I didn't consider it in my score.

问题

  • If I understand correctly, you can use the method for generative tasks as well. Did you consider it? if it is not possible, can you please elaborate?
  • Did you study classification tasks? If yes, how did it perform?

局限性

.

作者回复

W. Paper is overwhelmed with its presentation of too many details, which can move to the appendix to let up the reader focus on the important parts.

We sincerely thank the reviewer for the suggestion. In order to balance the clarity and the detail, we will move some overly detailed procedure of contrastive fragment pairing in Section 2.1 (line 107-116) to Appendix, and make the motivation part (line 117-144) more concise. To improve clarity better, we will include a figure illustrating the overall framework of ConFrag, highlighting the online nature of the filtering and training processes.

Q1. Extension to generative tasks

ConFrag learns to select samples (x,y)(x, y) that are better aligned and use them for training regression tasks (i.e., learning P(yx)P(y|x)). As the reviewer suggested, the selected samples can also be used for conditional generation tasks (i.e., learning P(xy)P(x|y)).

To verify this, we train a continuous conditional GAN model (SVDL+ILI) [4] for 40k steps using (1) clean IMDB-Clean-B dataset, (2) IMDB-Clean-B with 40% symmetric noise, and (3) samples selected by ConFrag on the noisy dataset. To measure the condition alignment and the quality of generated samples, we use MAE and intra-fragment FID on 52,000 generated images. Specifically, intra-fragment FID is the average of FID values measured for images in each fragment. We use F=4F=4 as done in ConFrag. For both evaluation metrics, lower is better. The results show that sample selection by ConFrag improves both the condition alignment and the quality of conditionally generated images.

intra-fragment FID (F=4)MAE
Clean (0% noise)14.4210.419
Symmetric 40% noise15.99813.569
ConFrag15.24410.348

Q2. Extension to classification tasks

As suggested by the reviewer, it is possible to apply our contrastive fragmentation approach to classification tasks by making the following adjustments:

  1. We set each class as a fragment (i.e., the number of fragments FF = the number of classes)
  2. We measure the distance between fragments/classes. Possible metrics include euclidean distance between the GloVe embedding, CLIP text cosine similarity of each class, or CLIP image similarity using the samples of each class. The distance metric is used for constructing contrastive pairs and defining two neighboring fragments for each fragment.
  3. We redefine the fragment prior (Equation 2) using the distance between classes.

We evaluated the extended ConFrag on CIFAR-10 classification with 20/40/60% symmetric noise ratios using selection ratio and precision as metrics, where precision is defined as P(clean|selected). We found that samples selected by the modified ConFrag are cleaner than the original dataset (i.e., precision higher than 1 - noise ratio), showing the potential of extending ConFrag to classification tasks.

NoiseSelection ratioPrecision
20%0.81910.8871
40%0.73530.7359
60%0.67020.5243

However, since ConFrag is designed to leverage the characteristics of noisy regression tasks as mentioned in the general response, and thus noisy classification is out of our main focus, its performance does not reach the level of state-of-the-art sample selection approaches designed for noisy classification tasks. Additionally, we found that the choice of the distance metric has a significant effect on sample selection performance, indicating that the design of a better distance metric is necessary for successful extension of our approach to noisy classification tasks.

作者回复

Global Response

We thank the reviewers for their insightful comments and acknowledgment. We appreciate the recognition that our approach is technically solid and correct (Reviewer T5Bx), our presentation is commendable (Reviewer T5Bx, 2i51), and our extensive comparison of regression works is thorough (Reviewer 2i51). Additionally, we value the positive feedback on our robust evaluation setup and metrics (Reviewer T5Bx, RwYcT) and the inclusion of four additional benchmark datasets (Reviewer wYcT). We are grateful to all four reviewers for recognizing the novelty, soundness, intuition, and conceptual strength of our approach.

As a global response, we would like to highlight the following.

Characteristics of Noisy Regression

Noisy regression problems have two distinct characteristics that distinguish them from noisy classification tasks: continuously ordered correlation between labels and features and varying degrees of noise strengths. We emphasize that these two characteristics are utilized for designing the key components of ConFrag.

Ordered relations

ConFrag leverages ordinal relations by training on contrastive fragmented pairs to learn robust features. These features are then aggregated and ordered. Clean sample selection is done through the mixture of neighboring fragments which ensures the integrity of the learned order relations.

Focus on noisy label regression

A method tailored for noisy regression problems must primarily focus on detecting noise with high severity, rather than treating all noise as equal. The components of our method such as maximally contrasting fragment pairing and fragment prior, which are based on the distance in continuous and ordered label space, are designed to prioritize filtering out such high severity noise.

Additional Experiments

In this rebuttal, we present the results of ConFrag on two additional noisy label regression datasets suggested by Reviewer RwYcT. Please refer to W3.

Additional Real-World Dataset

We’d like to emphasize that ConFrag deals with the sample selection process, and thus it can be easily integrated with other noisy label methods. Our approach combined with regularization methods such as C-Mixup [1] and Co-Teaching [2] significantly outperforms all baselines on the IMDB-Org-B [3] dataset. Specifically, our method achieves improvements of 3.3% and 4.82% in Mean Relative Absolute Error (MRAE), respectively. This result demonstrates the practical effectiveness of our approach.

We attach a pdf file containing figures and tables. For figures and tables whose index starts with ‘R’, please refer to the attached file.

Global References

[1] Yao et al., C-mixup: Improving generalization in regression. ICML 2022

[2] Han et al., Co-teaching: Robust training of deep neural networks with extremely noisy labels. NeurIPS 2018

[3] Lin et al., FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in the Wild. TIP 2022

[4] Din et al., Continuous Conditional Generative Adversarial Networks: Novel Empirical Losses and Label Input Mechanisms. TPAMI 2023

[5] Yang, et al. Delving into deep imbalance regression. ICML 2022

[6] Gong, et al. Ranksim: Ranking similarity regularization for deep imbalanced regression. ICML 2022

[7] Zha, et al. Supervised contrastive regression. 2022

[8] Zhang, et al. Improving deep regression with ordinal entropy. ICLR 2023

[9] Castells et al., SuperLoss: A Generic Loss for Robust Curriculum Learning. NeurIPS 2020.

[10] Garg and Manwani. Robust deep ordinal regression under label noise. ACML 2020

[11] Wu et al., discrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination. TMM 2024.

[12] Grønlund et al., Margin-based generalization lower bounds for boosted classifiers. NeurIPS 2019

[13] Grønlund et al., Near-tight margin-based generalization bounds for support vector machines. ICML 2020

[14] Wei et al., Open-set label noise can improve robustness against inherent label noise. NeurIPS 2021

[15] Jacobs et al., Adaptive mixtures of local experts. Neural Computation 1991

[16] Dornaika et al., Robust regression with deep CNNs for facial age estimation: An empirical study. Expert Systems with Applications 2020

[17] Zha et al., Rank-N-contrast: Learning continuous Representations for Regression. NeurIPS 2023.

最终决定

This paper proposes a novel algorithm for regression with label noise. The proposed method partitions the data and trains several binary classifiers for the most distant partition pairs. Noisy data samples are detected using the voted probability of all classifiers. The method outperforms other baselines on several public datasets across different domains, as measured by a newly proposed metric. Given that the robust representation learning idea makes intuitive sense and that the reviewers are generally happy with the evaluation, I suggest the paper to be accepted. I have concerns on how the method would scale up to more regression targets but that could be left for future work. Authors shall incorporate reviewers' feedback on the presentation in the next version.