PaperHub
5.0
/10
Poster4 位审稿人
最低4最高7标准差1.2
5
4
7
4
3.0
置信度
正确性2.5
贡献度3.3
表达2.8
NeurIPS 2024

CoSW: Conditional Sample Weighting for Smoke Segmentation with Label Noise

OpenReviewPDF
提交: 2024-04-28更新: 2024-11-06

摘要

关键词
Smoke Recognition; Smoke Segmentation; Industrial Applications

评审与讨论

审稿意见
5

This paper tackles the problem of noisy label in smoke segmentation by introducing uncertainty measure in the some area, especially in the boundary of smoke/not smoke. Noisy label can be problematic for training stability. Entropy is used to measure uncertainty. Highly uncertain prototypes and pixels should not contribute too much during training.

优点

  1. Experiments using synthetic noise dataset and the ablation show the effectiveness of the proposed method.

  2. To the best of my knowledge, it is the first paper to tackle the noisy label problem in smoke segmentation.

缺点

Overall I find the paper difficult to read. I believe it can be caused by these following reasons:

  1. Not all of the notations are clearly defined or some notations are defined very late (e.g. NkN_k has been mentioned since section 3.3 but its definition is in section 3.4). Not all of vector or matrix have their size defined clearly.

  2. No reference to equation/section regarding each components in ablation study in Table 3a.

问题

  1. Will the annotation of the re-annotated validation set (the clean validation set) of SmokeSeg and SMOKE5K be released?

  2. Related to weakness #1: is Ω\Omega in line 37 should be ωΩ\omega_\Omega? Where is p(xnk\mathbf p(\mathbf x^k_n in Eq.(4)?

  3. Any insight why different entropies leads to different results?

  4. What are the λ\lambda, μ\mu, γ\gamma values used?

局限性

No limitation is discussed as also mentioned in the paper checklist.

作者回复

We thank the reviewer for the constructive comments which helped us improve the quality of our work. In the following, we have provided a point-by-point response to the comments. We adopt different letters to represent the different parts of the question raised. The "W" represents "weakness" and the "Q" represents "question".

W1(1). Not all of the notations are clearly defined or some notations are defined very late (e.g. has been mentioned since Sec. 3.3 but its definition is in Sec 3.4). Not all vectors or matrices have their size defined clearly.

We thank the reviewer for the suggestions, and we will include a list of symbol definitions in the paper.

W1(2). About the notation and reference in Tab. 3a.

We thank the reviewer for the suggestion. "Proto" means to change the regular prediction head into a prototype-based one. “Sample Weight” refers to Eq. 7 and the “Proto Update” refers to Eq. 8. We will also include the reference in Tab. 3a.

Q1. Will the annotation of the re-annotated validation set (the clean validation set) of SmokeSeg and SMOKE5K be released?

Yes, we will release the re-annotated validation set of SmokeSeg and SMOKE5K soon.

Q2. is Ω\Omega in line 37 should be ωΩ \omega_{\Omega}? Where is 𝑝(𝑥𝑛k)\pmb{𝑝}(\pmb{𝑥} _ {𝑛}^{k}) in Eq.4?

Yes, the Ω\Omega in line137 should be ωΩ\omega_{\Omega}. The 𝑝(𝑥𝑛k)\pmb{𝑝}(\pmb{𝑥} _ {𝑛}^{k}) in line 155 and 156 should be pk\pmb{p}^k in the Eq. 4.

Q3. Any insight why different entropies lead to different results?

The reason why different entropies lead to different results can be explained from two perspectives:

1. From the characteristics of the three entropies:

Kapur's entropy and Burg's entropy are both based on Shannon's entropy.

Shannon Entropy (T(P)T(P))

  • Formula: T(P)=i=1NpilnpiT(P) = -\sum_{i=1}^{N} p_i \ln p_i
  • Characteristics:
    • Shannon entropy is a fundamental concept in information theory, used to measure uncertainty or information content.
    • The more uniform the probability distribution (i.e., the closer each (pip_i) is to each other), the higher the Shannon entropy, indicating greater uncertainty.
    • When an event's probability is 1 (a certain event), Shannon entropy is 0, indicating no uncertainty.

Burg Entropy (TB(P)T^B(P))

  • Formula: TB(P)=i=1NlnpiT^B(P) = \sum_{i=1}^{N} \ln p_i
  • Characteristics:
    • Burg entropy differs from Shannon entropy in that it directly sums the logarithms of the probabilities.
    • Since the logarithm function is 0 when the probability is 1 and approaches negative infinity as the probability approaches 0, Burg entropy can heavily penalize extremely low-probability events.

Kapur Entropy (TK(P)T^K(P))

  • Formula: TK(P)=i=1Npilnpii=1N(1pi)ln(1pi)T^K(P) = -\sum_{i=1}^{N} p_i \ln p_i - \sum_{i=1}^{N} (1-p_i) \ln (1-p_i)
  • Characteristics:
    • Kapur entropy is an extension of Shannon entropy, considering not only pip_i but also (1pi)(1-p_i).
    • This entropy measure is more comprehensive, taking into account the uncertainty associated with both the occurrence and non-occurrence of each event.
    • In extreme cases (i.e., when pip_i is 0 or 1), Kapur entropy still provides a reasonable measure, addressing the limitations of Shannon entropy in these scenarios.

By comparing these different entropies, we can see that while they all measure the uncertainty of a probability distribution, they have distinct applications and characteristics. Shannon entropy is the most basic and classic measure, while Burg entropy and Kapur entropy offer extensions and complements for different situations and needs.

2. From the weighting derived:

Shannon's vnk=Nkexp(γxnkpk2)n=1Nkexp(γxnkpk2)v_n^k=N^k \frac{\exp(-\gamma || \pmb{x}_n^k-\pmb{p}^k ||_2)}{\sum _ {n=1}^{N^k} \exp(-\gamma || \pmb{x}_n^k-\pmb{p}^k ||_2)}

Burg's vnk=Nk(γxnkpk2)n=1Nk(γxnkpk2)v_n^k=N^k \frac{(-\gamma || \pmb{x}_n^k-\pmb{p}^k ||_2)}{\sum _ {n=1}^{N^k}(-\gamma || \pmb{x}_n^k-\pmb{p}^k ||_2)}

Kapur's vnk=Nk11+exp(xnkpk2λk)v_n^k=N^k \frac{1}{1 + \exp( -|| \pmb{x}_n^k-\pmb{p}^k ||_2 - \lambda_k)}

(where λk\lambda _ k in Kapur's are the solutions of n=1Nkvnk=Nk\sum_{n=1}^{N^k} v_n^k=N^k)

Compared to Shannon's entropy derived weighting (Eq. 7 in paper), Burg's (Eq. 32 in Appendix) lacks the exponential term. However, the exponential term can increase the sensitivity of the model to noisy labels. As a result, in Table 4, the performance of Burg's entropy is slightly inferior to the others.

The relatively poor performance of Kapur's entropy derived weighting (Eq. 37 in Appendix) can be attributed to the fact that we do not have TtK=TwK+TbK T_t^K = T_w^K + T_b^K for Kapur's entropy, which is valid on Shannon's entropy and Burg's entropy. Hence, for a given classification problem, maximizing the within-prototype Kapur's entropy is different from maximizing Kapur's entropy on the entire dataset. This means we cannot simply consider maximizing the entropy of each prototype independently. But for consistency and comparison, we also use the within-class Kapur's entropy instead of the within-class entropy to design the objective function. This may also be the reason why the performance is not as good as Shannon's entropy. The derivation of the different entropy measures can be found in the Appendix.

Q4. What are the λ\lambda, μ\mu, γ\gamma used?

In our experiment, we set λ=0.6\lambda = 0.6, μ=0.999\mu = 0.999, and γ=0.8\gamma = 0.8. The γ\gamma represents the strength of the regularization term in RWE, we provide the performance of models with different values. The details can be seen in the global response PDF file Tab. 1.

γ\gamma00.20.40.60.81
F1F_168.1770.1471.2272.3072.3272.04
mIoU55.2356.4257.6059.7759.8358.58
评论

Thank you author for the rebuttal. I am increasing to borderline accept.

评论

Thank you very much for the feedback.

审稿意见
4

In order to solve the problems of complex and blurred edges of non-grid smoke in smoke segmentation, as well as the existence of noisy labels in large-scale pixel-level smoke datasets, this paper proposes a conditional sample weighting (CoSW) method. CoSW uses a multi-prototype framework, in which prototypes are used as prior information and different weighting criteria are used in different feature clusters to solve the problem of feature inconsistency. This paper also introduces a new regularized within-prototype entropy (RWE) to achieve steady-state and stable updating of prototypes.

优点

1.This article's expression and language are mostly accurate, and its structure—which includes an introduction to the problem, a method, experimental results—is well-organized. The paper has a reasonable general organization and order. 2.This paper proposes a conditional sample weighting (CoSW) to handle smoke segmentation in the presence of noisy labels. CoSW is built on a multi-prototype framework, using prototypes as prior information to determine weight criteria, and weighing each feature cluster with different weight criteria. 3.This paper also introduces a new regularized within-prototype entropy (RWE) to obtain comprehensive information of samples. 4.From the results and visualization, the proposed method improves the accuracy of smoke segmentation. The visualization results show that this method can miss less high-transparency smoke than previous methods.

缺点

1.In section 1, two challenges of current smoke annotation are proposed in the second paragraph: "1) Smoke edges are complex and blurry, making it hard to distinguish smoke and background. 2) Smoke is non-rigid and lacks a fixed shape, making it difficult for annotators to become proficient through practice with the same shape." Nevertheless, the fifth paragraph of this paper's proposal for the CoSW approach omits to clarify how and exactly why CoSW can resolve these two problems from a methodological perspective. 2.In section 3.4, there are two hyper-parameters in Eq. 5 and Eq. 11: γ and μ. The article's explanation of γ and μ is too simplistic, and there is no extensive explanation of how γ and μ are defined, and how they affect the entire formula in the form of weights. 3.In section 3.4, the derivation process of how Eq. 5 is produced doesn't reflect in Appendix D, which simply covers the procedure of using Eq. 5 to derive future equations. 4.Please explain the reason for choosing the value of "we set ε= 10−5" in Section 4.4 and what effects it will have if ε is greater or less than 10−5. 5.The layout and aesthetics of the figures and tables in this article need to be improved, such as the placement of Table 4 and the setting of the table size and so on.

问题

  1. The two datasets used in the experiments are SmokeSeg and SMOKE5K. Can you add their information, such as the number of images, noise label types, noise rates, resolutions, etc.?
  2. For experimental results in Table 1 and Table 2, only numerical descriptions are listed. Can you add corresponding explanations and discussions for less than optimal performance?
  3. How to understand "CoSW is concise and does not require data with clean labels during the training." in the Section Abstract? Can this method be applied to completely noisy labels? What mechanism is used to achieve this?

局限性

1.In Table 1, the results show that “Large” with δ (smoke pixel ratio in an image) greater than 2.5% and “Medium” with δ between 0.5% and 2.5% have poor performance in real-time, but the reason is not analyzed. Does it mean that CoSW is not very effective in the case of high smoke pixel ratio? 2.In Section 5.3, this paper only introduces the experimental phenomenon that Trans-BVM performs best at low noise rates and CoSW performs best at high noise rates, but does not explain why. Does this result mean that CoSW has limitations at low noise rate scenarios?

作者回复

We thank the reviewer for the constructive comments. Below, we have provided a point-by-point response. The "W" for weakness, "Q" for question, and "L" for limitation.

W1. Relationship between the two smoke challenges and CoSW.

The two analyses are to explain why smoke tends to produce noisy labels. We then examine the characteristics of the noisy labels in smoke, finding that it has feature inconsistency due to variable transparency. To address this problem, we propose a conditional sample weighting (CoSW). CoSW aims to employ different weighting criteria for the samples within different feature clusters by constructing the regularized within-prototype entropy (RWE).

W2. Further explanation of μ\mu and γ\gamma.

The μ=0.999\mu=0.999 is for the momentum coefficient. Here we refer to the setting in MoCo (CVPR 2020). The meaning is how much of the previous content is retained with each update. Where γ\gamma represents the degree of the regularization term in the RWE, the larger the γ\gamma, the more sensitive the RWE is to the distance. We provide experiments with different values of γ\gamma in the global response PDF file Tab. 1.

W3. About the Eq. 5.

Eq. 5 means combining WE (Eq. 4) and the constraint equation Eq. 2. The original objective without the regularization is to build a uniform assignment, but after incorporating the regularization term Eq. 4, the RWE can determine the noise level of the features.

W4. The explanation of ε\varepsilon.

The introduction of ε\varepsilon is to prevent the occurrence of singular matrices when inverting the matrix. This is a commonly used technique in LDA. The typical value used is 10e-5. We test 10e-4 or 10e-6 and they also worked for training, but setting it to 0 results in non-invertible matrices during the training.

W5. About the layout and aesthetics of the figures.

We will revise the layout and aesthetics of the figures and tables.

Q1. Can you add details of SmokeSeg and SMOKE5K?

SMOKE5K is a mixed dataset (real + synthetic) and SmokeSeg is an entirely real dataset. The majority of images in SmokeSeg are early smoke. The details of the two datasets are shown below.

Number of ImagesNoise Label TypesNoise Rates*Training Resolution
SMOKE5K1,360 real + 4K syntheticreal8.4%480x480
SmokeSeg6,144real11.5%512x512

*The "Noise Rates" here is the ratio of noisy pixel labels to clean pixel labels (estimated by the validation set).

Q2. Can you add discussions for Tab. 1 and Tab. 2?

In Tab. 1: Our method has a clear advantage in small smoke. As small smoke mostly represents early smoke, being able to recognize early smoke accurately has significance for carrying out rescue work quickly in the real world. In addition, under the same method, we find that CoSW is more suitable for transformer-based backbones. Comparing Trans-BVM and CoSW, the gap is around 1% on ResNet-50, but the gap widens to over 4% on the MiT-B3. A similar phenomenon can be observed on SMOKE5K in Tab. 2a.

In Tab. 2b: The tests are divided into two: 1) Noise ratio (the proportion of data added noise); and 2) Noise degree (the strength of the noise) (detailed in Appendix C). The impact of noise degree is sometimes greater than the noise ratio (e.g., 40% high vs 60% low). Since Trans-BVM is specifically designed for smoke, it achieves better results in low-noise scenarios. As the noise continues to increase, the performance of other methods declines rapidly, but CoSW still maintains the performance.

Q3. How to understand "CoSW does not require clean labels during the training."? Can this method be applied to complete noise? What mechanism?

Many previous methods in noisy labels require a clean validation during the training to guide the model identifying noise. 1) the cost of obtaining clean labels is high. 2) Incorporating validation set into the training makes the pipeline complex. The intuition is that under CoSW, the model can find the common characteristics of smoke (i.e. multiple prototypes), and then determine the noise level of each feature based on them. This process is specifically carried out through RWE. The CoSW aims to employ different weighting criteria in different feature clusters to address the problem of feature inconsistency. Fig. 5b shows the CoSW formation. The CoSW requires the assumption that the majority of pixels have clean labels. When the label mask is completely noisy, the model is also unable to distinguish the noisy labels, because it can not learn which features are the common of smoke. To test the anti-noise ability of CoSW, we design an extreme experiment by directly adding different levels of Gaussian noise to the original labels, until it approaches the complete noise. The results and examples of noisy images can be seen in the global response PDF file Tab. 2 and Fig. 2.

L1. CoSW is not very effective in high smoke pixel ratio.

Since the noisy labels mainly occur at the edges of the smoke, in small smoke, the impact of the noisy labels is greater, while in large smoke (high smoke pixel ratio), the impact is smaller. Therefore, the small smoke is more important and meaningful in the real world, as it can assist in timely rescue. Our CoSW achieves the best performance in small smoke, and Tab. 2b demonstrates that our method has robustness against noise. In the high smoke pixel ratio, the impact of noisy labels is not so great, so the performance of CoSW is not very outstanding, but it is second, with the F1F_1 score close to the first (< 0.2).

L2. CoSW has limitations at low noise scenarios.

The reason why CoSW is slightly inferior to Trans-BVM under a low noise ratio is that CoSW uses the basic segmentation model, while Trans-BVM uses a model specially designed for smoke, with a Bayesian generative model and a Transmission module. When the noise rate increases, the performance of CoSW begins to emerge.

评论

Thanks for the author's rebuttal, I will further consider the review comments.

评论

Thanks for your response. Please feel free to let us know if you have any further questions. We are dedicated to further clarifying and addressing any remaining issues to the best of our ability.

评论

Dear Reviewer,

Thank you for carefully reviewing our rebuttal. If you have any further questions, please let us know promptly so that we can resolve them in the remaining time. We hope you will reconsider our score. Thank you again.

Best regards,

The Authors

审稿意见
7

Smoke segmentation is an important problem as it can be directly tied to health and safety. That being said, it is also a difficult problem as annotations for smoke segmentation datasets are noisy, sometimes leading to inconsistent or even poor segmentation performance. The authors address this issue by proposing a prototype-based clustering algorithm using different weighting criteria for feature clusters and prototypes via conditional sample weighting (CoSW) and regularized scatter metric learning (RSML). They also introduce regularize within-prototype entropy (RWE) to update prototypes using adaptive sample weighting. The proposed approach can be attached to existing segmentation models (e.g., SegFormer or MiT-B3) to produce state-of-the-art results on two real smoke segmentation datasets and on their new synthetically-noisy smoke segmentation dataset, NS-1K. This method is not only effective at dealing with noisy labels in smoke segmentation, but its general formulation has the potential to be adapted to other problems with noisy segmentation labels.

优点

  1. Provided a mathematical derivation or foundation for their proposed clustering method’s prototype and feature weighting (CoSW), the regularized scatter metric learning (RSML), and the prototype update method (RWE).
  2. Improved the quality of two existing real smoke segmentation datasets (SmokeSeg and SMOKE5K) by carefully re-annotating the validation set. The community can only progress if our benchmark datasets are reliable enough to believe the results are meaningful. Improving the reliability of the validation set for these datasets not only improves the proposed work's performance, but helps the community as a whole.
  3. Created a synthetic smoke noise dataset, NS-1K. The authors re-annotated 1000 images from SmokeSeg and added artificial noise to the labels to create a new dataset to target label noise problems with smoke segmentation.
  4. A thorough evaluation showing state-of-the-art performance on two existing benchmarks (SmokeSeg and SMOKE5k) and on their new datasets, NS-1K. The proposed approach surpasses the performance of existing methods in both F1 and mIoU with various backbones and even in real-time. The evaluation is thorough and comprehensive, featuring both quantitative and qualitative results compared with previous state-of-the-art methods. The ablation study is also very detailed and showcases the importance of each contribution to the method’s performance.

缺点

  1. The writing could be more clear at times, in particular in some parts of the mathematical derivation. Overall, the writing is understandable but it could be improved for clarity.
  • L104: What diagrams are the authors referring to?
  • L127: This is somewhat unclear, what do the authors means by varieties?
  • L137: What do classes mean in the context of this problem could be more clear with a simple example in parenthesis.
  • L138 and L196: What is D? The pixel dimension, 3 for RGB?

Minor:

  • Strange grammar/wording: L104-105, L128, L282

问题

  1. What classes are the authors referring to in L137? Smoke vs background?
  2. Could you give some details on the "Real-time" vs "Normal" column of Table 1? Also, what is the runtime or FPS of this approach compared to others, for real-time applications?
  3. This is not a limitation or a flaw, but more a question born from curiosity. Has this approach been applied to any other noisy-segmentation problem other than smoke segmentation? The formulation of the approach appears fairly general, meaning that it could potentially have broader use.

局限性

  1. The authors identify a limitation of metric learning in that it does not perform as well as the baseline (proto) when optimized using the triplet loss with noisy labels. However, using CoSW with the triplet loss (or even better, with the scatter loss) noticeably improves performance, showcasing the significance of CoSW in dealing with noisy labels.
  2. The authors also note that RWE is based on Shannon entropy, which could potentially be inferior to Kapur's or Burg's entropy formulations. The authors address this by performing an experiment (see Table 4) with RWE using each of these entropy formulations, which showed that Shannon entropy resulted in the best performance.
  3. Since clustering approaches are iterative, they can take a performance hit in real-time applications compared to single-pass methods like neural networks. The authors provide some results pertaining to this problem (see Table 1) but don't directly address this issue. That being said, runtime performance is not the focus of this problem or paper, so the limitation is not a significant issue.
作者回复

We thank the reviewer for our paper's positive feedback and constructive suggestions. Here are our responses to the reviewer's comments. We adopt different letters to represent the different parts of the question raised. Where "W" represents weakness and "Q" represents question.

W1(1). L104: What diagrams are the authors referring to?

The “diagrams” here refers to the “research fields”.

W1(2). L127: This is somewhat unclear, what do the authors mean by varieties?

The “varieties” here mean “N random variables”.

W1(3). L137: What do classes mean in the context of this problem could be more clear with a simple example in parenthesis.

Our intuition is to provide a more general expression, where CoSW can also be applied to multi-class tasks.

W1(4). L138 and L196: What is D? The pixel dimension, 3 for RGB?

“D” represents the feature dimension after the pixels have gone through the encoder-decoder. At the same time, it is also the dimension of the prototype. In our experiments, the value of D is 256.

W1(5). Strange grammar/wording: L104-105, L128, L282.

Thank you for the reviewers' suggestions. We explain the issues below:

  • L104-105: The research fields include supervised learning, unsupervised learning, and self-supervised learning.

  • L128: Shannon entropy can be used to measure the uncertainty of the distribution T(Π)=i=1NπilnπiT(\Pi) = - \sum _ {i=1} ^ {N} \pi_{i} \ln \pi_{i} (with i=1Nπi=1\sum_{i=1}^{N}\pi_i=1 constrain).

  • L282: To investigate the reasons behind the effects of CoSW, we visualize the formation process of CoSW.

Q1. What classes are the authors referring to in L137? Smoke vs background?

Yes, the classes are smoke and background.

Q2. Could you give some details on the "Real-time" vs "Normal" column of Table 1? Also, what is the runtime or FPS of this approach compared to others, for real-time applications?

The "real-time" refers to using lightweight backbones, such as MiT-B0, AFFormer-B, SeaFormer-B, etc. "Normal" uses the regular backbones, such as ResNet-50, MiT-B3, HRNet-48, etc. Here, we also provide a FPS comparison for the different methods.

Real-time:

MethodAFFormerSeaFormerSegFormerSCCleanNetCoSW
BackboneAFFormer-BSeaFormer-BMiT-B0MiT-B0MiT-B0MiT-B0
FPS101.392.598.488.489.593.6

Normal:

MethodDeepLabV3+OCRNetSegNeXtTrans-BVMCoSWSegFormerTrans-BVMSCCMW-NetCleanNetCoSW
BackboneResNet-50HRNet-48MSCAN-LResNet-50ResNet-50MiT-B3MiT-B3MiT-B3MiT-B3MiT-B3MiT-B3
FPS32.422.220.130.336.044.130.727.528.535.436.7

*Input shape: 512x512; NVIDIA RTX 3090Ti GPU

Q3. This is not a limitation or a flaw, but more a question born from curiosity. Has this approach been applied to any other noisy segmentation problem other than smoke segmentation? The formulation of the approach appears fairly general, meaning that it could potentially have broader use.

We find that the variable transparency also exists in skin lesions, so we supplement the experiments on skin lesions images. The dataset used is the ISIC-2017 dataset. The ISIC 2017 dataset is a well-known public benchmark of dermoscopy images for skin cancer detection. It contains 2000 training and 600 test images with corresponding lesion boundary masks. For the noise setting, we refer to the noise generation approach from the NS-1K dataset. Below we list the experimental results. The sample weighting visualization can be seen in the global response PDF file Fig. 1.

MethodTrans-BVMSCCleanNetCoSW
Clean85.4083.8084.1684.32
Noise Ratio:60%; Degree High*69.2371.9872.8074.57

*The specific meaning of the noise setting can be seen in the paper Sec. 5.1 NS-1K part.

评论

I thank the authors for their detailed rebuttal, addressing our concerns, and providing additional experiments. In particular, I appreciate the additional experiment on ISIC-2017, it showed the utility of this approach on more than smoke segmentation.

I have reread the paper, all of the reviews, the rebuttal, and comments by the authors. It appears that 1bVm, KX4g, and my concerns were largely focusing on the work’s presentation clarity. After reading their concerns and the authors’ rebuttal, it appears that the authors have addressed our presentation-related concerns in a satisfactory manner.

Of all the review comments (including my own) T8tv’s are perhaps the most concerning. If this submission is indeed a duplication of concurrently submitted work (“DSA: Discriminative Scatter Analysis for Early Smoke Segmentation” to ECCV 2024), that would dramatically reduce the value of this contribution to the community and could be in violation of the dual submission policy. However, I cannot find any trace of the DSA paper T8tv is referring to, not on ECCV’s website or even on arxiv. If the work is in violation of the dual submission rules, the area chair should be notified. Since I have no evidence of that, I will assume this submission is not in violation of these rules. As a result, I think it is not fair to reject a work based on how similar it is to an unpublished work (even if the unpublished work has allegedly been accepted to a conference). According to the NeurIPS FAQ, “Papers appearing less than two months before the submission deadline are generally considered concurrent to NeurIPS submissions. Authors are not expected to compare to work that appeared only a month or two before the deadline. “ Since ECCV 2024 papers have not even been officially published papers yet, it is not fair to require this submission to be compared against DCA. As such, I am disregarding T8tv’s single concern.

As it stands, I maintain my initial rating of “accept” since most of the reviewers' concerns have been addressed. I am happy to reassess my review if other concerns are presented.

评论

Thank you very much for the response.

审稿意见
4

This work proposes a method for Smoke Segmentation with Label Noise, addressing the issue of noisy labels commonly found in non-grid smoke annotations. This idea is meaningful and reasonable. The main contributions of the paper are the conditional sample weighting (CoSW) and regularized within-prototype entropy (RWE).

优点

N/A

缺点

However, during my recent review of ECCV 2024 papers, I gave positive feedback on a similar paper titled “DSA: Discriminative Scatter Analysis for Early Smoke Segmentation,” which has been accepted by ECCV 2024. That paper applied Scatter Matrices to Smoke Segmentation, optimizing the objective function through the ratio-trace form of (S_w)^-1*(S_b). In my view, the regularized within-prototype entropy (RWE) in this submission is very similar to the method I previously reviewed. Therefore, I am currently unable to give a positive rating for this submission unless the authors can clarify the theoretical differences between the two works.

问题

see Weaknesses

局限性

see Weaknesses

作者回复

We understand the reviewer's confusion, but this paper is different from DSA. We would like to clarify the differences between our proposed method and the DSA point-by-point below.

  1. The key difference is that CoSW is to construct conditional sample weighting to address the issue of noisy labels, while DSA is to formulate scatter matrices to handle the problem of hard samples. We introduce the information theoretic learning (ITL) to achieve CoSW, which aims to determine sample weighting through uncertainty estimation. Furthermore, we experiment with different entropies, including Shannon's, Kapur's, and Burg's entropy, and provide complete weighting derivations based on each entropy in the Appendix. CoSW only adopts scatter analysis in the loss part, and different from DSA, we further incorporate the sample weighting into the scatter matrix, to prevent metric learning from overfitting under noisy labels.

  2. In CoSW, we propose a concept of regularized within-prototype entropy (RWE). RWE can establish independent evaluation criteria for each feature group formed by multiple prototypes. This allows for a more fine-grained determination of the weighting, which is suitable for inconsistent smoke features.

  3. For the prototype update, CoSW is also different from DSA. DSA directly uses gradient descent, but gradient descent can easily be affected by noisy labels. We adopt a regularized nonparametric prototype update (Fig. 3 in the paper), in which the new prototype is weighted by all the features that matched the prototype in the previous iteration.

  4. The prediction head in DSA and CoSW is different as well. The essence of DSA is to integrate the scatter loss into the original model, still using the classic binary prediction head (2 neurons). But CoSW changes the prediction head to a prototype-based way, where the final output neurons are 2K, with K being the number of prototypes per class.

  5. In the experiment, we not only test on real datasets but also create a synthetic noise dataset NS-1K, which has two noisy hierarchies: the ratio of noisy labels to the total labels and the degree of noise. Under this setting, NS-1K can reflect the performance of various models under different noise levels.

评论

Thanks for the reviewer's response. We are delighted to answer any other concerns you may have.

评论

Dear Reviewer,

We sincerely thank you for taking the time to review our paper and for providing valuable comments. Despite some superficial similarities, our method differs significantly from DSA in fundamental ways. We have provided a detailed explanation in our rebuttal.

We cordially invite you to review our detailed rebuttal. If you have any questions, please feel free to contact us. We will do our best to clarify and eliminate the remaining concerns.

Best regards,

The Authors

评论

Thanks for the author's rebuttal, I will carefully consider and compare the differences between these two works.

作者回复

We thank the reviewers for their careful reading of our paper and help with improving our manuscript.

We sincerely appreciate that you find our work:

  • It is the first paper to tackle the noisy label problem in smoke segmentation (Reviewer 1bVm).
  • Create a synthetic smoke noise dataset, NS-1K (Reviewer bKT9, Reviewer 1bVm, Reviewer KX4g).
  • Provide a mathematical derivation or foundation for their proposed method (RWE) (Reviewer bKT9).
  • Improving the reliability of the validation set helps the community as a whole (Reviewer bKT9).
  • The proposed approach surpasses the performance of existing methods in both F1F_1 and mIoU with various backbones and even in real-time (Reviewer bKT9).
  • The ablation study is also very detailed and showcases the importance of each contribution to the method’s performance (Reviewer bKT9).

In the subsequent sections, we aim to address the concerns and questions you raised, offering a comprehensive item-by-item response to each of your comments.

We have provided some additional experiments as reviewers suggest. Due to space limitations, we've displayed the results tables and figures in the global response PDF file for Reviewer bKT9 Q3, Reviewer KX4g W2, Q3, and Reviewer 1bVm Q4.

最终决定

This paper received divergent reviews. R-T8tv is negative, with novelty concerns with respect to an unpublished paper previously reviewed. R-bKT9 is positive, noting technical novelty, dataset contributions, and strong evaluation. R-KX4g leans negative, primarily concerned with motivation (i.e., how the proposed method addresses the problems introduced in the intro) and clarity (reproducibility of the method). R-1bVm leans positive but also notes clarity issues (and indicates low confidence). The authors submitted an extensive rebuttal which did address some of the reviewers concerns (e.g., differentiating from parallel work, adding details that improve clarity). During the discussion phase, the reviewers and AC concluded that there were no dual submission concerns and the paper should be evaluated on its own merits. As the ratings remained split, the AC reviewed the paper in detail, finding the arguments of R-bKT9 the most compelling. The paper tackles an important problem, has novelty in the method formulation, contributes to benchmark datasets which may positively impact the community, and has extensive experiments validating the approach. As such, the AC reached a decision to accept the paper. Please take the reviewer feedback into account when preparing the camera ready version.