5.3

/10

Rejected4 位审稿人

最低5最高6标准差0.4

3.8

置信度

ICLR 2024

Semi-Supervised Semantic Segmentation via Marginal Contextual Information

Moshe Kimhi,Shai Kimhi,Evgenii Zheltonozhskii,Or Litany,Chaim Baskin

OpenReview PDF

提交: 2023-09-19更新: 2024-02-11

TL;DR

Better semi-supervised semantic segmentation by using information about neighboring pixels to improve pseudo-labels

摘要

关键词

Semantic segmentationsemi-supervised learningcontextual informationsemi-supervised segmentation

评审与讨论

审稿意见

评分: 5置信度: 32023-10-27

This paper introduces a teacher-student paradigm for the task of pseudo labeling within the context of semi-supervised segmentation. The idea is to have two identical deep learning network for teacher and student. The teacher is only fed with the unlabeled data, while the student network takes in both labeled and unlabeled data, in a bid to dynamically set the threshold of the unlabeled data pseudo label which is used to guide the student network. The pseudo label assignment is done through assessing an event-union probability of a group of neighboring pixels wherein the probability that at least one pixel belongs to a given class is computed. Using the neighboring pixels introduces the contextual cues to enhance the pseudo label propagation.

优点

The angle at which the pseudo labeling problem is solved in this paper is encompassing different aspects of concerns that exist in the relevant literature. Different component of the approach that are put together as a unified module are interesting and may open up new perspective for future research and need further investigation.

缺点

Although the qualitative results shows smooth segmentation in internal parts of the object, the artifacts are exaggerated in the boundary regions (compared against the baseline) of the segmentation despite the fact that neighboring pixels and the decaying distance-dependent factor are used in the conjunction of each other and it is supposed to refine the segmentation certainty. The paper needs to showcase more segmentation results in the qualitative section because it is not quite clear how the performance is like from only two sets of samples (Fig3). Specially when it is compared against the numerical results in the quantitative section. Most of the improvement are marginal (less that one percent) in the provided tables (and within the error range) and the correspondence of of the qualitative result to the experiment "partition" is not obvious in this regard.

问题

1- The inclusion of neighboring pixels may affect the segmentation in the boundary of the object and it may cause artifacts as it can be seen in Fig 3. The authors have not discussed how they would tackle/minimize this problem. 2- Most of the given samples in the figures contain one object and the background. How is the performance if a complicated background exist with multiple depth facades ?

2023-11-20

Thank you for review and for finding our approach interesting. Let us address some of your concerns in this comment, and we will update the paper accordingly during the discussion period.

Q1: Most of the improvements are marginal (less than one percent) in the provided tables (and within the error range)

A1: We are adding the empirical variance of our method and the baseline over PASCAL VOC 12; for 366 labeled examples, the standard deviation is 0.2 mIoU, meaning the improvements are significant compared to the error range. It is worth reiterating that our method improves without additional parameters or loss components. Previous works often show similarly-sized improvements, partly due to the saturation of benchmarks.

Q2: The paper needs to showcase more segmentation results in the qualitative section because it is not quite clear how the performance is like from only two sets of samples (Fig3).

A2: A larger set of qualitative results are displayed in Figure H.1 in the appendix. In addition, we will add qualitative results from predictions on COCO in a low annotation regime.

Q3: The correspondence of the qualitative result to the experiment "partition" is not obvious

A3: All qualitative results shown are from Pascal VOC 2012 with 366 labeled samples. Thank you for pointing out that issue; we will add clarification in the paper.

Q4: The inclusion of neighboring pixels may affect the segmentation in the boundary of the object, and it may cause artifacts, as can be seen in Fig 3. The authors have not discussed how they would tackle/minimize this problem.

A4: As reported in reply to reviewer P9tM, we indeed observe degradation of Boundary IoU while using S4MC, from 19.3 to 18.2 mIoU on FixMatch using 183 annotated images from PASCAL VOC 12. Possible solutions can include edge detection to give less weight to pixels near boundaries, which is part of our future work.

Q5: Most given samples in the figures contain one object and the background. How does the performance of a complicated background exist with multiple-depth facades?

A5: Our method, with PLR in particular, aims to solve the ambiguity between all possible classes at the pixel level. Figure H.1 in the appendix contains interesting showcases, where objects from different classes are bunked together, such as the dog under the table. In addition, since PASCAL has only 20 object classes, we conducted experiments over the COCO dataset, which comprises 80 classes, meaning more finely annotated objects are densely placed. Besides the experiments, we will add more visual results over COCO to the appendix to qualitatively show the improvement using S4MC in dense scenes.

We will happily answer additional questions; if we address your concerns, we hope you will reflect it in your final rating.

2023-11-23

Dear reviewer, Since the deadline for discussions is approaching, we would appreciate your review of our personal and general responses and let us know if you have additional questions. If we have properly addressed your concerns, we would appreciate it if you could reflect it in your score.

审稿意见

评分: 5置信度: 42023-10-29

The manuscript presents a technique for semi-supervised learning of semantic segmentation models. The techniue extends previous approaches based on unsupervised consistency where predictions of the teacher branch are used as targets for training the student. In order to avoid meaningless learning, these approaches train only on the most confident teacher predictions. The proposed technique extends this idea by expressing the confidence according to an upper bound of probability of that a small pixel neighbourhood contains more predictions of the same class. The technique changes the baseline method in two ways. First, the learning takes into account more unlabeled pixels since the threshold \gamma_t is a quantile of the pixel-level confidence (5) that tends to be less than the proposed union-level confidence. Second, the consistency loss also works on union-level predictions instead of on pixel-level predictions.

优点

S1 The proposed method can be combined with many existing techniques for semi-supervised segmentation (feature perturbation appears as a notable exception)

S2 The proposed method is conceptually simple and effective; Table 1 claims that it improves the CutMix-Seg mIoU by 4 percentage points on VOC aug with 1/1 training images.

缺点

W1 Comparison with UniMatch is difficult due to different experimental setups. The authors do not explain the reasons for reproducing UniMatch performance instead of just copying the numbers from the original paper.

W2 Comparison with CutMix-Seg is difficult due to different backbone and different segmentation architecture.

W3 Experiments do not report variance across different subsets of labeled/unlabeled images.

问题

Questions

Q1 Can you provide a comparison with previous work under their original experimental setups?

Q2 Can you confirm that the supervised baselines for all approaches in Tables 1-3 are equal?

Q3 Can you decouple improvement due to threshold \gamma_t being applied to union-level confidence from the improvement due to using union-level predictions in the loss (3)?

Q4 Why do the blue graph and the orange graph in Figure 4b converge at the end of training?

Q5 Report experiments with ResNet-50 in order to reduce environmental impact and to allow reproduction on modest hardware.

Q6 Explain the difference between the best numbers in Tables 4a/b and Table 5.

Q7 Report minimal hardware requirements (GPU RAM) and computational budget (GPU days) for reproducing the experiments

Suggestions

G1 Consider correcting "coarse PASCAL" as "augmented PASCAL"

G2 Consider rephrasing the term "information gain" since information gain is often considered as a synonym to KL divergence

G3 Improve descriptions of the related work. For instance the sentence with "unreliable prediction" fails to describe the gist of (Wang et al 2022).

G4 Explain where is the experimental performance of CutMix-Seg taken from.

G5 Consider clarifying (3) by replacing \hat[y] with f_{\theta_t}(x_i^u)

2023-11-20

Thank you for your review! We are happy that you appreciated the simplicity and effectiveness of the proposed method. Here, we will address the points you raised in your review, and we will update the paper accordingly later during the discussion period.

Q1: Experiments do not report variance across different subsets of labeled/unlabeled images.

A1: Semantic segmentation experiments are computationally heavy; segmentation SSL methods do not report the variance. We agree that the variance learning may indeed be high, so we run three random seeds for Pascal VOC 2012; for 366 labeled examples, the standard deviation is 0.2 mIoU. We will report it in the paper. Adding variance to every experiment is, unfortunately, beyond our computational resources, especially for large datasets.

Q2: Can you decouple improvement due to threshold \gamma_t being applied to union-level confidence from the improvement due to using union-level predictions in the loss (3)?

A2: The union-level predictions only affect pseudo-labels. The predictions used for the optimization process are not affected by the refinement mechanism. We will make sure to be clear in the paper about where we use the refinement module.

Q3: Comparison with UniMatch is difficult due to different experimental setups. The authors do not explain the reasons for reproducing UniMatch performance instead of copying the numbers from the original paper.

A3: We reproduced UniMatch results using the official implementation and the experimental configuration described in the paper; the results are close to the paper. In some experiments, e.g., PASCAL + SBD, not all data partitions are available, so for a fair comparison, we used the same random split for us and UniMatch. This may have caused a mismatch between our and reported results.

Q4: Why do the blue graph and the orange graph in Figure 4b converge at the end of training?

A4: Fig. 4 is generated using a single training run that uses S4MC, i.e., the training is performed using S4MC; yet, at each iteration, we calculate the metrics displayed in Fig. 4 using two methods. This allows us to evaluate marginal improvements at each point in time rather than cumulative ones. As a result, when all pixels pass the threshold, the accuracy is the same. We will add clarification to the paper.

Q5: Explain the difference between the best numbers in Tables 4a/b and Table 5.

A5: These tables use different setups for ablation. Table 4 refers to ablation on neighborhood size and $\alpha_0$ with CutMix-Seg+S4MC on PASCAL VOC 2012 with 366 labeled samples. Table 5 reports results on Fixmatch+S4MC on Pascal VOC 2012 with SBD and 2646 labeled examples. We added this information to the table captions.

Q6: Report minimal hardware requirements (GPU RAM) and computational budget (GPU days) for reproducing the experiments

A6: On 8 RTX A6000 GPUs, Pascal VOC 2012 with 366 labeled samples takes an average of 24:34 hours to train with DeepLabV3 with ResNet101; with ResNet50, it takes 12:11 hours to train on average. The minimum requirement is 48GB GPU memory for a batch size of 1.

Q7: Comparison with CutMix-Seg is difficult due to different backbones and different segmentation architecture.

A7: We are comparing to CutMix-Seg using DeepLabV3+ResNet101; the results of CutMix-Seg are from “Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels” (U2PL), which used identical architecture.

Q8: Can you provide a comparison with previous work under their original experimental setups?

A8: We have provided all compared methods using their original setup except for CutMix-Seg, for which we used results from U2PL to use the same architecture for the different baselines.

Q9: Can you confirm that the supervised baselines for all approaches in Tables 1-3 are equal?

A9: Yes, the supervised baseline for the different methods is the same.

Q10: Report experiments with ResNet-50 to reduce environmental impact and to allow reproduction on modest hardware.

A10: We will add results on Pascal VOC 2012 with ResNet50.

Q11: Consider correcting "coarse PASCAL" as "augmented PASCAL."

A11: Thank you for the suggestion. We will correct it.

Q12: Consider rephrasing the term "information gain" since information gain is often considered as a synonym to KL divergence

A12 Thank you for noting; we will replace the term “information gain” with information utilization.

Q13: Improve descriptions of the related work. For instance, the sentence with "unreliable prediction" fails to describe the gist of (Wang et al. 2022).

A13: We will rephrase the descriptions of the related work. In this example, we will use “unreliable pseudo-labels,” following Wang et al.

We will happily answer additional questions; if we address your concerns, we hope you will reflect it in your final rating.

2023-11-23

I would like to thank the authors for their exhaustive feedback. I still find it difficult to compare the present work with UniMatch since the tables report experiments with different versions of the model (UniMatch* in Tables 1,2 and 3, and UniMatch in Table 4).

2023-11-23

Edit: The previous version of this message contained a mistake.

All our results involve the same version of the model. The star denotes that the results are from our local run of the reference code of UniMatch. The missing star from Table 4 was a typo, which we will fix. Hope this clarifies the issue and convinces you the comparisons are all fair.

2023-11-23

Additional clarification on Unimatch results: We used UniMatch's reference code from the official repository with the provided hyperparameters. We believe that providing these additional clarifications will resolve the issue of fair comparison to prior art.

审稿意见

评分: 6置信度: 52023-10-31

This paper presents an approach for performing semi-supervised semantic image segmentation. To address the issue of a threshold-based filtering strategy prevailing in the semi-supervised field, the author proposes a pseudo-label refinement algorithm dedicated to the segmentation task. Specifically, the predicted pseudo-label of each pixel is improved by considering the predictions of that pixel's neighboring pixels using a proposed method. The method achieved state-of-the-art performance on both datasets.

优点

The paper is easy to follow and well-structured.
It is interesting to explore a refinement method for pseudo-labels that has been rarely discussed in the literature. 2-1. The proposed pixel-selection and propagation concept is simple yet intriguing to the reviewer since this kind of refinement is somewhat novel, as far as I know.
The experiments, including the appendix, are thorough and well-designed, providing comprehensive results across all settings.
The listed performance demonstrates the effectiveness of the proposed method, significantly improving performance compared to the baseline.

缺点

The main concern of the proposed work lies in its case analysis. According to the algorithm outlined in Section 3.2.1, the refinement process heavily relies on neighboring pixel predictions. However, we can identify two common failure scenarios in practice:

The model may mispredict the majority of interesting regions (e.g., labeling a sofa as a chair or a car as a bus), rendering it unable to refine its predictions with neighboring pixels.
In the case of boundary regions, the neighboring pixels may exhibit similar confidence values (lack of confidence). In such instances, the reviewer considers that the proposed method may not perform effectively in these areas.

Additionally, it would be beneficial to conduct another ablation study involving the propagation of pseudo-labels based on a k-NN (k-Nearest Neighbors) propagation algorithm with various pixel selection strategies, such as including all neighboring pixels (after filtering out those with low confidence) or other strategies.

问题

The proposed method is nor working well for the evaluation set. What are the author's reasonable explanations for this?
The refinement process appears to be ineffective in the later training period, as indicated by Figure 4-b. What is the reason for this phenomenon?

2023-11-20

Thank you for your review and your positive feedback! In this answer, we will address the points you raised in your review, and we will update the paper accordingly to all the reviews later during the discussion period.

Q1: The model may mispredict the majority of interesting regions (e.g., labeling a sofa as a chair or a car as a bus), rendering it unable to refine its predictions with neighboring pixels.

A1: Our method indeed relies on at least partial success of the refined semantic segmentation; it can’t and shouldn’t fix misclassifications but rather refines the probabilities to improve the pseudo-labeling. Thus, while the described failure mode is possible, we believe tackling it is outside of the scope of this work.

Q2: In the case of boundary regions, the neighboring pixels may exhibit similar confidence values (lack of confidence). In such instances, the reviewer considers that the proposed method may not perform effectively in these areas.

A2: The proposed method does not introduce improvements that tackle the problem of boundaries in particular; as such, the problem of low confidence in boundaries remains. Using the Boundary IoU [1] metric, we observe a degradation from 19.3 to 18.2 compared to FixMatch using 183 annotated images from PASCAL VOC 12. Importantly, our method capitalizes on the advantages of spatial continuity to enhance the performance as measured by the mIoU metric. We believe that the overall benefits our method brings to spatial coherence significantly outweigh the minor compromises in pseudo-label quality along boundaries.

Q3: Additionally, it would be beneficial to conduct another ablation study involving the propagation of pseudo-labels based on a k-NN (k-Nearest Neighbors) propagation algorithm with various pixel selection strategies, such as including all neighboring pixels (after filtering out those with low confidence) or other strategies.

A3: We appreciate your insightful observation. We are actively conducting experiments to explore the use of kNN with deep features or prediction probabilities to identify the optimal selection strategy for neighboring pixels. Since our current method chooses neighbors that enhance the confidence of each class, something we cannot do with kNN, the forthcoming results from our experiments may yield valuable insights, potentially uncovering new perspectives and avenues for improvement.

Q4: The proposed method is not working well for the evaluation set. What are the author's reasonable explanations for this?

A4: Could you clarify which evaluation set you have in mind and what you mean by “not working well”? S4MC consistently improves underlying baseline methods and shows state-of-the-art results in many settings. Indeed, for ’coarse’ PASCAL VOC 2012 val, S4MC results are worse than PCR, but this can be attributed to the significant advantage of PCR compared to methods used for S4MC and not the poor performance of S4MC itself.

Q5: The refinement process appears to be ineffective in the later training period, as indicated by Figure 4-b. What is the reason for this phenomenon?

A5: Since we utilize DPA during later training stages, the share of pixels used for pseudo-labeling increases until it achieves 100% at the end of training. This does not inherently improve classification accuracy; instead, it merely changes which examples make the cut for training. While there are fewer examples added by the refinement process in later stages, these examples, on average, carry more information since they were severely misclassified; it is hard to judge whether this is more or less efficient. Of course, when all pixels pass the threshold, refinement has no effect.

We will happily answer additional questions; if we address your concerns, we hope you will reflect it in your final rating.

[1] Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov: Boundary IoU: Improving Object-Centric Image Segmentation Evaluation, CVPR 2021.

2023-11-22

The author have properly addressed my concerns; therefore I'll keep my rating as 6 'marginally above the acceptance threshold'

2023-11-22

Dear Reviewer,

We are happy that you are satisfied with our response.

Regards,

The authors

审稿意见

评分: 5置信度: 32023-11-04

The paper tackles the problem of semi-supervised semantic segmentation by introducing the S4MC (Semi-Supervised Semantic Segmentation via Marginal Contextual Information) method, which enhances the use of pseudo-labels by considering the spatial correlation among neighboring pixels, rather than treating each pixel in isolation. The confidence-based pseudo-label refinement (PLR) module exploits neighboring pixels (3x3 grid) to adjust per-class predictions, whilst the Dynamic Partition Adjustment (DPA) module gradually lowers the threshold after each training iteration, increasing the number of propagated pseudo-labels (predictions on unlabeled data) without sacrificing quality. Extensive ablative studies justify the authors' design decisions and prove the effectiveness of the approach compared to other state-of-the-art SSL methods on popular benchmarks, such as PASCAL VOC 2012 and Cityscapes.

优点

Originality: The approach offers some degree of novelty - filtering low-confidence predictions by using the context around the pixel, rather than the pixel in isolation (current sota approaches). The contribution is relevant to an actual problem, it increases the use of unlabeled data. Quality: The method is sound and thoroughly explained. Experiments prove the effectiveness of the approach when applied on top of state-of-the-art methods with a negligible added computational cost. Clarity: The paper is an interesting read, well-structured, well-detailed, and very easy to understand (fairly enjoyed reading it). Significance: The results offer marginal improvements only in some scenarios, compared to state-of-the-art methods.

缺点

I would suggest changing the main figure of the paper (the elements within the figure are way too small and hard to follow).
The contribution is not groundbreaking. The +1.29 mIoU gain on PASCAL VOC 2012 and +1.01 mIoU improvement on Cityscapes declared at the beginning of the paper are not backed up by the numbers in the table (Table 1 and Table 3). The numbers in the tables show that the method is not robust enough to offer a consistent improvement in all tested scenarios.
The biggest weakness of the paper is Section 4.3 (the ablation studies) and Tables 4 & 5. The text states that the experiments were conducted using the CutMix-Seg framework, but I could not find the numbers in the previous tables. Also in Table 5, the caption states that the numbers are for FixMatch. The text, the numbers, and the tables do not correspond, this part needs further clarification (or another check) because it confuses me the most.
Low range of datasets, more experiments that include more varied and challenging scenarios to better understand the method's limitations.

问题

There are no insights as to why the best window for the used contextual information is in a 3x3 range - this actually suggests that the context is not used properly, or what is actually causing this degradation in performance when more neighboring pixels are used?

2023-11-20

Thank you for your review! We are happy to know that you enjoyed reading the paper. We would like to answer some of the concerns and clarify the points raised in weaknesses. We will update the paper accordingly to all the reviews later during the discussion period.

Q1: I would suggest changing the main figure of the paper (the elements within the figure are way too small and hard to follow).

A1: We appreciate your feedback and have increased the size of the elements in the main figure for better clarity.

Q2: The contribution is not groundbreaking. The +1.29 mIoU gain on PASCAL VOC 2012 and +1.01 mIoU improvement on Cityscapes declared at the beginning of the paper are not backed up by the numbers in the table (Table 1 and Table 3). The numbers in the tables show that the method is not robust enough to offer a consistent improvement in all tested scenarios.

A2: We agree with this observation; however, we do not believe that being groundbreaking should be a criterion for acceptance. Our paper offers meaningful advancements by incorporating an auxiliary approach that enhances existing approaches to semi-supervised semantic segmentation. We appreciate you highlighting the typo regarding the mIoU gains on PASCAL VOC 2012; we will correct this and reaffirm that Table 3 reflects a +1.01 mIoU improvement on Cityscapes with our method.

The figure of +1.01 mIOU improvement on Cityscapes with 186 annotated images can be seen in Table 3 (75.99 mIoU for UniMatch and 77.0 for S4MC).

The figure of mIoU improvement on Pascal VOC 2012 with 366 annotated images, as shown in Table 1 (77.7 UmIoU for UniMatch and 79.09 for S4MC), is +1.39; there has been a typo in the abstract, and we thank you for pointing it out.

Q3: The biggest weakness of the paper is Section 4.3 (the ablation studies) and Tables 4 & 5. The text states that the experiments were conducted using the CutMix-Seg framework, but I could not find the numbers in the previous tables. Also, in Table 5, the caption states that the numbers are for FixMatch. The text, the numbers, and the tables do not correspond; this part needs further clarification (or another check) because it confuses me the most.

A3: We agree that this part may appear confusing, and we will improve the presentation in the paper. As reported in Section 4.3, ablations of Table 4 were performed on PASCAL VOC 2012 with 366 labeled examples (1/4) using CutMix-Seg+S4MC; the best-performing setting achieves 75.41 mIoU; the same number appears in Table 1. Table 5 ablations were performed on Pascal VOC 2012 with SBD examples and 5291 labeled examples; our result in Table 5 matches the corresponding result in Table 2. The vanilla FixMatch result (first row of Table 5) is indeed wrong; the correct number is 77.46 (as in Table 2), and we will update it. We will also update the captions of the referred table to avoid confusion.

Q4: Low range of datasets, more experiments that include more varied and challenging scenarios to better understand the method's limitations.

A4: We have extended our experiments to include the COCO dataset, reinforcing the robustness of our method with improvements in state-of-the-art comparisons across 4 out of 5 dataset partitions. Additional details regarding the performance variance will be provided in the revised manuscript. We would like to emphasize again that our method enhances existing SSL approaches. COCO is not an established benchmark for semi-supervised semantic segmentation due to its complexity: it has 80 classes over 118k images, and our approach achieves a gain of 1.5 mIoU using only 1/256 annotated data.

Q5: There are no insights as to why the best window for the used contextual information is in a 3x3 range - this actually suggests that the context is not used properly, or what is actually causing this degradation in performance when more neighboring pixels are used?

A5: Aggregating long-distance relations in a visual task is usually embedded into the architecture, such as dilated convolution and attention mechanism. Our approach tries to compensate mostly for the lack of spatial continuity in pseudo-labels. Since we resize PASCAL images to 321x321, looking at close neighbors is sufficient. It is likely that for higher resolutions, a larger neighborhood is required.

We will be happy to answer additional questions; if we managed to address your concerns, we hope you will reflect it in your final rating.

2023-11-23

评论- Summary of the revision

2023-11-22

We thank the reviewers for their diligent assessment of our manuscript.

We appreciate your recognition of our contribution's novelty (xqtS) and your satisfaction with its clarity and comprehension (HMfh, P9tM).

We have tried to address the concerns and inquiries raised in this review. The valuable insights provided have played a pivotal role in enhancing the overall quality of our work. Specifically, new dataset experiments, experiment statistics, and insights about the quality of our method around the boundaries reflect our commitment to refining and advancing our research contributions.

We uploaded a modified version of our manuscript. Note that the table and figure numbering in the new paper version have changed due to the added table.

These modifications include:

We evaluated S4MC on a more challenging COCO dataset, which comprises 80 object classes. We observed a modest yet consistent improvement, similar to the improvement on PASCAL and CityScapes. Notably, we gained 1.5 IoU using only 1/256 annotated data (Table 4).
We added noise variance over some experiments to show the statistical significance of our method (Table 1), and we conducted more experiments considering different backbones (Table A.1), showing consistent improvement to the prior art.
We evaluated the influence of our method on boundaries using the boundary IoU metric and elaborated on our method's weaknesses related to that aspect (Table A.3)
We added qualitative results of a model trained with S4MC on the COCO dataset with ½ of the labeled images, with Xception-65 as in Table 4, and visualizations in Figure H.2.
We explored the alternative methodology of kNN to choose neighboring pixels for the refinement mechanism.
Multiple improvements in clarity, extended descriptions, and larger fonts in Fig. 2.

We hope that these changes address reviewers’ concerns and will be happy to further clarify unclear parts. We would be grateful if reviewers whose concerns have been addressed consider revising their scores accordingly.

AC 元评审

2023-12-15

After the rebuttal, three reviewers still have negative comments. The major issues are: (1) The proposed model heavily relies on neighboring pixel predictions. (2) The comparison with SOTA methods is difficult due to different experimental setups, different backbones and different segmentation architectures. (3) The performance improvement is limited. After reading the comments, the AC cannot recommend to accept this paper and encourage the authors to take the comments into consideration for their future submission.

为何不给更高分

Please see the detailed comments

为何不给更低分

Please see the detailed comments

最终决定Reject

2024-01-16

Reject