PaperHub
5.5
/10
Rejected4 位审稿人
最低5最高6标准差0.5
5
6
5
6
4.0
置信度
正确性2.5
贡献度2.3
表达3.0
ICLR 2025

Seeking Flat Minima with Mean Teacher on Semi- and Weakly-Supervised Domain Generalization for Object Detection

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-05

摘要

关键词
object detectiondomain generalizationsemi-supervised learningweakly-supervised learning

评审与讨论

审稿意见
5

Object detectors struggle with domain gaps between training and testing data, but this issue can be mitigated using semi-supervised (SS-DGOD) and weakly-supervised domain generalizable object detection (WS-DGOD) approaches. These methods require labeled data from only one domain and utilize unlabeled or weakly-labeled data from multiple domains, reducing annotation costs. The authors demonstrate that the Mean Teacher learning framework, where a student network is trained using pseudo labels generated by a teacher network, effectively addresses both SS-DGOD and WS-DGOD settings. Additionally, the authors show that adding a simple regularization method to the Mean Teacher framework can lead to flatter minima in the parameter space, further improving detector performance across different domains.

优点

  1. This paper is well written and organized.

  2. The two settings SS-DGOD and WS-DGOD are reasonable.

  3. The performances are quite good.

缺点

  1. Limited novelty: All adopted techniques including Mean Teacher, Flat minimal interpretation are existing technique.

  2. The performances seems strange. The performances of the proposed methods better than DGOD, Oracle and SOTA UDA-OD in Table 7. Can the authors provide more explanation. The authors could provide statistical significance tests for the performance differences and discuss potential reasons why their method outperforms DGOD and Oracle, which theoretically should have advantages.

问题

Please see Weakness.

评论

Limited novelty

The response to this point has been addressed in our reply to Reviewer LbNL. We kindly ask Reviewer Poom to refer to that section.

The authors could provide statistical significance tests for the performance differences and discuss potential reasons why their method outperforms DGOD and Oracle

Because it takes nearly one day with four A100GPUs to train the detector, it is not feasible to perform a statistically significant number of trials for model training. Instead, in this rebuttal, we trained the models with three different random seeds, as shown in Table below. Note that we fixed the random seed to seed1 throughout the previous experiments in our paper. GaussianFRCNN+EMA+PL+Regul. slightly outperforms DGOD and Oracle across all random seeds.

mAP on watercolor
settingmethodseed1seed2seed3
WS-DGODGaussian FasterRCNN+EMA+PL+Regul.62.959.961.7
DGODGaussian FasterRCNN62.655.658.7
OracleGaussian FasterRCNN62.259.560.4

The potential reason is that DGOD and Oracle were trapped in sharp local minima due to simple supervised learning (i.e., ERM), while GaussianFRCNN+EMA+PL+Regul. reached flat minima. It has been shown that even when both the train and test sets are from the same domain, there is a slight shift between the train loss and test loss, and falling into a sharp valley can decrease performance [A]. Fig. 9 in Appendix A.5 of the revised manuscript shows the comparison of the flatness similar to that in Sec. 7.5. We observe that GaussianFRCNN+EMA+PL+Regul. on WS-DGOD achieved a flatter solution than DGOD and Oracle.

[A] Averaging Weights Leads to Wider Optima and Better Generalization, Izmailov et al., UAI, 2018.

评论

Dear Reviewer poom,

We would like to gently remind you that there are approximately two days remaining for the discussion period. If possible, could you kindly confirm that you have reviewed the rebuttal and let us know if there are any remaining concerns regarding our work? We truly appreciate your time and effort in reviewing our paper and providing valuable feedback.

审稿意见
6

This paper delves into Semi-Supervised Domain Generalizable Object Detection and Weakly-Supervised Domain Generalizable Object Detection. This paper addresses these two tasks using Mean Teacher framework, and provide interpretations from the perspective of flat minima. Based on this interpretation, this paper provides two tricks to achieve flatter minima, include providing weak data augmentation for the student model during Mean Teaching, and generating pseudo labels for object detector without post-processing. Experiments validate the effectiveness of the proposed method.

优点

  • This paper is well motivated.
  • Theoretical analysis is provided.
  • The proposed method is simple yet effective.
  • The supplementary material is sufficient and abundant.

缺点

  • Related work about Semi-Supervised Domain Generalization (SSDG). The listed previous work of SSDG in Section 3.3 assumed that there are labeled and unlabeled data in each training domain. And this paper assumes that there only one labeled training domain while the other training domains are all unlabeled (or weakly labeled). The SSDG setting of this paper is more similar to [1].

[1] Semi-Supervised Domain Generalization with Evolving Intermediate Domain. PR 2023.

  • The interpretations of model generalization from the perspective of flat minima had appeared in many previous works, such as [2,3], to name a few.

[2] SWAD: Domain Generalization by Seeking Flat Minima. NeurIPS 201.

[3] Exploring Flat Minima for Domain Generalization with Large Learning Rates. 2023.

  • Although the interpretations of Mean Teacher from flat minima is inspirable, the proposed method and the corresponding two tricks are kind of lacking technique novelty. As mentioned above, there had been many previous works providing tricks to improve generalization by seeking flat minima. Had you ever tried other existing tricks to improve flat minima, such as Sharpness-Aware Minimization (SAM) regularization?
  • Although Table 2 validates the effectiveness of the proposed method, had you ever tried any other Semi-Supervised or Weakly-Supervised learning methods in the proposed two settings, SSDGOD and WSDGOD, so as to set up more baselines and highlight the superiority of the proposed method?

问题

See the weaknesses.

评论

The SSDG setting of this paper is more similar to [1]

We agree with this comment. As discussed in Appendix D.1, there are two types of settings in semi-supervised domain generalization. The first setting assumes that only a portion of the samples in each domain are labeled, similar to the previous works listed in Section 3.3. The second setting assumes that only a portion of the source domains are labeled, as in [1]. In this paper, we followed the previous SS-DOGD work (i.e., CDDMSL) and tackled the second setting. We have added the citation of [1] and clarified this point in Sec. 2 in the revised manuscript. Tackling the first setting is part of our future work.

The interpretations of model generalization from the perspective of flat minima had appeared in many previous works, such as [2,3], to name a few.

As pointed out, the flat minima theory that explains the model generalization from the perspective of flat minima had appeared in previous works. However, different from the flat minima theory itself, we provide novel findings that the Mean Teacher learning leads to flat minima and the interpretation of the reasoning behind it, which are novel and our main contribution. On the basis of the flat minima theory and our interpretations, we explained the reason of the detectors trained with the Mean Teacher learning framework achieve robustness to unseen test domains.

Had you ever tried other existing tricks to improve flat minima, such as Sharpness-Aware Minimization (SAM) regularization?

In this rebuttal, we conducted the experiments to compare with SAM. The results are shown in Table 12 in Appendix A.9 of the revised manuscript. By comparing GaussianFRCNN+EMA+PL+SAM and GaussianFRCNN+EMA+PL+Regul., our regularization outperforms the SAM. In addition, since the SAM is an optimizer and can be used instead of SGD, it is compatible with our regularization. We can see that GaussianFRCNN+EMA+PL+Regul.+SAM achieved the best performance.

Had you ever tried any other Semi-Supervised or Weakly-Supervised learning methods in the proposed two settings, SSDGOD and WSDGOD?

Although there are previous semi-supervised domain generalization methods proposed for image classification, extending them to object detection is non-trivial and they cannot be directly applied to SS-DGOD. Therefore, we compared our method with CDDMSL in our experiments, which is the previous work specifically tailored for SS-DGOD.

Regarding weakly-supervised domain generalization, there are no previous methods, even for the image classification task, because weak labels cannot be defined in image classification

评论

Dear Reviewer gcu5,

We would like to gently remind you that there are approximately two days remaining for the discussion period. If possible, could you kindly confirm that you have reviewed the rebuttal and let us know if there are any remaining concerns regarding our work? We truly appreciate your time and effort in reviewing our paper and providing valuable feedback.

评论

I have read the rebuttal, and tend to keep my original score.

评论

Thank you very much for your comment. If you have any remaining concerns, we would greatly appreciate it if you could kindly share them with us. We would be delighted to engage in further discussion.

审稿意见
5

The paper focuses on domain generalization for object detection where the domain differences between the training and testing domains is huge. The authors introduce two new setups: semi-supervised domain generalizable object detection (SS-DGOD) and weakly-supervised DGOD (WS-DGOD), where labeled data from one domain and either unlabeled or weakly-labeled data from multiple domains are used for training. The authors suggest a Mean Teacher learning framework with a student-teacher model that uses pseudo-labels to find flat minima in the parameter space, which helps improve the model's ability to generalize. The proposed method shows better object detection performance on unseen domains through a simple regularization technique that creates flatter minima.

优点

  1. The paper introduces two new settings for domain generalizable object detection: semi-supervised DGOD (SS-DGOD) and weakly-supervised DGOD (WS-DGOD), which have not been actively explored before. These settings are novel approaches to domain generalization, which is not commonly applied in object detection research.

2.The paper also provides theoretical reasoning for why the Mean Teacher learning framework works well in SS-DGOD and WS-DGOD settings, particularly in terms of finding flat minima.

  1. Experiments are conducted to validate the effectiveness of the proposed method, showing improved robustness across various domains.

缺点

The major concern of this paper is its incremental novelty. The concept of domain-generalizable object detection is interesting, but the proposed method lacks significant novelty. The major idea—regularizing the two networks to produce more similar outputs, also known as consistency regularization—is a widely used framework in the mean-teacher student approach (e.g., FixMatch). Although the authors slightly modify the pipeline by using weakly augmented inputs for both networks, the reviewer cannot find any substantial technical innovation in the proposed method.

问题

  1. What is the difference between recent mean-teacher based semi-supervised methods and the proposed method?
评论

The proposed method lacks significant novelty

As written in Sec. 1, it is noteworthy that our aim is not to propose an entirely new method or surpass the state-of-the-art methods. Instead, our contributions are summarized as follows:

  • We show that object detectors can be effectively trained on the SS-DGOD and WS-DGOD settings with the same Mean Teacher learning framework.
  • We provide interpretations of why the detectors trained with the Mean Teacher learning framework achieve robustness to unseen test domains in terms of the flatness of minima in parameter space, based on our novel finding that the Mean Teacher learning framework leads to flat minima.
  • On the basis of the interpretations, we introduce a simple regularization method into the Mean Teacher learning framework to achieve flatter minima.
  • We are the first to tackle the WS-DGOD setting.

As Reviewer LbNL pointed out, regularizing the two networks to produce more similar outputs is a widely used framework. However, we have revealed why this framework works well and improves performance based on our novel interpretations, which is our main contribution. Since this framework is widely used, our novel interpretations are likely to have a broad impact.

As Reviewer poom pointed out, each of the Mean Teacher and flat minima theory (the relationship between the model generalization ability and flatness of the solution in loss landscapes) are already established concepts, separately. Different from them, our main contribution lies in the novel finding that the Mean Teacher framework leads to the flat minima and the interpretation of the reasoning behind it. On the basis of the flat minima theory and our interpretations, we explained the reason of the detectors trained with the Mean Teacher learning framework achieve robustness to unseen test domains. Because the Mean Teacher has been used across various tasks, our novel interpretation is likely to have a broad impact across a wide range of tasks.

What is the difference between recent mean-teacher based semi-supervised methods and the proposed method?

Although the technical novelty of the regularization is not our main contribution, there are technical differences from recent Mean Teacher-based methods. For example, unlike Harmonious Teacher [Deng, CVPR 2023], which regularizes the consistency between the classification and localization scores, our regularization encourages consistency between the raw outputs from the teacher and student. Unlike SSDA-YOLO [Zhou, CVIU 2023], we use the raw output without post-processing to produce more similar outputs. More importantly, our contribution lies in providing a new interpretation of why such consistency enhances robustness to unseen domains.

评论

Dear Reviewer LbNL,

We would like to gently remind you that there are approximately two days remaining for the discussion period. If possible, could you kindly confirm that you have reviewed the rebuttal and let us know if there are any remaining concerns regarding our work? We truly appreciate your time and effort in reviewing our paper and providing valuable feedback.

评论

Thank you for your response. Despite the clarifications and additional explanations provided by the authors, I still believe that the major issues with the paper have not been adequately addressed, and therefore, I am maintaining my original score of 5 at this point. Below, I provide a detailed explanation of my reasoning.

  • Lack of Technical Novelty: The authors have emphasized that their goal is not to introduce a new method or surpass state-of-the-art results but rather to offer an interpretation of the effectiveness of the Mean Teacher learning framework in SS-DGOD and WS-DGOD settings. However, I remain unconvinced that this interpretation provides a sufficiently novel contribution. The conclusion that "finding flat minima is advantageous for domain generalization" is already well-documented in the literature, and the connection between flat minima and generalization is well-established. While I understand the authors' intent to explore why Mean Teacher leads to flat minima, I am skeptical about whether this provides substantial new insight beyond the existing theoretical framework. The contribution seems largely limited to showing that this known effect also holds in the specific context of SS-DGOD and WS-DGOD, which may not be broadly impactful.

  • Limited Differentiation from Existing Methods: The authors have explained the technical differences between their approach and existing Mean Teacher-based methods, such as Harmonious Teacher and SSDA-YOLO. However, I still find these differences to be incremental rather than novel. For example, the modification of "omitting post-processing and using raw outputs" is not convincingly shown to be a decisive factor in improving performance. The proposed regularization strategy does not seem significantly distinct from existing consistency strategies to be considered an innovative advancement. I acknowledge that the authors argue that their primary contribution is the interpretation rather than the technique, but this makes the technical novelty less compelling as an independent contribution.

If there are any key points I may have misunderstood or misrepresented about your paper or your argument, please feel free to let me know.

评论

Dear Reviewer LbNL,

Thank you very much for sharing your remaining concerns. We would like to address them below.

Lack of technical novelty

As Reviewer LbNL pointed out,

The conclusion that "finding flat minima is advantageous for domain generalization" is already well-documented in the literature, and the connection between flat minima and generalization is well-established.

However, how to find flat minima remains an area for further exploration. Although there are existing works, such as sophisticated weight averaging methods like SWAD and optimization techniques like SAM, we demonstrated that the Mean Teacher framework can be another viable approach for finding flat minima. Additionally, in Table 12, we showed that the Mean Teacher and regularization are compatible with SAM because they are different types of approaches for achieving flat minima. Therefore, we believe there is substantial new insight in our work that extends beyond existing methods.

The contribution seems largely limited to showing that this known effect also holds in the specific context of SS-DGOD and WS-DGOD, which may not be broadly impactful.

Regarding this point, in Table 2, we showed that the simple regularization based on our interpretation works well even in the UDA-OD, which is a popular task setting. Therefore, we believe that our interpretation has a broad impact.

Limited Differentiation from Existing Methods

the modification of "omitting post-processing and using raw outputs" is not convincingly shown to be a decisive factor in improving performance.

We disagree with this point. In Table 4, we showed that omitting post-processing and using raw outputs significantly improved performance. Although the technical difference may not be large, this modification based on our interpretation is the key and has a significant impact on achieving better performance.

审稿意见
6

This paper focuses on how to generalize the object detectors from the source domain to the target domain while the label of target domain is unavailable. Two settings: semi-supervised and weakly-supervised domain generalizable object detection (SS-DGOD and WS-DGOD) are defined. Based on these settings, the authors propose a mean-teacher-based framework, and make theoretically interpretation on how the mean-teacher-based framework works. A regularization term is added to the framework based on the interpretation, which improves the performance of both SS-DGOD and WS-DGOD.

优点

  1. This paper is well-written and easy to understand.
  2. Theoretical interpretation is given to make the readers better understanding the mean-teacher frameworks and the proposed approach.
  3. The proposed approach with the regularization term outperforms the SOTA and the baseline method. Ablation studies are sufficient.

缺点

  1. The baseline method under the SS-DGOD significantly outperforms the SOTA method CDDMSL (in Table 2). For “watercolor” and “clipart”, the margins are larger than 10% mAP50. It seems that ”EMA” and “PL” play a vital role in the proposed framework, which however are common tricks under semi-supervised settings. Is it possible to compare the proposed method with “CDDMSL+EMA+PL”, or even “CDDMSL+EMA+PL+Regul.”? If so, the comparison could be more convincing, while the generalization ability to vary detection frameworks of the proposed approach is also validated.
  2. In Figure4, both of the input and output of the framework are modified to make sure that the teacher and student produce the similar predictions. Does this modification increase the risk of network collapse, making it more difficult or slower to train the model?

问题

I think the quality of this paper is good, and I am glad to see the response of the weaknesses above.

评论

Comparison with CDDMSL+EMA+PL

Thank you for the insightful comment. However, integrating the Mean Teacher framework into CDDMSL to train "CDDMSL+EMA+PL" is challenging due to CDDMSL's already complex network structure. It includes a detection backbone, a RegionCLIP backbone, a ClipCap-based vision-to-language module, and a detection head with an RPN, along with four types of loss functions: detection loss (classification and regression for the detection head and RPN), distillation loss via the v2l module, and instance- and image-level contrastive losses through the projection head and v2l module. Adding the Mean Teacher framework requires duplicating parts of these networks, at least the detection backbone and head, and deciding which parts to duplicate is not straightforward. Additionally, we need to introduce an unsupervised loss and fine-tune the hyperparameters for Mean Teacher. This complexity makes integrating the Mean Teacher framework into CDDMSL difficult.

It is important to reiterate that our goal is not to propose a new method that outperforms state-of-the-art methods. Therefore, instead of comparing with CDDMSL+EMA+PL, we validated the generalization ability of the regularization by conducting experiments with another detector that has a significantly different network design. In these experiments, we used a Transformer-based backbone (Swin-T) with the feature pyramid network, although the detection head was not changed. The results are shown in Table 11 in Appendix A.8 of the revised manuscript. We can see that the regularization improves the performance, which validates the generalization ability.

Does the regularization make it difficult or slow to train the model?

We have added Fig. 8 in Appendix A.4, which shows the mAP on the validation set during training. The regularization does not make the training process more difficult or slower. On the contrary, the regularization helps alleviate overfitting (i.e., less decrease in the validation mAP), stabilizing the training.

评论

Dear Reviewer ePD2,

We would like to gently remind you that there are approximately two days remaining for the discussion period. If possible, could you kindly confirm that you have reviewed the rebuttal and let us know if there are any remaining concerns regarding our work? We truly appreciate your time and effort in reviewing our paper and providing valuable feedback.

评论

Thank you for your response. The reply regarding Q2 is satisfactory, and I appreciate the authors' efforts to conduct additional experiments for Q2 within the tight rebuttal timeline. As claimed by the authors, if the goal of this paper is not to surpass the SOTA, then the proposed method needs more validation of its generalization capabilities. The experiment conducted during the rebuttal phase, where the authors replaced the detector/backbone and compared the proposed method with the baseline, is highly significant.

However, from the experimental results, the performance improvement after replacing the detector/backbone is relatively limited (+0.4 mAP50), which is much smaller than the gains reported in Table 2.

In summary, while I think the overall quality of the paper is commendable, the focus on methodology rather than achieving SOTA performance means the potential for generalization across detectors/backbones remains to be fully demonstrated. Therefore, I will maintain my original score.

评论

We would like to thank Reviewer ePD2 for the constructive comment. The remaining concern of Reviewer ePD2, regarding the limited improvement with the Swin-T-based detector, is due to the fact that the hyperparameters were not tuned. For the experiments in Table 11, we used the same hyperparameters as the ResNet101-based detector and did not tune them at all. As a result, the performance of the baseline detector was inherently limited, which led to the limited improvement from the regularization.

To validate this claim, we conducted additional experiments with tuned hyperparameters. Specifically, we increased the batch size from 16 to 32, as we found that this change improves the baseline performance. Additionally, we used dropout instead of drop path in the SwinTransformer blocks to stabilize the training. The results are shown below. We can see that this hyperparameter change significantly improved the baseline performance (53.0 -> 53.9), and consequently, the regularization further boosted the performance (53.9 -> 55.0). This (+1.1 mAP50) improvement demonstrates the generalization of the regularization across different types of detectors. Although this improvement is a little smaller than that in Table 2 (+1.6 mAP50), we believe that further hyperparameter tuning will lead to even greater performance gains, as we only adjusted two hyperparameters in this experiment.

Before hyperparameter-tuning (Table 11)

settingmethodbackbonemAP50 on watercolor
SS-DGODGaussianFasterRCNN+EMA+PLSwin-T+FPN53.0
SS-DGODGaussianFasterRCNN+EMA+PL+Regul.Swin-T+FPN53.4

After hyperparameter-tuning

settingmethodbackbonemAP50 on watercolor
SS-DGODGaussianFasterRCNN+EMA+PLSwin-T+FPN53.9
SS-DGODGaussianFasterRCNN+EMA+PL+Regul.Swin-T+FPN55.0
评论

We would like to thank all the reviewers for their valuable comments and efforts in reviewing this paper. We have revised the manuscript pdf and posted our response to each reviewer’s comment. The revised points are highlighted in blue texts. We appreciate the careful consideration of our responses and would be grateful if the review could be conducted. If any aspects remain unclear or require additional discussion, we would be glad to engage in further dialogue.

评论

Dear Reviewers,

We would like to kindly remind you that there is approximately one day left for Authors to update the PDF. Could you please inform us if there are any remaining concerns regarding our work?

We sincerely thank you for your time and effort in reviewing our paper.

Authors

AC 元评审

Summary of the paper: This paper considers domain generalization in object detection, where the goal is to generalize object detectors from a source domain to a target domain without using target domain labels. Two new settings are introduced: Semi-Supervised Domain Generalizable Object Detection (SS-DGOD) and Weakly-Supervised Domain Generalizable Object Detection (WS-DGOD), which utilize labeled data from one domain and either unlabeled or weakly-labeled data from multiple domains. The key idea is to leverage a Mean Teacher framework, where a student network is optimized using pseudo-labels generated by a teacher network, and incorporate a theoretical interpretation highlighting the role of flat minima in enhancing generalization. To further improve performance, a regularization term promotes flatter minima. Experimental results demonstrate the effectiveness of the proposed approach, achieving robust domain generalization while reducing annotation costs.

Strengths: The reviewers highlight the major strengths of the paper as 1) theoretical reasoning that strengthens the methodology; 2) strong experimental results demonstrating that the proposed approach consistently outperforms state-of-the-art methods and baselines across various domains; 3) a well-written and well-motivated presentation.

Weaknesses and missing in the submission: There is a general consensus on the weaknesses and limitations of the paper among the reviewers, in particular limited technical novelty, unclear motivation, and insufficient experimental justification of the proposed method. While the settings of SS-DGOD and WS-DGOD are novel, the methodology primarily relies on existing techniques, such as the Mean Teacher framework and consistency regularization, which are widely used in semi-supervised learning (e.g., FixMatch). The added contributions, including weak augmentation and regularization for flat minima, are perceived as minor modifications that lack substantial technical innovation. Furthermore, the interpretation of model generalization through flat minima aligns with prior works such as SWAD and SAM, raising questions about the originality of the insights provided. Concerns were also raised regarding experimental validation, including the need for comparisons with stronger baselines and other semi-supervised or weakly-supervised learning methods in the proposed SS-DGOD and WS-DGOD settings, and more in-depth analysis of the results like potential risk of network collapse.

The discussion phase featured comprehensive exchanges between the authors and reviewers, with a particular focus on clarifying the contributions and providing additional evaluations. However, the reviewers remained unconvinced, and the concerns they raised were ultimately not resolved.

审稿人讨论附加意见

The reviewers maintained their initial assessment, citing weak experimental results and a lack of novelty in the proposed work. By the end of the discussion phase, the reviewers provided borderline ratings, with Reviewers Poom and LbNL remaining unconvinced.

In particular, during the rebuttal phase, the authors clarified that the primary goal of the paper was to provide an interpretation of the effectiveness of the Mean Teacher learning framework in SS-DGOD and WS-DGOD settings. However, reviewers (such as Reviewers Poom and LbNL) found this interpretation insufficiently novel, as many aspects of this idea have already been widely recognized and well-established in the literature. As a result, the work offers limited new insights and has a limited broad impact on the field. The proposed regularization strategy does not appear to be substantially distinct from existing consistency-based strategies to be considered an innovative advancement. Given this limited technical contribution, another way to strengthen the paper would be to demonstrate the broad applicability of the proposed strategy. Unfortunately, while the rebuttal provides some additional information, the argument for its general applicability remains underdeveloped and not compelling enough to address the concerns raised.

Given the reviewers’ comments and recommendation, the area chairs weigh the weaknesses raised by the reviewers over the current merits. The area chairs think the paper has great potential and implications for future research. By addressing the remaining concerns, particularly those related to weak experimental results and broad applicability, the paper would improve a lot in the next cycle.

最终决定

Reject