4.0

/10

withdrawn4 位审稿人

最低3最高5标准差1.0

4.5

置信度

正确性2.0

贡献度2.5

表达2.0

ICLR 2025

RouGE: Learning Gated Experts for Segment Anything in the Wild

Yizhen Guo,Hang Guo,Tao Dai,Bin Chen,Ruisheng Luo,Shu-Tao Xia

OpenReview PDF

提交: 2024-09-26更新: 2024-11-13

TL;DR

We propose a novel plug-in module called RouGE, using paramter-efficient method to directly enhance the robustness of pre-trained SAM-based model against real-world degradation, empowering an All-in-One segmentation model.

摘要

关键词

PEFT，real-world degradation， segment anything model， robustness enhancement

评审与讨论

审稿意见

评分: 5置信度: 42024-10-31

This paper introduces RouGE, a Mixture-of-Experts (MoE) structure designed to enhance the robustness of SAM-based models in handling diverse degraded images, such as low-light, rainy, or blurred conditions. RouGE employs multiple probability gates to assess and adjust features as needed, using low-rank experts to handle different conditions without impairing SAM’s original capabilities. Experiments show that RouGE achieves state-of-the-art performance on both degraded and clean images while tuning only 1.5% of parameters.

优点

Using a Mixture-of-Experts (MOE) to efficiently enhance the robustness of the foundation model is a sensible idea.
The qualitative experiments in both the main text and the appendix are comprehensive, clearly demonstrating the performance improvements of the RouGE model.

缺点

Regarding Figures 2 and 3 in the motivation, does the qualitative and quantitative performance degradation of the SAM model occur only on synthesized degraded images? What would the results be if tested on degraded images from the real world? Can the SAM foundation model achieve good results on real-world degraded images?
Following up on the first question, would it be possible to add all synthesized degraded images to the SAM foundation model's pretraining dataset to directly improve its performance on degraded images?
During training, RouGE uses synthetic image pairs of degraded and clean images. Are the degraded images sourced from the real world or generated? Additionally, how does RouGE perform when real-world degraded images are used during testing?
The design of the probability gates lacks novelty. This type of gate, which relies on image features for implicit decision-making, can be applied to various tasks and is not specifically designed for handling different types of degraded images.
There is a lack of ablation studies, such as on the number of lazy experts, the number of trainable experts, the adapter design, and the choice of RouGE layer placement, among others.

问题

please refer to weakness.

审稿意见

评分: 5置信度: 42024-10-31

The paper introduces RouGE, a gated adapter module to enhance the robustness of SAM-based models for degraded images. By employing lightweight probability gates and low-rank experts, RouGE adapts to various degradation types with minimal parameter tuning.

优点

Relevant Motivation: Addressing robustness in SAM-based segmentation models for real-world degraded conditions is a timely and important topic.
Parameter Efficiency: The use of gated experts and low-rank modules offers an efficient approach to improving robustness with minimal parameter tuning.
Broad Evaluation: The paper evaluates RouGE on multiple types of degradation and compares it with various parameter-efficient tuning methods, demonstrating its competitive performance.

缺点

Unclear Feature Definition: The image feature $F_t$ is not clearly defined, leaving ambiguity about its source and role. Please provide a detailed description of the generation process for $F_t$ .
Computational Cost of CLIP: If CLIP is used for $F_t$ extraction, the computational load could be significant. A full breakdown of pipeline steps is needed to assess efficiency.
Unspecified Detection Model: The object detection model used for bounding boxes is not specified, leaving reproducibility issues and potential variability in results.
Ambiguity in Baseline Comparisons: The paper uses a sequential adapter integration, whereas Adaptformer has shown that parallel adapter configurations can yield better performance. Without testing RouGE with a parallel MoE adapter setup, the paper does not convincingly demonstrate that its adapter design is optimal.
Missing Adapter Details: Critical details like the adapter’s bottleneck dimension are omitted, limiting the method's replicability.
Incomplete Ablation Studies: The paper lacks comprehensive ablation studies for critical hyper-parameters, such as the number of experts and adapter bottleneck dimensions. Including these would strengthen the evaluation and provide insights into the effects of architectural choices on performance.

问题

Why does the proposed method achieve a lower number of trainable parameters compared to previous methods, despite using a MoE structure? Could the authors provide a quantitative analysis to clarify this?

审稿意见

评分: 3置信度: 52024-11-04

This paper adapts the SAM-based model for degraded images. The main contribution is on the RouGE layer which has a MoE-like design. The suggested RouGE layer improves the feature modeling by activating the necessary gates. The final model improves the baseline model on different benchmarks.

优点

The proposed method can improve the baseline performance and show superior performance compared to other adapt/prompting methods.

缺点

While the improved performance looks promising, the reviewer has several concerns:

There are many typos in the current manuscript and the consistency can be improved. For example, the reviewer think in the Figure 1 it should be "Mask" instead of "Musk"
Again in Figure 1, the clean image is the same as the rainy image, while the blurry and low-light ones have different content. This makes the figure hard to understand.

Some more major concerns:

It seems that the proposed RouGE has all experts activated, which loses the advantage of boosting inference during inference as in conventional MoE. Therefore, the current method looks more like a combination of different gating functions with a weighted function.
It is unclear what experts are activated for which kind of input and how the weighting function varies. In other words, the paper lacks in-depth analysis.
The proposed method only compares with one single baseline method. It is unclear if the proposed method can generalize well on the SAM-based works. In addition, the authors replace the proposed adapting/prompting method with other counterparts. While it is so-called a fair comparison, it would be better to at least include the variance of each method. It might be likely that the authors take the best performance from the proposed method and compare it with a lower performance from the other counterparts.
From Table 1, it can be seen that LN can already hugely improve the baseline performance with only 10% learnable param of the RouGE block. Therefore, the contribution of the the RouGE block becomes less convincing since we can believe adding more LN might achieve a similar performance. By the way, the proposed RouGE in fact looks very much like a concatenation of LN, with some dimension changes .... So the novelty is limited...
The reviewer is not sure if the claim on Real-World degradation is valid since most of the datasets contain only synthetic degradation.,,,
There are many more advanced works in the MoE for image restoration as shown in [1]. However, these works are not discussed. The reviewer thinks that the MoE design from this work looks too naive compared to these counterparts...

[1] A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends

问题

See weakness

审稿意见

评分: 3置信度: 52024-11-06

The paper presents RouGE, a PEFT and MoE framework designed to enhance the robustness of SAM-based models. RouGE employs multiple probability gates to decompose image conditions and manages them using a set of low-rank experts in a PEFT manner. The results show promise results and successfully reducing the number of learnable parameters.

优点

The idea is simple and the paper is easy to follow.
The article's approach shows promising results and robustness improvement on real-world unseen data.

缺点

Utilizing MoE to enhance robustness and performance is not a novel concept. The authors integrate MoE with low-rank Adapters in PEFT for the Segment Anything model, which may not provide more insights for the community.
Some ablations or results are absent, including Adapter dimensions and a comparison of training sources (e.g., GPU memory and additional computational overhead).
Reporting on the performance of structures using MoE with full fine-tuning is essential to evaluate the effectiveness of the PEFT strategy.

问题

See Weakness. Besides, there are few other suggestions:

There seems $2 \times 6 = 12$ Adapter, why does the proposed method have fewer trainable parameters than previous methods like AdaptFormer ?
The image feature $F_t$ is not clearly defined.

撤稿通知

2024-11-13

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.