Lightweight Frequency Masker for Cross-Domain Few-Shot Semantic Segmentation
We find simply filtering frequency components significantly improves CD-FSS performance. We delve into this for an interpretation, and further propose a lightweight frequency masker, importing only 2.5% parameters but averagely improving over 11%.
摘要
评审与讨论
The paper presents a novel approach to cross-domain few-shot semantic segmentation (CD-FSS) by introducing a lightweight frequency masker that aims to improve performance by filtering different frequency components for target domains. The authors propose an amplitude-phase-masker (APM) module and an adaptive channel phase attention (ACPA) module to reduce inter-channel correlations and enhance feature robustness against domain gaps.
优点
- The paper addresses a relevant and challenging problem in the field of few-shot semantic segmentation, particularly in cross-domain scenarios where performance typically suffers due to domain shifts.
- The proposed lightweight frequency masker, including the APM and ACPA modules, introduces a novel perspective on feature disentanglement in the frequency domain, which is a promising direction for improving generalization across domains.
- The authors provide a thorough interpretation of the phenomenon of frequency filtering and its impact on feature channel correlations, which is well-supported by mathematical derivations and empirical evidence.
- The paper includes extensive experiments on four target datasets, demonstrating the effectiveness of the proposed method in reducing domain gaps and improving segmentation performance.
缺点
- The novelty of the approach among existing work could be better established. It would be better to give a more detailed comparison with state-of-the-art methods that also attempt to address domain shifts in few-shot segmentation, especially those through the frequency operations.
- The paper does not discuss the computational efficiency of the proposed method, which is an important consideration for practical applications. It would be beneficial to include details on the runtime and resource requirements of the approach.
- Since overfitting is an important issue in the few-shot finetuning, I wonder how this method could benefit the model in reducing the overfitting.
问题
NA
局限性
NA
1. Compare with other frequency-based methods
We answered this question in the global response. We hope this could resolves your concerns.
2. The complexity analysis
We present the results of the complexity analysis, showing that our APM and ACPA are extremely lightweight, modules with minimal parameters and computational overhead. Our experiments were conducted on a single 4090 GPU. We also compared it with a lightweight frequency-based method (DFF [1]), which further highlights our advantages in terms of computational overhead and parameters.
| baseline (encoder + decoder) | APM-S | APM-M | ACPA | DFF [1] | |
|---|---|---|---|---|---|
| Params(K) | 26174 (23600 + 2574) | 0.338 | 692 | 65.54 | 2100 |
| baseline | ours (APM-S) | ours (APM-M) | DFF [1] | |
|---|---|---|---|---|
| FLOPs (G) | 20.11 | 20.17 | 20.26 | 22.07 |
[1] Deep Frequency Filtering for Domain Generalization, CVPR2023
3. Our method could benefit the model in reducing the overfitting
We believe that overfitting in few-shot fine-tuning arises from two main reasons: 1) The extremely limited samples (few-shot) prevent the model from fully learning each feature, making it prone to fitting extreme features (noise) and overly relying on feature correlations (e.g., if the samples are red apples, the model binds red color and round shape, and fails if the test sample is a green apple). 2) Intra-class variations (such as viewing angles, transparency, and distances) hinder the model's ability to recognize the same features accurately.
Our APM addresses the first issue by reducing the correlation between features and eliminating channel bias (eliminating extreme features). ACPA tackles the second issue by leveraging the phase's invariant information to minimize intra-class variations. Consequently, our approach effectively mitigates overfitting in few-shot fine-tuning.
concerns addressed
Thanks for your response! If you have further questions, please feel free to tell us. We will continue to polish our work in the final version!
This paper discover a phenomenon that simpy filtering different frequency components for target domains can lead to a significant performance improvements. Then the paper delve into this phenomenon for an interpretation, and propose an approach based on this phenomenon, which achieves futher performance improvements. The proposed method includes an amplitude-phase-masker (APM) module and an Adaptive Channel Phase Attention (ACPA) module, which are lightweight but effective as validated by experiments.
优点
- The paper identifies an intriguing phenomenon where frequency filtering leads to performance gains in CD-FSS, which is a novel contribution to the field.
- The proposed lightweight frequency masker introduces minimal additional parameters (0.01%) yet achieves significant performance improvements (over 10% on average), which is a strong practical contribution.
- The paper includes extensive experiments on four target datasets, demonstrating the effectiveness of the proposed method.
缺点
- Can this method be applied to other domains or tasks such as Cross domain few shot learning, domain generalization?
- It would be helpful to include a sensitivity analysis on the choice of frequency components to filter, to understand the robustness of the method to different filtering strategies.
- Some existing methods such as GFNet also applied filtering on the frequency domain, it would be better to compare with these methods, both in the related work and in the experiments.
问题
- How this method can help other tasks such as cross domain few-shot learning, domain generalization, or few-shot object detection?
- How is this work compared to other works applying frequency operation?
局限性
This paper addresses the CDFSS task from the aspect of frequency analysis, however, solely the frequecy analysis is not an novel aspect. This paper lacks discussion about the difference with previous works regarding the frequency analysis. But in all, I still recognize the novelty and contribution of this paper.
1. Our method can be applied to other tasks
Our method can also be applied to cross-domain few-shot learning (CDFSL). Following BSCD-FSL[1] we implemented our method under this task setting (5-way 1-shot), and experimental results show that our method is effective in CDFSL as well.
| CropDisease | EuroSAT | ISIC | ChestX | Ave. | |
|---|---|---|---|---|---|
| baseline [1] | 73.39 | 66.12 | 35.07 | 21.98 | 49.14 |
| baseline + ours | 82.01 | 68.95 | 38.86 | 24.07 | 53.47 |
[1] A Broader Study of Cross-Domain Few-Shot Learning
2. Sensitivity analysis on the choice of frequency components to filter
First, we visualized the average masker results for each domain to observe the filtered frequency components, as shown in the global rebuttal PDF. We found that the masker effectively adjusts to filter different frequency components according to different domains.
Then, we validated the robustness of APM by adding gaussian noise during its adaptive process. Even with the added noise, APM could still dynamically adjust and filter out the frequency components detrimental to the current domain, demonstrating its robustness.
| FSS | Deep | ISIC | Chest | Ave. | |
|---|---|---|---|---|---|
| baseline | 77.54 | 33.19 | 32.65 | 47.34 | 47.68 |
| APM | 79.29 | 40.86 | 41.71 | 78.25 | 60.03 |
| APM + noise | 79.03 | 40.06 | 40.82 | 77.92 | 59.46 |
APM's initialization We also explored different initialization strategies for APM. A value of 0 means no frequency components pass through, while a value of 1 means all frequency components pass through. "Rand" indicates random values uniformly distributed in [0,1], "gauss" indicates values drawn from a normal distribution, "clamp" indicates values clipped to [0,1], and "line" indicates values scaled linearly to [0,1]. The experimental results show that our APM is robust, quickly adjusting and adapting even with an initial value of all zeros. Our default initialization strategy is all ones, meaning all frequency components pass through initially, which also facilitates the dynamic adjustment and learning of APM.
| FSS | Deep | ISIC | Chest | Ave. | |
|---|---|---|---|---|---|
| baseline | 77.54 | 33.19 | 32.65 | 47.34 | 47.68 |
| one (choose) | 79.29 | 40.86 | 41.71 | 78.25 | 60.03 |
| zero | 76.7 | 35.32 | 40.63 | 76.09 | 57.19 |
| rand | 78.93 | 40.74 | 41.49 | 77.56 | 59.68 |
| gauss (clamp) | 78.26 | 39.43 | 41.38 | 76.89 | 58.99 |
| gauss (line) | 78.82 | 40.46 | 41.54 | 77.85 | 59.67 |
3. Compare with other frequency-based methods
We answered this question in the global response. We sincerely hope this could resolve your concerns.
Here, we provide a more detailed explanation of the differences between our work and GFNet (experimental results are in the global rebuttal table):
-
The motivation of GFNet is to use global frequency filters to replace self-attention or MLPs, reducing computational overhead while removing inductive biases and maintaining a large receptive field (which helps capture long-term dependencies). This is reasonable because a spatial location in the frequency domain represents global information. In contrast, our work is motivated by the observation that different frequency components play different roles in different domains; a frequency component beneficial in domain A might be harmful in domain B. Therefore, we designed an adaptive masker to dynamically filter different frequency components according to different domains. We also explored and validated the relationship between frequency and feature correlation.
-
GFNet's "filter" refers to a convolutional filter, which can be seen as a stack of multiple convolution operations (a multiplication operation in the frequency domain can be replaced by multiple convolution operations in the spatial domain), with values in the range (-∞, +∞). In contrast, our masker is used to filter frequency components, with values in the range [0,1].
Dear Reviewer H2xr,
Please be reminded that the Author-Reviewer discussion phase will end very soon (in ONE day). Please take a look at the authors' rebuttal, see if they addressed your concerns. If you have any further questions/concerns, please post them ASAP, so that the authors may have time to respond to them!
Thanks,
AC
This paper presents a novel approach to cross-domain few-shot semantic segmentation (CD-FSS) by introducing a lightweight frequency masker. This masker aims to enhance the robustness of models against domain gaps by filtering different frequency components during the testing phase. The authors claim that their method significantly improves performance, sometimes by as much as 14%, without the need for extensive retraining or parameter tuning.
优点
This paper introduces a novel frequency masker that is lightweight and does not require training on the source domain. The authors provide a clear explanation of the phenomenon where frequency filtering improves performance, supported by mathematical derivations and experiments. This paper demonstrates significant performance improvements on multiple target datasets, which is a strong empirical contribution. The proposed APM and ACPA modules are innovative and show promise in addressing the domain gap problem in few-shot segmentation.
缺点
While the paper claims to reduce inter-channel correlation, why not directly constrain the model to reduce such correlation? How is this work compared with [27]? The paper could benefit from a more thorough comparison with state-of-the-art methods, particularly those that also employ frequency domain techniques. The authors might consider providing more details on the experimental setup, including data preprocessing and model training procedures, to ensure reproducibility.
问题
Could other frequency-based works also achieve the correlation reduction?
局限性
An important part of the analysis is the correlation reduction. However, the paper only includes a comparison with the reduction method by mutual information. Many other methods can also achieve this goal. In my opinion, the mathematical deduction only proves that the frequency operation is able to reduce the correlation, but does not show its advantages in such reduction. Therefore, I would like to see the author provide more comparison and analysis regarding this problem, such as directly comparing with [27].
1. Compare with directly constraining the model
We answered this question in the global response. We sincerely hope this could resolve your concerns.
2. Comparing with [27]Channel Importance Matters in Few-Shot Image Classification
We compared with [27] in the global response, and here we provide a more detailed explanation.
We implemented the two transformation methods from [27] under our task setting. The performance slightly declines on the FSS dataset, which is similar to the source domain. However, on the other three datasets that are more distant from the source domain, there is a slight performance improvement. Nonetheless, our method has an advantage in terms of performance, with improvements significantly surpassing those of [27].
We further elaborate on the differences between our work and [27]:
1)[27] found that different channels recognize different patterns, and the channel bias present between channels can affect the model's recognition ability. They improved performance by eliminating this channel bias through feature transformation functions. In contrast, our work posits that different frequency components play different roles in different domains. We dynamically filter out detrimental frequency components based on the domain, thereby reducing channel correlation and improving performance.
2)Compared to [27]'s spatial operations, our frequency operations have the advantage of better representing global information. A spatial position in the frequency domain represents information from the entire spatial domain, giving frequency domain operations a natural advantage in capturing long-term dependencies and maintaining a large receptive field. Additionally, compared to feature transformation and convolution operations in the spatial domain, frequency domain operations remove inductive biases.
4. Compare with other frequency-based methods
We answered this question in the global response. We hope this could resolve your concerns.
5. Could other frequency-based works also achieve the correlation reduction
We tested the MI of the aforementioned frequency-based methods, and not all of them were able to achieve correlation reduction. They achieved correlation reduction in certain domains because they are used during training to enhance the model's generalization. However, due to the domain gap, feature extraction patterns that perform well on the source domain may not benefit the target domain. For example, DFF explores frequency components beneficial for generalization during training but its results show it filters out a lot of high frequencies and retains low frequencies. Therefore, its performance improvement might be due to filtering out noise in high frequencies. We visualized how our masker filters frequency components across different domains (the global rebuttal PDF displays these results). The frequency components to be filtered differ among target domains. Additionally, phase and amplitude need to be considered separately rather than being treated as a single entity. Hence, not all frequency-based methods can achieve correlation reduction.
| MI | FSS | Deep | ISIC | ChestX |
|---|---|---|---|---|
| baseline | 1.3736 | 1.3679 | 1.3789 | 1.3952 |
| DFF | 1.3742 | 1.3701 | 1.3702 | 1.3429 |
| GFN | 1.384 | 1.3682 | 1.3781 | 1.3605 |
| ARP-SP | 1.3705 | 1.3568 | 1.3713 | 1.3488 |
| DAC-SC | 1.3722 | 1.3557 | 1.3676 | 1.3526 |
| ours | 1.3501 | 1.2761 | 1.3139 | 1.2610 |
6. More details on the experimental setup
Here, we provide a more detailed explanation of our data processing method. We adopt the same setup and data processing as PATNet[22]. For FSS-1000, the official split for semantic segmentation is used in our experiment. We report the results on the official testing set, which contains 240 classes and 2,400 testing images. For Deepglobe, the images have a fixed resolution of 2448 × 2448 pixels. To increase the number of testing images and reduce their size, each image was cut into 6 pieces. This cutting has minimal effect on segmentation due to the irregular shapes of the categories. After filtering out single-class images and the 'unknown' class, we obtained 5,666 images, each with a resolution of 408 × 408 pixels, for reporting the results. For ISIC, the images have a spatial resolution around 1022 × 767. We down-size the images to 512 × 512 pixels. For Chest X-ray, due to the large size of the image, we down-size the images to 1024 × 1024 pixels.
7. Why is frequency operation more advantageous in reducing correlation
The aforementioned (global response, answer2) orthogonality constraints, whitening, MMC, and MI Loss (discussed in the main text) all use spatial operations to reduce correlation. Here, we elaborate on the advantages of frequency operations compared to spatial operations.
1)The frequency domain inherently offers finer granularity compared to the spatial domain, facilitating more precise feature disentanglement. When a spatial domain channel (feature) is transformed into the frequency domain, each point in the frequency domain represents the global information of the feature. This results in a finer granularity transformation from 1 to hw points.
2)The frequency domain inherently provides a more lightweight operation compared to the spatial domain. A simple multiplication in the frequency domain can be equivalent to multiple convolutions in the spatial domain. This makes modules operating in the frequency domain more lightweight, easier to adapt to different domains, and more advantageous when data is scarce.
3)The frequency domain inherently has a larger receptive field and better helps capture long-term dependencies, making it more effective for learning global information. This enables operations in the frequency domain to capture more independent channel patterns, leading to expanded activation regions and more generalized representations.
Dear Reviewer 79M1,
Please be reminded that the Author-Reviewer discussion phase will end very soon (in ONE day). Please take a look at the authors' rebuttal, see if they addressed your concerns. If you have any further questions/concerns, please post them ASAP, so that the authors may have time to respond to them!
Thanks,
AC
This paper makes several notable contributions to the field of cross-domain few-shot segmentation (CD-FSS). The authors discover that filtering different frequency components for target domains can lead to significant performance improvements, attributing this to reduced inter-channel correlation in feature maps, which enhances robustness against domain gaps and expands activated regions for segmentation. Building on this insight, they propose a lightweight frequency masker comprising an Amplitude-Phase-Masker (APM) module and an Adaptive Channel Phase Attention (ACPA) module. These components effectively reduce channel correlations and further enhance segmentation performance. The proposed method demonstrates significant advancements over current state-of-the-art CD-FSS approaches, highlighting its potential impact on the field.
优点
This paper presents several notable advantages in the field of cross-domain few-shot segmentation (CD-FSS). It identifies a significant performance improvement by filtering different frequency components for target domains, which reduces inter-channel correlation in feature maps and enhances robustness against domain gaps. The proposed lightweight frequency masker, consisting of the Amplitude-Phase-Masker (APM) and Adaptive Channel Phase Attention (ACPA) modules, effectively reduces channel correlations and improves segmentation performance with minimal additional parameters. The authors also provide relevant mathematical derivations to support their findings. The method demonstrates substantial improvements over state-of-the-art CD-FSS methods, making it a significant contribution to the field.
缺点
- Performing frequency domain filtering on features is likely to result in some loss of information, potentially damaging the original structure of the features. Moreover, the mask weights required for different domains should vary. Are the authors training the APM on the source domain and then directly testing it on different target domains?
- The novelty of the method is limited. The idea proposed by the authors is very similar to [1] and seems to merely apply cross-domain techniques to cross-domain few-shot segmentation.
- The method proposed by the authors shows very limited improvement on some datasets and even performs worse than the existing state-of-the-art (SOTA) methods.
[1] Deep Frequency Filtering for Domain Generalization CVPR2023
问题
- The authors claim that filtering certain frequency components can lead to significant performance improvements. How sensitive is this improvement to the specific frequency components chosen? Is there a systematic way to determine the optimal frequency components for a given domain?
- The paper introduces the Amplitude-Phase Masker (APM) module. How does the initialization of the APM affect the final performance? Have the authors explored different initialization strategies?
- The Adaptive Channel Phase Attention (ACPA) module uses phase information for attention weights. What is the rationale behind using only phase information rather than both phase and amplitude? How would the results change if amplitude information were incorporated?
- The paper claims that the proposed method reduces inter-channel correlation in feature maps. How does this reduction in correlation compare to other feature decorrelation methods in the literature, such as those based on orthogonality constraints or whitening?
- The authors use mutual information to measure inter-channel correlation. Are there other metrics that could provide additional insights into the nature of the feature disentanglement achieved by this method?
局限性
- Computational overhead: The paper does not adequately address the computational cost of the proposed method. Frequency domain operations and the additional modules (APM and ACPA) likely introduce significant computational overhead, which should be quantified and compared to existing methods.
- Ablation studies: The paper would benefit from more comprehensive ablation studies. For instance, the individual contributions of APM and ACPA are not clearly delineated, and the impact of different design choices within these modules is not thoroughly explored.
1. Filtering on features damages the original feature structure? Filtering certain frequency components does not damage the original feature structure; instead, it is beneficial. Since not all frequencies are advantageous for the current domain, we dynamically adjust the mask and take its inverse (1-mask). We find that performance decreases compared to the baseline, indicating that the frequencies we filter out are indeed detrimental components.
| FSS | Deep | ISIC | Chest | Ave. | |
|---|---|---|---|---|---|
| baseline | 77.54 | 33.19 | 32.65 | 47.34 | 47.68 |
| APM (w/o ACPA) | 78.98 | 40.81 | 38.99 | 77.73 | 58.86 |
| Inv. APM (w/o ACPA) | 77.25 | 30.26 | 31.23 | 47.07 | 46.45 |
2. Sensitivity analysis on the choice of frequency components to filter Since different domains require different weights, our method adapts directly to the target domain without the need for source-domain training. We visualized the average masker results for each domain to observe the filtered frequency components, as shown in the global response PDF. We found that the masker effectively adjusts to filter different frequency components according to different domains.
3. Differences between our approach and Deep Frequency Filtering for Domain Generalization (DFF): We compare with DFF in global response answer 1. Here we provide a more detailed explanation.
-
Motivation: DFF aims to explore and retain frequency information beneficial for generalization during training, while filtering out frequencies that are not. However, we found that useful frequency information varies across different domains; frequencies beneficial to one domain may be harmful to others. Therefore, we focus on adaptively selecting beneficial information for different domains.
-
Amplitude and Phase: DFF does not distinguish between amplitude and phase, using attention mechanisms to filter out non-generalizable frequency components during training. However, amplitude and phase play different roles: amplitude contains domain-specific information, while phase contains domain-invariant information. Our APM independently adjusts amplitude and phase, filtering out detrimental frequency information separately. ACPA leverages the domain-invariant characteristic of phase to reduce intra-class variance between support and query.
-
Effectiveness: DFF performs well when input distributions differ but the label space remains the same. However, its effectiveness is limited in our task, where both input distributions and label spaces differ. We implemented DFF in our task, and our method demonstrated superior performance (see Q1 in global response).
4. Performance could be further improved after segmentation refinement To highlight the effectiveness of our method, we did not employ methods such as data augmentation, or segmentation refinement. We used the segmentation refinement of PANet [37], which led to further performance improvements. Our method already surpasses the existing SOTA and shows significant improvement over the baseline, even without using any additional methods.
| FSS | Deep | ISIC | Chest | Avg | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | 1-shot | 5-shot | 1-shot | 5-shot | 1-shot | 5-shot | 1-shot | 5-shot | 1-shot | 5-shot |
| baseline | 77.54 | 80.21 | 33.19 | 36.46 | 32.65 | 35.09 | 47.34 | 48.63 | 47.68 | 50.10 |
| PATNet [22] (SOTA) | 78.59 | 81.23 | 37.89 | 42.97 | 41.16 | 53.58 | 66.61 | 70.20 | 56.06 | 61.99 |
| APM-M | 79.29 | 81.83 | 40.86 | 44.92 | 41.71 | 51.16 | 78.25 | 82.81 | 60.03 | 65.18 |
| APM-M refine | 80.02 | 82.35 | 41.23 | 45.57 | 42.56 | 53.69 | 78.76 | 83.22 | 60.64 | 66.21 |
5. The different initialization strategies of APM We presented this experiment in our response to question 2 for reviewer H2xr. Please refer to the AMP initialization experiments in question 2 for reviewer H2xr. We sincerely hope this could resolve your concerns.
6. Why does ACPA only use the phase information Previous interpretability studies have shown that phase is an invariant representation, while the amplitude varies between samples and contains specific information. To alleviate intra-class variations (such as viewing angles, transparency, and distances, which hinder the model's ability to recognize the same features accurately), we leverage the invariant nature of phase to align the feature spaces of support and query.
| Ave. 1-shot | Ave. 5-shot | |
|---|---|---|
| + amplitude | 58.32 | 63.25 |
| w/o amplitude | 60.03 | 65.18 |
7. Compare with other methods for reducing correlation/Compare with other frequency-based methods/More detailed about the individual contributions of APM and ACPA We answered these questions in the global response. We sincerely hope this could resolve your concerns.
8. Other metrics validate that our method achieves feature disentanglement We normalize the feature map channels with L2 normalization and then compute the L1 norm to measure their sparsity. A smaller value indicates higher sparsity. After masking certain frequency components, the sparsity value decreases, indicating sparser features. Sparse features imply lower feature redundancy, which benefits feature disentanglement and thereby enhances the model’s generalization capability.
| sparsity | FSS | Deep | ISIC | ChestX |
|---|---|---|---|---|
| baseline | 31.85 | 32.41 | 32.4 | 31.86 |
| APM | 31.12 | 31.79 | 31.08 | 30.5 |
9. The complexity analysis We answer this question in the reviewer upoB's question 2. We sincerely hope this could resolve your concerns.
Dear Reviewer jSMF,
Please be reminded that the Author-Reviewer discussion phase will end very soon (in ONE day). Please take a look at the authors' rebuttal, see if they addressed your concerns. If you have any further questions/concerns, please post them ASAP, so that the authors may have time to respond to them!
Thanks,
AC
1. Compare with other frequency-based methods
Here, we elaborate on the differences between our work and previous frequency-based methods.
DFF [1] explores and retains frequency information beneficial for generalization during training while filtering out frequencies that are not. GFNet [2] uses global frequency filters to replace self-attention or MLPs, reducing computational overhead while maintaining a large receptive field. ARP [3] proposes that a robust CNN should be resilient to amplitude variance and focus on the phase spectrum, thus introducing the Amplitude-Phase Recombination data augmentation method. DAC [4] proposes a novel normalization method, which eliminates style (amplitude) only as the preserving content (phase) through spectral decomposition. Although all these methods enhance the model's generalization ability, they do not effectively bridge large domain gaps.
Our motivation stems from the observation that filtering certain frequency components can significantly improve performance, while different frequency components have varying effects on different domains due to domain gaps. We delved into this phenomenon and discovered that operations in the frequency domain can reduce the correlation between channels, achieving feature disentangling. Therefore, our method does not require training on the source domain. Instead, it adaptively masks components that are detrimental to the current target domain (feature level). Additionally, we independently consider amplitude and phase rather than treating them as a whole, and we leverage the invariant characteristics of phase to design a channel attention module that addresses intra-class variations. Experimental results demonstrate that our method outperforms existing frequency-based methods in the CDFSS task.
| FSS | Deep | ISIC | Chest | Ave. | |
|---|---|---|---|---|---|
| baseline | 77.54 | 33.19 | 32.65 | 47.34 | 47.68 |
| DFF [1] | 78.18 | 32.16 | 35.71 | 60.29 | 51.59 |
| GFNet [2] | 76.86 | 32.23 | 33.95 | 53.12 | 49.04 |
| ARP-SP [3] | 78.83 | 35.06 | 35.61 | 59.83 | 52.33 |
| DAC-SC [4] | 78.27 | 35.98 | 36.02 | 57.66 | 51.98 |
| ours | 79.29 | 40.86 | 41.71 | 78.25 | 60.03 |
[1] Deep Frequency Filtering for Domain Generalization, CVPR2023
[2] Global Filter Networks for Image Classification, NeruIPS2021
[3] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain, ICCV2021
[4] Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalizatio, CVPR2023
2. Compare with other methods of reducing correlation
In the main text, we compared our method with MI Loss. Here, we further provide comparisons with orthogonality constraints, whitening, and MMC [27].
For methods that directly constrain the model (orthogonality, whitening): the few-shot setting means limited sample size, and existing models have a large number of parameters. Directly adjusting the model with constraints using such small datasets is not effective and, without careful tuning of hyperparameters, can lead to negative optimization. As seen, the performance of orthogonality constraints and whitening methods is not satisfactory.
For feature transformation/augmentation methods like MMC: the stability is not guaranteed because they use specific feature transformation functions. Due to the domain gap, a transformation method effective for one domain may not be effective for others. For example, MMC's performance on the FSS dataset not only failed to improve but declined. The MMC paper also mentioned that this method might experience performance degradation on certain datasets.
In contrast, our method has the advantages of being 1) lightweight (allowing for quick adaptation in a few-shot setting) and 2) stable and robust (with adaptive adjustments for different domains). These benefits are well reflected in the performance results.
| MIoU | FSS | Deep | ISIC | Chest | Avg |
|---|---|---|---|---|---|
| baseline | 77.54 | 33.19 | 32.65 | 47.34 | 47.68 |
| MMC (Simple) [27] | 77.48 | 34.70 | 34.32 | 48.74 | 48.81 |
| MMC (Oracle) [27] | 77.45 | 35.12 | 34.59 | 50.27 | 49.36 |
| baseline + orthogonality [1] | 78.13 | 34.61 | 34.05 | 50.58 | 49.34 |
| baseline + whitening | 77.92 | 33.22 | 32.98 | 50.89 | 48.75 |
| ours | 79.29 | 40.86 | 41.71 | 78.25 | 60.03 |
| MI | FSS | Deep | ISIC | Chest |
|---|---|---|---|---|
| baseline | 1.3736 | 1.3679 | 1.3789 | 1.3952 |
| MMC (Simple) [27] | 1.3742 | 1.3601 | 1.3782 | 1.3629 |
| MMC (Oracle) [27] | 1.3740 | 1.3582 | 1.3751 | 1.3605 |
| baseline + orth [1] | 1.3695 | 1.3611 | 1.3758 | 1.3590 |
| baseline + whitening | 1.3702 | 1.3668 | 1.3783 | 1.3577 |
| ours | 1.3501 | 1.2761 | 1.3139 | 1.2610 |
[1] Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?
[27] Channel Importance Matters in Few-Shot Image Classification
3. More details about the individual contributions of APM and ACPA
APM filters out negative frequency components at the feature level within feature maps. It leads to a feature map that is more robust, generalizable, and provides broader and more accurate representations. Adaptive Channel Phase Attention (ACPA) can be seen as a process of feature selection. Building on the APM-optimized feature map, ACPA encourages the model to focus on more effective channels (features) while aligning the feature spaces of the support and query samples.
In this paper, the authors presented a method for cross-domain few-shot semantic segmentation. Based on an observation that filtering different frequency components in target domains can lead to performance gain, two modules -- amplitude-phase-masker (APM) and adaptive channel phase attention (ACPA), were proposed to reduce channel correlations. The key contributions of this paper are the observed phenomenon of frequency filtering with its impact on feature channel correlations and the proposed lightweight frequency masker, which could be of interest to the audience of NeurIPS and potentially inspire follow-up research.
The paper received consistent positive recommendations from four expert reviewers. All the reviewers acknowledged the above-mentioned contributions and the corresponding significant performance improvement, and believe it could be a good contribution to the field. There were some concerns around the detailed design of the filtering method, novelty (compared to related works), and experiments. Through the rebuttal, these major concerns were addressed with some additional provided evidence.
Considering the above and the novelty and soundness requirements for NeurIPS papers, the AC recommends Accept for this paper, but the authors are reminded to include the further evidence provided in the rebuttal (to address those concerns) into their final version.