PaperHub
7.0
/10
Rejected4 位审稿人
最低6最高8标准差1.0
6
8
8
6
4.0
置信度
ICLR 2024

Cooperative Hardware-Prompt Learning for Snapshot Compressive Imaging

OpenReviewPDF
提交: 2023-09-24更新: 2024-02-11

摘要

关键词
snapshot compressive imaginghyperpectral imagingprompt learningfederated learning

评审与讨论

审稿意见
6

The paper focuses on the robustness, efficiency, and accuracy of current snapshot compressed imaging reconstruction networks. The major contribution of the paper is designing a prompt network which automates the process of aligning a measurement based on its corresponding measurement model.

优点

The paper tackles an interesting distribution shift, that is a shift in the measurement model of the compressed sensing task.

The paper is overall well-written (although in some parts difficult to read).

The idea of the prompt network to tackle distribution shifts is very interesting. If I understood correctly, without a prompt network, fine-tuning is needed for new measurement models. Yet with the prompt network the process of measurement alignment with the measurement model is automated for any measurement model.

The experiments are interesting and carefully designed in the sense that reasonable baselines and datasets are chosen for evaluation.

缺点

The major contribution of the paper is not well-justified. i.e., the crucial need for the proposed method (as opposed to training from the scratch for every new measurement model) is not well-supported. e.g., the reviewer still finds it very convenient to train a model for every new sets of measurement models for a new organization of interest from a practical perspective. i.e., all it takes is a few hours (days) of training for the new set (note that this automatically addresses the other concern raised by the authors regarding privacy constraints, in that each organization has access to its own data and device sets).

The comparisons are not fair in Fig. 1 (also please see my question regarding Fig. 1 below). Clearly, joint training should serve as an upper bound on the performance when the test set contains the same measurement models as training.

The results (especially the quantitative ones) do not yield the conclusion that FedAVG is outperformed by FedHP. The major advantage of FedHP seems to be its 4x more efficient training time compared to FedAVG. This is fine and improving the efficiency is valuable from a practical point of view, but the paper isn’t oriented around this conclusion; the paper emphasizes the value of prompt networks and FedHP in the form of accuracy and robustness gains, whereas FedAVG enjoys those traits, too!

Minor: On Tab. 2, FedHP is highlighted as the best-performing method in terms of SSIM (0.8481), whereas FedAVG should be highlighted (0.8496).

问题

How’s Fig. 1 obtained? Is it evaluated on the same measurement model set used during training of each setup? Or are all models evaluated on the same predefined test set of measurement models?

As mentioned in the strengths section, the idea of the prompt network is interesting. However, the biggest question raised is whether that network induces another source of instability to the overall model. Specifically, what guarantees that the prompt network doesn’t do a terrible alignment for measurement models deviating from the training distribution?

What is the source of inconsistency between Tab. 1 and Fig. 3? What we see in Fig. 3 flags for a higher PSNR difference than 0.14 dB between FedAVG and FedHP… It’s understandable to argue that quantitative metrics such as PSNR or SSIM don’t perfectly capture the true quality, but the visual difference on Fig. 3. is too large not to be captured by those metrics.

Why isn’t the deep unfolding network included in all the results and only reported as a short paragraph at the end?

Any intuition on why FedAVG is so much slower to train than FedHP?

评论

Weakness 4: Minor: On Tab. 2, FedHP is highlighted as the best-performing method in terms of SSIM (0.8481), whereas FedAVG should be highlighted (0.8496).

A4: Thanks for the useful comments! We modified the annotation typos in Table 1 in our manuscript.

Question 1: How’s Fig. 1 obtained? Is it evaluated on the same measurement model set used during training of each setup? Or are all models evaluated on the same predefined test set of measurement models?

A5: Thanks for the great question!

In Fig. 1, all results are evaluated by sampling from the same mask pool to test the trained models. Note that this mask pool contains masks from three different distributions, i.e., P1P_1, P2P_2, P3P_3 as we plotted in Fig. 1. Since we sample non-overlapped masks, the masks used for testing are unseen to the models and can be regarded as a zero-shot testing. We add this illustration into the caption of Fig. 1. In computational imaging, previous work [1,2] has shown that the change of mask (e.g., shift, perturbation) will cause large performance degradation.

Question 2: The biggest question raised is whether that network induces another source of instability to the overall model. Specifically, what guarantees that the prompt network doesn’t do a terrible alignment for measurement models deviating from the training distribution?

A6: Thanks for the valuable insight!

We provide an ablation study in Table 3 to discuss the effect of prompt networks. FedHP experiences large performance degradation without using a prompt network. One thing to note is that the prompt network is learned on local but will be aggregated to the server, and then distributed to clients at each global round. This makes sure that the prompt network does not deviate from the global learning objective.

Question 3: What is the source of inconsistency between Tab. 1 and Fig. 3? What we see in Fig. 3 flags for a higher PSNR difference than 0.14 dB between FedAVG and FedHP… It’s understandable to argue that quantitative metrics such as PSNR or SSIM don’t perfectly capture the true quality, but the visual difference on Fig. 3. is too large not to be captured by those metrics.

A7: The difference of 0.140.14dB is an averaged result on ten testing scenes as provided in Table 1. In Fig.3, we select one example of Scene 77 to visually compare between different methods. This should correspond to a PSNR gap of 0.430.43dB and 0.00730.0073 in SSIM. Besides, the visual quality in different regions can vary. In the supplementary material, we provide more visual comparisons of different scenes.

Question 4: Why isn’t the deep unfolding network included in all the results and only reported as a short paragraph at the end?

A8: Thanks for the insightful questions!

The deep learning-based reconstruction methods and model-based methods (e.g., deep unfolding) methods demonstrate promising performance for the SCI study. This work mainly focuses on deep learning-based methods.

Besides the performance comparison in main tables, we also would like to compare with a state-of-the-art deep unfolding method [3]. As shown in Table 4b, FedHP also brings a notable performance boost (+0.28+0.28dB/0.00380.0038 in PSNR/SSIM) with a much smaller model size.

Question 5: Any intuition on why FedAVG is so much slower to train than FedHP?

A9: Thanks for the useful question!

There are two designs that make FedHP more efficient than FedAvg.

Firstly, FedHP does not need to train every local client model from scratch, as long as there is a pre-trained model, we can easily adapt it to new clients under FedHP. By comparison, for any new client, FedAvg is required to train the model from scratch, this can be computationally cumbersome.

The other reason is that the proposed method has a much lower communication burden. FedHP only needs to aggregate and distribute a lightweight hardware prompt network and adaptors. By comparison, FedAvg has to do full model training and communication to adapt to different masks, otherwise, the performance will be significantly degraded (e.g., <20<20dB). In contrast, we introduce a hardware prompt to address this issue in a smart way.

[1] Modeling mask uncertainty in hyperspectral image reconstruction. ECCV 2022.

[2] Metasci: Scalable and adaptive reconstruction for video compressive sensing. CVPR 2021.

[3] Deep unfolding for snapshot compressive imaging. IJCV 2023.

评论

We appreciate that Reviewer bAjN finds our method tackles an interesting problem, proposes an interesting prompt network, experiment is carefully designed with reasonable baselines!

Weakness 1: The major contribution of the paper is not well-justified. i.e., the crucial need for the proposed method (as opposed to training from scratch for every new measurement model) is not well-supported. e.g., the reviewer still finds it very convenient to train a model for every new set of measurement models for a new organization of interest from a practical perspective. i.e., all it takes is a few hours (days) of training for the new set (note that this automatically addresses the other concern raised by the authors regarding privacy constraints, in that each organization has access to its own data and device sets).

A1: Thanks for this useful question!

The conventional setting where each organization employs its local data and device to perform independent training has practical limitations for the current SCI research study and prevents its further real-world applications due to the following reasons.

  • Data-starving challenge of the client. Some clients may have limited training data (e.g., only several scenes) that the training cannot converge.
  • Efficiency concern. We find training a local model from scratch takes 3.543.54 days (10.62/310.62/3 as shown in Table 3) on our platform. It only takes less than 11 day (2.86/32.86/3 in Table 3) to adapt the model to a new device/client using FedHP. We have put this illustration into the manuscript. The training time can be even longer if the reconstruction backbone becomes larger.

In summary, considering the data-starving nature of the client, it may be impractical for a new client to train a model. Besides, there is a training efficiency concern when there is enough data for the client. Plus, the reconstruction model trained with a single well-calibrated hardware instance is hard to adapt to new hardware. This work proposed FedHP to practically enable the deployment of reconstruction models on new clients.

Weakness 2: The comparisons are not fair in Fig. 1 (also please see my question regarding Fig. 1 below). Clearly, joint training should serve as an upper bound on the performance when the test set contains the same measurement models as training.

A2: We appreciate the reviewer’s valuable insight in improving this work!

In general, joint training serves as an upper bound for federated learning when the test set contains the same measurement models as training.

However, in Fig. 1, the test set consists of random measurement models that are unseen for training. Specifically, all results (1 to 5) are evaluated by sampling from the same mask pool to test the trained models. Note that this mask pool contains masks from three different distributions, i.e., P1P_1, P2P_2, P3P_3 as we plotted in Fig. 1. Since we always sample non-overlapped masks, the masks used for testing are generally unseen to the models. This is quite challenging and can well simulate a real-world scenario. We have added more illustrations about the settings to the caption of Fig. 1. In this case, our experiments find that simply combining all data from different hardware to jointly train does not work well (as shown in Fig. 1, over 0.60.6dB lower than FedHP and 0.30.3dB lower than FedAvg).

We appreciate the reviewer's help in distinguishing this work from general federated learning endeavors!

Weakness 3: The results (especially the quantitative ones) do not yield the conclusion that FedAVG is outperformed by FedHP. The major advantage of FedHP seems to be its 4x more efficient training time compared to FedAVG. This is fine and improving the efficiency is valuable from a practical point of view, but the paper isn’t oriented around this conclusion; the paper emphasizes the value of prompt networks and FedHP in the form of accuracy and robustness gains, whereas FedAVG enjoys those traits, too!

A3: We thank the reviewer bAjN for helping us summarize this contribution!

We find that FedAvg serves as a very strong baseline by even working better than recent FL methods, such as SCAFFOLD and FedProx. Thus, it is non-trivial to achieve further performance boost. By comparison, FedHP can bring a consistent performance boost. For example, +0.14+0.14dB in Table 1, +0.35+0.35dB in Table 2, and +0.27+0.27dB in Table 4 (a) with more clients.

Besides, by jointly introducing the hardware prompt network and adapter, the proposed method also benefits from the efficiency advantage by only requiring adapting the pre-trained models in the system to the new clients. We have also emphasized this point by mentioning the advantage of efficiency in the revised manuscript.

审稿意见
8

This work develops a federated hardware-prompt learning (FedHP) method for the task of snapshot compressive imaging (SCI). Existing reconstruction methods generally consider a single well-calibrated hardware configuration for network learning, inducing a highly coupled relationship between the reconstruction model and hardware settings. Differently, this work adopts federated learning to coordinate multiple clients with variant hardware settings and proposes a hardware-oriented solution to mitigate heterogeneous data issues.

优点

• The motivation of this work is impressive, pointing out a very practical problem for snapshot compressive imaging. Both the hardware cooperation and hardware heterogeneous problems are underexplored. This work solves the heterogeneous issue accounting for the special characteristics of SCI. • The design of the hardware prompter bridges the hardware and software in a novel way, which could be easily incorporated into optimizing diverse set-ups in SCI. • Experimental results are abundant and have shown a clear performance boost over previous methods. Extensive ablation studies and model discussions have also been provided.

缺点

• It remains unclear if the proposed method can adopt a larger client number. A detailed discussion on the number of clients should be given to demonstrate the practicality of the proposed method and to enhance the soundness of the work. • Is it possible to apply the proposed method to other hyperspectral image datasets? • It seems that a competitive method of FedGST for comparison was a centralized learning strategy, is it a fair comparison or what are the modifications toward this method? Please provide more details.

问题

• Is the dataset split of the centralized learning the same as the federated learning? Please provide more illustrations and details. • There are some typos in the manuscript, for example, Fig.3 caption.

评论

We appreciate that Reviewer X7WS finds our method well-motivated, solves a very practical problem, experimental results are abundant and clear.

Weakness 1: It remains unclear if the proposed method can adopt a larger client number. A detailed discussion on the number of clients should be given to demonstrate the practicality of the proposed method and to enhance the soundness of the work.

A1: We provide an experiment with more clients, e.g., C=8, as follows. We train the proposed FedHP under the same setting as Table 4a.

#ClientsFedAvgFedHPΔ\Delta
331.21 ±\pm 0.10 \ 0.8959 ±\pm 0.001731.35 ±\pm 0.10 \ 0.9033 ±\pm 0.00140.14dB/0.0074
431.06 ±\pm 0.10 \ 0.8955 ±\pm 0.001831.33 ±\pm 0.13 \ 0.9023 ±\pm 0.00180.27dB/0.0068
531.05 ±\pm 0.10 \ 0.9025 ±\pm 0.001431.32 ±\pm 0.19 \ 0.9029 ±\pm 0.00190.27dB/0.0004
831.17 ±\pm 0.10 \ 0.9033 ±\pm 0.001431.42 ±\pm 0.11 \ 0.9043 ±\pm 0.00100.25dB/0.0010

Table R1: Comparison between FedAvg and FedHP with different number of clients. The last column denotes the performance gap.

As shown in Table R1, the proposed method can achieve a consistent performance boost over FedAvg at a larger number of clients.

Weakness 2: Is it possible to apply the proposed method to other hyperspectral image datasets?

A2: We perform experiments on another hyperspectral dataset [1] with 24 spectral channels with the number of client C=3. As the table R2 shown below, the proposed method enables a performance boost over FedAvg.

MetricsFedAvgFedHP
PSNR29.97 ±\pm 0.3130.55 ±\pm 0.20
SSIM0.8442 ±\pm 0.00260.8471 ±\pm 0.0035

Table R2: Comparison between FedAvg and FedHP on different hyperspectral dataset [1].

Weakness 3: It seems that a competitive method of FedGST for comparison was a centralized learning strategy, is it a fair comparison or what are the modifications toward this method? Please provide more details.

A3: Thanks for the valuable suggestion!

GST [2] is a centralized learning strategy to handle various hardware masks from the same distribution. In this work, we insert this model directly into the federated framework to enable hardware cooperation. We have added more illustrations about this method into the manuscript. Since all of the methods are compared under the federated learning framework and adopt the same hardware instances, we can thus perform a fair comparison. Actually, considering GST adopts a self-tuning network besides the reconstruction backbone, it requires more training cost to converge.

Question 1: Is the dataset split of the centralized learning the same as the federated learning? Please provide more illustrations and details.

A4: We split the training data according to the number of clients for federated learning. We keep the total amount of training data the same for both centralized learning and federated learning, for a fair comparison. We have put the above illustration into the manuscript.

Questions 2: There are some typos in the manuscript, for example, Fig.3 caption.

A5: Thanks for the useful comments! We have revised the typos in Fig. 3 caption.

[1] l-net: Reconstruct hyperspectral images from a snapshot measurement. ICCV 2019.

[2] Modeling mask uncertainty in hyperspectral image reconstruction. ECCV 2022.

评论

Thanks for the authors response and all my concerns have been well addressed. I have no further comments.

评论

We appreciate the reviewer's approval for our response and recognition of our work!

审稿意见
8

This paper has studied a new problem for snapshot compressive imaging (SCI) by optimizing a cooperative network across different hardware configurations (coded apertures). A new hardware prompt learning module has been proposed and integrated into the FedAvg algorithm to enable co-optimizing multi-hardware and the global model for a computation imaging task. Extensive experimental results were provided on simulated and real data, compared with several federated baselines.

优点

1)It is interesting and practical to leverage a federated learning framework to address hardware shifts across different systems while preserving the privacy of each system’s local data. Plus, the paper has collected data from multiple real hardware systems to empirically validate the proposed method. 2)The proposed hardware prompt is a novel and efficient solution to mitigate data heterogeneity for developing deep SCI models in a federated learning framework, especially to enable co-optimizing multiple hardware and a global model across systems. A detailed ablation study has also been provided to clearly show the improvement given this prompt design. 3) A multi-hardware dataset has been collected and built for this new problem, which could broadly benefit the SCI community. Extensive experimental results on multiple settings were provided in terms of both quantitive and qualitative evaluation. 4)Several state-of-the-art federated learning methods have been developed for a computational imaging task and been involved in the experiment comparison.

缺点

1) While federated learning is a good choice, it remains unclear if the proposed problem setting can be directly solved by some other simple solutions, such as meta learning or deep ensemble. 2)Despite the large improvement given by the hardware prompt, it lacks further analysis of how this design works for different hardware. For example, will different hardware lead to different prompts? What these “hardware prompt” look like? Is the prompt network only implemented by an attention block?

问题

1)What are the benefits of introducing adaptors? Why not directly update the full model? 2)What’s the main reason for setting C=3 in the experiment? 3)In Eq (9), is there any other way to impose a prompt on the measurements? For example, can the concatenation operation be applied? 4)It would be better to directly explain the settings of different hardware shits in the captions of Table ½.

评论

We appreciate that the reviewer UbMX finds our method novel, solves a practical problem, also provides a new multi-hardware dataset!

Weakness 1: While federated learning is a good choice, it remains unclear if the proposed problem setting can be directly solved by some other simple solutions, such as meta learning or deep ensemble.

A1: One key reason we adopt federated learning is to solve the privacy concern. To the best knowledge, both meta learning and deep ensemble are centralized learning strategies and require seeing all of the data to train each client, which makes it hard to perform the hardware corporations among different institutions.

Besides, we additionally conduct a new experiment using meta learning. Specifically, we integrate MAML [1] into the federated learning framework, termed as FedMAML. We perform experiments using the same setting as Table 1, such that #client=3. As shown in Table R1 below, FedMAML method gives limited performance compared with FedAvg and the proposed FedHP.

MetricsFedMAMLFedAvgFedHP
PSNR29.00 ±\pm 1.4431.21 ±\pm 0.1031.35 ±\pm 0.10
SSIM0.8532 ±\pm 0.03040.8959 ±\pm 0.00170.9033 ±\pm 0.0014

Table R1: Comparison between FedAvg and FedHP, and FedMAML

Weakness 2: It lacks further analysis of how this design works for different hardware. For example, will different hardware lead to different prompts? What these “hardware prompt” look like? Is the prompt network only implemented by an attention block?

A2: Thanks for the detailed suggestion!

There will be only one prompt network obtained as a function to handle different input masks, as shown in Fig. 2. Thus, different input hardware will lead to different prompts, which will have the same dimensionality as the input mask but with different pixel values. The prompt network only contains one attention block, which per our observation, can effectively cooperate among clients.

Question 1: What are the benefits of introducing adaptors? Why not directly update the full model?

A3: Directly updating the full model can cause cumbersome computational cost and communication cost. As exemplified by Table 3, directly learning client backbones from scratch under a federated framework (FedAvg) can result in 1414 times training time. Together with the prompt network, adaptor can help enhance efficient fine-tuning performance of pre-trained backbones. As shown in Table 3, the adapter can bring 0.0160.016dB improvement in PSNR and 0.00370.0037 boost in SSIM.

Question 2: What’s the main reason for setting C=3 in the experiment?

A4: Thanks for the valuable question!

We previously collected 55 different real hardware instances. Considering the computational cost and our limited resources, we choose to adopt a number of 33 clients in the main table. In Table 4a, we also report results under more clients, such as C=44 and C=55. We are still working on collecting more real hardwares.

Question 3: In Eq (9), is there any other way to impose a prompt on the measurements? For example, can the concatenation operation be applied?

A5: Thanks for the insightful question!

Directly using the concatenation will cause the dimensionality inconsistent with the backbones and cannot be used. Besides, there is no learnable module for concatenation, which lacks flexibility in handling different data distributions.

Question 4: It would be better to directly explain the settings of different hardware shits in the captions of Table ½.

A6: Thanks for the useful suggestion!

We have added more explanations of the settings of different hardware shifts in captions of Table 1 and 2. Specifically, for Table 1, we provide the explanation that: For different clients, we sample non-overlapping masks from the same mask distribution to train the model and use unseen masks randomly sampled from all clients for testing. For Table 2, we provide the explanation that: Masks from each client are sampled from a specific distribution for training. We randomly sample non-overlapping masks (unseen to training) from all distributions for testing.

[1] Model-agnostic meta-learning for fast adaptation of deep networks. ICML 2017.

评论

Thanks for your responses. My concerns have been well solved, and thus I have no further questions.

评论

We appreciate the reviewer's recognition of our response and support for our work!

审稿意见
6

Motivated by recent success of Federated Learning (FL) and Prompt Tuning, this paper proposes a deep neural network framework, named as FedHP, that can take into account diverse sensor acquisitions for spectral snapshot compressive imaging (Spectral SCI). The primary distinction from existing FL methods lies in the inclusion of a measurement enhancement network that considers both degraded observations and the physical forward model pattern across clients. Experimental results demonstrate its effectiveness of FedHP on both simulation dataset and real-world SCI dataset.

优点

1), This paper is overall well written and easy to flow. It clearly introduces the motivation and problem formulation, making the method accessible to non-SCI experts.

2), Both simulation and real-world datasets are considered, making a better practical contribution.

3), The experimental comparison is comprehensive, and baseline methods are up to date.

缺点

1), The technical contribution to more general computational imaging seems to be limited or at least not well supported by this paper’s current state.

2), Likewise, the main deep learning technic behind this proposal, FedAvg, is already well known, which makes the technical contribution to deep learning community also marginal.

3), The idea of using another learning-based module that can consider additional forward-model settings seems not new to model-based deep learning methods for computational imaging. Moreover, it is difficult to evaluate the proposed “correction” module indeed robust to distribution shift. At least, there is no clear evidence presented in this paper.

问题

1), Figure 1. It is difficult to find differences between 4. FedAvg and 5. FedHP, the method instruction plot.

2), The authors did not discuss a lot about why their method robust to the codec pattern shift, both intuitively and theoretically. What if the new module ϕ\phi cannot handle very new coded aperture M\bf M?

评论

We appreciate that Reviewer 99wV finds our method easy to follow, provides comprehensive comparison, and is well presented!

Weakness 1: The technical contribution to more general computational imaging seems to be limited or at least not well supported by this paper’s current state.

A1: Thanks for the valuable comments!

Our key technical contribution is to provide a new multi-hardware optimization framework adapting to hardware shift by only accessing local data. The principle underlying the proposed FedHP can be potentially extended to broad SCI applications. However, due to the practical cost of data acquisition and building optics systems, this study explores one specific direction following previous works in the field, where we focus on spectral SCI and collecting optical masks and real data from multiple hardware.

Exploiting the hardware collaboration of computational imaging systems is still in an early stage. This work serves as a proof of concept to inspire future endeavors in a more general scope. We list several potential related applications that might benefit from the proposed method, such as Lensless camera [1], LiDAR [2], HDR camera [3], or CT-Reconstruction [4], cooperating multiple imaging systems via aligning forward models, etc. We have put this discussion into the related work.

Weakness 2: FedAvg, is already well known, which makes the technical contribution to deep learning community also marginal.

A2: Thanks for the useful comments!

We find that FedAvg serves as a very strong baseline by even working better than recent FL methods, such as SCAFFOLD and FedProx. Thus, it is non-trivial to achieve further performance boost. By comparison, FedHP provides an encouraging performance.

We kindly summarize the technical contributions of this work as follows:

  • We introduce a hardware prompt network to capture the hardware perturbations/replacement.
  • We improve the efficiency of the federated learning in SCI, achieving performance boost with much lower training time cost.
  • We collect and will release a heterogeneous dataset that covers multiple hardware configurations, which to the best knowledge, is the first one in SCI.

Weakness 3: The idea of using another learning-based module that can consider additional forward-model settings seems not new to model-based deep learning methods for computational imaging. Moreover, it is difficult to evaluate the proposed “correction” module indeed robust to distribution shift. At least, there is no clear evidence presented in this paper.

A3: Thanks for the valuable suggestion!

Firstly, we compared with a state-of-the-art model-based method of GAP-Net [5] in Table 4b. We find that the proposed method brings a notable performance boost (+0.28+0.28dB/0.00380.0038 in PSNR/SSIM) with a much smaller model size. Secondly, we provide empirical evidence that the proposed method can handle new masks from distinct distributions in Table 2. During training, masks from different clients are sampled from different distributions. During testing, we randomly sample non-overlapping masks (unseen to models) from different distributions of all clients.

Questions 1: Figure 1. It is difficult to find differences between 4. FedAvg and 5. FedHP, the method instruction plot.

A4: Thanks for the useful suggestion!

Fig. 1 aims to show different settings for different types of solutions. For federated learning methods including FedAvg and FedHP, we make the settings the same to avoid misunderstanding. We have made it clear by modifying the Fig.1 captions.

Question 2: Authors did not discuss a lot about why their method robust to the codec pattern shift, both intuitively and theoretically. What if the new module ϕ\phi cannot handle very new coded aperture M\mathbf{M}?

A5: Thanks for the valuable insight!

Intuitively, one of the key reasons why the proposed method can handle coded aperture shifts lies in the design of the hardware prompt learning model. The hardware prompt learning aligns the input data distributions, solving the heterogeneity rooted in the input data space. We have provided the above discussion in Section 3.3.

Besides, in Table 2, we provided the results that FedHP can handle very new coded apertures M\mathbf{M}. Specifically, mask distributions from different clients are drastically different (as the distributions we shown in Fig.1, also shown in the supplementary). The proposed method enables a significant performance boost over compared methods. For example, FedHP improves 0.350.35dB compared with FedAvg (Table 2).

[1] A simple framework for 3D lensless imaging with programmable masks. CVPR 2021.

[2] LiDAR-in-the-loop Hyperparameter Optimization. CVPR 2023.

[3] End-to-end high dynamic range camera pipeline optimization. CVPR 2021.

[4] DOLCE: A model-based probabilistic diffusion framework for limited-angle ct reconstruction. ICCV 2023.

[5] Deep unfolding for snapshot compressive imaging. IJCV 2023.

评论

Thank you to the authors for addressing my concerns regarding the general applicability of this proposal to computational imaging. I've no other questions. I'll increase my score by 1.

评论

We appreciate the reviewer's valuable comments. We thank the reviewer's recognition of our rebuttal!

AC 元评审

This Meta-Review is written by the Program Chairs.

The paper proposes FedHP, a federated learning-based approach for spectral snapshot compressive imaging (SCI) that includes a measurement enhancement network. It has been reviewed by four individuals, receiving mixed assessments, especially after reviewer calibration and downweighting of multiple inflated and non-informative reviews.

There were numerous concerns:

  • Limited Generalizability: Reviewers expressed concerns about the paper's limited contribution to general computational imaging and the marginal novelty in the deep learning community.

  • Uncertainty in Methodology: There is skepticism about the novelty and effectiveness of the learning-based module proposed for model correction, with a lack of clear evidence for its robustness to distribution shifts.

  • Experimentation and Comparison: While comprehensive, the experiments are deemed insufficient in some respects, lacking comparisons with a broader range of methods and datasets.

The author responses addressed these issues somewhat, but ultimately the paper was borderline, and the decision is to reject at this time.

为何不给更高分

See meta-review

为何不给更低分

N/A

最终决定

Reject