PaperHub
7.3
/10
Spotlight4 位审稿人
最低6最高9标准差1.3
8
6
6
9
4.3
置信度
正确性3.3
贡献度3.5
表达3.3
NeurIPS 2024

PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging

OpenReviewPDF
提交: 2024-05-03更新: 2024-11-06

摘要

关键词
Lensless Imaging; Computional Imaging; Generative Prior

评审与讨论

审稿意见
8

This paper addresses the challenge of spatially varying Point Spread Function (PSF) in lensless imaging and introduces a two-stage approach for reconstructing lensless images.

优点

This paper is well-written, presenting a detailed statement of the problem, the physical imaging model, an innovative reconstruction method, and thorough experimental results. I believe this paper will significantly impact the fields of both lensless imaging and the broader area of spatially-varying computational imaging, given its contributions to both the imaging model and reconstruction methods. The demonstrated reconstruction results are impressive, showing significant improvements over existing methods.

I believe this paper will make a significant contribution to both lensless imaging and the broader field of spatially varying imaging reconstruction. The proposed learnable spatially varying PSF will provide valuable insights for many other applications.

缺点

The only two weaknesses of this paper are: (1) the lack of real-world experiments, but this is mitigated by the use of a standard lensless imaging dataset for evaluation, and (2) the paper would benefit from incorporating more knowledge and discussion from the optics field to strengthen its soundness.

There are two suggestions to make this paper more robust: (1) Expand Fig. 4(b) and Fig. A1 to illustrate how PSFs change with different fields of view. (2) Expand Fig. 2 and Fig. 4(d) to better demonstrate the significance of considering spatially varying PSFs, possibly including pixel-wise error maps of the reconstruction results.

问题

  1. Fig. 4(a) should be revised. Light coming from an off-axis angle should not be shown as being ‘refracted’ to produce parallel output light, as this is inconsistent with the proposed spatially-varying PSF model. Instead, I suggest that the author illustrate two point sources: one originating from the axis with output also parallel to the axis (in cyan), and another from an off-axis point with output also at an off-axis angle (in blue).

  2. In Fig. 1(1), vignetting is observed, while the reconstructed images show no signs of vignetting. It would be better to explain in the paper how the vignetting effect was removed.

  3. It would be beneficial to provide more explanations from the optical perspective in Section 3.2 to strengthen the paper's soundness in optics. The paraxial imaging model (spatially invariant convolution) is commonly used to simplify the forward imaging model, but it is inherently inaccurate, particularly for large field-of-view imaging. It would be good to add references to corresponding off-axis wave propagation models, such as “Shifted angular spectrum method for off-axis numerical propagation” by Matsushima, “Modeling Off-Axis Diffraction with the Least-Sampling Angular Spectrum Method” by Wei et al., and “Shifted band-extended angular spectrum method for off-axis diffraction calculation” by Zhang et al.

  4. Spatially varying PSFs have also been used in other applications with improved results. Therefore, I think it is important to add references to these works. For example, “Aberration-aware depth-from-focus” by Yang et al. demonstrated improvement in depth estimation when considering spatially varying PSFs. “Correcting Optical Aberration via Depth-Aware Point Spread Functions” by Luo et al. proposed spatially varying PSFs for optical aberration correction and depth estimation. Additionally, “High-Quality Computational Imaging through Simple Lenses” by Heide et al. proposed spatially varying PSF imaging for simple lenses.

局限性

The limitations of this paper are well discussed.

作者回复

1. Real-world experiments

The DiffuserCam and PhlatCam datasets utilized in our study are collected from real-world lensless camera prototypes. These datasets are widely recognized and extensively used in lensless imaging literature, providing robust and reliable benchmarks for evaluation. Thus, while our study focuses on standard datasets, they effectively represent real-world scenarios and ensure a comprehensive evaluation of our proposed methods.

2. Discussion from the optics field

We appreciate the reviewer's suggestion to incorporate deeper insights from optics. We agree that a more thorough optical analysis can strengthen our model's foundation. In the revised manuscript, we will provide a detailed derivation of the imaging model mismatch from an optical perspective to enhance the theoretical basis of our approach.

3. Mitigating vignetting in WinnerDeconv outputs

The appearance of vignetting is primarily due to the limitations of the shift-invariant convolution model which motivates us to propose a spatial-varying formuation. Our method effectively addresses vignetting through two stages. First, the proposed SVDeconv component in the first stage corrects for the inaccuracies of the shift-invariant model, mitigating vignetting artifacts. Second, the subsequent neural network stage learns a natural image distribution that is typically devoid of vignetting. This observation is supported by the fact that even a trained U-Net architecture, as employed in FlatNet, exhibits the ability to reduce vignetting.

4. Revision on Fig. 4

We appreciate the reviewer's suggestion. Figure 4(a) has been revised to accurately depict the light propagation as suggested. The updated figure is presented in Fig. a of the rebuttal PDF.

5. Other comments

Thanks for the suggestions on adding references. We will incorporate the recommended references to enhance the discussion on the theoretical foundations and applications of spatially varying PSFs in the revised manuscript.

评论

Thanks a lot for your rebuttal.

审稿意见
6

This paper proposes a method for reconstructing photorealistic images in lensless imaging. The reconstruction process, which aims to be consistent with observations while achieving photorealism, is based on range-null space decomposition. To accommodate realistic cameras, the method introduces SVDeconv, which learns the deconvolution process with a spatially-varying PSF simultaneously. Additionally, the reconstruction in the null space is performed using a pretrained diffusion model. Quantitative and qualitative evaluations are conducted using two datasets, PhlatCam and DiffuserCam.

优点

  • For the generally challenging task of reconstructing high-frequency details in lensless imaging, introducing the concept of range-null space decomposition to achieve photorealistic and measurement-consistent image reconstruction is a very rational and technically sound approach.
  • The idea of using a generative approach solely for reconstruction in the null space, rather than relying entirely on generative priors for restoration, addresses the issue of hallucination. This approach showcases originality.
  • The effectiveness is also commendable as it achieves generally good results both quantitatively and qualitatively when compared with various other methods.

缺点

  1. There is a lack of detail regarding the training process, making it difficult to understand correctly. It is unclear whether fine-tuning is performed using the input images at the test time or the PSF and parameters are freezed using a pretrained network.
  2. The method of dividing the Spatially-Varying PSF (SV-PSF) into a 3×3 grid is somewhat naive. Particularly, I doubt that the spatial dependency of the PSF is also influenced by the target scene depth, which does not appear to be considered.
  3. The analysis of the results is also insufficient. It is unclear how close the estimated PSF is to the accurately calibrated SV-PSF. The comparison between the reconstructed images using the accurate SV-PSF and the proposed method is not discussed. Additionally, for the dataset used in the evaluation, the inference results of the range-null content are not provided (as shown in Figure A2). The data fidelity in the reconstructed images is also not confirmed.
  • Comment: In eq(4), N×NN\times N is used, while previously K×KK\times K was used.

问题

If there are any misunderstandings in the weaknesses pointed out, please clarify them.

局限性

Limitations have been addressed well.

作者回复

1. Details of the training process

In the first stage, SVDeconv is trained using range-space content derived from ground truth images. SVDeconv consists of two main parameter components: a learnable deconvolution kernel initialized with known PSFs, and a U-Net initialized with standard weights without pretraining. Once trained, SVDeconv processes input lensless measurements to estimate the range-space content of training samples. Subsequently, we use this estimated range-space content as input conditions for fine-tuning via null-space diffusion. During diffusion fine-tuning, we utilize a pre-trained diffusion model with frozen weights. We only train the supplementary conditioning modules like StableSR [35], to guide the reconstruction process effectively.

2. Variation of spatially-varying PSF related to the depth

In our lensless camera setup, the scene-to-camera distance significantly exceeds the sensor size (less than 1cm), typically around 40cm, which is practical for everyday capture scenarios. This distance effectively disregards spatial variance along the depth axis (from 30cm to infinity), treating the PSF at such distances as equivalent to that from an infinitely distant point source. Our simulations validate this by showing a 0.995 similarity score between the PSF of a light point at 30cm and 100cm distances. Such scenarios are widely accepted in the relevant literature [15,17,23,43] and align with the lensless imaging datasets used in our study.

Consideration of depth-dependent spatial variance in the PSF becomes critical only when the scene-to-camera distance is comparable to the camera size, typically between 1cm and 5cm. However, these scenarios require capturing scenes very close to our camera, which is beyond the scope of this paper. Future work will explore these 3D spatially varying PSF effects.

3. Comparison of reconstruction using accurate SV-PSF and our method

We conducted an additional experiment below, which shows that our method achieves comparable performance to SV-deconvolution methods when accurate PSFs are provided.

In real-world lensless camera datasets like the PhlatCam and DiffuserCam, accurately calibrated Spatially-Varying Point Spread Functions (SV-PSFs) for different incident angles are typically unavailable. To validate the effectiveness of our proposed method, we simulated a dataset using the simulated lensless camera discussed in our paper, incorporating known SV-PSFs. This dataset comprises 2000 images with 20dB noise. We evaluate four methods for SV-deconvolution: 1) spatially varying FISTA [a], 2) MultiWinnerNet [b] using a 3x3 grid of known PSFs, 3) our method using a single known PSF, and 4) our method using a 3x3 grid of known PSFs. Results in the following table demonstrate that our method achieves comparable performance to SV-deconv methods utilizing accurately calibrated Spatially-Varying Point Spread Functions (SV-PSFs).

MethodsPSNRSSIMLPIPS
Spatially-varying FISTA [a]24.190.7870.288
MultiWinnerNet [b] (1 known PSF)24.720.7960.273
MultiWinnerNet [b] (3x3 known PSFs)25.880.8320.261
Ours (1 known PSF)25.470.8110.265
Ours (3x3 known PSFs)26.020.8370.258

4. Data fidelity in the reconstructed range-space contents

For data fidelity in the reconstructed range-space contents, the results in Tab.2 of the original paper compare various deconvolution methods, showing that our approach achieves superior fidelity quantitatively. Reviewers can also refer to the qualitative examples in Fig. b in the rebuttal PDF for further confirmation.

5. Other comments

Thank you for pointing out the 𝑁𝑁×𝑁𝑁 typo, we will revise it.

[a]. Yanny, Kyrollos, et al. "Miniscope3D: optimized single-shot miniature 3D fluorescence microscopy." Light: Science & Applications 9.1 (2020): 171.

[b]. Yanny, Kyrollos, et al. "Deep learning for fast spatially varying deconvolution." Optica 9.1 (2022): 96-99.

评论

1. Details of the training process

Thanks to the authors for the clarification.

2. Variation of spatially-varying PSF related to the depth

I expect the authors to address in the final version that the spatial dependency of the PSF is negligible in the target application and is out of scope for the paper.

3. Comparison of reconstruction using accurate SV-PSF and our method

The additional results are not convincing for me. Can the authors clarify why the proposed method outperforms the SV-deconv methods utilizing accurately calibrated SV-PSFs? It might be inadequate to simply compare the PSNR of the final results. I believe that some analyses such as checking and compare infromative paring the range- and null-space contents."を"I believe that additional analyses, such as checking and comparing the range- and null-space contents, would be more informative.

4. Data fidelity in the reconstructed range-space contents

The description for Tab.2 in the original paper is insufficient. I appreciate the authors for providing the range space content in the rebuttal. It helps to clearly see the improvement achieved by the method.

评论

Thank you for your response, and we greatly appreciate your insightful comments and advice regarding our work.

1. Comparison of reconstruction using accurate SV-PSF and our method

We understand that the proposed deep-learning-based approach outperforms traditional SV-deconvolution methods like spatially-varying FISTA on lensless imaging due to the key differences in the image priors that these methods have incorporated. It is known that there is some high-frequency information (null space content in our paper) loss in the lensless imaging process. Therefore, even with accurate SV-PSFs, a traditional iterative optimization-based method for inverse imaging can only recover the range space of the original capture, often resulting in over-smoothed outputs. In contrast, a well-trained neural network can learn image priors to recover the original scene with both range space content and null space content, therefore achieving better results. Similar observations have also been shown in the MultiWinnerNet[a] paper.

To further show the effectiveness of the proposed method in recovering the range space content, we compare the range space content of the following methods: 1) spatially varying FISTA [a], 2) MultiWinnerNet [b] using a 3x3 grid of known PSFs, 3) MultiWinnerNet [b] using a 3x3 grid of known PSFs, 4) our method using a single known PSF, and 5) our method using a 3x3 grid of known PSFs. The following Tab. A shows that our method achieves comparable performance to iterative optimization-based SV-deconv methods utilizing accurately calibrated SV-PSFs.

Table A: Comparison of different methods on range space content reconstruction

MethodsPSNRSSIMLPIPS
Spatially-varying FISTA [a]30.600.9580.069
MultiWinnerNet [b] (1 known PSF)28.720.9310.074
MultiWinnerNet [b] (3x3 known PSFs)29.840.9650.052
Ours (1 known PSF)29.470.9520.061
Ours (3x3 known PSFs)29.980.9740.048

We further substantiate our observation by comparing the null-space content recovery capabilities of the above methods, highlighting the contrast between iterative optimization-based approaches and deep-learning-based methods. As illustrated in the table below, optimization-based methods struggle to recover null-space content compared to deep-learning-based methods, even when accurate SV-PSFs are used. Furthermore, the optimization-based method (15 seconds per image) is much slower than the deep learning approach (0.025 seconds per image), and obtaining precise calibration of SV-PSFs for lensless cameras in real-world conditions is very challenging due to uncontrolled environmental light noise.

Table B: Comparison of different methods on null space content recovery

MethodsPSNRSSIMLPIPS
Spatially-varying FISTA [a]16.920.3920.553
MultiWinnerNet [b] (1 known PSF)22.480.5790.270
MultiWinnerNet [b] (3x3 known PSFs)23.390.6080.249
Ours (1 known PSF)22.940.5940.265
Ours (3x3 known PSFs)23.680.6110.243

Meanwhile, we greatly appreciate the reviewers for reminding us to compare the range space and null space content for additional insights. This comparison enhances our understanding of the differences between iterative optimization-based methods and deep learning-based methods, while also highlighting our contribution of introducing analysis of lensless imaging through range-null space decomposition.

2. Variation of spatially-varying PSF related to the depth

Thanks for your advice, we will try to extend our work to the 3D-SVDecov scenarios by introducing 3D coordinates of the focus centers.

3. Data fidelity in the reconstructed range-space contents

We will improve the clarity of the description for Tab. 2 in the original paper.

审稿意见
6

This paper proposed a deep learning-based approach for lensless imaging. To address the problem of model mismatch that simple convolutional models cannot accurately describe the lensless imaging process, this paper introduced a spatially-varying devolution module that reweights the deconvolution results from multiple kernels using spaitally-varying weights. In addition, a two-stage model based on range-null space decomposition. In the first stage, a spatially-varying deconvolution network is designed to reconstruct the low-frequency content; The second stage uses the output of the previous stage as a condition to guide the pre-trained diffusion model in reconstructing fine high-frequency image details associated to the null space of the measurement operator. The experiments were conducted on two datasets: PhlatCam and DiffuserCam.

优点

  1. Introducing diffusion model to reconstruct fine details of high-frequency content associated to the null space the measurement operator.
  2. To some extent, the spatially-varying deconvolution solved the problem of model mismatch.

缺点

  1. Using diffusion model to recover the null-space related image information is not new. E,g,, [a] Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model. ICLR 2023.
  2. Lack of experimental or deductive analysis processes to demonstrate the effectiveness or accuracy of using range-null space decomposition to describe the process of lensless image restoration.
  3. The PSFs in lenless imaging are usually having a very large size (even larger than the image). Accordingly, the deconvolution kernels should be large. However, the deconvoluton kernel size seems small in the first stage.

问题

1 . Why to simulate the PSF that has already been calibrated in the dataset, and what effect does the simulated PSF have in the algorithm? 2. Section 3 mentions that model mismatch is mainly caused by the presence of the incident angle θ. However, in the first stage of weight calculation, the error caused by only considering distance. Why? 3. What exactly does spatially-varying represent? How is the FoV center of the learnable kernel determined in weight calculation? What does the distance from point (u, v) to the FoV center exactly mean? Illustration with one figure could facilitate the understanding. 4. How to set the size and quantity of the learnable kernels?

局限性

The paper mentioned several limitations, such as computational cost due to introducing diffusion-based sampling and two-stage processing, as well as false details introduced by the diffusion model.

One possible limitation that is not mentioned in the paper is, the two-stage framework may have the robustness issue, the error from the first stage may affect the accuracy in the second stage, compared to an iterative framework. I suggest the authors to have a discussion on this issue.

作者回复

1. Comparison with DDNM

Our paper presents notable advancements compared to DDNM, in both the reconstruction quality and inference speed. Training-free methods like DDNM depend on an accurate imaging model to recover null space, but acquiring such a model for lensless imaging is difficult. In contrast, our null space diffusion learns also utilizes real-world training data pairs, improving its performance on real captures where the imaging formulation model is imperfect. As demonstrated in Tab. 3 and Fig. 9 in the original paper, our approach consistently outperforms DDNM on real captures. Additionally, DDNM requires 3x more time than ours for the reconstruction Under lensless imaging setting. Unlike DDNM, which requires complex calculations at each sampling step to maintain consistency with the original measurement, our feed-forward model minimizes additional computational overhead while delivering superior results. Moreover, we introduce a novel imaging model that enhances the accuracy of estimating range-space content, crucially improving the fidelity of final reconstruction.

2. Effectiveness of range-null space decomposition

In Tab. 4 and Fig. 10 of the original paper, we conduct a comparative analysis of our diffusion model under various conditions. Specifically, for the SVD-OC method, we utilize a similar deconvolution approach to recover the original content, contrasting with our method that incorporates range-null space decomposition. The results consistently demonstrate our method's superiority over SVD-OC, highlighting the clear advantage of integrating range-null space decomposition in enhancing lensless image restoration.

3. Clarification about deconvolution kernel size

The deconvolution kernel size is as large as the PSF size of the lensless camera. For example, in the PhlatCam dataset (1280 × 1480, same as the lensless measurement), and the DiffuserCam dataset (540 × 960, larger than the original lensless measurement). For DiffuserCam, we employ replicate padding to align the PSF size with the padded measurements, as detailed in our paper. This ensures our method effectively handles the large PSF sizes typical in lensless imaging, maintaining accuracy in image restoration.

4. The function of the simulated lensless camera and PSF

The simulation of the lensless PSF is solely used to demonstrate the mismatch in widely used spatial invariant convolution models. We do not use the simulation in our lensless imaging experiments. For lensless datasets captured in the real world we used, like the DiffuserCam and PhlatCam datasets, we only have a calibrated PSF in the center (zero angle of incidence light). Therefore, we use the calibrated PSF to initialize all the learnable deconvolution kernels.

5. Relation between the incident angle and the FoV center

In a lensless setup with a 2-dimensional imaging plane, the Huygens-Fresnel principle [9] establishes a direct relationship between the incident angles (θ\theta, ϕ\phi) and the center shift of the PSF — known as the focus center. Specifically, if the PSF center for zero angles of incidence light is located at (cxc_x, cyc_y), then for incident angles (θ\theta, ϕ\phi), the PSF center approximately shifts to (cxdsinθc_x - d \sin \theta, cydsinϕc_y - d \sin \phi) according to the Fresnel propagation approximation [9], where dd denotes the mask-sensor distance. For detailed derivation, please refer to the Phlatcam paper [5], which we will further elaborate upon in the revised version. Reviewers can also refer to Fig. a in the rebuttal PDF for an illustration. This approach allows us to model the PSF variation across different focus centers on the imaging plane, reflecting the variation due to incident angles.

In our formulation, the center of the learnable kernel corresponds to a specific incident angle as well as coordinates on the imaging plane. We define this specific incident angle as the FoV center of the learnable kernel and the coordinates as the focus center of the kernel. Therefore, the weights can be determined by the distance between the pixel coordinates (u, v) and the focus center.

6. Robustness of the two-stage Framework

We appreciate the concern about the potential robustness issues of a two-stage framework. Actually, our training scheme is designed to mitigate errors introduced by inaccurate image formation in the first stage. Our range-space reconstruction in the first stage is a deterministic process, focused on recovering information directly observable from the lensless measurement. This reduces the risk of introducing artifacts or errors that would hinder the subsequent diffusion process. Additionally, our SVDencov component is specifically designed to handle the challenges of lensless image reconstruction, improving the accuracy of the first stage. We conducted ablation studies (Tab. 4 and Fig. 10) comparing different intermediate outputs, demonstrating that using the full estimated image reconstruction as input to the diffusion model can indeed amplify first-stage errors. But our range-space-based approach alleviates this issue. This is because recovering range-space content is easier than the original content in the first stage, therefore we achieve better data fidelity and fewer artifacts. Fig. b in the rebuttal PDF also shows the data fidelity of our first stage.

Moreover, iterative frameworks that project the output onto the lensless measurement space generally rely on precise imaging models, which are difficult to establish accurately in the context of lensless imaging. However, inaccurate modeling may introduce additional errors in the reconstruction process. Moreover, iterative methods often require more computational resources, such as DDNM and similar approaches.

Given these considerations, our method achieves a balanced trade-off between performance and efficiency.

7. Choice of the number of deconvolution kernels

Please refer to the reply to Reviewer FAra (R1).

评论

Thanks for the rebuttal which has addressed most of my comments. I would like to raise the score to Weak Accept. One limitation is the time complexity shown in the response to Reviewer FAra, mainly caused by the use of diffusion models and much higher than CNN-based methods. I understand this is the common limitation in existing diffusion-based methods, and hope that it could be addressed in future work.

评论

Dear Reviewer Ernw:

Thank you for acknowledging our explanation and raising the initial rating. We would like to respond if you have any further concerns.

审稿意见
9

This paper proposes a novel two-stage approach for lensless image reconstruction. The approach ensures data consistency with a spatially varying deconvolution method and enhances photorealism using pre-trained diffusion models. This method outperforms existing techniques in data fidelity and visual quality on PhlatCam and DiffuserCam systems.

优点

  1. The biggest strength of this paper lies in the quality of results displayed. The reconstructed images closely resemble the ground truth and are structurally very similar, unlike some of the other comparable methods.
  2. Extensive evaluation has been performed. The method has been tested on two different lensless imaging datasets, which helps build confidence. The method outperforms peers on 6 difference metrics, 3 for quality and 3 for photorealism. The evaluation has been conducted on both range space recovery method and null space recovery method, demonstrating the merit of this approach.
  3. Great clarity has been provided around the range space - null space decomposition, including mathematical derivations.
  4. The method seems reproducible because of the great details provided in the paper, for both range space and null space reconstruction.

缺点

  1. The paper mentions usage of 3X3 PSF kernels for Deconvolution. They arrive at this conclusion using experiments with different sizes. However, no mathematical reasoning or intuition has been provided as to why this is a good choice, and under what circumstances, it will break down. Without this explanation, it is very difficult to reuse the same model for a dataset captured with a different lenseless system. More work/reasoning is needed to support this choice of sampling.

问题

  1. This is minor, but why is Wiener Doconvolution referred to as WinnerDeconv everywhere in the paper?
  2. Although the paper compares the proposed approach with several other methods for qualitative evaluation, it is unclear how these approaches compare when it comes to computational complexity. Typically, there is a trade-off between quality and compute. Therefore I would like to see a comparison of the computational complexity involved in reconstruction.
  3. I would recommend citing the following paper: V. Boominathan, J. T. Robinson, L. Waller, and A. Veeraraghavan, “Recent advances in lensless imaging,” Optica 9(1), 1 (2022). This paper gives an overview of the common modulation schemes used for lenseless imaging, and I believe your approach is only applicable for phase modulation masks and not for amplitude modulation masks?

局限性

  1. Although the authors mention that real-time reconstruction is not possible, it is unclear how much time reconstruction actually takes. It would be great if this is explained as well.
作者回复

1. Choice of the number of deconvolution kernels

The rationale for using 3x3 PSF kernels is based on the assumption that the central PSF effectively represents the region of the field of view (FoV) where PSFs change smoothly, which is typical in lensless settings. The 3x3 choice balances performance and cost: by discretely sampling a small number of the central region and their corresponding PSFs, we approximate the continuous PSF across the entire FoV while minimizing computational and memory costs.

Furthermore, opting for an odd number of PSFs (in both width and height dimensions) is recommended when employing a single calibrated PSF at the center of the FoV (θ\theta = 0). This choice ensures that the initialized deconvolution kernel at the FoV center aligns accurately with the calibrated PSF, establishing a reliable starting point for the deconvolution process. In contrast, using an even number of kernels may result in inaccurate initializations across all deconvolution kernels, potentially compromising the effectiveness of the deconvolution. With an odd number, at least one correct initialization can be assured.

2. Comparison of the computational complexity

MethodFlatNet [15]Le-ADMM-U [23]DDNM [37]SVDeconv (first stage)Null-space diffusion (second stage)PhoColens (two stages total)
Inference time (in sec)0.0130.0473.4470.0250.7810.806

We evaluated the computational efficiency of the proposed method on a machine equipped with Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz and RTX 4090 GPU. The above table presents the Inference time results. Future work will explore optimal trade-offs between computational speed and performance for various application scenarios.

3. Usage of WinnerDeconv

Thanks for the reminder! We agree that we should use the correct term 'Wiener Deconvolution' in the main context to ensure clarity and accuracy and we apologize for any confusion caused.

4. Phase modulation masks v.s. amplitude modulation masks

Our model is applicable to all lensless cameras using a convolution imaging model, whether they employ phase modulation masks or amplitude modulation masks. However, for amplitude modulation lensless cameras like Flatcam [3], which utilize a separate imaging model, further exploration is needed to assess the effectiveness of the proposed method.

5. Other comments

In the revised version, we will include a citation to the paper 'Recent Advances in Lensless Imaging,' which contributes to our research by offering a comprehensive review of the definition and evolution of lensless imaging.

作者回复

We sincerely appreciate the time and effort dedicated by the ACs and reviewers in evaluating our work. We have carefully considered all comments and suggestions, and our detailed responses can be found in the rebuttal box below.

最终决定

The paper received highly positive reviews, which became even stronger after the authors' rebuttal. Reviewers appreciated quality of results (4jsT, pMG3, FAra), the novelty (4jsT, pMG3), thorough evaluation (4jsT, FAra), and clear presentation (4jsT, FAra). Although there were initial concerns about missing important details and a lack of analysis, these issues were effectively addressed in the authors' rebuttal.

After carefully reviewing the paper, the reviews, and the authors' rebuttal, the AC agrees with the reviewers' highly positive consensus and therefore recommends the paper for acceptance (spotlight).

For the camera-ready version, the authors should ensure to incorporate all discussions from the rebuttal into the main paper and supplementary materials. The specific changes that need to be implemented are:

1. Improved presentation: Include a discussion from the optics field (4jsT) and a discussion on the variation of the spatially-varying PSF related to depth (pMG3). Revise Figure 4 to accurately illustrate light propagation (4jsT) and improve Table 2 for data fidelity in the reconstructed range-space contents (pMG3).

2. Additional experiment: Add a comparison of reconstruction using the accurate SV-PSF and the proposed method (pMG3).

3. Other corrections: Include additional references (4jsT, FAra) and correct the typo from "Winner" to "Wiener" (FAra).