PaperHub
7.5
/10
withdrawn4 位审稿人
最低6最高8标准差0.9
6
8
8
8
3.8
置信度
ICLR 2024

Training-free Linear Image Inversion via Flows

OpenReviewPDF
提交: 2023-09-20更新: 2024-03-26
TL;DR

We propose a training-free method for image inversion using pretrained flow models trained via flow matching.

摘要

关键词
inverse problemslinear image inversioncontinuous normalizing flowsflow matchingdiffusion models

评审与讨论

审稿意见
6

The paper proposes a flow matching model (or equivalently, diffusion model-based) based inverse problem solver. Unlike previous diffusion model-based approaches which are based on stochastic samplers, the proposed method is based on ODE integration. Π\PiGDM approximation is used to estimate the time-dependent log likelihood is used. The method is shown to outperform previous arts (Π\PiGDM, RED-Diff) on some datasets: ImageNet-64/128, and AFHQ-256.

优点

  1. To the best of my knowledge, this work is the first to use flow matching models (although they can be thought of as equivalent to diffusion models) to solve linear inverse problems.

  2. The method is easy to understand, given that it builds on the prior approximation of Π\PiGDM.

缺点

  1. (Limited contribution) As stated in S1, the proposed method boils down to Π\PiGDM that uses ODE sampler, rather than the stochastic DDIM sampler (η=1.0\eta = 1.0) that Π\PiGDM uses. The step size derived from Lemma 2 is also another way of stating what was already shown from DDRM and Π\PiGDM.

  2. (Choice of test dataset) On most of the diffusion model-based inverse problem solvers, testing is performed on two canonical datasets: FFHQ 256×\times256 and ImageNet 256×\times256. These two datasets are typically more challenging than the datasets that are used in this paper. It is hard to convince the strength of the paper when there is no specific reason to deviate from this standard. The method would be more convincing if the same superiority can be seen in such largely used benchmarks.

  3. (Exposition) In the preliminaries, it is unclear why one has to start from conditional diffusion models and conditional flow models, when at the end of the day, unconditional models will be used to try to sample from the posterior distribution p(x1y)p(\mathbf{x}_1|\mathbf{y}).

问题

  1. The authors repeatedly use the term image inversion throughout the work. Instead, I would advise to use the term inverse problems in imaging, which is a standard term that has been used for decades.

  2. Is Algorithm 1 different from Π\PiGDM other than the fact that it is an ODE sampler? Specifically, assume that η=0.0\eta = 0.0 from the Π\PiGDM DDIM sampling. Are these two equivalent in this case, given that VP-ODE is used?

  3. (Implementation details) It is unclear why the authors trained a class conditional model, where the standard is to use class unconditional models. The former is known to perform better than the latter.

  4. Why does VP-ODE perform (marginally) better than OT-ODE? The theory of flow matching models would state otherwise.

  5. (pg 9. last paragraph) Did you mean to cite DPS when you mention non-linear observations? PSLD would not be able to solve non-linear inverse problems due to the gluing term used in the paper.

评论

Key contributions

We identify that there is a gap in the literature for solving inverse problems via flow models. Flow models trained via flow matching naturally consider a broader range of probability paths than diffusion models, and therefore, leveraging them opens the possibility to use optimal transport paths for solving inverse problems.

  • Our primary methodological contribution is Theorem 1, which provides the mathematical expression to correct the vector field v^(x)\hat{v}(x) of unconditional flow models with affine Gaussian probability paths. This allows us to retrieve the conditional vector field v^(x,y)\hat{v}(x,y) that can sample from the correct conditional posterior distribution q(xty)q(x_t|y).

  • We utilize Theorem 1 to propose an algorithm (See Algorithm 3) for solving linear inverse problems via flows with arbitrary affine Gaussian probability paths. We utilize Π\PiGDM correction to estimate the intractable term xtlogq(yxt)\nabla_{x_t} \log q(y | x_t) that arises in our expression to correct the unconditional vector field.

  • We empirically demonstrate that conditional OT paths outperform diffusion probability paths in solving linear inverse problems on a broad range of datasets (ImageNet 64/128 and AFHQ 256) and tasks (Gaussian deblurring, super-resolution, denoising and inpainting).

  • As pretrained flow models are not widely available, we offer a way to convert between flow models and diffusion models, including crucially between arbitrary Gaussian probability paths (Lemma 2). This allows applying our OT-ODE method to diffusion models trained under any affine Gaussian probability path, as demonstrated by our results using our VP-SDE model checkpoint. Algorithm 1 in the main paper outlines our method for solving linear inverse problems via conditional OT probability paths with a pretrained diffusion model. Additionally, Algorithm 2, found in the supplementary, enables use of pretrained diffusion models for solving linear inverse problems via flows with any affine Gaussian probability paths.

In summary, our work offers a unifying algorithm for solving linear inverse problems, applicable to a broad range of flow models trained via flow matching as well as any existing pre-trained continuous-time diffusion models.

Distinction of Lemma 2 from prior works: Lemma 2 provides a way to translate between arbitrary affine Gaussian probability paths qq and qq’. This involves two key steps: 1) Evaluating the time where the Signal-to-Noise Ratio (SNR) of the two probability paths qq and qq’ match. 2) Using the pretrained flow model/diffusion model to predict x^1\hat{x}_1, where the model uses the converted time from step (1) and the noisy image is also rescaled (See Eq. 9 for the scaling factor.).

This is very different from the limited method followed in Π\PiGDM and DDRM, which converts VP-SDE (pt(xtx0)=N(αtx0,(1αt)Ip_t(x_t|x_0) =\mathcal{N}(\sqrt{\alpha_t}x_0, (1 - \alpha_t)I) to VE-SDE (pt(xtx0)=N(x0,σt2I)p_t(x_t|x_0) = \mathcal{N}(x_0, \sigma_t^2I)), where it is required that αt=1/(1+σt2)\alpha_t = 1 / (1 + \sigma_t^2) (see Appendix B of the DDRM paper). In other words, with a given VP-SDE model, one can only utilize the past method to convert to VE-SDE with this particular noise schedule σt\sigma_t. This requirement in fact insists that the SNR is equal at time tt under the two paths and hence leads to t==tt==t’.

Our approach on the other hand does not require this equal time limitation and works for any arbitrary pair of affine Gaussian probability paths including for example when σt2\sigma_t^2 does not satisfy the requirement used by Π\PiGDM and DDRM. Depending on αt\alpha_t and σt\sigma_t, our method results in different values of tt and tt’ when converting VP-SDE to VE-SDE, and vice versa. Further, conversion between the probability paths of OT-ODE and VP-ODE requires our formulation in Lemma 2 that utilizes different times. Lemma 2 applies to any pair of affine Gaussian probability paths, not a particular pair of paths that satisfies t==tt==t’ through an unnecessary requirement that SNRs match at all times.

评论
  1. Choice of test datasets: Due to ethical and privacy concerns associated with the use of real people’s faces in both FFHQ and ImageNet, we opted to use AFHQ, which has animal faces instead of FFHQ as a close relative, and face-blurred ImageNet. Because we needed to train models from scratch on these alternative datasets with better privacy, we used 128X128 images to save compute. We also include results on AFHQ 256X256. Overall, we include results on three different image resolutions (64X64, 128X128, and 256X256) and our results are consistent across datasets and tasks.

  2. Exposition: Our theoretical derivations require an understanding of what the optimal solution should be for conditional diffusion and flow models, so that we can understand how to convert unconditional diffusion and flow models into meeting those expectations. Our approximation is about reducing the gap between the unconditional model and the conditional model that we would have liked to train. It would not be possible to write about the optimal conditional vector field without this additional notation and background.

  3. Replace the phrase 'image inversion' with 'inverse problems for imaging' in the paper: Thank you for this suggestion. We have updated our paper to use inverse problems for imaging instead of image inversion. As per ICLR policy, we cannot update the title of the paper during discussion phase, so we leave the original title unchanged.

  4. Implementation details: We believe that the standard of using unconditional models does not correspond well to practice. Image generative models are often conditioned on text as well as other inputs. Our empirical results, by using class-conditional models, are therefore closer to practice which we believe increases their relevance. We did explore unconditional models at lower resolutions, and found similar trends in the results. Finally, all the methods in our paper used exactly the same model checkpoint, and therefore, there was no unfairness.

  5. Performance of VP-ODE against OT-ODE: We find that our approach for solving linear inverse problems with OT-ODE performs better than VP-ODE on both the model checkpoints. We note that it is not completely fair to compare across our two model checkpoints, as one of the models could have been trained for longer. We believe that for a fair comparison, we should only compare different methods for solving inverse problems on the same model checkpoint.

  6. Clarification about the page 9 last paragraph citation: Here, we meant that linear image restoration problems become non-linear when applied to latent diffusion models. It is indeed true that the observations yy themselves are still linear in the pixel space, but the observations yy can be considered non-linear observations with respect to the latent space used in the latent diffusion. We have updated the paper to make this clarification.

We are happy to answer any follow up questions.

评论

Thank you for your thoughtful feedback on our work. We have tried our best to address your questions and concerns below. We are certain that our method is sufficiently distinct and hope our response clarifies its novelty and significance sufficient for acceptance.

  1. We first emphasize that our method for solving linear inverse problems, as presented in Algorithm 1, is not Π\PiGDM that uses a VP-ODE (DDIM) sampler. We expand upon and enumerate the differences below. Instead, Algorithm 1 outlines our method for solving inverse problems via conditional Optimal Transport (OT) paths with a pretrained diffusion model. As demonstrated by our experiments, images restored via the conditional OT probability path outperform both VP-ODE paths and Π\PiGDM. We also emphasize Lemma 2 is not equivalent to the previous results on scaling intermediate noisy images xtx_t in DDRM and Π\PiGDM.

Key differences between our method and Π\PiGDM

  • Different sampling paths for solving inverse problems: Our algorithm employs conditional OT probability paths as opposed to VE-SDE/VP-SDE probability paths employed by Π\PiGDM for solving linear inverse problems. VE-SDE and VP-SDE models are characterized by specific schedules for αt\alpha_t and σt\sigma_t, and these parameters are coupled through the use of a βt\beta_t schedule. In contrast, the OT schedule uses functions for αt\alpha_t and σt \sigma_t that cannot be modeled by either VE-SDE or VP-SDE. This key distinction in our approach of using conditional OT probability path enables our method to outperform diffusion paths across a wide range of linear inverse problems and datasets, as demonstrated in our paper.

  • Early starting: Our algorithm starts sampling from time t>0t > 0 while Π\PiGDM starts at t=0t=0 (Gaussian noise). As can be seen in the supplementary material in Figure 7 and Figure 8, early starting is beneficial to our method, but not beneficial to Π\PiGDM as can be seen in Figure 28.

  • Reduced hyperparameter tuning: The use of conditional OT path significantly reduces the amount of manual hyperparameter tuning. The only hyperparameter in our case is the choice of start time, that we keep constant for all tasks and datasets.

    • Our method exhibits more stable performance across a wide range of NFEs compared to both VP-ODE and Π\PiGDM. Further, our method outperforms both VP-ODE and Π\PiGDM at a smaller number of function evaluations (NFEs) as shown in Figure 6.

    • In contrast to Π\PiGDM, our method does not incorporate any guided weights for solving linear inverse problems via conditional OT paths. We find that the guidance weights used in Π\PiGDM hurt performance on conditional OT paths.

  • Our algorithm focuses on correcting the vector fields of flow models, as opposed to the correction of intermediate noisy images typically done by diffusion models.

  • Experiments with our VP-ODE method, an alternative to our OT-ODE also introduced in our paper, could be considered similar to that of the Π\PiGDM algorithm with η=0.0\eta=0.0. We point out though that Π\PiGDM explored different values of η\eta in their Appendix (see their Table 6 and 7) and found that η=1.0\eta=1.0 was superior at 100 NFE’s, which is the number of NFE’s used by Π\PiGDM in our paper. This is completely different from our results where our VP-ODE method performed better than Π\PiGDM with the recommended η=1.0\eta=1.0 for solving noisy linear inverse problems. So while there are similarities, our VP-ODE method and Π\PiGDM at η=0.0\eta=0.0 are different and not the same as suggested by the referee at η=0.0\eta=0.0. Further, we emphasize that our recommended OT-ODE method is even more conceptually different and performs even better than our VP-ODE method.

Overall, our algorithm for solving linear inverse problems works for flow models with arbitrary affine Gaussian probability paths (Algorithm 1,2,and 3) while Π\PiGDM is specifically designed for VP-SDE/VE-SDE paths.

评论

Reviwer pako,

Since the authors' had submitted the rebuttal with detailed answers to your concerns, and your original score was diverged from the others, could you please reply to the rebuttal and share your opinion?

Thanks, AC

评论

I would like to thank the authors for clarifying most of the concerns. I have updated my score accordingly. Apologies for the late update.

审稿意见
8

This paper explores the application of flow models to linear inverse problems. It specifically extends recently introduced techniques from diffusion models to flow models and offers a method for converting diffusion models into flow models. The paper also conducts a comprehensive set of experiments to validate these propositions.

优点

While many papers have recently concentrated on approaches that utilize diffusion models for inverse problems, there appears to be a noticeable gap in the literature concerning the application of flow models in this context. This paper aims to bridge that gap. The extensive numerical results appear to suggest that their approach outperforms diffusion models, although it's worth noting that, at times, distinguishing differences or identifying major issues in the produced images can be challenging.

缺点

The paper's main weakness lies in its writing. It appears that Section 2, labeled 'Preliminaries,' and the rest of the paper are written with the assumption that readers are already well-versed in flow-based models trained with flow matching. This might create challenges for readers who are not familiar with this background. Considering the relative novelty of flow matching, particularly in comparison to diffusion methods, it would be beneficial for the authors to provide some background information before transitioning to the conditional case

问题

  • Could the authors offer some insight into why having probability paths that are straighter than diffusion paths might be advantageous for reconstruction in inverse problems?
评论

Thank you for your encouraging comments on our work. We have tried our best to address your questions and concerns below:

  • Additional background on flow matching: We tried to include as much background information on flow matching as possible in the main paper but due to the constraints of space, we were unable to include extensive details. We will update our paper to include additional background information on flow matching in the Supplementary material.

  • Why straighter probability paths might outperform diffusion paths? If we trained a conditional flow matching model, we anticipate that straighter probability paths would lead to simpler training and better sampling. In the training-free case, we were motivated to consider conditional OT probability paths based on this intuition from the training perspective. Aligned with this, we observe straighter paths might potentially help with reducing the number of model evaluations needed for image restoration. As shown in Figure 6 in the Appendix, conditional OT paths have more stable performance than diffusion paths across a wide range of NFEs. Further, we achieve good performance in as few as 20 NFEs, and outperform both Π\PiGDM and VP-ODE sampling.

We are happy to answer any follow-up questions about our work.

评论

Dear Reviewer 4dCg,

We have updated our paper to include additional background information on flow matching. These details can be found in the Appendix F.

Please reach out to us if you have any further questions.

评论

Dear authors,

Thank you for the clarifications and for incorporating Appendix F into the paper. In light of these changes, I will revise my initial score.

However, to facilitate a clearer connection between the main text and the added material, I recommend including a reference to Appendix F at the beginning of Section 2. This would enhance the coherence of your paper and assist readers in navigating the supplementary content.

评论

Dear Reviewer 4dCg,

Thank you for increasing you score. Your feedback is greatly appreciated and we will include a reference to Appendix F at the beginning of Section 2.

审稿意见
8

They propose a training-free method for image inversion using pretrained flow models. Their approach leverages the simplicity and efficiency of Flow Matching models, significantly reducing the need for manual tuning. They draw inspiration from prior gradient correction methods and conditional Optimal Transport paths. Empirically, their flow-based method improves upon diffusion-based linear inversion methods across various noisy linear image inversion problems on high-dimensional datasets.

优点

  • Theoretical analysis is conducted based on linear corruption regarding conditional flow matching.
  • Extensive experiments are performed to validate the proposed method from various perspectives.

缺点

  • I wonder whether the linear inversion in Sec3 can handle a wide range of data corruption in the real world.

I believe this paper adequately addresses the problem and conducts various experiments. Overall, I think it is a good paper and I would like to accept it. Honest speaking, I am starting to use it as my new baseline for another submission.

问题

as above.

评论

Thank you for your extremely positive feedback on our work! We are excited to know that you are already using our method as a baseline!

As the reviewer has rightly pointed out, it is indeed true that linear inverse problems cannot handle many data corruptions of the real world and we leave this extension as future work. We however note that our method can handle composition of linear corruptions. Further, we list potential directions to extend this work to non-linear observations in the conclusion. Finally, we emphasize that a strength of our method is that it is simple and requires essentially no hyperparameter tuning to get good results.

审稿意见
8

This paper proposes a training-free method for linear image inversion using pretrained flow models. The proposed method has theoretically-justified weighting schemes and thus require less hyperparameter-tuning. The authors show effectiveness of the proposed method on common high-dimensional datasets and compare to prior diffusion-based linear inversion methods.

优点

  1. The paper is clear and organized well. The visualized results are impressive.
  2. The proposed method is well-motivated and is justified theoretically. Requiring less hyperparameter tuning is an important practical advantage for inverse problems.
  3. Experiments is generally thorough and solid. Results reported on ImageNet and AFHQ demonstrated the effectiveness of the proposed method comparing to prior works.

缺点

[Minor]

  1. In the paper, it seems DPS is assumed to have worse performance than ΠGDM as "ΠGDM is improved upon DPS". Empirically, is it also observed that DPS has worse performance than ΠGDM and the proposed method?
  2. In Figure 2, the x-axis seems to be not continuous, but adjacent points are connected and interpolated with lines, which might be not intuitive.

问题

In Figure 5, ΠGDM shows green-dot artifacts, is this common for ΠGDM or rare cases?

评论

Thank you for your encouraging feedback on our work! We are happy to know that you find our paper well-motivated and theoretically justified, and that our experiments are thorough. We have tried our best to answer your questions below:

  1. DPS works well with a large number of NFEs (~1000 NFEs) but performs worse than Π\PiGDM with few NFEs (Table 10, 11, 12 in the original Π\PiGDM paper). As we are working in the regime of few NFEs, we use Π\PiGDM which has stronger performance. Furthermore, Π\PiGDM and DPS are closely related as their approximations differ by a scaling factor.

  2. We agree that the styling choice for plotting figures might seem counterintuitive, and bar plots might be a more reasonable choice. We decided to plot it in this way because we felt the results are easier to interpret. The corresponding bar plot for Figure 2 would have 13×4=5213 \times 4=52 vertical bars in each plot which can be tricky to interpret. On the other hand, in the current plot, it is relatively easy to interpret that the red line, that corresponds to OT-ODE, outperforms other methods by always being above (or below) the competing methods.

  3. We find that green-dot artifacts mostly occur in few images in Π\PiGDM for noisy inverse problems but are relatively uncommon in the noiseless case. We presume that these dots occur because the model samples outside the support of images while solving the inverse problem.

Again, thank you for your thoughtful questions and we are happy to answer any further questions on our work.