PaperHub
5.6
/10
Poster5 位审稿人
最低5最高6标准差0.5
5
6
6
6
5
3.0
置信度
正确性3.0
贡献度2.6
表达3.0
NeurIPS 2024

Light Unbalanced Optimal Transport

OpenReviewPDF
提交: 2024-05-14更新: 2025-01-16

摘要

While the continuous Entropic Optimal Transport (EOT) field has been actively developing in recent years, it became evident that the classic EOT problem is prone to different issues like the sensitivity to outliers and imbalance of classes in the source and target measures. This fact inspired the development of solvers that deal with the *unbalanced* EOT (UEOT) problem $-$ the generalization of EOT allowing for mitigating the mentioned issues by relaxing the marginal constraints. Surprisingly, it turns out that the existing solvers are either based on heuristic principles or heavy-weighted with complex optimization objectives involving several neural networks. We address this challenge and propose a novel theoretically-justified, lightweight, unbalanced EOT solver. Our advancement consists of developing a novel view on the optimization of the UEOT problem yielding tractable and a non-minimax optimization objective. We show that combined with a light parametrization recently proposed in the field our objective leads to a fast, simple, and effective solver which allows solving the continuous UEOT problem in minutes on CPU. We prove that our solver provides a universal approximation of UEOT solutions and obtain its generalization bounds. We give illustrative examples of the solver's performance.
关键词
unbalanced optimal transportlight solverentropy regularizationgenerative modeling

评审与讨论

审稿意见
5

This paper proposes U-LightOT, a lightweight solver for the Unbalanced Entropic Optimal Transport (UEOT) problem. This method uses a Gaussian Mixture approximation for the potential vθ(y)v_{\theta}(y) and measure uw(x)u_{w}(x). This paper proves that under this approximation, the KL divergence to the ground truth UEOT plan has a tractable form. U-LightOT is evaluated on Gaussian Mixture and Unpaired Image-to-Image Translation tasks.

优点

  1. This paper provides a theoretical analysis of the generalization bounds and the universal approximation property for the Gaussian mixture parametrization.
  2. The proposed method is a lightweight solver for the UEOT problem, which requires several minutes of CPU training for the experiment in Sec 5.
  3. This paper is easy to follow.

缺点

  1. The optimization objective and Gaussian Mixture approximation in Sec 4 are similar to [1].

  2. While this paper provides the universal approximation property for Gaussian Mixure approximation, I have concerns about whether this Gaussian Mixture parametrization can achieve decent results for more complex distributions, such as in generative modeling within the data space on CIFAR-10.

[1] Korotin, Alexander, Nikita Gushchin, and Evgeny Burnaev. "Light schr" odinger bridge." ICLR 2024.

问题

  1. In the Unpaired Image-to-Image Translation task, Table 2 only presents the accuracy of keeping the attributes of the source images. However, since the goal of this task is semantic translation, the accuracy of the target semantic is also required. For example, in the Young-to-Adult task, the accuracy of whether the generated image is indeed an adult image. Could you provide this target semantic accuracy results?
  2. In the Appendix, Tables 5 and 7 show the Frechet distance (FD) between the learned and target measures. I believe this FD metric evaluates whether the semantic translation is successful, at the marginal level. Generally, increasing τ\tauτ decreases (improves) FD metrics in Table 5 and 7. Could you clarify how the optimal τ\tau is selected? I am curious because when τ\tau is overly large, U-LightOT achieves worse accuracy compared to other models in Table 2.
  3. Could you present the FD results (Tables 5 and 7) for the other models?
  4. For the optimal τ=500\tau=500 in the Man-to-Woman task, Table 6 shows an accuracy of 83.85 at best, while Table shows an accuracy of 92.85. Could you clarify which result is correct?

局限性

The authors addressed the limitations and broader impact of their work.

作者回复

Thank you for your thorough feedback. Please find the answers to your questions below.

(1) The optimization objective and Gaussian Mixture approximation in Sec 4 are similar to [1].

Our solver can be considered as the generalization of the one from [LightSB, 1] in the sense that it subsumes LightSB for the specific choice of ff-divergences. However, this generalization is not straightforward or direct since our objective is built on the completely different principles:

  1. Our solver is derived from minimizing DKLD_{\text{KL}} divergence (defined as a discrepancy between positive measures) between ground truth plan γ\gamma^* and its approximation γθ,ω\gamma_{\theta,\omega}. This definition of divergence notably differs from the ordinary definition of DKLD_{\text{KL}} for probability measures used in [1].

  2. We parametrize the entire plan using Gaussian mixtures while in [1] it is done only for conditional plans. It is an important difference, since in an unbalanced case the marginals of optimal plan do not coincide with source and target measures. Our parametrization allows sampling from the left marginal of UEOT plan and identifying potential outliers in the source measure.

(2) While this paper provides the universal approximation property for Gaussian Mixure approximation, I have concerns about whether this Gaussian Mixture parametrization can achieve decent results for more complex distributions, such as in generative modeling within the data space on CIFAR-10.

In general, methods based on Gaussian parametrization are usually not appropriate for tasks with complex data, e.g., images. We mention this limitation in our paper (lines 256-257,702-707). Still, our aim was to develop a lightweight unbalanced solver which can serve as a simple and easy-to-use baseline in moderate-dimensional tasks. As expected, we get this ease in exchange for the rich parametrization required for large-dimensional tasks and vice versa.

(3) In the Unpaired Image-to-Image Translation task, Table 2 only presents the accuracy of keeping the attributes of the source images. However, since the goal of this task is semantic translation, the accuracy of the target semantic is also required. For example, in the Young-to-Adult task, the accuracy of whether the generated image is indeed an adult image. Could you provide this target semantic accuracy results?

We are working on this and aim to add the results soon.

(4) In the Appendix, Tables 5 and 7 show the Frechet distance (FD) between the learned and target measures. I believe this FD metric evaluates whether the semantic translation is successful, at the marginal level. Generally, increasing τ\tau decreases (improves) FD metrics in Table 5 and 7. Could you clarify how the optimal tau is selected? I am curious because when tau is overly large, U-LightOT achieves worse accuracy compared to other models in Table 2.

In Appendix C (Tables 5, 7), we perform an ablation study of our method in order to show that it offers a flexible way to select a domain translation configuration (unbalancedness parameter τ\tau) that either allows for very good level of preserving the properties of the input objects or generation of a distribution which is a very good approximation of the target distribution. In that section, we highlighted the parameter which is optimal in the sense that it provides the best tradeoff between the closeness of the learned translations and target ones (Pareto-optimal), and the ability of the learned latents to keep the features of input latents. However, the final selection of the optimal configuration remains at the discretion of the user.

(5) Could you present the FD results (Tables 5 and 7) for the other models?

We will add the results soon.

(6) For the optimal tau=500 in the Man-to-Woman task, Table 6 shows an accuracy of 83.85 at best, while Table shows an accuracy of 92.85. Could you clarify which result is correct?

The Tables provide results for different number of training steps and parameters τ\tau. Table 2 shows the accuracy for τ=100\tau=100 and 5K steps of the algorithm which is specified in Appendix B.3. Table 6 presents the results for only 3K steps and different parameters τ\tau. (Note that the value for τ=100\tau=100 in the Table 6 (88.59 ±\pm 0.40) is close to 92.85 - up to the difference in number of training steps.)

References.

[1] Korotin, Alexander, Nikita Gushchin, and Evgeny Burnaev. "Light schr" odinger bridge." ICLR 2024.

评论

In the Unpaired Image-to-Image Translation task, Table 2 only presents the accuracy of keeping the attributes of the source images. However, since the goal of this task is semantic translation, the accuracy of the target semantic is also required. For example, in the Young-to-Adult task, the accuracy of whether the generated image is indeed an adult image. Could you provide this target semantic accuracy results? [...] Could you present the FD results (Tables 5 and 7) for the other models?

As per your request, we provide the Accuracy (of mapping to the target) and FD (between learned and target latents) results for our solver and its unbalanced competitors (in Young\rightarrowAdult translation) in the Table below. For completeness, we also include the results for balanced LightSB [1] solver.

Choi et al. [3]Yang et al. [4]UOT-FM [2]LightSB [1]ULight-OT (ours, τ=100\tau=100)
Accuracy (mapping to the target)85.3680.3283.2788.1481.14
FD (between generated and target latents)13.2411.5010.2724.6627.72

The results show that balanced LightSB solver outperforms other methods according to the target accuracy results. Note that FD metrics is based on the first and second moments of distributions, therefore, there is a chance that it can provide imprecise results (as it, possibly, happens for the case of LightSB). Our method provides the target accuracy results on par with UOT-FM model (which is the second-best model according to the accuracy of keeping the class, see Table 2). Other unbalanced solvers (Yang et al., Choi et al.) provide better accuracies of mapping to the target and FD results, but are slightly worse in keeping the attributes (classes) of the source latents. Besides, our solver is simpler and and faster than its competitors, especially, those from (Yang et al., Choi et al.) which are based on adversarial learning, see the speed-up comparison in our answer to the reviewer bWBu (https://openreview.net/forum?id=co8KZws1YK&noteId=akup4rEAWd).

[1] Korotin et al. "Light schrodinger bridge", ICLR 2024.

[2] L. Eyring et al. Unbalancedness in neural monge maps improves unpaired domain translation. ICLR, 2024

[3] J. Choi et al. Generative modeling through the semi-dual formulation of unbalanced optimal transport. NeurIPS, 2023.

[4] K. D. Yang et al. Scalable unbalanced optimal transport using generative adversarial networks. ICLR, 2018.

评论

I appreciate the author for their clarifications and additional experiments. These have been helpful in addressing my concerns. Hence, I will raise my rating to 5.

审稿意见
6

The proposal focuses on developing a fast solver for the unbalanced entropy-regularized optimal (EOT) transport between continuous Radon measures. The authors utilize the dual formulation of unbalanced EOT and use the relationship between the optimal potentials (i.e., the dual variables) and the primal transport plan. They then consider a parameterization of the transport plan and plan to minimize the KL divergence between this parameterized transport plan and the optimal one. Given that the optimal transport plan is unknown, the authors first use the relationship between the primal and dual solutions to reparameterize the dual variables and then derive a tight upper bound for the KL between the optimal plan and the parameterized one, which they propose to minimize. To deal with the normalization terms in their upper bound, the authors use a similar framework to that of Gushchin et al. [29] and assume the reparameterized dual variables are unnormalized Gaussian mixtures; this assumption enables analytic solutions to the otherwise difficult-to-calculate terms in the upper bound. Lastly, the authors provide a generalization error bound for their proposed framework. The paper provides two small-scale numerical examples to demonstrate their solver's efficiency: 1) two-dimensional Gaussian mixtures and 2) unpaired-image-to-image translation in the embedding space of an autoencoder, specifically ALAE, on an unbalanced subset of FFHQ dataset for Adult, Young, Man, Woman face.

优点

  • The paper is very well written and straightforward to follow.
  • The clever parameterizations used in this paper (while they appear in some prior work), provide a unique approach for solving the UEOT problem between continuous measures.
  • The provided generalization error bounds (while straightforward to derive), are important and certainly add value to the paper.
  • The method is easy to implement and fast to train. Quick convergence on the CPU is a notable achievement unlocked by this work.

缺点

  • One major weakness is that the paper does not discuss how KK and LL, i.e., the number of Gaussians in the mixtures, affect the results. The generalization error bound mentions that KK and LL will appear as constants in the error bound, but the practical implications of the choice of KK and LL are missing from the paper.

  • Experiments are relatively modest: 2 experiments in low dimensions. It would be beneficial to have an experiment on robustness to outliers, as this is included in the main claim.

  • The paper claims to have a fast solver but lacks a detailed speed comparison for either experiment. It would be great to have a wall-clock comparison of competing methods.

  • The Gaussian mixture assumption limits the method's applicability to only low-dimensional problems, and it is not clear whether this limitation can be overcome.

问题

  • How does the performance change as a function of KK (assuming L=KL=K)?

局限性

Limitations are provided in the appendix.

作者回复

Thank you for your thorough feedback. Please find the answers to your questions below.

(1) The paper does not discuss how K and L, i.e., the number of Gaussians in the mixtures, affect the results. The generalization error bound mentions that K and L will appear as constants in the error bound, but the practical implications of the choice of K and L are missing from the paper. [...] How does the performance change as a function of K(assuming K=L)?

To address your question, we perform additional experiments (both on Gaussians and in the latent space of ALAE autoencoder) with our solver varying the number of Gaussian modes (KK, LL) in potentials.

The setup of this experiment follows the setup introduced in our Section 5.1. We test our solver with diverse number of potentials K{1,3,5}K\in\{1,3,5\} and L{1,2,3,4,5}L\in\{1,2,3,4,5\}. The results are visualized in Fig. 1 of the attached PDF file. It can be seen that for insufficient number of modes in potentials, the solver exhibits issues with convergence and do not correctly solve the task.

(2) It would be beneficial to have an experiment on robustness to outliers, as this is included in the main claim.

Thank you for this valuable suggestion. We conduct the experiment on Gaussian Mixtures with added outliers and visualize the results in Fig. 2 of the attached PDF file. The setup of the experiment, in general, follows the Gaussian mixtures experiment setup described in section 5.2 of our paper. The difference consists in outliers (small gaussians) added to the input and output measures. The results show that our U-LightOT solver successfully eliminates the outliers and manages to simultaneously handle the class imbalance issue. At the same time, the balanced LightSB [4] solver fails to deal with either of these problems.

(3) [...] speed comparison for either experiment. It would be great to have a wall-clock comparison of competing methods.

Thank you for the suggested idea. We compared the running time of our algorithm and unbalanced competitors on the image translation task (Adult\rightarrowYoung), its setup is described in section 5.2 of our paper. The results for all of the methods (wall-clock times for 10k updates) are presented in the Table below. We omit the results for other variants of translations since they are quite similar to the results in the Table.

As you can see, our proposed solver outperforms its competitors (unbalanced methods) in terms of convergence time.

ULight-OTUOT-FM [1]Yang et al. [2]Choi et al. [3]
Time02:3803:2116:3018:11

(4) The Gaussian mixture assumption limits the method's applicability to only low-dimensional problems, and it is not clear whether this limitation can be overcome.

In general, methods based on Gaussian parametrization are usually not appropriate for tasks with complex data, e.g., images. We mention this limitation in our paper (lines 256-257,702-707). Still, our aim was to develop a lightweight unbalanced solver which can serve as a simple and easy-to-use baseline in moderate-dimensional tasks. As expected, we get this ease in exchange for the rich parametrization required for large-dimensional tasks and vice versa.

Concluding remarks. Please respond to our post to let us know if the clarifications above suitably address your concerns about our work. We are happy to address any remaining points during the discussion phase; if the responses above are sufficient, we kindly ask that you consider raising your score.

References.

[1] L. Eyring et al. Unbalancedness in neural monge maps improves unpaired domain translation. ICLR, 2024

[2] J. Choi et al. Generative modeling through the semi-dual formulation of unbalanced optimal transport. arXiv preprint arXiv:2305.14777, 2023.

[3] K. D. Yang et al. Scalable unbalanced optimal transport using generative adversarial networks. ICLR, 2018.

[4] Korotin et al. "Light schrodinger bridge", ICLR 2024.

评论

I appreciate the authors' extensive responses and clarifications.

The experiment on varying KK and LL is particularly insightful, as it demonstrates that even in a simple toy problem, the method's performance is highly sensitive to the appropriate selection of these hyperparameters. I believe the paper would benefit from reporting the method's results on large-scale experiments across a range of KK and LL values, perhaps in the supplementary material.

I also appreciate the wall-clock performance data provided by the authors. A more rigorous analysis of the wall-clock time, considering different sample sizes and varying KK and LL values on toy datasets, could further enhance the paper’s practical value to the community.

Overall, I find this paper well-written, easy to follow, novel, and of potential interest to the community. The strengths of the paper outweigh the weaknesses, and I am increasing my score to Weak Accept.

审稿意见
6

This work focuses on the largely computationally intractable efforts in unbalanced OT dual form where neural networks are used as a proxy (used as potentials) in order to approximate Wasserstein distances. In this work, the authors set out to significantly reduce this optimization procedure by decomposing the join optimal solution into conditionals which allows for both easier inference and a reduction in the number of parameters required. Experimental results are then carried out to show the success of this method beyond the improved efficiency.

优点

[+] The reduction in efficiency seems to be quite strong and effective way to reduce the overall parameters required to approximate OT distances [+] The theoretical results are sound and well motivated. [+] A generalization bound is also presented, attesting to the soundness one achieves with this light variation.

缺点

[-] Appears to be specific only to the case of having KL divergence penalties for the mass constraints. [-] Paper can appear a bit difficult and dense to read.

问题

(1) The reduction you get appears to have (perhaps even superficially) some relationship to the way WAE decomposes the coupling into conditionals. More coincidentally, WAE also uses conditional Gaussians to parametrize the encoder distribution, although for different purposes. Do you have any comments if there is any deeper link here?

(2) Do you have any intuition if one were to use other penalties beyond KL to enforce the mass constraint?

局限性

Yes

作者回复

Thank you for your thorough feedback. Please find the answers to your questions below.

(1) Appears to be specific only to the case of having DKLD_{\text{KL}} divergence penalties for the mass constraints. [...] Do you have any intuition if one were to use other penalties beyond DKLD_{\text{KL}} to enforce the mass constraint?

Our solver admits different divergences except for the DKLD_{\text{KL}} one. From the theoretical point of view, we describe the set of admissible divergences in our Appendix C (lines 549-672). Besides, we provide a numerical example illustrating the performance of our solver with Dχ2\mathcal{D}_{\chi^2} divergence, see Fig. 3 and description in lines 559-677.

(2) Paper can appear a bit difficult and dense to read.

We are upset that you found our work difficult to read. We will try to improve this aspect if you indicate in more detail which points were difficult to understand.

(3) The reduction you get appears to have (perhaps even superficially) some relationship to the way WAE decomposes the coupling into conditionals. More coincidentally, WAE also uses conditional Gaussians to parametrize the encoder distribution, although for different purposes. Do you have any comments if there is any deeper link here?

Thanks for asking. We think that there is no direct link. Indeed, in WAE, the encoder for each input xx outputs some Gaussian, while in our case, all the conditional Gaussians (more precisely, Gaussian mixtures) are tight together. This means that given one conditional distribution γθ(yx=x0)\gamma_{\theta}(y|x=x_0), one can immediately express all the other γθ(yx=xother)\gamma_{\theta}(y|x=x_{\text{other}}). In fact, the densities of all these conditional distributions are parameterized by a single scalar-valued function vv; see eq. (8) in our paper. This is achieved because of the properties of the entropic optimal transport solutions which we exploited to construct our algorithm.

Concluding remarks. Please respond to our post to let us know if the clarifications above suitably address your concerns about our work. We are happy to address any remaining points during the discussion phase; if the responses above are sufficient, we kindly ask that you consider raising your score.

评论

Thank you for responding to my questions, I don't have any concerns after reading the response.

审稿意见
6

The paper presents a novel approach to solving the continuous Unbalanced Entropic Optimal Transport (UEOT) problem. The authors introduce a lightweight, theoretically-justified solver that addresses the challenges of sensitivity to outliers and class imbalance in traditional Entropic Optimal Transport (EOT). The proposed method features a non-minimax optimization objective and employs Gaussian mixture parametrization for UEOT plans, resulting in a fast, simple, and effective solver. The authors provide theoretical guarantees for their solver's performance and apply it to simulated and image data.

优点

  • The paper is well-written and easy to follow
  • The paper describes well related literature and clearly motivates the approach / why there is a need for this solver
  • The paper introduces a novel way to solve UEOT problems using Gaussian mixtures, even if the approach was previously used for balanced EOT problems as mentioned by the authors.
  • The authors thoroughly study generalization bounds.
  • The authors consider a wide range of competing methods.

缺点

  • While the authors provide generalisation bounds, it would be helpful to assess the performance of the method on the UEOT plan between Gaussian distributions, see Janati et al. ,2020
  • As mentioned by the authors, the Gaussian mixture approach is likely to work only in low dimensions. It would be interesting to see when it fails, e.g. using the benchmark above.

问题

  • The authors state that for OT-FM and UOT-FM in the FFHQ dataset, they use a 2-layer feed-forward network with 512 hidden neurons and ReLU activation. Where does this parameterization come from? It seems to be relatively small for a flow matching architecture on images, and does not seem to be the architecture used in the original papers.
  • Why are FID scores not reported for the image translation tasks?

局限性

The authors have considered the limitations and potential negative societal impact.

作者回复

Thank you for your thorough feedback. Please find the answers to your questions below.

(1) Performance of the method on the UEOT plan between Gaussian distributions, see Janati et al.,2020 [5]

Thank you for your suggestion. Unfortunately, a comparison of our method's solutions with the analytical solutions proposed in [5] is not relevant, since this paper consideres a different setup of the UEOT problem. Namely, this paper derives solutions for the UEOT problem (between Guassian measures) with DKLD_{\text{KL}} as entropy regularization instead of the differential entropy used in our paper. We noted the difference between the problem we are considering and the one considered in [5] in our paper, see lines 91-92 and corresponding footnote.

(2) The Gaussian mixture approach is likely to work only in low dimensions. It would be interesting to see when it fails, e.g. using the benchmark above.

As we explained in the previous answer the benchmark provided in [5] is not relevant for us as it considers another UEOT problem.

(3) The authors state that for OT-FM and UOT-FM in the FFHQ dataset, they use a 2-layer feed-forward network with 512 hidden neurons and ReLU activation. Where does this parameterization come from? It seems to be relatively small for a flow matching architecture on images, and does not seem to be the architecture used in the original papers.

It is important to understand here that we run our experiments in the latent space of the ALAE autoencoder, and not on the images directly. Accordingly, we adapted the architectures of the neural networks used in OT-FM and UOT-FM to work with latent codes. In this case, architectures such as fully connected neural networks are relevant.

(4) Why are FID scores not reported for the image translation tasks?

We conduct the image translation experiment in the latent space of ALAE autoencoder. For this reason, we did not report the FID metrics assessing the quality of the generated images but rather focus on assessing the quality of the generated latents and focus on the Frechet distance (FD) defined as the difference in means and covariances of the generated and target latents.

However, to fully address the raised question, we report the FID scores between the generated images (produced by ALAE decoder from the generated latent codes) and target images distributions in the Table below. The results show that FID is nearly the same for all of the models under consideration. It supports our intuition that FID is indeed not a representative metric for assessing the performance of models performing the translation of latent codes.

10k updatesULight-OT (ours)Light-SB [1]UOT-FM [2]Yang et al. [3]Choi et al. [4]
FID0.331±0.030.331 \pm 0.030.331±0.030.331 \pm 0.030.331±0.040.331 \pm 0.04 0.344±0.040.344 \pm 0.04 0.339±0.030.339 \pm 0.03

Concluding remarks. Please respond to our post to let us know if the clarifications above suitably address your concerns about our work. We are happy to address any remaining points during the discussion phase; if the responses above are sufficient, we kindly ask that you consider raising your score.

References

[1] Korotin et al. "Light schrodinger bridge", ICLR 2024.

[2] L. Eyring et al. Unbalancedness in neural monge maps improves unpaired domain translation. ICLR, 2024

[3] J. Choi et al. Generative modeling through the semi-dual formulation of unbalanced optimal transport. arXiv preprint arXiv:2305.14777, 2023.

[4] K. D. Yang et al. Scalable unbalanced optimal transport using generative adversarial networks. ICLR, 2018.

[5] H. Janati et al. Entropic optimal transport between unbalanced gaussian measures has a closed form. NeurIPS, 2020.

评论

I thank the reviewers for their clarifications, and apologise for having missed this difference explained in lines 91-92. Thus, I increase my score to 6.

审稿意见
5

This paper presents a lightweight solver for Unbalanced Entropic Optimal Transport (UEOT) that does not rely on neural network parametrization. Instead, the authors parameterize the potential functions of UEOT using Gaussian Mixture Models (GMM). This parametrization enables the derivation of a tractable joint coupling. By incorporating the parameterized potential into the dual objective, the authors achieve a simple and tractable loss function. Additionally, the paper provides a universal approximation result for GMM parametrization. Experiments are conducted on toy data (GMM) and image-to-image (I2I) translation.

优点

  • The paper proposes a simple and fast UEOT algorithm.
  • The paper justifies the GMM parametrization by presenting generalization bounds.
  • The paper demonstrates applicability to large-scale tasks such as I2I translation when combined with an autoencoder (AE).
  • The paper is well-written, clear, and easy to follow.

缺点

  • The method of parameterizing the potential function using GMMs was already proposed in LightSB [1]. The only change here is the switch to a UOT objective, making the methodological contribution minimal. Aside from the universal approximation result, the theoretical contributions are also limited.

  • The experiments are not comprehensive. First, the experiments are conducted only on face-related data. More diverse datasets should be included. Second, the fairness of the comparisons is questionable. In the I2I experiments, the authors use the ALAE autoencoder, while some of comparison methods are implemented directly in the image space. All of the comparisons should be implemented in the latent space for fairness. Third, since U-LightOT are implemented on a latent space that captures attributes well, the attribute accuracy is expected to be high. Other than accuracy, more general metrics such as c-FID or FID should be used for comparison. Fourth, there is a lack of ablation studies on the number of Gaussian modals NN and MM. This is very important and expected to be sensitive hyperparameter, thus, I believe authors should provide ablation studies on this parameter. Overall, the practical utility of the approach is questionable.

[1] Light Schrodinger Bridge, ICLR, 2024.

问题

  • How does the performance change when parameterizing very high-dimensional and multi-modal GMM data with fewer or more N,MN,M?
  • In the I2I experiments, how does the number of modes in the GMM affect performance?
  • In toy data experiments, does U-LightOT has lower transport plan costs and smaller Wasserstein distances between the target and generated distributions compared to other comparisons?

局限性

Discussed in Weakness section.

作者回复

Thank you for your thorough feedback. Please find the answers to your questions below.

(1) The method of parameterizing the potential function using GMMs was already proposed in LightSB [1]. The only change here is the switch to a UOT objective, making the methodological contribution minimal.

Our solver can be considered as the generalization of the one from [LightSB, 1] in the sense that it subsumes LightSB for the specific choice of ff-divergences. However, this generalization is not straightforward or direct since our objective is built on the completely different principles:

  1. Our solver is derived from minimizing DKLD_{\text{KL}} divergence (defined as a discrepancy between positive measures) between ground truth plan γ\gamma^* and its approximation γθ,ω\gamma_{\theta,\omega}. This definition of divergence notably differs from the ordinary definition of DKLD_{\text{KL}} for probability measures used in [1].

  2. We parametrize the entire plan using Gaussian mixtures while in [1] it is done only for conditional plans. It is an important difference, since in an unbalanced case the marginals of optimal plan do not coincide with source and target measures. Our parametrization allows sampling from the left marginal of UEOT plan and identifying potential outliers in the source measure.

(2) Aside from the universal approximation result, the theoretical contributions are also limited.

We partially agree with the reviewer that the proof of our Universal Approximation Theorem (UAT) is the most difficult and tricky among the results obtained in our paper. However, the theoretical contributions of our paper are not limited to this theorem. Our other results include: (1) Theorem 4.1 - the derivation of the tractable optimization objective in terms of the DKLD_{\text{KL}} divergence between positive measures; (2) Proposition 4.2 - the derivation of the bound for the estimation error of our solver; (3) Theorem A.4 - derivation of the dual form of UEOT problem with the potentials belonging to the space C2,b(x)C_{2,b}(x) of continuous functions, bounded by the quadratic polynom (from the both sides) and additionally bounded by constant from above. Proof of each of these results is non-trivial and requires highly specialized knowledge in the diverse fields of mathematics and statistics.

(3) In the I2I experiments, the authors use the ALAE autoencoder, while some of comparison methods are implemented directly in the image space. All of the comparisons should be implemented in the latent space for fairness.

All of the methods included in comparison in the image-to-image translation task were implemented in the latent space of the ALAE autoencoder which is mentioned in the paper, see Section 5.2, line 276. We agree that it might written more clearly and will additionally emphasize this aspect in the final version of our paper.

(4) Since U-LightOT are implemented on a latent space that captures attributes well, the attribute accuracy is expected to be high. Other than accuracy, more general metrics such as c-FID or FID should be used for comparison.

We conduct the image translation experiment in the latent space of ALAE autoencoder. For this reason, we did not report the FID metrics assessing the quality of the generated images but rather focus on assessing the quality of the generated latents and focus on the Frechet distance (FD) defined as the difference in means and covariances of the generated and target latents.

However, to fully address the raised question, we report the FID scores between the generated images (produced by ALAE decoder from the generated latent codes) and target images distributions in the Table below (Adult → Young translation). The results show that FID is nearly the same for all of the models under consideration. It supports our intuition that FID is indeed not a representative metric for assessing the performance of models performing the translation of latent codes.

10k updatesULight-OT (ours)Light-SB [1]UOT-FM [2]Yang et al. [3]Choi et al. [4]
FID0.331±0.030.331 \pm 0.030.331±0.030.331 \pm 0.030.331±0.040.331 \pm 0.04 0.344±0.040.344 \pm 0.04 0.339±0.030.339 \pm 0.03

(5a) Lack of ablation studies on the number of Gaussian modes N and M.*

To address the reviewer's concern, we perform additional experiments with our solver varying the number of Gaussian modes (NN, MM) in potentials.

Gaussians mixtures. The setup of this experiment follows the setup introduced in our Section 5.1. We test our solver with diverse number of potentials N{1,3,5}N\in\{1,3,5\} and M{1,2,3,4,5}M\in\{1,2,3,4,5\}. The results are visualized in Fig. 1 of the attached PDF file. It can be seen that for insufficient number of modes in potentials, the solver exhibits issues with convergence and do not correctly solve the task.

(5b) How does the performance change when parameterizing very high-dimensional and multi-modal GMM data with fewer or more N and M?

Unfortunately, to assess the performance of our solver in such an experiment with multi-model GMM data, we need to have some kind of ground-truth solutions. However, for the multi-modal GMM data the solutions are not available making it hard to perform such an experiment. Following your comment, we qualitatively demonstrated the performance of our solver with varying number of potential modes N,MN,M for 2-dimensional Gaussian mixtures and quantitavely assess its performance in Image-to-Image translation task, see the answer above.

Concluding remarks. Please respond to our post to let us know if the clarifications above suitably address your concerns about our work. We are happy to address any remaining points during the discussion phase; if the responses above are sufficient, we kindly ask that you consider raising your score.

评论

References.

[1] Korotin et al. "Light schrodinger bridge", ICLR 2024.

[2] L. Eyring et al. Unbalancedness in neural monge maps improves unpaired domain translation. ICLR, 2024

[3] K. D. Yang et al. Scalable unbalanced optimal transport using generative adversarial networks. ICLR, 2018.

[4] J. Choi et al. Generative modeling through the semi-dual formulation of unbalanced optimal transport. NeurIPS, 2023

评论

In toy data experiments, does U-LightOT has lower transport plan costs and smaller Wasserstein distances between the target and generated distributions compared to other comparisons?

To answer this question, we compared our solver with different unbalancedness parameters τ{1,10,50,100}\tau\in\{1,10,50,100\} and LightSB for an experiment with a mixture of Gaussians. The results are presented in the table below. Note that our solver is designed to solve an unbalanced EOT problem with relaxed boundary conditions. This entails two properties. Firstly, our solver better preserves the properties of the input objects - indeed, it allows for the domain translation which preserves object classes even in the case of class imbalance. Secondly, due to the relaxed boundary condition for the target distribution, the distribution generated by our solver is naturally less similar to the target distribution than for balanced methods.

The above intuitive reasoning is confirmed by the metrics we obtained. Indeed, as the τ\tau parameter increases, when our method becomes more and more similar to balanced approaches, the normalized OT cost (E_xpE_yγ(yx)(xy)22\mathbb{E}\_{x\sim p} \mathbb{E}\_{y\sim \gamma(y|x)} \frac{(x-y)^2}{2}) between the source and generated distributions increases, and the Wasserstein distance between mapped pp and target distribution qq decreases. This property of our solver was noted in our paper, see Appendix C. LightSB [1] baseline, which is a purely balanced approach, shows the best quality in terms of Wasserstein distance and the worst in terms of OT cost.

LightSBU-LightOT (ours, τ=100\tau=100)U-LightOT (ours,τ=50\tau=50)U-LightOT (ours,τ=10\tau=10)U-LightOT (ours,τ=1\tau=1)
OT cost3.9523.9313.8742.9132.023
W2\mathbb{W}_2-distance0.0880.0910.1381.1072.044

It is important to note that our method offers a flexible way to select a domain translation configuration that allows for better preserving the properties of the original objects or generating a distribution closer to the target one. The final optimal configuration selection remains at the discretion of the user. At the same time, balanced approaches do not allow making a choice in favor of preserving the properties of the original objects.

评论

(a) In the I2I experiments, how does the number of modes in the GMM affect performance?

(b) ...since U-LightOT are implemented on a latent space that captures attributes well, the attribute accuracy is expected to be high

Up to the request, we perfom an ablation study of our U-LightOT solver with different number on Gaussian components in potentials N,MN,M. Similarly to Appendix C of our paper, we run the solver in YoungAdult*Young*\rightarrow*Adult* translation with 3K steps, ε=0.1\varepsilon=0.1 and set τ=100\tau=100. The quantitative results (accuracy of keeping the class, accuracy of mapping to the target, FD of generated latents vs target latents) are presented in the Tables below.

FD of generated latents (less is better)

N/M1248163264
129.18 ± 0.0531.43 ± 2.8331.49 ± 1.6531.66 ± 2.3632.59 ± 2.1231.11 ± 2.6533.60 ± 3.40
229.38 ± 0.7027.75 ± 0.1227.98 ± 0.3030.15 ± 2.3331.61 ± 2.6530.46 ± 2.7129.60 ± 1.29
429.44 ± 1.0528.63 ± 2.1828.78 ± 1.0628.17 ± 1.3528.95 ± 2.2127.94 ± 1.1830.29 ± 2.72
830.79 ± 2.6728.55 ± 0.7529.54 ± 3.2327.82 ± 0.8927.10 ± 0.2329.98 ± 0.5027.57 ± 1.03
1627.15 ± 0.3430.95 ± 1.2027.69 ± 0.6628.71 ± 2.1728.19 ± 0.5027.58 ± 0.7828.31 ± 0.92
3228.86 ± 1.3530.64 ± 4.0527.36 ± 0.8528.01 ± 0.8727.99 ± 1.5929.52 ± 1.6529.14 ± 1.91
6430.08 ± 3.6329.12 ± 1.5830.07 ± 2.6730.04 ± 1.0529.15 ± 2.0029.48 ± 2.4329.08 ± 1.53

Accuracy of keeping the class (less is better)

N/M1248163264
187.78 ± 0.6288.11 ± 0.6988.50 ± 0.2788.38 ± 0.3087.97 ± 0.3888.19 ± 0.4988.45 ± 0.37
288.65 ± 0.2588.57 ± 0.6487.54 ± 1.0888.62 ± 0.7588.09 ± 0.4488.06 ± 0.3988.75 ± 0.51
487.89 ± 0.4487.84 ± 0.3988.22 ± 0.8787.82 ± 0.6488.85 ± 0.7087.25 ± 0.4388.09 ± 0.66
888.54 ± 0.9588.52 ± 0.4888.29 ± 0.4088.27 ± 0.3787.93 ± 0.7388.88 ± 0.7887.56 ± 0.75
1688.38 ± 0.7388.89 ± 0.5088.02 ± 0.4588.19 ± 0.3287.80 ± 0.7087.84 ± 0.6287.94 ± 0.57
3287.97 ± 0.7588.87 ± 0.5986.99 ± 0.1687.71 ± 0.5087.50 ± 0.4487.71 ± 0.7688.08 ± 0.37
6487.13 ± 0.7388.23 ± 0.5887.70 ± 0.9187.56 ± 0.5287.99 ± 0.4888.83 ± 0.4388.30 ± 0.59

Accuracy of mapping to the target (less is better)

N/M1248163264
179.09 ± 0.0279.48 ± 0.1779.09 ± 0.7278.85 ± 0.4178.86 ± 0.0479.02 ± 0.3178.00 ± 0.34
279.26 ± 0.5279.10 ± 0.6678.95 ± 0.4979.68 ± 0.4679.65 ± 0.2979.44 ± 0.7479.49 ± 0.42
478.80 ± 0.7179.27 ± 0.6479.68 ± 0.3179.84 ± 0.5079.16 ± 0.4779.57 ± 0.9278.88 ± 1.34
878.41 ± 0.1679.36 ± 0.5478.74 ± 0.2078.80 ± 0.6979.11 ± 0.8978.43 ± 1.0179.06 ± 0.71
1679.44 ± 0.8279.27 ± 0.4079.21 ± 1.1579.47 ± 0.5779.35 ± 0.6379.70 ± 0.6778.34 ± 1.15
3278.59 ± 0.4279.55 ± 0.5979.27 ± 0.7978.70 ± 1.0979.37 ± 0.6378.68 ± 0.7779.42 ± 0.77
6479.11 ± 1.0278.41 ± 0.6679.29 ± 0.3677.91 ± 0.4578.19 ± 1.1880.27 ± 0.9679.60 ± 0.47

The results show that in the considered task, our solver provides good performance even for small number of Gaussian components. This can be explained by the smoothness of the latent representations of data in ALAE autoencoder.

评论

I appreciate the authors for their clarifications, particularly regarding the comparison between this work and [1]. Moreover, I thank the authors for the extensive experiments conducted. Based on this, I would like to raise my score to 5.

作者回复

Dear reviewers,

thank you for your thorough and detailed reviews! We are highly inspired by the fact that you agree on the importance of our theoretical results (Reviewer bWBu, vYvs), clarity (Reviewer WAcu, bWBu, t9a3, nig3) of our paper and mark the efficiency of the our solver (Reviewer bWBu, vYvs). We hope that our U-LightOT algorithm would be easy to use in practical applications.

We will incorporate the changes suggested by the reviewers in the final version of our paper. We list the changes below:

(a) Main text - addition of the Table with wall-clock comparison of our U-LightOT solver and its competitors (Reviewer ) plus minor requested clarifications,

(b) Additional experiment in Appendix C section - ablation study of our solver with different number of gaussian modes in potentials(Reviewers nig3, bWBu),

(c) Additional experiment in Appendix E section - Gaussian Mixtures with outliers experiment showing the robustness of our solver towards potential outliers (Reviewer bWBu).

Please find Figures for experiments requested by the reviewers nig3, bWBu in the attached PDF file.

Please find the answers to your questions below.

评论

Since the end of the rebuttal period is approaching, we want to thank the reviewers for their time spent on their reviews and subsequent discussion. We are grateful for your interesting and valuable suggestions and will add the changes to the final version of our paper. In addition to the changes listed in the general comment, we will include

  1. (Addition to Appendix C) Table with OT cost between the source and generated distributions and Wasserstein-2 distance between the generated and target distributions in Gaussian mixture experiment (with varying parameter τ\tau);
  2. (Addition to Appendix E) Tables with additional metrics for comparing our method and its competitors (accuracy, FD, FID between the generated latents and the target ones).
最终决定

The paper introduces a new solver for Unbalanced Entropic Optimal Transport, using a novel parameterization of the dual potential. The reviewers acknowledged that the paper offers theoretical insights into the approximation capabilities of the proposed method and that the authors made a significant effort in the rebuttal to improve some of the numerical experiments. However, all reviewers expressed concerns that the contributions are too incremental, primarily combining existing methodological and theoretical techniques from [39] (and [30], which is the same paper) and relying on the same set of experiments.

The introduction of the paper does not clearly outline the connection to [39] or explain the differences, failing to adequately motivate the relevance of the contribution. As a result, the paper does not clearly convey what makes this generalization to an unbalanced solver both challenging and important. Furthermore, the numerical evaluation of the proposed method’s improvements is mostly qualitative, without new compelling experiments to demonstrate the practical benefits from an applied perspective. Instead of relying on the same set of experiments as in [39], there should have been more innovation in this area, with stronger benchmarking to better highlight the advantages of the method.

For these reasons, I recommend rejecting this paper.