PaperHub
5.5
/10
Rejected3 位审稿人
最低3最高3标准差0.0
3
3
3
ICML 2025

Unified (Semi) Unbalanced and Classic Optimal Transport with Equivalent Transformation Mechanism and KKT-Multiplier Regularization

OpenReviewPDF
提交: 2025-01-21更新: 2025-06-18
TL;DR

We propose Equivalent Transformation Mechanism with KKT-Multiplier Regularization for solving SemiUOT and UOT

摘要

关键词
Semi-Unbalanced Optimal Transport; Domain Adaptation

评审与讨论

审稿意见
3

The paper proposes an approach of transforming the Unbalanced and Semi-unbalanced Optimal Transport (UOT/SUOT) problem into the classical OT problem. It is done by finding a scheme for proper reweighing of the marginal distributions. After this, the authors propose an approach for solving the discrete UOT/SUOT problems and test in a variety of experiments.

After rebuttal.

The authors have answered my questions. Thus, I update the score.

给作者的问题

  • As far as I understand, you propose a new approach (MROT) for finding the solutions of classic OT problem which, however, uses multipliers ss obtained from the reweighing step. I am wondering, is it possible to MROT for any classic OT problem when reweighing is not needed? If yes, then why you did not perform comparison with other approaches for classic OT?
  • Why you did not perform quantitative comparison of your approach Ent-UOT (Pham et al., 2020) in section 5.3? It seems to be important to understand the performance of your approach.
  • What is the computational complexity of your algorithms? It seems to be bigger than that of the Sinkhorn one.

论据与证据

  • The main claims made in the submission are supported by proofs and experiments. However, in lines 63-67, the authors write that the idea that "we can transform SUOT, UOT problems into classic OT problems by adjusting the weights" gives new insights on understanding of the UOT and SUOT problems. However, I can not agree with this point since the connection between these types of problems was already established in previous works, see (Choi et al., 2023, Theorem 3.3).

  • In section 5.3 it is not clear in which experiment the authors conduct solvers comparison. It seems to be a synthetic experiment but it should be clearly written.

I also have some concerns regarding the experimental evaluation which I give below.

References.

Choi, J., Choi, J., & Kang, M. (2023). Generative modeling through the semi-dual formulation of unbalanced optimal transport. Advances in Neural Information Processing Systems, 36, 42433-42455.

方法与评估标准

The method and evaluation criteria make sense.

理论论述

I have skimmed through the theoretical claims.

实验设计与分析

I have concerns regarding the experimental designs:

  • My major concern is devoted to the limited number of experimental setups considered in the paper. It seems hard to evaluate the quality of the proposed scheme for UOT/SUOT solution in the domain adaptation problem, since here the performance largely relies on many additional factors, e.g., the underlying approach for domain adaptation. I kindly suggest the authors to consider more synthetic examples. For example, synthetic experiments do not cover cases of datasets with outliers where UOT/SUOT-based approaches are usually used. It would be interesting to see how the approach performs in this experiment w.r.t. other approaches.
  • I am also concerned by the overall performance of the proposed approach for the classic OT solution (MROT). The experiments which compare: 1) this MROT approach with other approaches for classic OT solutions, and 2) ETM approach for marginals reweighing plus MROT vs. ETM + other classic OT methods are missing.

补充材料

I have reviewed the appendix of the paper.

与现有文献的关系

The papers proposes a scheme of transforming the SUOT/UOT problem into the classic OT one by adjusting the weights of the marginals' points. It also proposes a new algorithm for classic OT problem solution.

遗漏的重要参考文献

The understanding of connection between the UOT/SUOT and classic OT problem was previously established in (Choi et al., 2023, Theorem 3.3). While this paper do not propose an algorithm to directly estimate the reweighted marginals for discrete measures, I think it is necessary to refer to this theoretical result.

The paper contributes to the field of discrete OT solvers. However, I think that the paper will benefit from stating the difference between discrete and continuous OT/UOT/SUOT solvers and referencing existing works in the field of continuous UOT. For example, the papers listed below are not included in the paper.

K. D. Yang and C. Uhler. Scalable unbalanced optimal transport using generative adversarial networks. In International Conference on Learning Representations, 2018.

F. Lübeck, C. Bunne, G. Gut, J.S. del Castillo, L.Pelkmans, and D.Alvarez-Melis.Neural unbalanced optimal transport via cycle-consistent semi-couplings. arXiv preprint arXiv:2209.15621, 2022.

M. Gazdieva, A. Asadulaev, E. Burnaev, and A. Korotin. Light Unbalanced Optimal Transport. Advances in Neural Information Processing Systems, 37, 2024.

L. Eyring, D. Klein, T. Uscidda, G. Palla, N. Kilbertus, Z. Akata, and F. Theis. Unbalancedness in neural monge maps improves unpaired domain translation. In The Twelfth International Conference on Learning Representations, 2024.

其他优缺点

Strengths. The paper propose a direct algorithm for adjusting the weights in UOT/SUOT problem and, thus, converting it to the classic OT problem.

Weaknesses. The benefits of the proposed approach for classic OT estimation are not clear. The approach should be directly compared with other classic OT ones. See pervious sections for my other concerns.

其他意见或建议

N/A

作者回复
  • Comment 1: The differences/novelty between this paper and Theorem 3.3 in [Choi] should be highlighted.

Response 1: Theorem 3.3 in [Choi] and our proposed ETM differ significantly in several aspects:

(1) Theorem 3.3 mainly considers the continuous case and does not involve the translation invariant term ζζ. As we shown in Appendix M, without ζζ, the transformed marginal probability will not be equal in the practice and therefore [Choi] cannot guarantee SemiUOT/UOT can be transformed into classic OT, making it impractical in the discrete scenario.

(2) [Choi] primarily discusses UOT and does not explore SemiUOT. Our proposed ETM method specifically addresses the discrete case by directly calculating the exact value on dual Lagrange multipliers (e.g., ff and uu in SemiUOT/UOT) for data reweighting and finding ππ, a topic not covered by [Choi].

In summary, our proposed ETM method ensures that the transformed marginal probability is equal in the discrete scenario, making the equivalent transformation practical for further obtaining ππ with KKT regularization.

  • Comment 2: Descriptions in section 5.3 are not clear.

Response 2: We conduct solver comparison on synthetic datasets in Fig.3. That is, we sample 90% data from PXP_X and PZP_Z accordingly while also randomly sampling 10% outlier data for PXP_{X} and PZP_{Z} to create synthetic dataset and conduct the experiments w.r.t. other methods.

  • Comment 3: The domain adaptation results may rely on additional factors.

Response 3: We follow the same framework, loss functions and experimental settings as the famous models JUMBOT/MOT and only replace UOT/SemiUOT solver with ETM as detailed in line 372-379. That is, we control the variants of these additional factors in the experiments.

  • Comment 4: ETM+MROT vs. ETM+other classic OT methods are missing.

Response 4: The results for ETM combined with Entropy and Norm under SemiUOT/UOT scenarios, using 500 synthetic data, are presented below:

Methodττ=0.01ττ=0.1ττ=1ττ=10ττ=100
(SemiUOT) ETM+Entropy0.150.731.311.741.89
(SemiUOT) ETM+Norm0.110.480.790.961.24
(UOT) ETM+Entropy0.100.611.231.571.78
(UOT) ETM+Norm0.080.350.540.710.96

Both ETM+Entropy and ETM+Norm performs poorly when ηG=0η_{G}=0 comparing with the results in Fig.3(c)-(d) due to insufficient guidance from the KKT conditions. It also reflects the visualization results in Fig.6 in our paper.

  • Comment 5: Some similar related work should be added.

Response 5: We will add these references into our main paper with discussions.

  • Comment 6: Is it possible to MROT for any classic OT problem when reweighing is not needed?

Response 6: No, MROT requires multipliers ss via the ETM method for data sample reweighting, tailored for SemiUOT/UOT. The multipliers ss cannot be derived without the reweighing step, rendering MROT incapable of addressing classic OT problems. Determining ss for classic OT problems presents its own challenges and falls outside the scope of this paper. Our paper primarily focuses on solving π\pi for SemiUOT/UOT, and we have provided ample experiments to validate our proposed KKT-multiplier regularization term, which yields more precise matching results.

  • Comment 7: The performance of Ent-UOT should be provided.

Response 7: We provide the experimental results (i.e., time consumption and absolute error) of Ent-UOT in Table A and Table B respectively.

Table A: The time consumption of Ent-UOT where τ=1\tau = 1 with synthetic data.

MethodNN = 100NN = 200NN = 300NN = 400NN = 500NN = 600NN = 700NN = 800NN = 900NN = 1000
ETM-Refine + MROT-Ent0.130.380.791.482.985.036.609.7511.9319.98
ETM-Approx + MROT-Ent0.120.320.581.132.033.014.947.1210.8917.45
Ent-UOT0.110.280.551.082.153.025.847.6810.4517.62

Table B: The absolute error of Ent-UOT with synthetic data.

Methodττ=0.01ττ=0.1ττ=1ττ=10ττ=100
ETM-Approx + MROT-Ent0.050.190.310.500.54
Ent-UOT0.120.641.251.611.83

We can observe that Ent-UOT could lead to coarse output matching results and it further illustrates the efficacy of our proposed ETM method.

  • Comment 8: What is the computational complexity of your algorithms?

Response 8: The computation complexity of ETM-Approx/ETM-Refine with MROT-Ent is O(NM)O(NM) which has the same BigO magnitude as the Sinkhorn algorithm. Empirically, ETM-Approx with MROT-Ent and Ent-UOT (Sinkhorn in UOT) have roughly comparable running times while less absolute error as reported in Response 7 Table A-B.

审稿人评论

Thank you for your answers, you have mitigated most of my concerns. I incorrectly pointed out to the experiment with outliers which is indeed included in the paper. But did you consider synthetic experiment with imbalance of classes in the source and target measures (e.g., in the context of Gaussian mixtures)? The property of dealing with class imbalance is an another nice feature of UOT-based approaches. It seems to be a valid additional experiment justifying the properties of your method.

作者评论

Esteemed Reviewer,

Thank you for your kind message, and valuable comments helping us improve and refine our manuscript.

  • Comment A: Did you consider some synthetic experiments with imbalance of classes in the source and target measures (e.g., in the context of Gaussian mixtures)?

Response A: Sure, our proposed ETM-based method can tackle the class imbalance scenario. Specifically, we conduct the experiments following the settings in [1] (shown in Fig.2 in [1]) where the source and target data distributions (Gaussian mixtures of two uniform distributions with different weights) are defined as PX=2/5U([1,1]×[0.5,1.5])+3/5U([5,6]×[0.5,1.5])P_{{X}} = 2/5 U([-1,1] \times [0.5,1.5]) + 3/5U([5,6] \times [0.5,1.5]) and PZ=3/5U([1,1]×[0.5,1.5])+2/5U([5,6]×[0.5,1.5])P_{{Z}} = 3/5 U([-1,1] \times [-0.5,-1.5])+ 2/5 U([5,6] \times [-0.5,-1.5]), respectively. We first sample N=50N = 50 data for both PXP_{{X}} and PZP_{{Z}} with τ=0.1\tau = 0.1 or τ=0.9\tau = 0.9 and conduct the UOT matching experiments accordingly. (Note that No.1-No.20 data in PXP_X are sampled from U([1,1]×[0.5,1.5])U([-1,1] \times [0.5,1.5]) and No.21-No.50 data in PXP_X are sampled from U([5,6]×[0.5,1.5])U([5,6] \times [0.5,1.5]). Meanwhile No.1-No.30 data in PZP_Z are sampled from U([1,1]×[0.5,1.5])U([-1,1] \times [-0.5,-1.5]) and No.31-No.50 data in PZP_Z are sampled from U([5,6]×[0.5,1.5])U([5,6] \times [-0.5,-1.5]) to setup the synthetic data experiment). The results can be found in the following anonymous link: https://anonymous.4open.science/r/ETM_matching/icml4808_tau_01_match.pdf and https://anonymous.4open.science/r/ETM_matching/icml4808_tau_09_match.pdf.

Here the blue ‘+’ and red ‘x’ denote the source and target samples, respectively. The data distributions are set to be class-imbalanced. From that we can observe: (1) Ent-UOT could only provide coarse and inaccurate matching results. Moreover, Ent-UOT may lead to mismatch when τ=0.9\tau = 0.9. (2) GEMUOT with square norm regularization term can obtain more sparse matching results. However, the output results of GEMUOT are still far away from the ground truth. (3) Our proposed ETM+MROT-Norm can achieve more accurate results meanwhile avoiding mismatch compared with Ent-UOT and GEMUOT, which indicates the efficacy of the proposed method.

Moreover, we collect the absolute error e=i,jπijπij1e = \sum_{i,j} ||\pi_{ij} - \pi^*_{ij}||_1 with 500 data samples on UOT as shown in the following table.

Methodτ=0.1\tau=0.1, N=500N = 500τ=0.9\tau=0.9, N=500N = 500
Ent-UOT0.591.12
GEMUOT0.380.60
ETM-Approx + MROT-Ent0.210.33
ETM-Refine + MROT-Ent0.190.32
ETM-Refine + MROT-Norm0.150.27

We also observe that our proposed ETM-based method can obtain more accurate output results for tackling the class imbalance scenario. We will add these contents into the final version of our paper.

We hope that we addressed mainly of your concerns sufficiently, and if you agree, we would kindly request for updating the review in light of this response. If there is anything else we can answer or explain or discuss further, kindly do let us know.

Kind regards,

Authors

Reference:

[1] Eyring L, Klein D, Uscidda T, et al. Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation. The Twelfth International Conference on Learning Representations.

审稿意见
3

The paper introduces a new method called the Equivalent Transformation Mechanism (ETM) that computes Unbalanced Optimal Transport (UOT) and Semi-Unbalanced Optimal Transport (SemiUOT) problems without relying on entropy regularization. The key idea is to compute the final marginal distributions explicitly through a dual optimization method then transform the UOT/SemiUOT problem into a classical OT problem without the relax on marginal constrain. This paper further proposed three variants of ETM for more efficient computation. Experiments were conducted on both synthetic data and domain adaptation tasks using proposed ETM variants thar are ETM-Exact, ETM-Approx, and ETM-Refine. The experiment results shows better performance compared to existing methods, particularly in terms of accuracy.

给作者的问题

Could you provide a formal Big-O complexity analysis for your proposed ETM algorithm variants, ETM-Approx and ETM-Refine?

Given the sequential nature of L-BFGS, do you see viable approaches for parallelizing or adapting your algorithm for GPU implementations?

Could you elaborate more on how sensitive your approach is to hyperparameters, such as τ\tau, ϵ\epsilon?

论据与证据

This paper majorly claims that ETM obtains exact marginal distributions for UOT and SemiUOT without the need for entropy regularization, leading to more accurate and sparse matching solutions. This claim is supported by the derivations showing how dual formulations and KKT conditions can yield the necessary marginal reweighting.

The other claim is about the efficiency of proposed ETM variants. While the empirical results supports this claim, the paper lacks an explicit theoretical analysis (e.g., Big-O complexity) of the proposed methods, leaving the claims about efficiency less supported.

方法与评估标准

The evaluation on both synthetic datasets and domain adaptation tasks looks good to me for showing the usage of the method. But I think the evaluation on efficiency could be strengthened with more discussion on scalability, like how the method performs with larger datasets and a comparison regarding GPU acceleration.

理论论述

I reviewed the main theoretical derivations, and the proofs look valid overall.

实验设计与分析

The paper provides comparisons between the proposed ETM variants and several state-of-the-art methods on both synthetic and domain adoptation task. The runtime and computation error analyses illustrate the trade-offs between different ETM variants. But, the experiment does not include the sensitive analysis on the hyperparameters.

补充材料

I reviewed the supplementary material includes the theoretical proofs.

与现有文献的关系

As they claimed, I think the key contributions of this paper is that it builds on recent work in UOT and SemiUOT, and proposed a method that avoids the drawbacks brought by entropy regularization.

遗漏的重要参考文献

I think the references are good.

其他优缺点

Strengths:

This paper introduces a new method that transforms UOT/SemiUOT problems into classical OT via explicit marginal reweighting. And the realated theoretical derivations are robust.

Weaknesses:

The implementation of ETM is not trivial due to its reliance on dual optimization method, L-BFGS. This framework is sequential iterative methods that limits the parallelizability and GPU acceleration.

Readibility, some concepts and notations such as SemiUOT and the variable τ\tau are introduced late in the paper, making it difficult for readers unfamiliar with these terms to follow.

The paper lacks an theoretical time complexity analysis, mainly relying on empirical comparisons.

其他意见或建议

na

作者回复
  • Comment 1: The time complexity is not provided.

Response 1: The computation complexity of ETM-Approx is O(NMlog(1/εa))O(NM\log (1/\varepsilon_a)) where εa\varepsilon_a denotes the error tolerance (e.g., εa=f^f^oε_a = || \hat{f} - \hat{f}_o||_∞ in SemiUOT and εa=u^u^oε_a = || \hat{u} - \hat{u}_o ||_∞ in UOT, f^o\hat{f}_o and u^o\hat{u}_o denote the optimal solution on SemiUOT and UOT respectively). When the initial solution is sufficiently close to the optimal solution, quasi-Newton optimization procedure [1] can achieve super-linear convergence rate [1]. Thus, the time complexity for ETM-Refine is given as O(NMlog(1/εa)+MNDT)O(NM\log (1/ε_a) + MN D_T) where DTD_T denotes the number of iterations.

  • Comment 2: How the method performs with larger datasets.

Response 2: We solve the optimization problem on the GPU. We have conducted the experiments on the large transfer learning datasets, such as Office-Home. Specifically, Office-Home contains approximately 15,500 images, covering 65 categories, with images collected from office and home scenes. In our experiments, we set the batch size to 512. It takes approximately 3.7 seconds to perform one UOT computation, while one execution of ETM-Refine + MROT-Ent takes about 4.1 seconds.

  • Comment 3: Could you elaborate more on how sensitive your approach is to hyperparameters?

Response 3: We have conducted the parameter sensitivity in Section 5.3 by varying ϵ\epsilon and reported the results in Fig.4. Moreover, we also vary the hyper parameter of ηG\eta_G in Appendix L and report the results in Fig.6. Based on your valuable comment, we collect ϵ=(0.01,0.1,1)\epsilon = (0.01, 0.1, 1) and report the number of iterations with τ=(0.1,1)τ = (0.1,1) shown in the following Table A.

Table A. Number of iterations to convergence on SemiUOT/UOT

MethodETM-Refine (ϵ=0.01\epsilon=0.01)ETM-Refine (ϵ=0.1\epsilon=0.1)ETM-Refine (ϵ=1\epsilon=1)ETM-Exact
Number of iterations to convergence (SemiUOT LPL_P, τ=0.1τ = 0.1)135189215243
Number of iterations to convergence (SemiUOT LPL_P, τ=1τ = 1)97153176219
Number of iterations to convergence (UOT LUL_U, τ=0.1τ = 0.1)114146157176
Number of iterations to convergence (UOT LUL_U, τ=1τ = 1)83129140168

Furthermore, we vary ηG=(0,1,100)η_G = (0, 1, 100) for SemiUOT/UOT with ETM + MROT-Entropy and ETM + MROT-Norm, where τ=1τ = 1, ηReg=0.1\eta_{\rm Reg} = 0.1 and N=500N = 500, and reported the absolute error e=i,jπijπij1e = \sum_{i,j} ||\pi_{ij} - \pi^*_{ij}||_1 in the Table B:

Table B. The absolute error on SemiUOT/UOT

MethodηG=0η_G = 0ηG=1η_G = 1ηG=100η_G = 100
ETM + MROT-Entropy (SemiUOT)1.310.970.42
ETM + MROT-Norm (SemiUOT)0.790.640.28
ETM + MROT-Entropy (UOT)1.230.850.39
ETM + MROT-Norm (UOT)0.540.460.31

Moreover we conduct the experiments with ϵ=(0.01,0.1,1)\epsilon = (0.01, 0.1, 1) and ηG=(0,1,100)\eta_G = (0, 1, 100) on UDA in Office-Home with τ=1τ = 1 following JUMBOT (i.e., sensitivity analysis on ττ has been investigated in JUMBOT) and report the average classification below:

JUMBOT + UOT(ETM + MROT-Ent)JUMBOT + UOT(ETM + MROT-Norm)
ϵ=0.01,ηG=100\epsilon = 0.01, η_G = 10073.173.4
ϵ=0.1,ηG=100\epsilon = 0.1, η_G = 10072.673.0
ϵ=1,ηG=100\epsilon = 1, η_G = 10072.272.5
ϵ=0.01,ηG=1\epsilon = 0.01, η_G = 172.472.8
ϵ=0.01,ηG=0\epsilon = 0.01, η_G = 071.771.9

From this, we observe that a smaller value of ϵ=0.01\epsilon=0.01 provides a more accurate smoothness approximation [4], resulting in fewer iterations needed for refinement. Meanwhile we can observe that a larger value on ηGη_G (e.g., ηG=100η_G = 100) can provide more useful KKT-multiplier regularization and boost the model performance as also reflected in Fig. 6.

  • Comment 4: Given the sequential nature of L-BFGS, do you see viable approaches for parallelizing or adapting your algorithm for GPU implementations?

Response 4: We leverage code from [1, 3] to optimize L-BFGS, though directly applying it to ETM-Exact may lack efficiency. Our key contribution in this paper is using fixed-point iteration in ETM-Approx to efficiently generate reliable initial solutions for ETM-Refine, achieving superlinear convergence as noted in [1, 4]. Parallelizing L-BFGS exceeds our current scope but is planned for future research.

  • Comment 5: Readability should be improved.

Response 5: Thanks for your advice. We will first introduce the important notations at the beginning of our method in the final version to make the paper more readable.

Reference:

[1] A quasi-Newton approach to non-smooth convex optimization

[2] Smooth minimization of non-smooth functions.

[3] PyTorch: An Imperative Style, High-Performance Deep Learning Library

[4] Optimization: Modeling, Algorithm and Theory

审稿意见
3

This paper presents a new approach to the Semi-Unbalanced Optimal Transport (SemiUOT) problem by determining the marginal probability distribution using the Equivalent Transformation Mechanism and extends it to the Unbalanced Optimal Transport (UOT) problem. To improve matching accuracy, the authors introduce a KKT-Multiplier regularization term combined with the Multiplier Regularized Optimal Transport method. Experiments demonstrate the effectiveness of the proposed methods for UOT/SemiUOT problems.

update after rebuttal

I appreciate the author's response and the additional experiments provided. Accurate computation should be the core aspect of the algorithm. While linear programming does involve a relatively high computational cost, this can be mitigated under specific conditions through hardware enhancements such as GPUs and multi-threading. Consequently, I will maintain the score I have assigned for now.

给作者的问题

It is recommended that the authors include other calculations of optimal transport in their comparison experiments, such as linear programming, approximation algorithms.

论据与证据

Yes, the claims made in the submission are supported by clear and convincing evidence.

方法与评估标准

Yes, but the time complexity is not given in the convergence rate.

理论论述

Yes, I do. All basic theoretical derivations are correct.

实验设计与分析

Yes, I checked. The experimental design was comprehensive, covering both synthetic and real datasets, comparing multiple baseline methods, using reasonable assessment metrics, and validating the robustness of the method through parametric analysis and ablation experiments. However, a detailed description of parameter selection could further strengthen the credibility of the experiment.

补充材料

No supple material.

与现有文献的关系

There is a correlation, but the authors have mainly improved the calculation of UOT, but not the algorithm.

遗漏的重要参考文献

The key contributing author is to improve the sinkhorn algorithm, but in the comparison experiments, it's all about using sinkhorn's baseline model for the comparison.

其他优缺点

The Equivalent Transformation Mechanism proposed in this paper surpasses existing optimal transport - based domain adaptive algorithms to some extent. However, it has the following limitations: (1) It fails to provide the convergence speed of the overall method. (2) Most compared models are based on Sinkhorn experiments. The authors don't compare them with other computational methods like linear programming - based experiments. (3) Since KL divergence can't handle non - overlapping distributions, it's questionable how the authors plan to use it in semiUOT for partial domain adaptation.

其他意见或建议

It is recommended that the authors compare their method with a variety of other computational OT methods beyond Sinkhorn. Additionally, the authors should consider using Total Variation as an alternative to KL divergence.

作者回复
  • Comment 1: The time complexity is not given in the convergence rate.

Response 1: The computation complexity of ETM-Approx is O(NMlog(1/εa))O(NM\log (1/\varepsilon_a)) where εa\varepsilon_a denotes the error tolerance (e.g., εa=f^f^oε_a = || \hat{f} - \hat{f}_o||_∞ in SemiUOT and εa=u^u^oε_a = || \hat{u} - \hat{u}_o ||_∞ in UOT, f^o\hat{f}_o and u^o\hat{u}_o denote the optimal solution on SemiUOT and UOT respectively). When the initial solution is sufficiently close to the optimal solution, quasi-Newton optimization procedure [1] can achieve super-linear convergence rate [1]. Thus, the time complexity for ETM-Refine is given as O(NMlog(1/εa)+MNDT)O(NM\log (1/\varepsilon_a) + MN D_T) where DTD_T denotes the number of iterations.

Moreover, the convergence speed on MROT is determined by different kinds of regularization terms (e.g., entropy or norm regularization). For instance, ETM-Approx + MROT-Ent or ETM-Approx + MROT-Norm has the computation complexity as O(NMlog(1/εa)+NMDπ)O(NM\log (1/\varepsilon_a) + NMD_{\pi}) where DπD_{\pi} denotes the number of iteration. Likewise, ETM-Refine + MROT-Ent or ETM-Refine + MROT-Norm has the computation complexity as O(NMlog(1/εa)+MNDT+NMDπ)O(NM\log (1/\varepsilon_a) + MN D_T + NMD_{\pi}).

  • Comment 2: A detailed description of parameter selection should be provided.

Response 2: We adopt the parameter sensitivity analysis on ϵ\epsilon and report the results in Fig.4. Moreover, we provide more detailed results by varying ϵ=(0.01,0.1,1)\epsilon = (0.01, 0.1, 1) and report the number of iterations with τ=(0.1,1)\tau = (0.1,1) and report in Response 3 for Reviewer JNNS (Table A) due to the space limit.

From this, we observe that a smaller value of ϵ=0.01\epsilon=0.01 provides a more accurate smoothness approximation [2], resulting in fewer iterations needed for refinement. Moreover, we also vary the hyper parameter of ηG\eta_G in Appendix L and report the results in Fig.6. We provide more detailed result by varying ηG=(0,1,100)\eta_G = (0, 1, 100) for SemiUOT/UOT with ETM + MROT-Entropy and ETM + MROT-Norm, where τ=1\tau = 1, ηReg=0.1\eta_{\rm Reg} = 0.1 and N=500N = 500, and reported the absolute error e=i,jπijπij1e = \sum_{i,j} || \pi_{ij} - \pi^*_{ij} ||_1 in Response 3 for Reviewer JNNS (Table B) due to the space limit.

From that we can observe that a larger value on ηG\eta_G (e.g., ηG=100\eta_G = 100) can provide more useful KKT-multiplier regularization and boost the model performance as also reflected in Fig. 6. Therefore we set ϵ=0.01\epsilon=0.01 to provide a more accurate smoothness function approximation and ηG=100\eta_G = 100 to provide better KKT regularization in our experiments.

  • Comment 3: Other computational methods like linear programming should be reported.

Response 3: Thank you for highlighting this important point. Following your advice, we present the detailed experimental results of the linear programming (LP) method combined with our proposed approach, as shown in the following table A.

Table A. The time consumption (s) on synthetic data with τ=1\tau = 1

MethodNN = 500NN = 600NN = 700NN = 800NN = 900NN = 1000
UOT (ETM-Exact + MROT-Norm)5.438.3311.0715.7621.9633.99
UOT (ETM-Exact + LP)10.7915.6420.1528.5837.3658.27

We also conduct ETM-Exact + LP for domain adaptation tasks as shown in Table B.

Table B. Classification accuracy on Office-Home for UDA

MethodAr->ClAr->PrAr->RwCl->ArCl->PrCl->RwPr->ArPr->ClPr->RwRw->ArRw->ClRw->PrAvg
JUMBOT + UOT(ETM + MROT-Ent)59.078.583.468.777.177.668.357.282.476.262.586.473.1
JUMBOT + UOT(ETM + MROT-Norm)59.478.784.168.577.378.568.657.982.876.362.586.573.4
JUMBOT + UOT(ETM + LP)60.779.284.868.777.979.069.158.383.176.662.787.273.9

From that we can observe ETM + LP can provide more accurate results and boost the model performance. However, ETM-Exact + LP could be severely time-consuming compared with other methods (e.g., ETM-Exact + MROT-Norm).

  • Comment 4: It's questionable how the authors plan to use it in semiUOT for partial domain adaptation.

Response 4: We adopt KL-divergence between the weights on data samples and the uniform distributions. SemiUOT can select and reweight the most similar data samples with higher values. We also conduct a toy example shown in Fig.1 where source and target domains are definitely non-overlapped. However, SemiUOT can still figure out outliers (the irrelevant data) and therefore it is reasonable to adopt SemiUOT to solve partial domain adaptation problems.

Reference:

[1] A quasi-Newton approach to non-smooth convex optimization

[2] Smooth minimization of non-smooth functions.

最终决定

The paper proposes casting the semi-UOT problem as classical OT. This is done in multiple ways, exact, approximate etc., leading to different results. Optimization algorithms for solving the variants are discussed. Simulations show the methodology's efficacy.

The effect of the lse based approximation is not analysed and left as a hack. Analysis of the quality of this approximation will help the presentation. Some issues with simulations, presentation are pointed out by the reviewers. A computational analysis will help the presentation. All the reviews are borderline with no strong support for accepting the paper.