PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
5
4
4
4
3.3
置信度
创新性2.3
质量2.8
清晰度3.0
重要性2.3
NeurIPS 2025

Efficient Algorithms for Robust and Partial Semi-Discrete Optimal Transport

OpenReviewPDF
提交: 2025-05-12更新: 2025-10-29
TL;DR

We relate robust and partial variants of optimal transport (OT), introduce novel characterizations of their solutions in the semi-discrete setting, and develop two algorithms for these variants of semi-discrete OT.

摘要

The sensitivity of optimal transport (OT) to noise has motivated the study of robust variants. In this paper, we study two such formulations of semi-discrete OT in $\mathbb{R}^d$: (i) the $\alpha$-optimal partial transport, which minimizes the cost of transporting a mass of $\alpha$; and (ii) the $\lambda$-robust optimal transport, which regularizes the OT problem using the total variation (TV) distance. First, we provide a novel characterization of the optimal solutions in these settings, showing they can be represented as a restricted Laguerre diagram. Second, we exploit this characterization to establish a strong algorithmic connection between the two problems, showing that any solver for one can be adapted to solve the other with comparable precision. Third, we overcome key challenges posed in extending the cost-scaling paradigm to compute these variants of OT and present an algorithm that computes the exact solution up to $\log (1/\varepsilon)$ bits of precision in $n^{O(d)}\log (1/\varepsilon)$ time, where $n$ is the support size of the discrete distribution. Finally, we present an $n^{1+o(1)}\varepsilon^{-O(d)}$ time approximation algorithm for the above variants of OT.
关键词
Semi-Discrete Optimal TransportApproximation AlgorithmsRobust Optimal Transport

评审与讨论

审稿意见
5

In this paper, the authors extend the deterministic semi-discrete optimal transport problem, which involves transporting a continuous distribution to a discrete one, to a stochastic setting. Noise and outliers are considered via two variants: α\alpha-partial optimal transport, where only an α\alpha-fraction of the probability mass is transported, and λ\lambda-robust optimal transport, where mass can be discarded at a cost of λ\lambda per unit.

The authors propose a novel theoretical characterization of exact solutions using restricted Voronoi cells and weight functions on the discrete distribution domain, under specific conditions. For inexact solutions, they rigorously derive a connection between the two problem variants and introduce a reduction scheme along with corresponding error bounds. To solve the λ\lambda-robust OT problem with high accuracy, the authors generalize the combinatorial cost-scaling paradigm, incorporating all necessary modifications. The resulting algorithm achieves a complexity bound of nO(d)log(Δ/ε)n^{O(d)} \log(\Delta/\varepsilon). All results hold for arbitrary cost functions, with additional acceleration techniques developed specifically for Wasserstein distances.

优缺点分析

Optimal Transport is an important topic, and new robust algorithms represent valuable contributions to both theory and practice.

The strongest side of the paper is its complex rigorous theoretical analysis of the proposed methods and properties of the solutions. These findings are interesting and cover all aspects of the considered setups and connections between them.

Despite the amount of information, the paper’s writing is clear and well-structured making it easy to follow.

For weakness, see the Limitations section.

问题

How long does it take to compute the Φ\Phi factor, both for known distributions (e.g., normal, exponential) and unknown distributions? What methods are available for this?

Can your methods handle distributions for which only sample data is available?

Are the obtained convergence rates optimal with respect to their dependencies on parameters such as n (sample size), d (dimensionality) and others?

局限性

The paper lacks practical validation of the proposed methods, particularly comparisons with deterministic counterparts in scenarios where robustness is essential. Even basic 2D experiments on synthetically corrupted data could have provided a proof of concept while demonstrating the methods' robustness and quality.

While the theoretical contributions are significant, their novelty appears somewhat incremental, as they primarily combine (in a non-trivial way) existing ideas, problem formulations, and methodologies.

Minor weakness: the notation δ\delta is ambiguous, it is used as a parameter in Section 3, and as units in Introduction.

最终评判理由

I'm satisfied authors' answers.

格式问题

No problem

作者回复

We appreciate your thoughtful review and comments. We answer your main questions and concerns below.

How long does it take to compute the Φ\Phi factor, both for known distributions (e.g. normal, exponential) and unknown distributions?

The factor of Φ\Phi depends on how the distribution is represented. If the distribution is a parametric distribution with a constant number of parameters, e.g. normal or exponential (mixture), then a constant degree polygonal query region can be integrated over in constant time using numerical methods. On the other hand if the continuous distribution is represented as a histogram or piecewise-linear function, then the time depends on the complexity of the distribution inside the query region. In low dimensions, one could use some geometric data structures to expedite this computation, but in high dimensions, a simple method would be more practical. Finally, if the distribution is represented as a large representative sample, then the Φ\Phi factor simply corresponds to assigning each sample point to the region of the partition it is contained in. In turn, our algorithm successfully computes a transport plan between a discrete distribution and a large sample of an unknown continuous distribution.



Can your methods handle distributions for which only sample data is available?

Our algorithm can handle sample data from a distribution, as long as one global sample is used to approximate the continuous mass inside each region of the arrangement (rather than sampling from the distribution for each mass query).



Are the obtained convergence rates optimal with respect to their dependencies on parameters such as n,dn, d?

A: As noted in Section 2, as well as references [9] and Bourne et al., the optimal semi-discrete transport plan requires computing a (restricted) weighted Voronoi diagram. There are known lower bounds of nΩ(d)n^{\Omega(d)} for the complexity of Voronoi diagrams in the worst case in dd dimensions (see e.g. Har-Peled "Geometric Approximation Algorithms"), and therefore our high-precision algorithm is near-optimal. While we do not have a rigorous lower bound for the optimal runtime of a low-precision algorithm in constant dimension, we suspect that optimal transport is at least as hard as nearest neighbor search (a naive plan would assign each point to its nearest neighbor). The runtime of our low-precision algorithm is near linear in nn and incurs comparable dependence in ε\varepsilon and dd to the best known algorithms for approximate nearest neighbor search.



The paper lacks practical validation of the proposed methods, particularly comparisons with deterministic counterparts in scenarios where robustness is essential. Even basic 2D experiments on synthetically corrupted data could have provided a proof of concept while demonstrating the methods' robustness and quality.

While the original submission emphasized our theoretical contributions, we appreciate your interest in practical implementation and experimental evaluation. In response, we include below preliminary experimental results. A more extensive evaluation will be added in the next version.

Implementation Details. We provide a Python implementation of our algorithm. However, it relies on two core procedures that lack efficient implementations in Python:

  1. Estimating continuous mass within regions defined by the arrangement of the restricted Voronoi cells and their expansions.
  2. Maintaining a dynamic tree data structure as per Sleator and Tarjan (reference [44] in the appendix).

To address these challenges in our prototype:

  • For (1), we approximate the mass within each region using a fine grid, summing the masses of grid points lying inside the region.
  • For (2), we simulate the dynamic tree by naïvely iterating over all edges.

An efficient implementation will require optimized versions of these components, which remains a technical challenge in Python.


Preliminary Results on Algorithm Performance. We evaluated our implementation using a continuous distribution sampled from a Gaussian distribution inside the unit square. The discrete support consists of nn samples drawn from the same distribution, each assigned a mass of 1/n1/n. We set λ=0.2\lambda = 0.2 and ε=0.02\varepsilon = 0.02. Results are averaged over 10 runs for each nn.

Key Findings:

  • The additive error stays within the target ε=0.02\varepsilon = 0.02.
  • The number of iterations in the two steps of our algorithm (Step 1 and 2) remains well below our upper bounds proved in the paper (6n6n and 12n12n as described in Lemma 4.3 and D.9).
  • The total path and cycle lengths computed in the two steps grow subquadratically in nn, aligning with our O(n2)O(n^2) complexity analysis (Section 4.3).

The details of our experiments are presented in the following table.

nErrorStep 1 iterationsStep 2 iterationsStep 1 paths and cycles lengthStep 2 paths and cycles lengthRegions
100.0091441361198
200.01824320991529
300.021356881719862
400.0153597923851212
500.01845160330611544
600.01556197336731855
700.01756251241292130
800.01667284547852421

Robustness of α\alpha-OPT.

We examined the effect of 10% noise in the continuous distribution (added in the top-right corner) and measured how much noise was transported under various α\alpha values. Results confirm the robustness of α\alpha-OPT: for α\alpha < 0.9, almost no noise mass is transported.

AlphaCost% inliers transported% noise transported
0.7000.0100.7770.000
0.7250.0110.8040.000
0.7500.0120.8300.015
0.7750.0130.8590.005
0.8000.0140.8870.005
0.8250.0160.9140.009
0.8500.0170.9420.014
0.8750.0190.9700.005
0.9000.0220.9960.020
0.9250.0310.9990.244
0.9500.0450.9990.496
0.9750.0600.9990.748

Robustness of λ\lambda-ROT. We conducted similar experiments for λ\lambda-ROT. As λ\lambda increases, total mass transported grows, but noise transport remains limited when the transported mass is under 0.9.

LambdaCostCost ROT% inliers transported% noise transportedMass transported
0.050.0140.0240.8960.0040.807
0.100.0200.0310.9800.0090.884
0.150.0210.0370.9910.0120.894
0.200.0230.0420.9960.0460.902
0.250.0240.0470.9990.0730.907
0.300.0230.0510.9990.0640.907
0.350.0250.0550.9990.1410.914
0.400.0270.0611.00.1470.915
0.450.0280.0631.00.2120.921
0.500.0340.0661.00.3490.935
0.550.0410.0681.00.5060.951

Please let us know if any further clarification or experiments would strengthen the paper. We are happy to incorporate additional feedback into the next version of our paper.

评论

I would like to thank the authors for their detailed responses which have fully addressed my questions. If time allows, it would be valuable to examine a broader variety of more complex distributions and noise outliers within the same experimental framework (e.g., different continuous and discrete distributions, including heavier-tailed ones).

评论

Thank you for the helpful suggestion. In the next version of our paper, we will expand our experiments to include more complex distributions. Below, we present one such additional experiment.

In this experiment, we approximate heavy-tailed behavior within a bounded region (the unit square) using mixtures of Beta distributions that induce skewed and dispersed mass. Specifically, the discrete distribution consists of n=20n=20 samples drawn from a Beta(α=2,β\alpha=2,\beta) distribution, with β\beta varying in the range [3,10]. As β\beta increases, the distribution becomes increasingly concentrated near [0,0]. The continuous distribution is constructed by contaminating the same base distribution with 15% noise drawn from a Beta(α\alpha=5,β\beta=2) distribution (a distribution concentrated close to [0.9,0.9]). For smaller values of β\beta, the two components overlap significantly; for larger values, they become well-separated. Each experiment was repeated 10 times, and we report the averaged results.

These experiments confirm that our algorithm remains reliable even when inliers and outliers overlap. In particular, the computed partial and robust transport plans consistently focus on the inlier mass while effectively disregarding much of the outlier mass—especially as the main and noise distributions become more dissimilar. Performance metrics (e.g., computational complexity, convergence rate, and error) remain consistent with those observed in the Gaussian case.



Robustness of 0.850.85-OPT:

β\betaCost% Inliers transported% Outliers transported
30.0160.9260.419
40.0160.9570.245
50.0140.9770.132
60.0130.9780.125
70.0130.9900.056
80.0120.9910.053
90.0110.9940.035
100.0110.9960.021


Robustness of 0.150.15-ROT:

β\betaCostRobust cost% Inliers transported% Outliers transportedTotal mass transported
30.0220.0410.9800.4930.907
40.0200.0410.9920.3500.896
50.0190.0420.9950.2450.883
60.0170.0410.9980.1910.877
70.0150.0420.9980.0910.862
80.0130.0410.9990.0850.862
90.0120.0400.9990.0530.857
100.0110.0401.0000.0480.857


Performance of our Algorithm:

β\betaErrorStep 1 iterationsStep 2 iterationsStep 1 paths and cycles lengthStep 2 paths and cycles lengthRegions
30.01246420993495
40.016783321026495
50.00567190986462
60.00768131971435
70.01098229886421
80.00836161810382
90.0076889849388
100.0073555766351
审稿意见
4

This paper studies two formulations of semi-discrete optimal transport (OT) problem: α\alpha-optimal partial transport and λ\lambda-robust OT. The authors provide theoretical investigations on the optimal solutions in these formulations, connection between the algorithms for each of the settings and present their own algorithm for each of the formulations.

优缺点分析

Strengths. The paper provides a lot of theoretical results on the connection and properties of partial and robust OT formulations, their solutions and corresponding algorithms.

Weaknesses. My main concern corresponds to the lack of experiments in the paper. The authors present a lot of theoretical results related to partial/robust OT formulations, propose new algorithms to solve these problems, establish their convergence properties, but do not test them in any experiments. I am confused by this fact, since while the paper proposes the algorithms, it should provide at least some experimental validation of their performance. I am also interested in real-world applications of the proposed algorithms, i.e., experiments showing that semi-discrete partial/robust OT formulations and corresponding algorithms are useful in practice. However, at this step, even toy experiments are missing.

The related concern corresponds to the paper clarity. The paper is really hard to follow, most of the text is written in a technical, mathematical style starting from the ‘Introduction’ section. While I understand that it mainly focuses on theoretical investigations, the text should be written in an accessible and understandable form:

  • I kindly suggest the authors exclude complex explanations of their results from the ‘Introduction’ section (lines 60-94) and rather include the paper contributions in a general form trying to avoid specific mathematical notations and definitions which are then introduced one more time in main sections.
  • I also have concerns regarding the clarity of OT formulations. In the classic OT problem, the marginals of the plan should coincide with input and target measures. However, in lines 98-104, you give the definition of the OT plan replacing the equality constraint on the marginals with the inequalities. While I understand that the purpose was to give a general definition applicable for the partial/robust OT cases, the explanation might be ambiguous for the reader and should be clarified.
  • I am also wondering why the paper lacks a ‘Conclusion’/’Discussion’ section with summarization of the achieved results in a general form.
  • Another minor concern corresponds to the lines 117-118 which operate with the notions of LP formulation/duality which were not given in the text before that.

Another restriction of the established theory corresponds to the usage of the pp-Wasserstein distance in the partial/robust OT formulations. This point should be stated as a limitation of this work since general OT formulations might cover many other costs.

In summary, I have doubts about the relevance of this paper for the current conference. The paper focuses on the theoretical results related to the partial/robust OT formulations and does not include proper experimental validation of the suggested algorithms (no experiments at all). Theoretical results might represent an interest for the ML community, but their motivation, ideas and originality should be clearly explained in the paper’s main text while now the exposition is very technical and hard to follow. However, currently the exposition of the paper seems to be more suitable for a mathematical venue where the significance of the established theory (nearly 30 pages in Appendix) could be carefully checked.

问题

  • Could you conduct some experiments justifying your theoretical conclusions?
  • Could you please provide a comment regarding the originality of your approach which follows the ideas of the one suggested in (Agarwal et al., 2024)?

局限性

The limitations of the approach are stated only in the NeurIPS paper checklist. I kindly suggest the authors create a separate ‘Limitations’ section in the main text or Appendix of their paper and list the limitations there. One limitation is missing here - the paper establishes results only for the pp-Wasserstein distance as the OT cost.

最终评判理由

During the rebuttal, the authors have addressed my main concerns. That is, they conducted the experiments which I have suggested, and promised to improve the clarity of the paper, e.g., improve the paper text, add missing sections, etc. Thus, I increase my score to slightly positive (4). I do not assign higher score since all of the experiments were conducted during the rebuttal. I think that the question if this represent a major revision or not could be discussed among the Reviewers and Area Chairs.

格式问题

Paper misses the Conclusion/Discussion section with summarization of the achieved results in a general form. There is no separate 'Limitations' section also, the limitations are included only in the NeurIPS checklist.

作者回复

We appreciate the reviewer’s comment regarding the density of technical details in the introduction. In view of this comment, we will fix the introduction to present the motivation, high-level ideas, and contributions in a clearer and more accessible manner, deferring technical details to later sections. We will also do our best to make the later sections more accessible.



Could you conduct some experiments justifying your theoretical conclusions?

Summary of Contributions. Our paper presents the following key contributions:

  • A characterization of partial and robust variants of the optimal transport (OT) problem in the semi-discrete setting,
  • Theoretical guarantees on the approximation error when using an approximate λ\lambda-ROT solver to compute an approximate α\alpha-OPT plan, and,
  • Two algorithmic solutions for the λ\lambda-ROT problem.

While the original submission emphasized our theoretical contributions, we appreciate your interest in practical implementation and experimental evaluation. In response, we include below preliminary experimental results. A more extensive evaluation will be added in the next version.

Implementation Details. We provide a Python implementation of our algorithm. However, it relies on two core procedures that lack efficient implementations in Python:

  1. Estimating continuous mass within regions defined by the arrangement of the restricted Voronoi cells and their expansions.
  2. Maintaining a dynamic tree data structure as per Sleator and Tarjan (reference [44] in the appendix).

To address these challenges in our prototype:

  • For (1), we approximate the mass within each region using a fine grid, summing the masses of grid points lying inside the region.
  • For (2), we simulate the dynamic tree by naïvely iterating over all edges.

An efficient implementation will require optimized versions of these components, which remains a technical challenge in Python.


Preliminary Results on Algorithm Performance. We evaluated our implementation using a continuous distribution sampled from a Gaussian distribution inside the unit square. The discrete support consists of nn samples drawn from the same distribution, each assigned a mass of 1/n1/n. We set λ=0.2\lambda = 0.2 and ε=0.02\varepsilon = 0.02. Results are averaged over 10 runs for each nn.

Key Findings:

  • The additive error stays within the target ε=0.02\varepsilon = 0.02.
  • The number of iterations in the two steps of our algorithm (Step 1 and 2) remains well below our upper bounds proved in the paper (6n6n and 12n12n as described in Lemma 4.3 and D.9).
  • The total path and cycle lengths computed in the two steps grow subquadratically in nn, aligning with our O(n2)O(n^2) complexity analysis (Section 4.3).

The details of our experiments are presented in the following table.

nErrorStep 1 iterationsStep 2 iterationsStep 1 paths and cycles lengthStep 2 paths and cycles lengthRegions
100.0091441361198
200.01824320991529
300.021356881719862
400.0153597923851212
500.01845160330611544
600.01556197336731855
700.01756251241292130
800.01667284547852421

Reduction from α\alpha-OPT to λ\lambda-ROT. We evaluated our reduction from α\alpha-OPT to λ\lambda-ROT by using the Sinkhorn algorithm to approximate λ\lambda-ROT plans, followed by binary search over λ\lambda to compute an α\alpha-OPT plan. The continuous distribution in our experiments is an exponential distribution over the unit square, centered at the bottom-left corner, with 10% noise added to the top-right corner. The discrete set BB consists of n=400n=400 samples from the same exponential distribution, each assigned mass 1/n1/n. We target a transported mass of α\alpha = 0.9.

Additive Error: For any fixed regularization parameter, we evaluate the additive error (with respect to the optimal λ\lambda-ROT solution) incurred by Sinkhorn, and compare it to the additive error of our computed α\alpha-OPT solution (with respect to the optimal α\alpha-OPT). We observe that the error in α\alpha-OPT closely mirrors that of λ\lambda-ROT across different choices of regularization parameter. Moreover, as the regularization parameter decreases, the additive errors in α\alpha-OPT also diminish. This behavior is consistent with the theoretical guarantee established in Theorem 3.5.

RegularizationRobust CostPartial CostAdditive error of ROTAdditive error of OPT
0.010.1760.0840.0000.000
0.020.1860.0950.0090.012
0.030.1820.1000.0120.017
0.040.1930.1120.0230.029
0.050.2050.1250.0340.041
0.060.2160.1370.0460.053
0.070.2270.1480.0570.065
0.080.2380.1590.0680.075

Approximation factor: Similarly, we evaluate the approximation factor for each method using the same setup. Once again, we find that the approximation factor in α\alpha-OPT closely mirrors that of λ\lambda-ROT over a range of regularization parameters. As the regularization parameter decreases, the approximation factor in α\alpha-OPT also approaches one, further corroborating the theoretical prediction in Theorem 3.5.

RegularizationRobust CostPartial CostRelative error of ROTRelative error of OPT
0.010.1760.0840.9981.003
0.020.1860.0951.0501.142
0.030.1820.1001.0711.200
0.040.1930.1121.1361.346
0.050.2050.1251.2031.493
0.060.2160.1371.2691.638
0.070.2270.1481.3341.776
0.080.2380.1591.3971.904

Please let us know if any further clarification or experiments would strengthen the paper. We are happy to incorporate additional feedback into the next version of our paper.



Could you please provide a comment regarding the originality of your approach which follows the ideas of the one suggested in Agarwal et al., 2024?

Challenges in Extending Cost-Scaling to Partial Transport: Cost-scaling algorithms have proven effective for solving the complete optimal transport (OT) problem. However, extending these techniques to partial optimal transport introduces substantial challenges—even in fully discrete settings (see, e.g., Ramshaw and Tarjan, FOCS 2012 [37])—and these difficulties are further amplified in the semi-discrete setting. A central obstacle lies in the behavior of the dual weights: in complete OT, the optimal solution is invariant under translations of the dual weight vector for the discrete set BB. In contrast, in partial OT, the absolute magnitudes of the dual weights are crucial, as they control the truncation of Voronoi cells and thus directly determine which regions of the continuous distribution are assigned to each discrete point.

Limitations of Existing Scaling Algorithms. Recent cost-scaling algorithms (e.g., Agarwal et al., SODA 2024; NeurIPS 2024) increase the magnitudes of the dual weights for the discrete set BB, and implicitly for the continuous domain AA, over successive scales. However, these methods lack a mechanism to reduce or correct the dual weights once they are overestimated. This renders existing scaling algorithms unsuitable for partial OT, where accurate dual weight magnitudes are critical.

Our Solution. To overcome this limitation, we introduce a novel step that uses consolidating paths and augmenting paths to reroute mass in the residual graph and correct overestimated dual weight magnitudes. The addition of this step makes it non-trivial to establish both the correctness of the modified algorithm and its efficiency matching that of Agarwal et al., NeurIPS 2024. These challenges necessitate several new ideas, as detailed in Lemmas 4.2 and 4.3 of the main text and Lemma D.9 in the Appendix.

Geometric Interpretation of this limitation and consolidating paths. At the start of a scale δ\delta, some mass from the continuous distribution may lie well inside the restricted Voronoi cells of the discrete points BB, yet remain untransported—these are what we call violating deficit regions (see Figure 3). In the complete OT setting, such violations are naturally resolved as the plan evolves to transport all mass. In the partial OT setting, however, this issue is non-trivial and must be explicitly resolved to ensure correctness. Our consolidating paths are specifically designed to eliminate all violating deficit regions by restructuring the transport plan to ensure that the total transported mass lies strictly within the restricted Voronoi regions, thereby guaranteeing correctness of the partial OT plan.

评论

I thank the authors for the provided answers and conducted experiments. However, there are several aspects which still raise my concerns. First, some of the weaknesses which I mentioned in my initial review left undiscussed, e.g., lack of the discussion section, absence of limitation stating that the algorithm deals only with a quadratic cost. I kindly ask the authors to take these points into consideration during the preparation of the revised version of the paper. Second, I have several questions about preliminary experiments conducted during the rebuttal:

  • As far as I understand, you work with 2-dimensional data in these experiments? I am wondering whether the proposed algorithm is scalable to higher dimensions.
  • How exactly do you calculate the additive and relative errors?
  • Is it possible to compare your approach with alternatives in the described experimental setup? For example, comparison with (Agarwal et al., 2024) seems to be valuable. It is interesting to see how this approach behaves in case of the outliers in the data.

I will appreciate if the authors could provide clarifications on these points during the remaining days of rebuttal.

评论

We sincerely thank the reviewer for their follow-up questions.



Applicability to Other Distance Functions: Our algorithm and its analysis extend to distance functions where both the weighted bisectors and the balls are semi-algebraic sets of constant degree. This includes all pq\ell_p^q norms for any constant p,q1p, q\ge 1. We would like to note that for arbitrary distance functions, the weighted Voronoi diagram might have an unbounded complexity. In such cases, as opposed to a discrete transport plan, a semi-discrete transport plan may not have a finite representation. We will clarify this in the next version of the paper and also add a Conclusion section summarizing our contributions and discussing the range of admissible cost functions.



Scalability to Higher Dimensions: As noted in Section 2 and discussed in reference [9], computing the optimal semi-discrete transport plan requires constructing a (restricted) weighted Voronoi diagram. It is known that the combinatorial complexity of such diagrams in dd dimensions can be as high as nΩ(d)n^{\Omega(d)} in the worst case. Therefore, the execution time of our exact algorithm is nO(d)n^{O(d)}, which is near-optimal given these lower bounds. However, the complexity of the (restricted) Voronoi diagram is small in many instances, e.g., when points are chosen randomly from a distribution. Assuming there is an efficient algorithm for computing the Voronoi diagram in such cases, our algorithm will also be efficient and not incur the worst-case cost.

We also evaluated the implementation of our algorithm on a 3D Gaussian distribution with 10% additive noise. The results below demonstrate that our algorithm continues to behave predictably with respect to error and iteration counts, even as the dimension increases:

nErrorStep 1 iterationsStep 2 iterationsStep 1 paths and cycles lengthStep 2 paths and cycles lengthRegions
100.0022525454389
200.0165630325141776
300.0135650046643292
400.016118158770574807
500.02812112820109546681
600.02416123644131418072
700.02015104735153109008


Computation of Additive and Relative Errors: To compute the additive error for Sinkhorn, we take the absolute difference in cost of the plan produced by Sinkhorn and the optimal transport plan. The approximation factor for the ROT computed by the Sinkhorn algorithm is the ratio of the cost of Sinkhorn’s plan to the optimal transport plan.



Comparison with Agarwal et al. (2024): The algorithm of Agarwal et al. (2024) addresses the complete semi-discrete OT problem, and ours generalizes it to compute partial transport plans too. Notably, while their paper does not provide an implementation, by setting appropriate parameters, our implementation serves as an implementation of their algorithm.

In the revised version of our paper, we will include images illustrating how the partial semi-discrete transport plan selectively transports inlier mass in contrast to complete transport plans (as in Agarwal et al. (2024)), which are distorted by the presence of outliers.



Summary: Finally, we would like to reiterate that the primary contributions of our work are new characterizations of partial semi-discrete OT, duality-based reductions between partial and robust semi-discrete OT, and two efficient algorithms for partial semi-discrete OT with provable guarantees on their performance. To complement this, we have implemented a prototype of our algorithm as a proof of concept. An optimized implementation that addresses all algorithmic engineering issues is beyond the scope of this paper. If the paper is accepted, we are committed to releasing the code publicly for the benefit of ML community and also to facilitate future research.

评论

The authors have addressed my major concerns, thus, I will increase my score.

审稿意见
4

Paper studies two robust formulations of optimal transport (OT) : α\alpha-optimal partial transport (α\alpha-OPT), which transports only a fraction of the mass of the two input distributions, and λ\lambda-robust optimal transport (λ\lambda-ROT), which penalizes untransported mass via a total variation regularization. While both variants have been well-studied in fully discrete settings, e.g. by Caffarelli and MacCann (ref [9] of the paper), this work proposes to extend them to semi-discrete OT. The authors introduce novel characterizations of optimal solutions using restricted weighted Voronoi diagrams, which captures the points of the Voronoi cells that are within a given distance.

Authors show that any solver for λ\lambda-ROT can be leveraged to solve α\alpha-OPT, thus unifying the algorithmic treatment of these problems. They propose two algorithmic contributions: (1) a high-precision combinatorial algorithm for semi-discrete λ\lambda-ROT using a refined cost-scaling approach (2) a near-linear-time approximation algorithm. Theoretical analysis includes correctness proofs, runtime bounds, and approximation guarantees even when only a fraction of the mass is transported.

The paper does not include empirical experiments as it is primarily theoretical in nature. Authors clearly articulate limitations, such as the exponential dependence on dimension.

优缺点分析

Strengths:

  • The paper addresses an important problem: robust and partial OT in semi-discrete settings, for which no/few prior algorithms exists.
  • The characterizations via restricted Voronoi diagrams are very elegant and provide useful geometric insight.
  • The paper is exceptionally well-crafted, presenting the ideas clearly and accessibly despite the complexity of the subject matter.
  • Although I didn't check all the proofs, I think the algorithm and results are reasonable.

Weaknesses:

  • While the contribution is theoretical in nature, numerical evaluations are not provided.
  • the connection between this work and Bourne et al., 2018, are not discussed

Bourne, D. P., Schmitzer, B., & Wirth, B. (2018). Semi-discrete unbalanced optimal transport and quantization. arXiv preprint arXiv:1808.01962.

问题

  • Could you provide numerical experiments (even in simple cases) that could validate experimentally your algorithm and show its practical utility?
  • how does this work relates to Bourne et al., 2018?
  • the connection between α\alpha-OPT and λ\lambda-ROT has been thoroughly examined in the work of Caffarelli and MacCann (ref [9] of the paper). Could you clarify how your results relate to their duality findings?

局限性

Yes

最终评判理由

I keep a positive evaluation of the paper, leaning toward acceptance. I found the paper really clear and it provides new results about robust semi-discrete OT problems. My main concern is about the lack of experimental evaluation, beyond toy experimentation.

格式问题

yes

作者回复

We appreciate your thoughtful review and comments. Below, we answer your main questions and concerns.



Could you provide numerical experiments (even in simple cases) that could validate experimentally your algorithm and show its practical utility?

Summary of Contributions. Our paper presents the following key contributions:

  • A characterization of partial and robust variants of the optimal transport (OT) problem in the semi-discrete setting,
  • Theoretical guarantees on the approximation error when using an approximate λ\lambda-ROT solver to compute an approximate α\alpha-OPT plan, and,
  • Two algorithmic solutions for the λ\lambda-ROT problem.

While the original submission emphasized our theoretical contributions, we appreciate your interest in practical implementation and experimental evaluation. In response, we include below preliminary experimental results. A more extensive evaluation will be added in the next version.

Implementation Details. We provide a Python implementation of our algorithm. However, it relies on two core procedures that lack efficient implementations in Python:

  1. Estimating continuous mass within regions defined by the arrangement of the restricted Voronoi cells and their expansions.
  2. Maintaining a dynamic tree data structure as per Sleator and Tarjan (reference [44] in the appendix).

To address these challenges in our prototype:

  • For (1), we approximate the mass within each region using a fine grid, summing the masses of grid points lying inside the region.
  • For (2), we simulate the dynamic tree by naïvely iterating over all edges.

An efficient implementation will require optimized versions of these components, which remains a technical challenge in Python.


Preliminary Results on Algorithm Performance. We evaluated our implementation using a continuous distribution sampled from a Gaussian distribution inside the unit square. The discrete support consists of nn samples drawn from the same distribution, each assigned a mass of 1/n1/n. We set λ=0.2\lambda = 0.2 and ε=0.02\varepsilon = 0.02. Results are averaged over 10 runs for each nn.

Key Findings:

  • The additive error stays within the target ε=0.02\varepsilon = 0.02.
  • The number of iterations in the two steps of our algorithm (Step 1 and 2) remains well below our upper bounds proved in the paper (6n6n and 12n12n as described in Lemma 4.3 and D.9).
  • The total path and cycle lengths computed in the two steps grow subquadratically in nn, aligning with our O(n2)O(n^2) complexity analysis (Section 4.3).

The details of our experiments are presented in the following table.

nErrorStep 1 iterationsStep 2 iterationsStep 1 paths and cycles lengthStep 2 paths and cycles lengthRegions
100.0091441361198
200.01824320991529
300.021356881719862
400.0153597923851212
500.01845160330611544
600.01556197336731855
700.01756251241292130
800.01667284547852421

Please let us know if any further clarification or experiments would strengthen the paper. We are happy to incorporate additional feedback into the next version of our paper.



How does this work relate to Bourne et al., 2018? Could you clarify how your results relate to the duality findings of Caffarelli and McCann [9]?

A: First, we appreciate the reviewer for bringing the work of Bourne et al. to our attention. We will cite their work in the final version. These papers also form connections among α\alpha-OPT, λ\lambda-ROT, and weighted Voronoi diagrams. But we respectfully submit that we give a richer characterization that provides deeper insights into these problems. For example, [9] works with the weighted Voronoi diagram under the capped distance function and argues that the mass transported to bb in a λ\lambda-ROT plan is contained within the weighted Voronoi cell centered at bb. We note that the Voronoi diagram under the capped distance is counter-intuitive, e.g., the boundary of a Voronoi cell may not be connected, and thus hard to interpret. Similarly, Bourne et al. prove an optimal α\alpha-OPT plan routes the mass of bb to a subset of the mass within its weighted Voronoi cell, and only find experimentally that the mass transported to each discrete point is contained within a ball (they do not prove the second claim). In contrast, we work with the weighted Voronoi diagram under the Euclidean (or squared Euclidean) distance, where the Voronoi diagram is more standard, and prove that the α\alpha-OPT plan routes bb to all mass of its restricted Voronoi cell, i.e. its weighted Voronoi cell intersected with the ball of radius equal to the dual weight. With our method, we additionally prove that every point whose mass is not fully routed in the optimal α\alpha-partial transport plan has a maximal dual weight, which is an even stronger guarantee. Thus we provide a more detailed characterization. For the sake of space, we did not go into as much detail on the differences between Section 2 and prior work. We appreciate the question and will make this difference more clear in a final version.

Furthermore, [9] only studies the duality between exact solutions of λ\lambda-ROT and α\alpha-OPT. In contrast, our main contribution (see Section 3) is to show that an approximate solver for one could be used to compute an approximate solution for the other. This result is nontrivial, as shown by the pathological example in Section 3.

We finally note that our work provides two new combinatorial algorithms to approximate the λ\lambda-ROT problem to varying degrees of precision.

评论

We greatly appreciate your time and feedback, and we would be happy to clarify any remaining questions or concerns you might have.

As the discussion deadline approaches, we’d be grateful if you have a chance to take a final look and let us know if there’s anything else we can address.

评论

Thank you very much for your detailed answer. I keep my positive evaluation of the paper.

审稿意见
4

This paper presents two categories of semi-discrete OT which are the α-partial transport problem and TV-regularized optimal transport formulation with regularization parameter λ. The two categories can be demonstrated in one common formulation. The theoretical analysis reveals a fundamental connection between these problems, demonstrating that their optimal solutions can be characterized as restricted Laguerre diagrams - a geometric insight that enables us to establish their algorithmic equivalence. Building on this theoretical foundation, novel computational algorithm are developed. A rigorous analysis of the algorithm's precision and computational complexity is presented.

优缺点分析

Strengths:

  1. Provides a novel characterization of optimal solutions as restricted Laguerre diagrams, deepening the geometric understanding of partial semi optimal transportation with TV regularization. The theory is mathematically rigorous.
  2. Establishes a strong equivalence between α-partial and λ-robust OT, enabling solvers for one problem to be adapted to the other.
  3. The proposed method achieves exact solutions with guaranteed precision, while also providing an efficient approximation scheme, with their respective computational complexities rigorously analyzed. Weaknesses: The theoretical findings lack empirical validation and practical substantiation.

问题

  1. Could numerical experiments be incorporated to validate the algorithm's efficacy?
  2. These two robust formulations of semi-discrete optimal transport provide mathematically grounded mechanisms. It will enhance noise resistance through distinct regularization paradigms. Could this theoretical claim be empirically validated through some experiments?

局限性

yes

格式问题

The format is good enough.

作者回复

We appreciate your thoughtful review and comments. We address your concerns below.

Summary of Contributions. Our paper presents the following key contributions:

  • A characterization of partial and robust variants of the optimal transport (OT) problem in the semi-discrete setting,
  • Theoretical guarantees on the approximation error when using an approximate λ\lambda-ROT solver to compute an approximate α\alpha-OPT plan, and,
  • Two algorithmic solutions for the λ\lambda-ROT problem.

While the original submission emphasized our theoretical contributions, we appreciate your interest in practical implementation and experimental evaluation. In response, we include below preliminary experimental results. A more extensive evaluation will be added in the next version.

Implementation Details. We provide a Python implementation of our algorithm. However, it relies on two core procedures that lack efficient implementations in Python:

  1. Estimating continuous mass within regions defined by the arrangement of the restricted Voronoi cells and their expansions.
  2. Maintaining a dynamic tree data structure as per Sleator and Tarjan (reference [44] in the appendix).

To address these challenges in our prototype:

  • For (1), we approximate the mass within each region using a fine grid, summing the masses of grid points lying inside the region.
  • For (2), we simulate the dynamic tree by naïvely iterating over all edges.

An efficient implementation will require optimized versions of these components, which remains a technical challenge in Python.



Preliminary Results on Algorithm Performance. We evaluated our implementation using a continuous distribution sampled from a Gaussian distribution inside the unit square. The discrete support consists of nn samples drawn from the same distribution, each assigned a mass of 1/n1/n. We set λ=0.2\lambda = 0.2 and ε=0.02\varepsilon = 0.02. Results are averaged over 10 runs for each nn.

Key Findings:

  • The additive error stays within the target ε=0.02\varepsilon = 0.02.
  • The number of iterations in the two steps of our algorithm (Step 1 and 2) remains well below our upper bounds proved in the paper (6n6n and 12n12n as described in Lemma 4.3 and D.9).
  • The total path and cycle lengths computed in the two steps grow subquadratically in nn, aligning with our O(n2)O(n^2) complexity analysis (Section 4.3).

The details of our experiments are presented in the following table.

nErrorStep 1 iterationsStep 2 iterationsStep 1 paths and cycles lengthStep 2 paths and cycles lengthRegions
100.0091441361198
200.01824320991529
300.021356881719862
400.0153597923851212
500.01845160330611544
600.01556197336731855
700.01756251241292130
800.01667284547852421


Robustness of α\alpha-OPT.

We examined the effect of 10% noise in the continuous distribution (added in the top-right corner) and measured how much noise was transported under various α\alpha values. Results confirm the robustness of α\alpha-OPT: for α\alpha < 0.9, almost no noise mass is transported.

AlphaCost% inliers transported% noise transported
0.7000.0100.7770.000
0.7250.0110.8040.000
0.7500.0120.8300.015
0.7750.0130.8590.005
0.8000.0140.8870.005
0.8250.0160.9140.009
0.8500.0170.9420.014
0.8750.0190.9700.005
0.9000.0220.9960.020
0.9250.0310.9990.244
0.9500.0450.9990.496
0.9750.0600.9990.748

Robustness of λ\lambda-ROT. We conducted similar experiments for λ\lambda-ROT. As λ\lambda increases, total mass transported grows, but noise transport remains limited when the transported mass is under 0.9.

LambdaCostCost ROT% inliers transported% noise transportedMass transported
0.050.0140.0240.8960.0040.807
0.100.0200.0310.9800.0090.884
0.150.0210.0370.9910.0120.894
0.200.0230.0420.9960.0460.902
0.250.0240.0470.9990.0730.907
0.300.0230.0510.9990.0640.907
0.350.0250.0550.9990.1410.914
0.400.0270.0611.00.1470.915
0.450.0280.0631.00.2120.921
0.500.0340.0661.00.3490.935
0.550.0410.0681.00.5060.951

Please let us know if any further clarification or experiments would strengthen the paper. We are happy to incorporate additional feedback into the next version of our paper.

评论

We greatly appreciate your time and feedback, and we would be happy to clarify any remaining questions or concerns you might have.

As the discussion deadline approaches, we’d be grateful if you have a chance to take a final look and let us know if there’s anything else we can address.

最终决定

The paper investigates the partial and robust variants of semi-discrete OT. All the reviewers agree that this is an important problem and the paper offers novel and interesting insights to this problem via the restricted Laguerre diagrams. The original submission did not contain any numerical verification results of the algorithm, however during the discussion period, such numerical results were provided to the satisfactory of the reviewers. For this reason, I think the paper has been improved and does meet the bar of acceptance. I strongly recommend that the authors incorporate these important information (experimental result, conclusion section) in the final paper if it is accepted.