/10

Oral4 位审稿人

最低3最高4标准差0.4

ICML 2025

Fully Dynamic Euclidean Bi-Chromatic Matching in Sublinear Update Time

Gramoz Goranci,Peter Kiss,Neel Patel,Martin P. Seybold,Eva Szilagyi,Da Wei Zheng

提交: 2025-01-23更新: 2025-08-14

摘要

We consider the Euclidean bi-chromatic matching problem in the dynamic setting, where the goal is to efficiently process point insertions and deletions while maintaining a high-quality solution. Computing the minimum cost bi-chromatic matching is one of the core problems in geometric optimization that has found many applications, most notably in estimating Wasserstein distance between two distributions. In this work, we present the first fully dynamic algorithm for Euclidean bi-chromatic matching with sublinear update time. For any fixed $\varepsilon > 0$, our algorithm achieves $O(1/\varepsilon)$-approximation and handles updates in $O(n^{\varepsilon})$ time. Our experiments show that our algorithm enables effective monitoring of the distributional drift in the Wasserstein distance on real and synthetic data sets, while outperforming the runtime of baseline approximations by orders of magnitudes.

关键词

Euclidean bi-chromatic matchingdynamic algorithm1-Wasserstein distance

评审与讨论

审稿意见

评分: 42025-03-08

This paper studied geometric bi-chromatic matching (aka the optimal transportation problem) for discrete distributions in the dynamic setting, where points could undergo insertions and deletions. Here, we are given $n$ points inserted and deleted dynamically, and we want to always maintain an approximation to the optimal matching between the red and blue points.

The main result of the paper is a dynamic algorithm that maintains an implicit representation of a $(1/\varepsilon)$ -approximation to the optimal bi-chromatic matching in $O(n^{1+\varepsilon})$ pre-processing time and $O(n^{\varepsilon}/\varepsilon)$ update time. The paper further complements the algorithm with a lower bound of $\Omega(n)$ update time for $<2$ approximation, indicating that we could not expect $1+\varepsilon$ -approximation as in the offline setting.

The paper further conducted experiments, showing that the proposed algorithm has significant advantage for the efficiency over the static algorithm.

给作者的问题

Do you have any opinion on how tight your result is? Is it possible to, say, get any approximation in $\text{polylog}(n)$ update time?

论据与证据

Yes. The algorithm is quite natural, and the detailed descriptions and the proofs are included in the appendix.

方法与评估标准

Yes. Since this paper is the first to consider dynamic bi-chromatic matching, comparing it with the offline algorithm seems reasonable.

理论论述

The main technical idea of the paper is to take advantage of the $p$ -tree structure to maintain an implicit representation of the matching. Here, if we pick $p=n^{O(\varepsilon)}$ , the $p$ -tree will of of depth only $O(1/\varepsilon)$ . The algorithm first tries to resolve matching inside each leaf node; if there is any surplus of a color, the algorithm aggregates the ‘surplus’ and matches the mass to sibling nodes in the $p$ -tree. The process could be done recursively only on $O(1/\varepsilon)$ nodes per point update (since we only need to follow a path from leaf to root), which results in a sublinear update time algorithm.

The design of the algorithm is quite natural and straightforward. Due to the simplicity for understanding, I did not check the details in the proof. I’m confident the algorithm is correct.

实验设计与分析

Yes, nothing looks bad to me.

补充材料

The appendix contains the formal pseudocodes of the algorithm, the omitted proofs, a lower bound, and additional experiments. I briefly checked the lower bound and the additional experiments, and they look good.

与现有文献的关系

Bi-chromatic matching (also known as optimal transportation) has a wide range of applications in machine learning. Solving the problem in the dynamic setting is an important problem in my opinion.

遗漏的重要参考文献

No essential reference is missing, as far as I know.

其他优缺点

My overall opinion of this paper is positive. Optimal transportation is an important problem in machine learning, and the dynamic setting is an important scenario increasingly common in modern applications. I’m surprised the problem was not studied before. From a technical perspective, the problem also correctly identified the gap between geometric matching vs. graph matching and obtained sublinear update time dynamic algorithm for the former. The paper is also well-written, and I could follow most of the arguments without checking the details.

The techniques are relatively straightforward, and this might become an issue for technical contributions. However, I personally like simple and cute algorithms and do not have a problem with them.

其他意见或建议

The texts in Figures 2 and 3 are generally very small. I had to zoom in 400% to read. I understand this might be a problem with the template. My suggestion would be to have the same figures with better resolutions in the appendix and add a pointer.

作者回复

2025-03-31

We would like to thank the reviewer for their time and for providing valuable comments.

“The texts in Figures 2 and 3 are generally very small. My suggestion would be to have the same figures with better resolutions in the appendix and add a pointer.”

All figures will be replicated in the appendix, allowing for a more readable presentation.

“Do you have any opinion on how tight your result is? Is it possible to, say, get any approximation in $\text{polylog } n$ update time?”

For a choice $\varepsilon = 1/ \log n$ , our algorithm reports an $O(\log n)$ -approximation solution in $O(\text{polylog } n)$ update time. We believe that there are instances where our current algorithms attain the claimed trade-off between approximation ratio and update time. It’s an interesting open question whether such a trade-off can be improved, and will likely require new ideas.

审稿人评论

2025-04-02

Thanks for the clarification. I think it would be helpful if you could add more discussion about the tightness of your algorithm in your final version.

In light of the discussion, I'll keep my score as it is.

作者评论

2025-04-02

Thank you for the response. We'll add an example showing the tightness of our algorithm in the final version.

审稿意见

评分: 42025-03-12

This papers presents a dynamic data structure that maintains a bipartite matching and supports insertions ans deletions of points. Given two point sets in $2$ -dimensional space and a parameter $\varepsilon>0$ , the data structure computes an $O(1/\varepsilon)$ -approximate matching and handles updates in $O(n^{-\varepsilon})$ time.

The data structure stores a $p$ -tree, which is a hierarchical partitioning similar to quadtrees but with a higher fan-out of $p^2$ . For each cell of the hierarchical partitioning, the data structure maintains an "implicit matching", which can be then converted to an explicit matching in linear time. The data structure gathers the excess demand/supply of the points in the sub-tree of that node at the center of cell, which are then matched to the excess masses at the centers of the sibling cells. Inserting/deleting a point into/from each set requires updating the matching computes for the cells along the path from the root to the leaf node containing the inserted point. By picking $p=n^{-O(\varepsilon)}$ , the height of the tree will be $O(1/\varepsilon)$ and updating the matching of each cell would take $O(n^{-\varepsilon})$ time.

update after rebuttal

I keep my positive score.

给作者的问题

In your experimental results, you used $p=2, 8, 32$ , but your paper mentions the value $p$ as $n^{-\varepsilon}$ , which is less than $1$ . I assume that the diameter of the space plays a role in the choice of $p$ in your experiments, but I could not get a sense of what is in fact the value of $\varepsilon$ .

论据与证据

The paper introduces a fully dynamic algorithm for Euclidean bi-chromatic matching that achieves an $O(1/\varepsilon)$ approximation with sublinear update time. The authors prove that no algorithm can achieve a $(2-\delta)$ -approximation while maintaining sublinear updates, establishing a fundamental trade-off. Their method leverages a hierarchical grid-based partitioning scheme to efficiently handle insertions and deletions in $O(n^{-\varepsilon})$ time. Through experiments, they show that their algorithm significantly outperforms static recomputation methods while maintaining high accuracy. Real-world applications, such as tracking spatial distribution changes in taxi pickup/dropoff data, demonstrate its effectiveness in monitoring Wasserstein distance and evolving datasets.

方法与评估标准

The authors propose a hierarchical grid-based partitioning approach that processes updates in a bottom-up manner, ensuring efficient maintenance of an approximate bi-chromatic matching with provable guarantees. Their evaluation, which includes theoretical analysis, synthetic benchmarks, and real-world datasets (such as taxi pickup/dropoff locations), is well-designed to demonstrate both the algorithm’s efficiency and its practical applicability in tracking dynamic spatial distributions.

理论论述

I checked the correctness of the algorithms and lemmas.

实验设计与分析

The experimental results makes sense.

补充材料

I checked almost all parts of the appendix.

与现有文献的关系

The problem of computing bipartite matchings in a dynamic setting has not been extensively explored in the literature, as there are few known algorithms for it. However, I believe this problem could have potential applications in machine learning methods.

遗漏的重要参考文献

All relevant references that I am aware of are included.

其他优缺点

The paper is well-written and easy to follow.

其他意见或建议

Line 95 LC: The sentence needs reformatting
Line 114 LC: There are two O() notations
Line 119 LC: input puts points -> input points
Line 168 RC: spread bounded $U$ -> bounded spread $U$
Line 411 RC: use a use a -> use a
Missing the impact statement

作者回复

2025-03-31

We would like to thank the reviewer for their time and for providing valuable comments. For the minor comments and typos not addressed below, we will incorporate them in the next version of the paper.

“Missing the impact statement.”

We will include the following impact statement, and we apologize for our oversight:

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

“In your experimental results, you used $p=2,8,32$ , but your paper mentions the value $p$ as $n^{−\varepsilon}$ , which is less than $1$ . I assume that the diameter of the space plays a role in the choice of $p$ in your experiments, but I could not get a sense of what is in fact the value of $\varepsilon$ .”

To clarify, we choose the value of $p$ as $n^\varepsilon$ in the paper and obtain an approximation ratio of $O(1/\varepsilon)$ . Here are some examples to give a sense of the scale of $\varepsilon$ :

our largest experiment had $n=1 000 000$ and $p=2$ corresponding to $\varepsilon \approx 1/20$ ,
our uniform synthetic data set where points were integers between $0$ and $500$ , the choice of $p=32$ corresponds to $\varepsilon \approx 0.55$ .

Empirically, all choices of $p$ achieve a significantly better approximation ratio in practice than our worst-case theoretical guarantee (see Figure 5 in Appendix G).

审稿人评论

2025-04-02

Thanks for your clarification. I keep my score.

审稿意见

评分: 32025-03-12

This paper gives an algorithm for dynamic bi-chromatic matching in Euclidean space with $O(\frac{1}{\varepsilon})$ -approximation ratio and sublinear update time $O(\frac{n^\varepsilon}{\varepsilon})$ with theoretical guarantee and this algorithm is the frist sublinear update time algorithm for geometric dynamic bi-chromatic matching. In addition, this paper gives a proof of approximation lower bound for dynamic bi-chromatic matching in Euclidean, which shows that even if only insertion and deletion operations are considered, there does not exist a dynamic algorithm that implements both the $2-\delta$ approximation and the sublinear update time.

The main technique is based on the static algorithm by [Agarwal-Varadarajan 2004], the idea is to construct a nested grid cells structure(p-tree) to get an $O(\frac{1}{\varepsilon})$ -approximation matching and maintain the implicit matching on the tree with bottom-up manner. The theoretical results are validated with experiments on real-world datasets.

给作者的问题

论据与证据

Yes

方法与评估标准

Yes

理论论述

Yes

实验设计与分析

Yes.

It is mentioned that [Xu-Ding 2024] studied the same problem. Thus, it would be desirable for this paper to have an experimental comparison against Xu-Ding 2004), specifically, the update time comparison.

补充材料

与现有文献的关系

This paper states some connection between this question and the 1-Wassertein distance estimation (e.g. we can apply this matching to estimate 1-Wassertein distance measure). Besides, the paper presents experimental results on real-world datasets and shows how the algorithm performs when we use it to estimate the 1-Wassertein distance.

遗漏的重要参考文献

其他优缺点

Strengths:

The algorithm is simple and intuitive, with a solid theoretical guarantee.
The problem of bi-chromatic matching is well-studied and practical, and this paper gives some nice approaches to analysis that may be extended to some related work.

Weaknesses:

The contribution is not clear enough; e.g., the approximation ratio of [Xu-Ding 2024] is not mentioned and used to compare with the theoretical results in this paper.

其他意见或建议

作者回复

2025-03-31

We would like to thank the reviewer for their time and for providing valuable comments.

"The contribution is not clear enough in relation to [Xu-Ding 2024]."

The Xu-Ding paper studies a slightly different problem: dynamic maintenance of optimal transport. Specifically, the goal is to design a data structure that accelerates executing 'one iteration' of the Network Simplex Algorithm on an explicit complete bi-partite graph in $O(n)$ time, where $n$ is the current number of points. Their work is concerned with maintaining an exact solution, within numeric computing accuracy, after the insertion/deletion of a vertex to the min-cost flow problem. Since their algorithm operates on a complete, bipartite graph, it requires $\Omega(n^2)$ space to store the edge weights and $\Omega(n)$ update time to insert a new vertex. Moreover, in the worst case, the update time can take up to $O(n^2)$ time. As such, the algorithm and the implementation cannot handle very large instances in practice.

In contrast, our paper studies the problem of quickly maintaining an approximate solution to the Euclidean bi-chromatic bipartite matching (i.e., optimal transport with uniform demands), in time that is sub-linear in $n$ . For example, taking $\varepsilon = 1/2$ gives an update time of $O(\sqrt{n})$ for updating an $O(1)$ -approximate matching. Our proposed tree structure uses $O(n)$ space, regardless of $\varepsilon$ , which makes the approach practical for very large inputs. Moreover, our lower-bound in Thm. F.1 shows that the approximation ratio must be at least $2$ for any dynamic algorithm to achieve a sublinear update time.

In the related works section, our focus has been to compare our contributions with the known approximation algorithms that run in near-linear time and space, as those are also applicable to larger instances. That being said, as per the reviewer’s suggestion, we will include the above comparison to the Xu-Ding work in the next version of the paper.

Experimental comparison with [Xu-Ding 2024]

We ran the code of [Xu-Ding 2024] on our benchmark dataset (Unif-Vs-Gaussian). We have observed that the insertion time slowed down substantially after inserting $1$ k point-pairs (source and sink). In fact, already inserting $5$ k point-pairs took more than $12$ GB RAM and longer than $3$ hours in total, which is when we had to terminate the process as it was running out of memory. Note that this is already $>1$ s per update on a small instance.

In contrast, our experimental results show that the update time of our proposed algorithm remains $<1$ ms for very large instances (up to $n=10^6$ in Figure $2$ ; right), with the quality of the solution being within a factor of two from the optimum (see Figure $5$ ; right).

审稿意见

评分: 42025-03-15

The paper studies the euclidean minimum cost bipartite matching problem: given n blue points and n red points in the 2d euclidean plane, we wish to compute a minimum cost bipartite matching between them, where the cost is measured in terms of euclidean distance. The novel component of the paper is to introduce dynamic updates to the point sets while requiring that an approximately optimal solution is always maintained. The trivial baseline is to recompute the matching after every update which can require linear time every update for a $1+\epsilon$ approximation. The paper presents a tradeoff which leads to much faster update times. The paper obtains an update time of $n^{\epsilon}$ but maintains a worse $O(1/\epsilon)$ approximation.

The main idea seems to be inspired by the $1+\epsilon$ approximation algorithm in the static case. We first divide the input space into many nested regions using a quad-tree like data structure, but the fan out of the tree must be carefully controlled. When points get updated, we try to match from bottom up, but it is not clear if at some point $\Omega(n)$ updates need to be made. However, a careful update step introduced by the authors only needs to do work proportional to the fan out of the quad-tree like data structure.

Empirical evidence is given showing that their algorithm can obtain multiple orders of speed improvement over the naive recalculating baseline while maintaining an accurate solution, in both synthetic and real world data.

Overall, while the setting is a bit limited, I think it is a very solid contribution to an important problem.

给作者的问题

Do the authors have an estimate for the constant in the $O(1/\epsilon)$ approximation. I understand that asymptotically it doesnt matter but it would be nice to understand what the factor is for say $\sqrt{n}$ update time.
Does the analysis work for weighted matchings e.g. something like optimal transport? Of course one can replicate the points by their weight but it leads to some blowup in the parameters.
The case of $\epsilon \approx 1/\log(n)$ seems especially interesting since one gets $O(\log n)$ update time. Is there any hopes of improving the running time in this regime?
Can the authors quickly survey what is known about the dynamic problem in high dimensional settings? i.e. when d is not constant? What is the dependency on $d$ of the current algorithms? Does an exponential factor in $d$ show up in the approximation or in the update time?

论据与证据

Yes the proofs seem convincing and correct.

方法与评估标准

I am not an expert but the experiments seem sound and show many magnitude speed gains over the naive solution which recomputes the matching every time for both real world and synthetic datasets.

理论论述

I checked to the best of my ability.

实验设计与分析

I did not check very carefully.

补充材料

与现有文献的关系

遗漏的重要参考文献

None that I know of.

其他优缺点

I appreciate the authors attempt to make cler a complicated algoirthm in figure 1

其他意见或建议

None.

作者回复

2025-03-31

We would like to thank the reviewer for their time and for providing valuable comments.

“Do the authors have an estimate for the constant in the $O(1/ \varepsilon)$ approximation. I understand that asymptotically it doesn’t matter but it would be nice to understand what the factor is for say $\sqrt{n}$ update time.”

We haven’t done an exact analysis of the constant in the $O(1/ \varepsilon)$ approximation guarantee. However, if one would want to achieve $O(\sqrt{n})$ update time, we expect the approximation ratio to be $< 20$ .

“Does the analysis work for weighted matchings e.g. something like optimal transport? Of course one can replicate the points by their weight but it leads to some blowup in the parameters.”

This is a great question. We believe it works and our algorithm should extend to this setting, but details need to be worked out.

“The case of $\varepsilon \approx 1/ \log⁡(n)$ seems especially interesting since one gets $O(\log ⁡n)$ update time. Is there any hopes of improving the running time in this regime?”

This is a natural question to consider, but we’re not aware of any techniques of pushing down the runtime to something sub-logarithmic.

“Can the authors quickly survey what is known about the dynamic problem in high dimensional settings? i.e. when $d$ is not constant? What is the dependency on $d$ of the current algorithms? Does an exponential factor in $d$ show up in the approximation or in the update time?”

To the best of our knowledge, the dynamic version of our problem in higher dimensions hasn’t been studied yet. However, we can easily extend our algorithm to work in any dimension $d$ . The best possible trade-offs need to be worked out, but we believe that our approximation ratio has a multiplicative factor of $O(\sqrt{d})$ and the update time should be of the form $O(n^{\varepsilon d})$ .

最终决定Accept (oral)

2025-05-01

This paper presents the first dynamic algorithm for Euclidean bipartite (bi-chromatic) matching with provable approximation guarantees and sublinear update time. The algorithm maintains an approximate minimum-cost matching between two dynamic point sets in low-dimensional Euclidean space (e.g., ℝ²), under insertions and deletions. The core contribution is a hierarchical space partitioning approach that enables matching to be maintained efficiently via a bottom-up process.

All reviewers are positive about this paper. They appreciated the result, empirical validation and many applications of the problem in machine learning and beyond.