PaperHub
7.3
/10
Spotlight4 位审稿人
最低4最高5标准差0.5
5
4
5
4
2.5
置信度
创新性2.3
质量2.5
清晰度2.5
重要性2.5
NeurIPS 2025

An Efficient Orlicz-Sobolev Approach for Transporting Unbalanced Measures on a Graph

OpenReviewPDF
提交: 2025-05-10更新: 2025-10-29
TL;DR

Moving beyond $L_p$ geometric structure, we propose novel efficient Orlicz-Sobolev approaches (i.e., Orlicz-EPT, and Orlicz-Sobolev transport) for measures on a graph, possibly having different total mass.

摘要

We investigate optimal transport (OT) for measures on graph metric spaces with different total masses. To mitigate the limitations of traditional $L^p$ geometry, Orlicz-Wasserstein (OW) and generalized Sobolev transport (GST) employ Orlicz geometric structure, leveraging convex functions to capture nuanced geometric relationships and remarkably contribute to advance certain machine learning approaches. However, both OW and GST are restricted to measures with equal total mass, limiting their applicability to real-world scenarios where mass variation is common, and input measures may have noisy supports, or outliers. To address unbalanced measures, OW can either incorporate mass constraints or marginal discrepancy penalization, but this leads to a more complex two-level optimization problem. Additionally, GST provides a scalable yet rigid framework, which poses significant challenges to extend GST to accommodate nonnegative measures. To tackle these challenges, in this work we revisit the entropy partial transport (EPT) problem. By exploiting Caffarelli & McCann's insights, we develop a novel variant of EPT endowed with Orlicz geometric structure, called Orlicz-EPT. We establish theoretical background to solve Orlicz-EPT using a binary search algorithmic approach. Especially, by leveraging the dual EPT and the underlying graph structure, we formulate a novel regularization approach that leads to the proposed Orlicz-Sobolev transport (OST). Notably, we demonstrate that OST can be efficiently computed by simply solving a univariate optimization problem, in stark contrast to the intensive computation needed for Orlicz-EPT. Building on this, we derive geometric structures for OST and draw its connections to other transport distances. We empirically illustrate that OST is several-order faster than Orlicz-EPT. Furthermore, we show preliminary evidence on the advantages of OST for measures on a graph in document classification and topological data analysis.
关键词
Orlicz-Wassersteingeneralized Sobolev transportOrlicz geometric structureLp geometric structuremeasures on a graphmeasures with different total mass

评审与讨论

审稿意见
5

This paper addresses the challenge of computing optimal transport (OT) between unbalanced measures on a graph, which is common in tasks like comparing documents with different word counts. Traditional OT methods assume equal mass and often rely on p\ell_p-based cost functions, which limits their flexibility and scalability. The standard approaches that tackle unbalanced cases also have a higher complexity.

To overcome this, the authors introduce two contributions:

  • Orlicz-EPT: A new variant of entropy partial transport (EPT) that incorporates Orlicz geometry, which is a generalization of OT that uses convex functions instead of fixed powers like in p\ell_p. By reformulating EPT as a standard OT problem with a carefully calibrated, nonnegative ground cost, they enable the use of Orlicz-Wasserstein-like metrics even in unbalanced settings. This adds modeling flexibility, especially for data with noise or outliers, but remains computationally expensive due to its two-level optimization structure.

  • Orlicz-Sobolev Transport (OST): To overcome the higher complexity of Orlicz-EPT, the authors explore a dual formulation that is significantly more efficient. It defines a new transport distance using Orlicz-Sobolev spaces on graphs and simplifies the computation to a univariate optimization problem. The resulting distance has several desirable properties, such as divergence (nonnegative and zero iff measures match), being a metric under mild conditions, and generalizing several known distances like GST (Generalized Sobolev Transport), ST (Sobolev Transport), and UST (Unbalanced Sobolev Transport).

The authors demonstrate the effectiveness of OST on large-scale experiments in document classification and topological data analysis (TDA).

优缺点分析

Strengths

  • The paper introduces a novel variant of optimal transport (OT) that extends existing formulations to the Orlicz geometry, providing a more flexible alternative to traditional p\ell_p-based distances for unbalanced measures.
  • It establishes strong theoretical foundations, including connections to several known OT variants (e.g., GST, ST, UST), demonstrating that the proposed distance is a meaningful generalization.
  • The effectiveness of the proposed distance is supported by empirical results on document classification and topological data analysis tasks.

Weaknesses

  • Although the paper is mathematically rigorous, it is quite dense in places. Some sections (e.g., Section 3) would benefit from additional intuition or illustrative examples.

问题

  • Can the authors explain the reason behind the choices in your experiments? While some of the settings are said to follow prior work, it’s not entirely clear that they are optimal for the current setup. In particular, could the authors provide more intuition behind the selection of the Orlicz functions ϕ0,ϕ1\phi_0, \phi_1 , and ϕ2\phi_2 ? Additionally, I would appreciate an explanation for the specific form of the weight functions w1w_1 and w2w_2. Have the authors performed any ablation or sensitivity analysis to study the impact of these parameter choices on performance or efficiency?

局限性

Yes.

最终评判理由

The paper provides solid contributions to optimal transport theory, with strong theoretical results and valuable experimental evaluations. I maintain my positive score and vote for the acceptance of the paper.

格式问题

None

作者回复

Dear Reviewer HNe3,

Thank you for your valuable feedback. Below are the answers for your questions and comments.

(1) [...] (some of the settings follow prior work): (not entirely clear) optimal for the current setup [...] more intuition behind the selection of the Orlicz functions Φ0,Φ1,Φ2\Phi_0, \Phi_1, \Phi_2? [...] (explanation) specific form of the weight functions w1w_1 and w2w_2 [...] (ablation or sensitivity analysis) parameter choices on performance or efficiency? [...]

NN-functions Φ1,Φ2\Phi_1, \Phi_2 and the limit case Φ0\Phi_0 have been popularly utilized for Orlicz geometric structure in the literature [36, 17, 3, 49, 26, 4, 54, 13, 2, R8].

The performance of Orlicz-EPT/OST typically depends on the choice of the NN-function Φ\Phi (Equation (7) and Definition 4.1 respectively), similar to how kernel functions impact performance in kernel-dependent machine learning frameworks. In our experiments with NN-functions Φ1\Phi_1 and Φ2\Phi_2 (Section 7), we observed slightly different performances. Similar findings have been reported for GST [36].

We agree that determining the optimal NN-function Φ\Phi for Orlicz-EPT/OST in a given task is an open problem that warrants further investigation. Given that the primary focus of our paper is on Orlicz-EPT/OST, we leave this topic for future work for a careful study. As an interim solution, cross-validation can be used to select Φ\Phi from a set of candidate functions.

Following the theoretical result in Proposition 5.7, and theoretical properties on the EPT on a graph [35] (Corollary 3.2, Corollary 4.6, Proposition 5.4), weight functions w1,w2w_1, w_2 are bb-Lipschitz, e.g., we simply choose w1(x)=w2(x)=a1dG(x,z0)+a0w_1(x) = w_2(x) = a_1 d_{\mathbb{G}}(x, z_0) + a_0 where a1=b,a0=1a_1 = b, a_0 =1, see line 288-289.

We consider different graphs G\mathbb{G}_Sqrt, and G\mathbb{G}_Log. For all datasets, except MPEG7 dataset, graph G\mathbb{G}_Sqrt consists of 10K nodes and 1 million edges, while graph G\mathbb{G}_Log comprises 10K nodes and 100K edges. Due to the smaller size of the MPEG7 dataset, we constructed graph G\mathbb{G}_Sqrt with 1K nodes and 32K edges, and graph G\mathbb{G}_Log with 1K nodes and 7K edges. We also consider different NN-functions (Φ1,Φ2,Φ0\Phi_1, \Phi_2, \Phi_0). Some other parameters, e.g., b,α,w1,w2b, \alpha, w_1, w_2, are set by following our derived theoretical results (Section 5) and theoretical properties of EPT [33, 35] used in their experiments.

Furthermore, we carry out more additional experiments for different b,αb, \alpha. The average accuracy results are as follows:

\star For document classification on TWITTER

  • For graph G\mathbb{G}_Sqrt:
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
b=0.5b=0.571.66 ±\pm 0.6372.03 ±\pm 0.5672.12 ±\pm 0.57
b=2b=271.54 ±\pm 0.4972.15 ±\pm 0.4272.28 ±\pm 0.34
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
α=0.1\alpha=0.171.48 ±\pm 0.4372.34 ±\pm 0.5572.64 ±\pm 0.50
α=0.2\alpha=0.271.64 ±\pm 0.5072.12 ±\pm 0.3672.27 ±\pm 0.30
  • For graph G\mathbb{G}_Log
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
b=0.5b=0.570.97 ±\pm 0.2971.74 ±\pm 0.3371.78 ±\pm 0.41
b=2b=271.58 ±\pm 0.4171.92 ±\pm 0.1671.96 ±\pm 0.22
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
α=0.1\alpha=0.171.57 ±\pm 0.2371.87 ±\pm 0.3672.04 ±\pm 0.28
α=0.2\alpha=0.271.59 ±\pm 0.3571.74 ±\pm 0.2271.95 ±\pm 0.19

\star For TDA on MPEG7

  • For graph G\mathbb{G}_Sqrt
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
b=0.5b=0.566.67 ±\pm 3.4571.67 ±\pm 5.4068.83 ±\pm 3.09
b=2b=265.00 ±\pm 3.5869.13 ±\pm 3.8767.33 ±\pm 3.55
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
α=0.1\alpha=0.166.43 ±\pm 4.8170.47 ±\pm 4.5168.50 ±\pm 4.54
α=0.2\alpha=0.266.82 ±\pm 4.0168.43 ±\pm 4.6369.17 ±\pm 3.60
  • For graph G\mathbb{G}_Log
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
b=0.5b=0.565.87 ±\pm 5.6867.00 ±\pm 5.6068.67 ±\pm 5.61
b=2b=265.93 ±\pm 5.1168.91 ±\pm 6.0066.17 ±\pm 4.60
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
α=0.1\alpha=0.165.94 ±\pm 4.1967.83 ±\pm 5.8366.46 ±\pm 4.58
α=0.2\alpha=0.265.84 ±\pm 5.4569.61 ±\pm 5.5669.86 ±\pm 5.22

These empirical results suggest that turning these hyperparameters (e.g., by cross-validation) may help to improve the performances further. We will clarify it.

(2) [...] (Section 3) would benefit from additional intuition or illustrative examples. [...]

→ Thank you for your suggestion. We will add more text and examples to further clarify technical concepts (Section 3) within the constraints of the main manuscript or supplement.


Concluding remarks. We would be grateful if you could let us know whether our clarifications and answers are satisfied to address the raised concerns. If so, we kindly ask that you consider increasing your rating. We are also pleased to discuss any other questions you may have.


Reference

[R8] Vershynin, High-dimensional probability: An introduction with applications in data science. Cambridge university press, 2018

评论

I thank the authors for their comprehensive response. I maintain my positive score, voting for the acceptance of the paper.

评论

Dear Reviewer HNe3,

Thank you very much for your positive rating, and we appreciate your thoughtful endorsement.

We are glad that our clarifications address your concerns on our work. We will revise the paper following the feedback and suggestion.

With best regards,

审稿意见
4

This paper addresses the problem of optimal transport for unbalanced measures on graph metric spaces. The key idea is to leverage insights from Caffarelli and McCann to reformulate the unbalanced optimal transport problem as a corresponding complete transport problem with a nonnegative ground cost. To further enhance this formulation, the Orlicz geometric structure is introduced. By defining measures on a graph, the solution to the reformulated transport problem can be reduced to solving a univariate optimization problem. Experimental results demonstrate that the proposed method outperforms existing approaches in both accuracy and computational efficiency.

优缺点分析

Strengths:

  • The paper is well written and clearly structured.
  • The proposed method builds upon and integrates several well-established techniques. In particular:
  1. It reformulates problem (4), which incorporates Lagrangian multipliers for unbalanced measures, into a corresponding complete optimal transport (OT) problem (problem 6) with a nonnegative ground cost—an approach inspired by the insights of Caffarelli and McCann.
  2. Problem (6) is further enhanced using the Orlicz geometric structure, following the concept of Orlicz-Wasserstein distance.
  3. While the proposed Orlicz-Wasserstein formulation involves a two-level optimization—typically computationally expensive—this is efficiently addressed by defining measures on a graph (as in [35]), allowing the solution to be reduced to a univariate optimization problem (involving the variable t of the N-function).

Weaknesses:

  • The method presents an incremental development of several well-established ideas. As such, the novelty of the approach may appear limited. However, I think that, this is the first work to explore Orlicz-Wasserstein distance for unbalanced measures specifically on graph structures, which is a meaningful contribution.
  • it remains unclear how to recover the solution to problem (3) (for a fixed m) from the solution to problem (4) (for a fixed \lambda).

问题

See the weakness part.

局限性

Yes.

格式问题

I do not notice any major formatting issues in this paper

作者回复

Dear Reviewer uDto,

Thank you for your valuable feedback. Below are the answers for your questions and comments.

(1) [...] The method presents an incremental development of several well-established ideas [...] the first work to explore Orlicz-Wasserstein distance for unbalanced measures specifically on graph structures, which is a meaningful contribution. [...]

→ We respectfully disagree that our proposed methods are incremental. Below are the reasons:

  • GST is recognized as a scalable variant of OW (see a review in Appendix B.1.5) [36]. However, unlike OT, extending GST to accommodate unbalanced measures on a graph with a mass constraint is nontrivial, as discussed in lines 57-60.

  • We believe that it is non-trivial to establish a connection between EPT on a graph [35] and a potential extension of GST [36] for unbalanced measures.

  • As noted in Remark 3.1 (lines 138-140), simply applying Caffarelli and McCann's (2010) observations without calibration would result in a new ground cost for standard OT that is not guaranteed to be nonnegative, similar to [12, 33, 35]. We stress that our calibration is crucial for the development involving Orlicz geometric structure, given that the NN-function is defined on R+\mathbb{R}_{+} (see lines 98-100).

  • We leverage insights from [35, 36] to develop a novel regularization for OST (Definition 4.1), which involves new tools to control integral of gradient on shortest paths and its duality with Orlicz geometry, as shown in Equation (12) and Appendix A.2.4.

We emphasize that we do not claim any new techniques for the theoretical proofs. We appreciate your evaluation that the proposed methods are meaningful contributions: our work is the first approach to efficiently extend OW/GST for unbalanced measures on a graph.

(2) [...] (unclear) recover the solution to Problem (3) (for a fixed m) from the solution to Problem (4) (for a fixed λ\lambda). [...]

→ The relationship between Problem (3) and Problem (4) is established in [35], specifically through Theorem A.1 in Appendix A.1. To elaborate, Theorem A.1 in [35] demonstrates that a solution to Problem (4) directly yields a solution to Problem (3). We will provide further clarification on this connection.


Concluding remarks. We would be grateful if you could confirm whether our clarifications and responses adequately address your concerns. If they do, we kindly ask that you consider increasing your rating. We are also happy to address any additional questions or concerns you might have.

评论

I thank the authors for their response. I would like to maintain my score.

评论

Dear Reviewer uDto,

Thank you very much for your positive rating, and we appreciate your thoughtful endorsement.

We are glad that our clarifications address your concerns on our work. We will revise the paper following the feedback and suggestion.

With best regards,

审稿意见
5

The paper presents a new framework for optimal transport (OT) between unbalanced measures on a graph based on two new distances: Orlicz-EPT and Orlicz-Sobolev Transport (OST). Orlicz-EPT extends entropy-regularized partial OT by adding Orlicz geometric structure and recasting it as a standard OT problem with a calibrated nonnegative cost, enabling computation via binary search. Then, OST leverages the dual of EPT and graph structure to present a regularized OT distance that can be computed efficiently using a univariate optimization problem. The paper shows that OST generalizes several known distances, including generalized Sobolev transport, standard Sobolev transport, and unbalanced Sobolev transport, and maintains desirable geometric properties such as being a divergence or a metric. Empirical results demonstrate that OST is significantly faster than Orlicz-EPT and is competitively running in document classification and topological data analysis, suggesting its scalability and practicality.

优缺点分析

Strengths: The paper is a technically sound and well-presented contribution to the theory of optimal transport (OT) with a new pair of distances—Orlicz-EPT and Orlicz-Sobolev Transport (OST). The theoretical method is sound, basing itself on Caffarelli & McCann's result to redefine entropy partial transport as a regular OT with calibrated nonnegative cost, making it possible to construct Orlicz-EPT. Subsequent to this, OST formulation via duality and a new regularization leads to an efficient computationally distance that can be calculated by way of a simple univariate optimization, much quicker than existing methods like Orlicz-EPT and UOT. The paper is nicely organized and clear in its exposition, with full definitions, proofs, and relation to existing work. OST generalizes various earlier transport distances (e.g., GST, ST, UST), and its strong theoretical guarantees are complemented with empirical behavior on document classification and topological data analysis, hinting at its viability for large-scale structured data.

Weaknesses: Despite the theory being strong, empirical consideration is relatively narrow in scope and supported by document and topological data instead of more diverse or realistic contexts (e.g., biological or dynamic networks). Orlicz-EPT, while new, remains computationally expensive and less convenient despite theoretical innovation, and the paper fails to convincingly describe when it must be used instead of OST. Also, OST's behavior is still hyperparameter-sensitive (e.g., with respect to α\alpha, the choice of Orlicz function), though the paper has minimal suggestion or sensitivity analysis for making these selections. There are some technical subsections, notably the dual forms and cost calibration, that may benefit from a little more intuition or figures. Finally, certain parts of the work, particularly Orlicz-EPT, may arguably be seen as a modest extension of available forms like EPT, and experimental comparisons pretty much emphasize improvements over closely related previous work that may be extended.

问题

  1. While Orlicz-EPT is presented as a theoretically rich generalization, its application in practice is circumscribed by prohibitive computational cost. Can the authors say anything about operations or instances for which Orlicz-EPT would be preferable to use than OST, even with this penalty in terms of computation? Adding either empirical examples or a theoretical account with an explicit statement of under what conditions the additional expressiveness of Orlicz-EPT is valuable and worthwhile and justifies its use over OST would be advantageous. If authors are able to detail compelling practical applications in which Orlicz-EPT is superior to OST (in quality or quantity), this would further strengthen the importance and applicability of the contribution.

  2. The experiments are largely limited to text files and persistence diagrams. Can the authors supplement, or at least mention, performance in other domains like biological graphs, social networks, or dynamic/temporal graphs? Include a single additional domain or application (even in a limited form) that tests OST in a very different setting in order to more evidently demonstrate its generality. A broader empirical range would offer evidence towards arguments of practical use and increase confidence that OST is a generally applicable tool for structured data.

  3. How sensitive is OST to the parameters such as the Orlicz function ϕ\phi, the regularization weight bb, and the α\alpha range for critic functions? Did the author test this? Add a little sensitivity analysis or some pragmatic guidance on how to select such parameters in different settings. Even a graph of accuracy against α\alpha or different ϕ\phi functions would be helpful. More interpretability and usability of OST would help in ease of use by practitioners and allow practitioners to adopt and experiment with the approach in their setting.

  4. The paper discusses scalability based on the number of transport calculations but would be useful to also describe how the performance grows if graph size and complexity are increased. The authors can provide a brief runtime or memory analysis that alters the size of the graph (nodes/edges) or density, especially since there are graphs of thousands of nodes in practice.

局限性

Yes

最终评判理由

I greatly appreciate the efforts the authors have put in to address multiple (possibly difficult) technical questions. They do improve the quality of the paper if revised and I'm happy to raise my score. Regarding the author's question on suggesting datasets for biological, social network datasets, etc. there are plenty and it should be the responsibility of the author to justify their work - perhaps in a longer/extended format of this work.

格式问题

No

作者回复

Dear Reviewer pFMB,

Thank you for your valuable feedback. Below are the answers for your questions and comments.

(1) [...] under what conditions the additional expressiveness of Orlicz-EPT is valuable and worthwhile and justifies its use over OST would be advantageous (further strengthen the importance and applicability of the contribution) [...]

→ Following Proposition 5.2 (line 232), we can view OST as an extension of GST [36] for handling unbalanced measures, and Orlicz-EPT as an extension of OW [55] for unbalanced measures. Notably, when the input measures have equal mass (i.e., μ(G)=ν(G)\mu(\mathbb{G}) = \nu(\mathbb{G})), and b=1b = 1, Orlicz-EPT reduces to OW. Furthermore, Orlicz-EPT shares the same computational complexity as OW, as shown in Equation (7). It's worth noting that GST is a scalable variant of OW for balanced measures, in the same sense, OST can be seen as a scalable variant of Orlicz-EPT.

As a result, Orlicz-EPT is applicable for all applications of OW and extends OW to handle unbalanced input measures, whereas OST may not reserve all the properties of OW.

Like OW, Orlicz-EPT involves a two-level optimization problem, limiting its applicability to small-scale domains. In contrast, OST, similar to GST, is scalable and can be applied to large-scale domains.

We appreciate the suggestion and agree that it is an interesting research direction for further exploration.

(2) [...] (experiments) limited to text files and persistence diagrams [...] (broader empirical range) biological graphs, social networks, or dynamic/temporal graphs? [...]

→ We would like to clarify that our work focuses on the OT problem with an Orlicz geometric structure for unbalanced measures supported on a graph metric space (lines 52-53). Specifically, our proposed approaches (Orlicz-EPT/OST) compute the distance between two input measures with potentially different total masses, where the measures are supported on the nodes of a graph with a metric based on shortest path lengths. Consequently, our approaches are suitable for applications that require measuring distances between measures on graph structures endowed with a graph metric. In our experiments, we demonstrate the effectiveness of our approach in document classification on four real-world datasets and topological data analysis (TDA), including orbit recognition for linked twist maps (a discrete dynamical system modeling flows in DNA microarrays [R7]) and object shape recognition in MPEG7 dataset. These evaluations on document classification and TDA are often used for tasks involving comparing measures on a graph, see [35, 36]. We believe that such experimental coverage is rich and diverse enough. We will clarify it.

Regarding the suggestion to explore broader empirical range, e.g., biological graphs, social networks, or dynamic/temporal graphs, could you please specify public datasets for the suggestion and confirm whether they involve comparing measures with potentially different total masses on a given graph? We appreciate your suggestion and would be happy to conduct additional experiments within the rebuttal period if feasible. Thank you very much.

(3) [...] (sensitivity analysis/pragmatic guidance) the Orlicz function Φ\Phi, the regularization weight bb, and the α\alpha range (ease of use by practitioners) [...]

→ We clarify that different Orlicz functions, such as the popular NN-functions Φ1\Phi_1, Φ2\Phi_2, and the limit case Φ0\Phi_0, are illustrated in Figures 2 and 3 of the main manuscript, as well as Figures A.3 and A.4 in the supplementary materials.

The performance of Orlicz-EPT/OST typically depends on the choice of the NN-function Φ\Phi (Equation (7) and Definition 4.1 respectively), similar to how kernel functions impact performance in kernel-dependent machine learning frameworks. In our experiments with NN-functions Φ1\Phi_1 and Φ2\Phi_2 (Section 7), we observed slightly different performances. Similar findings have been reported for GST [36].

We agree that determining the optimal NN-function Φ\Phi for Orlicz-EPT/OST in a given task is an open problem that warrants further investigation. Given that the primary focus of our paper is on Orlicz-EPT/OST, we leave this topic for future work for a careful study. As an interim solution, cross-validation can be used to select Φ\Phi from a set of candidate functions.

Regarding the regularization weight bb, we used b=1b=1 in the experiments based on the results in Propositions 5.2 and 5.3, as well as for simplicity. This choice of bb is also supported by theoretical results on EPT on a graph [35] (Lemma A.6, Remark 4.8) and has been used in previous experiments for EPT [33, 35].

For α\alpha, from the result in Proposition 5.7 and for simplicity, we use α=0\alpha=0 for the experiments. Such value for α\alpha is also supported by theoretical results of EPT on a graph [35] (Lemma 4.4, Proposition 5.2) and used in experiments for EPT [33, 35]. Additionally, when α=0\alpha=0, Iα\mathcal{I}_{\alpha} is the largest interval (line 197).

Following your suggestion, we evaluate more additional experiments with different b,αb, \alpha. The average accuracy results are as follows:

\star For document classification on TWITTER

  • For graph G\mathbb{G}_Sqrt:
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
b=0.5b=0.571.66 ±\pm 0.6372.03 ±\pm 0.5672.12 ±\pm 0.57
b=2b=271.54 ±\pm 0.4972.15 ±\pm 0.4272.28 ±\pm 0.34
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
α=0.1\alpha=0.171.48 ±\pm 0.4372.34 ±\pm 0.5572.64 ±\pm 0.50
α=0.2\alpha=0.271.64 ±\pm 0.5072.12 ±\pm 0.3672.27 ±\pm 0.30
  • For graph G\mathbb{G}_Log
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
b=0.5b=0.570.97 ±\pm 0.2971.74 ±\pm 0.3371.78 ±\pm 0.41
b=2b=271.58 ±\pm 0.4171.92 ±\pm 0.1671.96 ±\pm 0.22
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
α=0.1\alpha=0.171.57 ±\pm 0.2371.87 ±\pm 0.3672.04 ±\pm 0.28
α=0.2\alpha=0.271.59 ±\pm 0.3571.74 ±\pm 0.2271.95 ±\pm 0.19

\star For TDA on MPEG7

  • For graph G\mathbb{G}_Sqrt
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
b=0.5b=0.566.67 ±\pm 3.4571.67 ±\pm 5.4068.83 ±\pm 3.09
b=2b=265.00 ±\pm 3.5869.13 ±\pm 3.8767.33 ±\pm 3.55
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
α=0.1\alpha=0.166.43 ±\pm 4.8170.47 ±\pm 4.5168.50 ±\pm 4.54
α=0.2\alpha=0.266.82 ±\pm 4.0168.43 ±\pm 4.6369.17 ±\pm 3.60
  • For graph G\mathbb{G}_Log
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
b=0.5b=0.565.87 ±\pm 5.6867.00 ±\pm 5.6068.67 ±\pm 5.61
b=2b=265.93 ±\pm 5.1168.91 ±\pm 6.0066.17 ±\pm 4.60
Φ0\Phi_0Φ1\Phi_1Φ2\Phi_2
α=0.1\alpha=0.165.94 ±\pm 4.1967.83 ±\pm 5.8366.46 ±\pm 4.58
α=0.2\alpha=0.265.84 ±\pm 5.4569.61 ±\pm 5.5669.86 ±\pm 5.22

These empirical results suggest that turning these hyperparameters (e.g., by cross-validation) may help to improve the performances further. We will clarify it.

(4) [...] (performance) alters the size of the graph (nodes/edges) or density (graphs of thousands of nodes) [...]

→ We clarify that our experiments (Section 7) utilize two graphs: G\mathbb{G}_Sqrt and G\mathbb{G}_Log. For all datasets, except MPEG7 dataset, graph G\mathbb{G}_Sqrt consists of 10K nodes and 1 million edges, while graph G\mathbb{G}_Log comprises 10K nodes and 100K edges. Due to the smaller size of the MPEG7 dataset, we constructed graph G\mathbb{G}_Sqrt with 1K nodes and 32K edges, and graph G\mathbb{G}_Log with 1K nodes and 7K edges. Additionally, the graph is assumed to be given in our experiments, see line 267-268. We will clarify it.

(5) [...] (dual forms and cost calibration) may benefit from a little more intuition or figures [...] (Orlicz-EPT) may arguably be seen as a modest extension of available forms like EPT [...]

→ We will follow your suggestions to add more text to further clarify dual forms and cost calibration within the constraints of the main manuscript or supplement.

We respectfully disagree that Orlicz-EPT is a modest extension of EPT. Below are the reasons:

  • GST is recognized as a scalable variant of OW (see a review in Appendix B.1.5) [36]. However, unlike OT, extending GST to accommodate unbalanced measures on a graph with a mass constraint is nontrivial, as discussed in lines 57-60.

  • We believe that it is non-trivial to establish a connection between EPT on a graph [35] and a potential extension of GST [36] for unbalanced measures.

  • As noted in Remark 3.1 (lines 138-140), simply applying Caffarelli and McCann's (2010) observations without calibration would result in a new ground cost for standard OT that is not guaranteed to be nonnegative, similar to [12, 33, 35] (including EPT). We stress that our calibration is crucial for the development involving Orlicz geometric structure, given that the NN-function is defined on R+\mathbb{R}_{+} (see lines 98-100).


Concluding remarks. We would be grateful if you could confirm whether our clarifications and responses adequately address your concerns. If they do, we kindly ask that you consider increasing your rating. We are also happy to address any additional questions or concerns you might have.


Reference

[R7] Hertzsch et al., DNA microarrays: design principles for maximizing ergodic, chaotic mixing. Small, 2007.

审稿意见
4

The authors investigate numerical methods to perform unbalanced optimal transport where the support of the measures is a graph. This paper proposes Orlicz-EPT and Orlicz-Sobolev Transport. Orlicz-EPT builds on the entropy partial transport (EPT) framework, extending it with Orlicz geometry by calibrating the ground cost to ensure nonnegativity—allowing standard optimal transport techniques to be applied. To accomplish this, the authors use insights once used by Caffarelli and McCann (2010) based on the obstacle problem. However, due to its two-level optimization structure, Orlicz-EPT remains computationally intensive. The authors then introduce Orlicz-Sobolev Transport (OST) as a means to address these computational hurdles, which is a dual formulation that regularizes the critic function using Orlicz-Sobolev spaces and reduces the problem to a single-variable optimization. The authors prove several theoretical properties about OST. Empirical results show that OST achieves competitive performance in document classification and topological data analysis while being several orders of magnitude faster than Orlicz-EPT.

优缺点分析

A main strength of the paper is the development of a novel optimal transport algorithm on graphs for unbalanced measures, which is a common situation that does not appear to lie in the scope of many existing approaches. The paper contains many theoretical results and is the first instance I have seen Orlicz geometries used in this context, which is quite nice (though, there do appear related works cited by the authors here). The Orlicz structure leads to an expensive two-layer optimization problem, which the authors find a work-around for. Their numerical speedups are impressive as well as the newfound accuracy on several tasks.

A main weakness of the paper is the writing and lack of clarity. From even the abstract, there are many repetitive concepts and sentence structures that make reading this paper incredibly difficult. There are simply too many equations (in-display equations) crammed into the text. The same is true for the tables and figures---everything is so small! I think the authors are trying to write too much instead of giving a more high-level insights into the problem and their proposed solution.

问题

  • The topic of OT on a graph (why this would happen, for example) is not appropriately motivated, and some more context would be appreciated.

  • It seems like the computations leading up to equation (6) is relevant enough to be a Theorem or proposition --- why is this not the case?

局限性

N/A

最终评判理由

I believe my existing score is what the paper deserves.

格式问题

N/A

作者回复

Dear Reviewer VFH4,

Thank you for your valuable feedback. Below are the answers for your questions and comments.

(1) [...] (OT on a graph) is not appropriately motivated [...] some more context would be appreciated [...]

→ OT problems for measures supported on a graph metric space have been explored in previous studies [34, 35, 36]. Notably, graph metrics extend tree metrics, which are utilized in scalable optimal transport methods like tree-sliced Wasserstein [R1, R2, R3, R4]. Tree-sliced Wasserstein, in turn, generalizes the popular sliced-Wasserstein approach [R5, R6]. The graph structure offers greater flexibility and degrees of freedom compared to the tree structure in tree-sliced Wasserstein and the line structure in sliced-Wasserstein. For a more in-depth discussion on the motivation behind optimal transport on graphs, please refer to [34]. We will elaborate on this further.

(2) [...] (computations) leading up to Equation (6) is relevant enough to be a Theorem or Proposition [...]

→ Thank you for your suggestion. We will summarize Equation (6) as a Proposition.

(3) [...] (weakness) writing and lack of clarity [...] (abstract) repetitive concepts and sentence structures that make reading this paper incredibly difficult [...] (in-display equations) crammed into the text [...] tables and figures (small) [...] giving more high-level insights [...]

→ We outline the key ideas of our proposed approaches in lines 52-81 and in the abstract (lines 1-25). We try to keep all important equations separated with explanatory text and define technical notions within both the equations and surrounding text.

We will follow your suggestions to incorporate additional text to clarify technical concepts, as well as expand figures and tables within the constraints of the main manuscript or supplement.


Concluding remarks. We would be grateful if you could confirm whether our clarifications and responses adequately address your concerns. If they do, we kindly ask that you consider increasing your rating. We are also happy to address any additional questions or concerns you might have.


References

[R1] Indyk & Thaper, Fast Image Retrieval via Embeddings. IWSCTV, 2003

[R2] Le et al., Tree-Sliced Variants of Wasserstein Distances. NeurIPS, 2019

[R3] Tran et al., Distance-based Tree-Sliced Wasserstein Distance. ICLR, 2025

[R4] Tran et al., Tree-Sliced Wasserstein Distance: A Geometric Perspective. ICML, 2025

[R5] Rabin et al., Wasserstein Barycenter and its Application to Texture Mixing. SSVMCV, 2011

[R6] Bonneel et al., Sliced and Radon Wasserstein Barycenters of Measures. JMIV, 2015

评论

Thank you for the clarifying comments. I believe that my score remains an accurate assessment of this work.

评论

Dear Reviewer VFH4,

Thank you very much for your positive rating, and we appreciate your thoughtful endorsement.

We are glad that our clarifications address your concerns on our work. We will revise the paper following the feedback and suggestion.

With best regards,

最终决定

(a) Scientific claims and findings: The paper introduces two new formulations of optimal transport (OT) between unbalanced measures supported on graph metric spaces:

Orlicz-EPT – an extension of entropy-regularized partial transport (EPT) that incorporates Orlicz geometry. By calibrating the ground cost to ensure nonnegativity, the authors reformulate unbalanced OT into a standard OT problem that admits classical techniques. This provides greater modeling flexibility, especially when data exhibit heterogeneity, noise, or imbalanced supports.

Orlicz-Sobolev Transport (OST) – a dual formulation that regularizes the critic function in Orlicz-Sobolev spaces. Importantly, this reduces the computational burden to a univariate optimization problem, making OST highly efficient. The authors prove several properties of OST, including divergence and, under mild conditions, metric structure, and show that it generalizes known transport distances such as generalized Sobolev transport (GST), Sobolev transport (ST), and unbalanced Sobolev transport (UST).

Empirical evaluations on document classification and topological data analysis (TDA) show that OST is orders of magnitude faster than Orlicz-EPT, while maintaining competitive or superior accuracy.

(b) Strengths:

  1. Conceptual novelty: First exploration of Orlicz-Wasserstein distances for unbalanced measures on graphs, extending classical OT theory to a flexible and practically important new regime.
  2. Solid theoretical development: Builds on insights from Caffarelli & McCann (2010), rigorously establishes properties of OST, and carefully situates the work within the OT literature.
  3. Computational efficiency: OST reduces a two-level optimization problem to a univariate optimization, yielding dramatic runtime gains without sacrificing theoretical guarantees.
  4. Generality: OST unifies and generalizes multiple existing OT variants (GST, ST, UST), demonstrating both theoretical depth and practical breadth.
  5. Empirical validation: Demonstrates strong results on real-world tasks (document classification, TDA) with scalability to large graphs (up to 10K nodes, 1M edges).
  6. Clarity and completeness: Despite some density in presentation, the paper provides full definitions, proofs, and comparisons, making it a comprehensive and self-contained contribution.

(c) Weaknesses / Missing elements

  1. Density of exposition: At times the paper is mathematically heavy and could benefit from additional intuition, illustrations, or higher-level summaries.
  2. Experimental diversity: Evaluation is primarily on text-based and TDA benchmarks; extensions to other domains (e.g., biological or dynamic networks) would further strengthen claims of generality.
  3. Parameter guidance: Sensitivity to the choice of Orlicz function and regularization weights is acknowledged, but more systematic guidance for practitioners remains open.

These weaknesses are relatively minor compared to the overall strength and novelty of the contribution.

(d) Reasons for acceptance and Spotlight recommendation I recommend acceptance as a Spotlight. The paper stands out for several reasons:

  1. Theoretical innovation – The introduction of Orlicz geometry into unbalanced OT on graphs is both novel and significant, opening new directions for research in transport distances.
  2. Practical impact – OST provides a dramatic improvement in computational scalability while retaining strong theoretical guarantees. This makes it directly useful for real-world structured data tasks.
  3. Breadth of contribution – The framework unifies and extends multiple existing OT distances, which is of broad interest across machine learning, optimization, and applied mathematics.
  4. Balance of theory and practice – Few works succeed in combining nontrivial mathematical development with concrete, scalable algorithms and empirical validation. This paper achieves that balance convincingly. Given these points, the paper represents a high-impact and well-rounded contribution that goes beyond incremental improvements, justifying its selection as a Spotlight.

(e) Rebuttal period discussion and resolution

  1. Motivation and context: One reviewer asked for stronger motivation for OT on graphs. The authors clarified its connection to tree-sliced and sliced-Wasserstein methods, with references to prior work, and promised to elaborate further in the revision.
  2. Theoretical formalization: Suggestions to state key derivations (e.g., Eq. 6) as a proposition were accepted; authors will adjust accordingly.
  3. Use of Orlicz-EPT vs. OST: Reviewers asked under what conditions the more expensive Orlicz-EPT is useful. The authors clarified that Orlicz-EPT preserves full Orlicz-Wasserstein properties and is suitable for small-scale domains, while OST serves as its scalable counterpart.
  4. Experimental scope: Concerns about limited domains were addressed by highlighting TDA applications (including DNA microarrays and shape recognition) and inviting reviewers to suggest additional datasets. Authors argued convincingly that their current experiments cover standard OT-on-graphs applications.
  5. Parameter sensitivity: Authors provided new empirical results demonstrating modest performance variation across Orlicz functions and parameters, and suggested cross-validation as a practical selection method.
  6. Scalability: Clarified that their experiments already include graphs with up to 10K nodes and 1M edges, demonstrating practical scalability.
  7. Incrementality concerns: The authors strongly defended the nontriviality of their contributions, emphasizing the difficulty of extending OW/GST to unbalanced measures with nonnegative calibration.
  8. Clarity issues: Authors committed to adding intuition, examples, and clearer figures/tables in the revision.

Overall, the rebuttal was thorough, addressed all reviewer concerns convincingly, and even included additional empirical results to back claims. Reviewers were satisfied, and consensus shifted to acceptance.

This paper combines novel theoretical insights, computational breakthroughs, and practical validation, representing a substantial and broadly relevant contribution to the optimal transport and machine learning communities.