Scalable Sobolev IPM for Probability Measures on a Graph
We propose a novel regularization for Sobolev IPM for measures on a graph, which yields a closed-form expression for fast computation. It addresses the long-standing computational challenge for Sobolev IPM, especially in large-scale settings.
摘要
评审与讨论
This paper introduces a Scalable Sobolev Integral Probability Metric (IPM) for probability measures defined on graph metric spaces. The key focus is on improving computational efficiency while maintaining mathematical rigor for applications in machine learning, topological data analysis (TDA), and document classification. The contributions include 1.Introducing a weighted L_p-norm formulation that enables fast closed-form computation, overcomes long-standing computational barriers in Sobolev IPM and extending Sobolev IPM to graph-based metric spaces. 2.Proving that the regularized Sobolev IPM is a proper metric and showing equivalence to the original Sobolev IPM, ensuring the regularized form preserves theoretical properties. 3.Moreover, the paper makes some connections Sobolev transport, classical optimal transport (OT) on graphs and Wasserstein distances, particularly for tree-structured graphs. This provides more potential applications of this paper.
给作者的问题
Please see the four questions in "Other Strengths And Weaknesses".
论据与证据
Yes
方法与评估标准
The paper proposes a Scalable Sobolev Integral Probability Metric (IPM) for probability measures defined on graph metric spaces. This makes sense.
理论论述
Yes. It looks correct to me overall.
实验设计与分析
Yes, all experiments look valid to me while I wonder one issue about them: the choice of graph structure seems not justified. The paper does not explain why these specific graph structures were chosen. Different graph structures (e.g., random geometric graphs, small-world graphs) may impact computational performance.
补充材料
Yes, the proofs of results in Section 3 and 4.
与现有文献的关系
The paper contributes to the field of probability metrics, optimal transport, and machine learning on graphs by improving the computational efficiency and applicability of Sobolev Integral Probability Metrics (IPM). These are commonly studied in machine learning community.
遗漏的重要参考文献
The paper has cited a sufficient amount of related papers to essentially understand their contributions.
其他优缺点
The strengths are listed in "Summary".
As for their weakness,
- The paper has over-reliance on fixed graph structures. As mentioned in "Experimental Designs Or Analyses": the choice of graph structure seems not justified. The paper does not explain why these specific graph structures were chosen and has no discussion on how graph structure impacts Sobolev IPM performance. I think this question is very crucial to the Sobolev Integral Probability Metric (IPM) proposed in this paper.
- The equivalence result (Theorem 4.2) only provides upper and lower bounds, but does not analyze the gap between standard Sobolev IPM and the regularized version.
- Moreover, the paper proves equivalence with the 1-Wasserstein distance when the graph is a tree but does not fully explore relations for p>1. No theoretical guarantees on whether the regularized Sobolev IPM behaves similarly to higher-order Wasserstein distances.
- As one of the advantages of this paper focusing on fast closed-form computation, the paper presents empirical runtime comparisons, but does not provide a theoretical complexity analysis of its method, for example, on graph size |V| and |E|, etc.
其他意见或建议
Please see the four questions in "Other Strengths And Weaknesses".
伦理审查问题
NA
Dear Reviewer Qz8Z,
Thank you for your valuable feedback. Below are the answers for your questions and comments.
(1) [...]choice of graph structure (not justified)[...]not explain why these specific graph structures were chosen. Different graph structures (e.g., random geometric graphs, small-world graphs) may impact computational performance[...]
[...]over-reliance on fixed graph structures [...]choice of graph structure seems not justified[...]not explain why these specific graph structures were chosen and has no discussion on how graph structure impacts Sobolev IPM performance[...]crucial to (Sobolev IPM)[...]
→ In this work, we study Sobolev IPM for probability measures supported on a given graph (line 11–13). As in line 1481–1284 (Appendix C.3), much as Sobolev transport (Le et al., 2022), we assume that the graph metric space (i.e., the graph structure) is given. We will clarify it.
Our proposed regularized Sobolev IPM can be applied for probability measures supported on any given graph such that is connected, undirected, physical graph with positive edge length, and satisfies the uniqueness property of the shortest paths (as in line 90-103). Furthermore, as in Remark 3.6 (181-189), the regularized Sobolev IPM can be also applied for non-physical graph for the discrete case as in Theorem 3.5 (line 165-179).
In our experiments, we consider graphs _Log, _Sqrt as in Le et al. (2022) (line 326–327). We agree that there are many different approaches for document classification and TDA tasks. However, it is not the goal of our empirical studies. As in line 299-306, our experiments aim to illustrate the fast computation of the regularized Sobolev IPM, and show preliminary evidences on its advantages compared to other popular transport approaches (e.g., optimal transport and Sobolev transport) for probability measures on a given graph under the same settings.
(2) [...] (advantages) on fast closed-form computation, the paper presents empirical runtime comparisons, but does not provide a theoretical complexity analysis of its method, for example, on graph size and [...]
→ Besides the preprocessing with computational complexity (line 262-267), from Equ. (10) in Theorem 3.5, the computational complexity of the regularized Sobolev IPM is trivially linear to the number of edges in of graph , i.e., .
Moreover, by exploiting the sparsity property, we further reduce its computational complexity into (line 205-216). We will clarify it.
(3) [...]proves equivalence with the -Wasserstein distance when the graph is a tree[...]not fully explore relations for . No theoretical guarantees on whether the regularized Sobolev IPM behaves similarly to higher-order Wasserstein distances.[...]
→ We do acknowledge it (line 233-243). For , regularized Sobolev IPM is equal to -Wasserstein (Prop. 4.5). For , to our knowledge, their relation is an open problem (line 233-235), but -order regularized Sobolev IPM is lower bounded by -Wasserstein (Prop. 4.6).
We respectfully clarify that our goal for the behavior of the regularized Sobolev IPM is similar to the original Sobolev IPM (Theorem 2), but NOT the -Wasserstein. We agree that for , the regularized Sobolev IPM and -Wasserstein can behave differently. We believe that it should not be considered as the weakness for the regularized Sobolev IPM.
(4) [...]equivalence result (Theorem 4.2) only provides upper and lower bounds[...]not analyze the gap between standard Sobolev IPM and the regularized version[...]
→ We agree that in Theorem 4.2, we show that the regularized Sobolev IPM is equivalent to the original Sobolev IPM, but not their gap. However, from Theorem 4.2, one can obtain the upper/lower bounds on the ratio between regularized Sobolev IPM and original Sobolev IPM.
We further emphasize that our approach aims neither to derive a tight bound for the gap nor to provide a sharp approximation for the challenging optimization problem (Equ. (4)) of the original Sobolev IPM. Our purpose is in a different direction. More precisely, we instead propose a novel regularization for Sobolev IPM which yields an equivalent metric to the original Sobolev IPM, and a closed-form expression for a fast computation. Note that, to our knowledge, there are no efficient algorithms to compute Sobolev IPM effectively yet. We believe that each approach has its own merits.
Concluding remarks. We would be grateful if you could let us know whether our clarifications and answers address the raised concerns. If so, we kindly ask that you consider increasing your rating. We are also pleased to discuss any other questions you may have.
Thank the authors for this detailed response. Now I understand this work better and raised my point a bit.
Dear Reviewer Qz8Z,
Thank you very much for raising your rating into positive, and we deeply appreciate your thoughtful endorsement.
We are glad that our clarifications address your raised concerns. We will revise the paper following the feedback and suggestion.
With best regards,
This paper studies the Sobolev Integral Probability Metric (IPM) for probability measures supported on graph-structured spaces. The authors introduce a regularized version of Sobolev IPM, which allows for a closed-form solution and efficient computation, making it more scalable for large-scale applications. The key contributions include:
- A proof that the regularized Sobolev IPM is equivalent to the original Sobolev IPM and related to Sobolev transport (ST) and optimal transport (OT).
- Demonstration that the regularized Sobolev IPM is negative definite, allowing for the construction of positive definite kernels.
- Experimental validation showing that the proposed method is computationally much faster than traditional OT and comparable to ST, with promising performance in document classification and topological data analysis (TDA).
给作者的问题
Could the authors discuss the potential applications of TW/ST or regularized ST to general graphs?
论据与证据
Yes.
方法与评估标准
Yes.
理论论述
No. I didn't check the proof line by line, unfortunately.
实验设计与分析
Yes.
补充材料
No.
与现有文献的关系
No.
遗漏的重要参考文献
No.
其他优缺点
I am confused with the following potential weakness, besides the contribution I mentioned in the summary part.
- Limited Justification for General Graphs
While the proposed metric is claimed to work on graphs, its theoretical analysis and experiments are only carried out on trees (special cases of graphs). The paper does not provide strong theoretical or empirical justifications for general graphs, where the shortest-path structure may not be unique.
- Evaluation on Embeddings Instead of Direct Graph Metrics
The method is designed for probability measures on graphs, but its empirical evaluation relies on embedding-based representations (e.g., word embeddings for documents, persistence diagrams for TDA). This raises questions about whether the metric truly captures graph-based probability measure discrepancies or if the improvement is due to embeddings.
- Scalability Claim Relative to Existing Closed-Form Solutions
The paper highlights scalability as a key advantage, but tree-Wasserstein (TW) and Sobolev transport (ST) already have closed-form solutions. It is unclear whether the new method provides significant computational benefits beyond these existing approaches.
其他意见或建议
NA
Dear Reviewer FFqa,
Thank you for your valuable feedback. Below are the answers for your questions and comments.
(1) [...]Limited Justification for General Graphs[...]claimed to work on graphs, its theoretical analysis and experiments are only carried out on trees[...] (not for) general graphs, where the shortest-path structure may not be unique[...]
[...]discuss the potential applications of TW/ST or regularized ST to general graphs?[...]
→ We respectfully disagree. We clarify that the regularized Sobolev transport can be applied for probability measures on any given graph such that is connected, undirected, physical graph with positive edge length, and graph satisfies the uniqueness property of the shortest paths (as in line 90-103). We emphasize that there is no requirement that the given graph is a tree. Furthermore, as in Remark 3.6 (181-189), the regularized Sobolev IPM can be also applied for non-physical graph for the discrete case as in Theorem 3.5 (line 165-179).
For the uniqueness property of the shortest paths, we clarify that it does NOT require that there is only one path connecting between two nodes (or the given graph should be a tree). In fact, there may exist multiple different paths with various path lengths, connecting these two nodes. The uniqueness property of the shortest paths only requires that the shortest path between root node and arbitrary node to be unique (i.e., no multiple shortest paths between them; and there is no problem for existing multiple different paths connecting them). Please see line 1467-1480 (Appendix C.3) for the further discussion.
Note that a graph is called geodetic if for every pair of nodes the shortest path between them is unique. Thus, geodetic graphs are special examples satisfying the uniqueness property of the shortest paths (i.e., geodetic graphs are not necessarily trees). Please see Fig. 1 in Le et al. (2022) for an example of such geodetic graphs. Moreover, in our experiments, graph _Log has nodes and at least edges; and graph _Sqrt has nodes and at least edges. Thus, these graphs are obviously not trees. We will clarify it.
(2) [...]Scalability Claim Relative to Existing Closed-Form Solutions[...]highlights scalability (key advantage), but (TW/ST) already have closed-form solutions[...] (unclear) significant computational benefits beyond[...]
→ We clarify that our goal is NOT to scale up the computation of either TW or ST, but the Sobolev IPM for probability measures on a given graph.
In this work, we study Sobolev IPM for probability measures supported in a given graph (line 10-13). To our knowledge, there are no efficient algorithmic approaches to compute Sobolev IPM effectively, which hinders its practical applications (line 19-22). Please see Equ. (4) for the challenging optimization problem of the Sobolev IPM.
We propose a novel regularization for Sobolev IPM in Equ. (8) in Def. 3.3 (line 131-139), which yields a closed-form for a fast computation (Theorem 3.5, line 165-179). This paves a way for applying Sobolev IPM in applications, especially large-scale settings.
In our experiments, the performances of regularized Sobolev IPM kernels compare favorably with those of ST, OT, TW kernels (line 371-375). Furthermore, note that TW can only use a partial graph information (line 292-294), i.e., tree information sampled from the given graph.
(3) [...]Evaluation on Embeddings Instead of Direct Graph Metrics[...]designed for probability measures on graphs[...] (experiments) relies on embedding-based representations (e.g., word embeddings for documents, persistence diagrams for TDA)[...]whether the metric truly captures graph-based probability measure discrepancies or if the improvement is due to embeddings[...]
→ We agree that there are many different approaches for document classification and TDA tasks. However, it is not the goal of our empirical studies.
We clarify that as in line 299-306, our experiments aim to illustrate the fast computation of the regularized Sobolev IPM, and show preliminary evidences on its advantages compared to other popular transport approaches (e.g., OT, ST) for probability measures on a given graph under the same settings.
More concretely, in our experiments, we consider graphs _Log, _Sqrt as in Le et al. (2022) (line 326–327), and evaluate for regularized Sobolev IPM, OT, ST to compare probability measures supported on these given graphs under the same settings.
Concluding remarks. We would be grateful if you could let us know whether our clarifications and answers address the raised concerns. If so, we kindly ask that you consider increasing your rating. We are also pleased to discuss any other questions you may have.
Dear authors,
Many thanks for the clarification. I adjusted my scores accordingly.
Best regards
Dear Reviewer FFqa,
Thank you very much for increasing your rating into positive, and we deeply appreciate your thoughtful endorsement.
We are glad that our clarifications address your concerns on our work. We will revise the paper following the feedback and suggestion.
With best regards,
The paper proposes an equivalent form for the Sobolev IPM on graphs which they call "Regularized Sobolev IPM", this form has the advantage to be computable in closed form.
Authors show that this proposed IPM is equivalent to the original Sobolev IPM and they provide bounds relating it to previous work on Sobolev Transport , and optimal transport on Graphs.
Authors show that the proposed IPM is negative definite and hence can be used in defining kernels, that are later used in experiments in document classification and TDA.
给作者的问题
Please address the clarity issues in your revision to help the reader follow and understand.
论据与证据
The claims in the paper are sensible and all proofs are provided in the appendix, and the proofs seem correct.
方法与评估标准
The evaluation is reasonable and compares accuracy to compute time.
理论论述
I checked overall the proofs but did not read the details, the method is sensible.
实验设计与分析
experiments are on basic classification tasks and show a good accuracy/ compute time tradeoffs.
补充材料
I went through the proofs to check their sensibility and I did not verify everything.
与现有文献的关系
The paper provides a nice contribution in terms of an IPM that is computable in closed form on graphs.
遗漏的重要参考文献
Authors cite appropriate relevant literature.
其他优缺点
The works is solid and sound theoretically nevertheless is not written in a way that is accessible to other machine learning researchers that have no familiarity with this line of work.
This a big weakness of the work as it is not didactic in terms of introducing these concepts in a clear way that is easy to grasp. and after the theorems little discussion is made to explain the results. The notations are heavy and are not conveyed in an easy way to grasp.
some suggestions to improve the presentation and clarity:
- please introduce a simple graph example and two distributions mu and nu on it in Section 2 as a running example through out the paper and give for the quantity defined in equation 1
Please define well quantities such as how they would be computed and also etc... for a unfamiliar reader that is a lot to unpack.
-
after Theorem 3.5 give a pseudo algorithm that helps understand how this is computed in practice, the preprocessing section you have in section 4 is dense and not too informative . I checked the code provided to follow what you do in practice the code is in matlab and it will be nice to provide a python code or a pseudo code of how computations are done. Please provide a pseudo code after Theorem 3.5 and show how equation 10 is computed on the running example above.
-
After Theorem 4.2 maybe interesting if you can find an example where you can validate this on a small synthetic example.
其他意见或建议
I think calling the proposed method as regularized IPM does not make sense , I would call this "weighted sobolev IPM", since it is not just a regularization to the original Sobolev IPM. I would encourage the authors to call it Weighted Sobolev IPM on Graphs.
Theorem 4.2 and other derivations seem related to the weighted sobolev norm and the analysis in Peyre https://arxiv.org/pdf/1104.4631
伦理审查问题
NA
Dear Reviewer XuWp,
Thank you for your valuable feedback. Below are the answers for your questions and comments.
(1) [...]pseudo algorithm (Theorem 3.5)[...] (compute in practice)[...] (preprocessing) is dense and not too informative[...]
→ We respectfully clarify that the preprocessing is necessary to compute the regularized Sobolev IPM efficiently (i.e., precompute and for each edge in graph ).
As in line 206-208, following Le et al., 2022, each support in measure contributes its mass to if and only if edge is in the shortest path between root and . Therefore, for each edge , it is simple to obtain , by checking the precomputed shortest paths between root and each support in , respectively. Additionally, is precomputed. Hence, it is straightforward to implement Equ. (10) in Theorem 3.5. We will clarify it.
We will follow your suggestion to add the pseudo-code for the computation in Equ. (10).
(2) [...]example where you can validate (Theorem 4.2)[...]
→ We respectfully clarify that the findings in Theorem 4.2 are theoretically and rigorously proved in Appendix A.5. As in Theorem 4.2, for any nonnegative Borel measure on graph and for , the -order regularized Sobolev IPM is equivalent to the original -order Sobolev IPM.
To our knowledge, there are no efficient algorithmic approaches to compute Sobolev IPM (Equ. (4), line 140-142) effectively yet.
(3) [...]Theorem 4.2 and other derivations seem related to the weighted sobolev norm and the analysis (Peyre, 2018)[...]
→ To our knowledge, our theoretical findings are essentially different from the results in Peyre (2018).
We study the Sobolev IPM for probability measures supported on a given graph. The Sobolev IPM (Equ. (4)) constrains a critic function within the unit ball defined by the Sobolev norm (Equ. (3), line 124) involving both the critic function and its gradient, while Peyre (2018) considers the weighted homogeneous Sobolev norm which constraints a critic function in a semi-norm involving only gradient of the critic function (see Equ. (3) and (4) in Peyre (2018)). Peyre's approach shares the same spirit as the -order Sobolev transport (Le et al., 2022). We will clarify it.
The proposed regularized Sobolev IPM may share some spirit with the weighted homogeneous Sobolev norm and the Sobolev transport (i.e., constraint on gradient of a critic function). However, we clarify that the weight function (Equ. (5)) plays the key role to establish the equivalence between Sobolev norm and weighted norm (Theorem 3.2) for our novel regularization for Sobolev IPM.
(4) [...] (example) quantity (Equ. 1)[...]
→ We clarify that Equ. (1) is reviewed from Le et al., 2022. Please see Fig. 1 in Le et al. (2022) for such illustration.
We will follow your suggestion to quote more texts and figures from Le et al. (2022) for the review within the limited space in the main manuscript (or placed in the supplementary).
(5) [...] (define) quantities such as (how to compute them)[...]
→ is a nonnegative Borel measure on graph (line 70); , are subgraphs defined in Equ. (1) (line 57-59); , are standard notions of measures on sets.
For a specific instance, as in Theorem 3.5, we consider to be the length measure on graph (see a review in Def. B.2 and Lemma B.3 in line 1117-1133 (Appendix B.2)), and hence is the total length of .
(6) [...]regularized IPM (not make sense)[...]Weighted Sobolev IPM[...]
→ Following Theorem 3.2, we propose to relax the set of feasible critic functions (line 128) by (Equ. (7), line 128). Therefore, we call it regularized Sobolev IPM.
For “weighted Sobolev IPM”, it may confuse with IPM where a critic function is constrained within the unit ball w.r.t. weighted Sobolev norm (e.g., some weighted versions of the Sobolev norm in Equ. (3)), which are essentially different to our proposed regularized Sobolev IPM.
We will consider your suggestion. Thank you.
(7) [...] (not) introducing these concepts in a clear way[... ]address the clarity issues[...]
→ We respectfully clarify that all definitions, theoretical findings, and corresponding proofs are elaborated rigorously in mathematics. We will follow your suggestions to add more explanation within the limited space.
Concluding remarks. We would be grateful if you could let us know whether our clarifications and answers address the raised concerns. If so, we kindly ask that you consider increasing your rating. We are also pleased to discuss any other questions you may have.
The paper studies the Sobolev Integral Probability Metric (IPM) for probability measures supported on graph-structured spaces. The authors introduce a regularized version of Sobolev IPM, which allows for a closed-form solution and efficient computation. It includes both theoretical (equivalence between regularized Sobolev IPM is equivalent to the original Sobolev IPM, negative definitess of the regularized SIPM) and experimental contributions.
This paper was reviewed by three reviewers with the following scores: 3/5,3/5,3/5. Two of the reviewers increased their ratings by 1 point after the rebutal. I think the paper is studying an interesting topic and the results are relevant to NeurIPS community. Yet, the following concerns were brought up by the reviewers:
- lack of clarity/ presentation that is not accessible to a broad audience
- role of the graph structure
In its current form the paper may reach only to a niche of readers and it can be further improved in terms of the writing and added explanations and clarifications. It is borderline. Authors should carefully go over reviewers' suggestions and address any remaining concerns in their final revision. Based on the reviewers' comments, as well as my own assessment of the paper, I recommend acceptance.