Theoretical and Practical Analysis of Fréchet Regression via Comparison Geometry
摘要
评审与讨论
The paper explores the statistical properties of Fréchet regression in CAT() space, a metric space with an upper bound on curvature. The main results of the paper are non-asymptotic convergence results (Section 3.2) and angle stability (Section 3.3) of the nonparametric (kernel-based) conditional Fréchet mean estimator.
优缺点分析
Strength:
- The paper analyzes the convergence of the kernel Fréchet regression estimator. The analysis of Fréchet regression has mainly been conducted when the covariates are univariate rather than multivariate [1, 2]. On the other hand, this work presents the results in the multivariate case.
- To the best of my knowledge, Section 3.3 is new, as I have not seen any literature analyzing the angle property of Fréchet regression.
Weaknesses
-
Several statements are not clearly written or are misleading. To list them:
- Definition 3 is not the definition of strong convexity, but rather the definition of the so-called Variance Inequality (see, for example, Definitions 2.3.3 and 2.3.5 of [3]), which is weaker than strong convexity. The definition in the paper coincides with the standard strong convexity in Riemannian manifolds only if is the minimizer of .
- In Theorem 1, Line 144: what is ? Does it mean instead of ?
- In Theorem 1, is the strong convexity constant of what? It seems like the authors meant the distance function, but this is not clearly mentioned.
- In Theorem 1, the statement is written for general CAT() spaces, but it later says denotes the dimension of the manifold. Does the result only hold for CAT() manifolds instead of general CAT() spaces?
- What is “mild regularity condition” specifically in Theorem 2, Line 165?
- In Theorem 3, Line 178: the statement “does not differ too much” is vague. Under which topology?
- What is “small” mathematically in Lemma 7, Line 217?
- For Proposition 3 and Theorem 4, it does not say which Wasserstein metric is used (e.g., 2-Wasserstein? Or any -Wasserstein?). It is not written in the proof either.
- In Section 3.4, the notion of tangent space on general CAT() space is not clearly stated. One should introduce a concept like the tangent cone as in [4]. Without introducing the notion of the tangent cone, it is not clear what is meant by the gradient and Hessian in Proposition 3.4.
-
Proofs are not written in full detail and omit important theoretical steps. For example:
- Line 596 should be . In addition, while I think Line 596 is true (after fixing the typo), I don’t think this result is trivial enough to omit justification. The authors should either provide the proof or give a reference.
- I think references for Line 612 are needed as well. For example, as mentioned above, what means is not clear when is a general CAT() space instead of a manifold. I suspect this quantity can be replaced by a metric entropy-related quantity for general metric spaces, but to assure this, the details of this step should be stated precisely.
- In Line 631, why is the integral taken over the real line instead of ?
- As mentioned above, which Wasserstein metric is used in Line 773?
- Proposition 4 is written with respect to general CAT() space, but the proof uses Riemannian geometry formulas. Whether such formulas can be extended to CAT() space seem nontrivial. I believe one can make the similar statement in general CAT() spaces, but to that end one would need to define quantities like tangent cone, gradient, and Hessian for general CAT() spaces.
-
I find the results of Section 3.2 duplicative of previous works, in particular [2]. While [2] analyzes the problem under 1-dimensional covariates, the extension to multivariate regression seems the same as in the canonical kernel regression problem (if I am thinking wrong here, can you provide the specific technical difficulties?). For example, Theorem 3 of this paper seems to duplicate Theorems 1 and 2 of [2] in the 1-dimensional case.
-
I find Section 3.3 theoretically interesting, but I am not sure how one should interpret the result in practice. I have not seen an analysis where is the quantity of interest. Is there any practical example where such quantities should be taken into account?
-
Some references are missing. For example, [2, 4, 5] seem relevant and should be included.
[1] Zhenhua Lin and Hans-Georg Müller. Total Variation Regularized Fréchet Regression for Metric-Space Valued Data. Annals of Statistics, 2021.
[2] Christof Schötz. Nonparametric Regression in Nonstandard Spaces. ArXiv preprint https://arxiv.org/abs/2012.13332, 2020.
[3] Austin J. Stromme. Wasserstein barycenters : Statistics and Optimization. Graduate Thesis, 2020.
[4] Thibaut Le Gouic, Quentin Paris, Philippe Rigollet, and Austin J. Stromme. Fast convergence of empirical barycenters in Alexandrov spaces and the Wasserstein space. Journal of the European Mathematical Society, 2019.
[5] Victor-Emmanuel Brunel and Jordan Serres. Concentration of empirical barycenters in metric spaces. 35th International Conference on Algorithmic Learning Theory. 2024.
问题
-
Can authors answer questions I have written in the weaknesses section?
-
Can authors characterize the difference between Theorem 1 of this paper and Corollary 11, Theorem 18 of [5]?
-
Can authors illustrate the novelty of the work compared to [2]?
局限性
The authors expressed limitations of the work. I have stated additional limitations that I found in the weaknesses section.
最终评判理由
I would like to summarize my overview of the work as follows:
-
Despite the authors detailed feedback, I am not still fully convinced to recommend the acceptance of the paper. First of all, the contribution seems minor, given the fact the main part of the paper is largely reproducing [2]. Second, while authors gave the very detailed answer, I think the paper requires the substantial rewriting as many results are displayed without rigorous derivation or references.
-
That said, the paper has some improvements compared to [2], some of which are things I missed during my initial review. First, the angle stability result (Section 3.3) is new to the best of my knowledge. While this result may not have the direct usage, there are some potential applications. Second, the paper provide the tighter variance inequality constant compared to [2]. In my personal opinion, this improvement is marginal, as it is straightforward from the well-known Riemannian comparison theorem [7]. But still I can say getting the tighter constant is important for theoretical results.
In conclusion, to be honest I am not still supportive to accepting this paper. However, given the fact that there is more improvement than I initially assessed, I decided to increase the score to take into account the change of my view.
格式问题
I did not notice any major formatting issue.
Thank you for your detailed comments. I believe all of your comments help to improve our manuscript.
Def. 3
You’re right that Def.3 is really the variance inequality, rather than strong convexity in our context. In the revised version, we will rename Def.3 to variance inequality, and add a new definition of strong geodesic convexity: . Note that strong geodesic convexity ⇒ variance inequality with the same modulus , but the converse fails in positively curved spaces. We will insert this remark so the reader sees the logical hierarchy at a glance.
instead of
there was just a stray placeholder and should have been the population Fréchet mean . In the revision, we’ll fix it.
Meaning of
In our proofs, is not an arbitrary constant but the strong convexity modulus of the Fréchet functional , which in turn comes from the fact that in any CAT(K) space of diameter each squared distance map is -strongly geodesically convex. In the revised manuscript, we will define immediately after the required position.
General CAT(K) space and
As you pointed out, mixing “general CAT(K) space” with a manifold-dimension was misleading. What really drives the in our tail bound is covering number assumption, which for a smooth -dimensional manifold reduces to its usual manifold dimension, but in principle can be replaced by any exponent such that for all small , where is the smallest number of balls of radius needed to cover . Then, we’ll clarify the hypothesis in Thm 1 and we’ll add the above explanation.
Mild regularity
In the revision, we’ll replace that phrase by the exact kernel regression assumptions needed to invoke the uniform LLNs:
- The marginal density of is continuous and strictly bounded away from zero, and infinity on the compact region where we do regression.
- is a bounded, compactly supported probability density, and is Lipschitz so that .
- Bandwidth sequence , . Equivalently, one can require and .
does not differ too much
In the revised version, we’ll replace it by an explicit assumption in terms of the -Wasserstein distance.
where is the set of all couplings of and . Equivalently, this is the standard Wasserstein-p topology.
“small” in Lemma 7
We’ll replace the phrase “small” by an explicit requirement that the total perturbation satisfy where , where must be , and is injectivity radius around those vertices.
Order of W
In both Proposition 3 and Theorem 4, we in fact use the 2-W distance induced by the ground metric on . We choose because i) our stability and angle perturbation bounds all depend on second-moment control, ii) the 2-W distance metrizes weak convergence plus convergence of second moments.
Notion in Section 3.4
In the revision, we’ll spell out as follows.
- We’ll insert a new definition along the line of [4].
- We then explain the directional derivative of a functional at in a cone-vector is , if admits a unique cone-element with for all , we call it gradient, and under the additional second-variation regularity, we can likewise define a Hessian operator as .
L596
Thank you for pointing out it, and as you suggested, fixing as and we provide more details as follows.
For , we have for any fixed , , where .Because the map is strictly decreasing on , the smallest Hessian eigen value over the whole ball of radius is attained at the boundary point . Consequently, , .
Define , and since for all , for every . Plugging into the above yields the correct bound
L612
For any , choose so that . For each fixed , , so Hoeffding inequality gives for any , . Setting and writing yields the exponent with .
Taking a union over all points, . Hence the prefactor . In our notation we absorb into the constant giving .
We still need to control . But since is -strongly convex along geodesics, one shows it is in particular Lipschitz, indeed, , , uniformly in . Therefore, . To make this , pick .
integral taken over the real line
Writing was a typo inherited from our use of the real-line notation in the earlier 1-D proof. We’ll fix it.
Tangent cones
As you suggested, we’ll fix as we i) move the smooth jet-expansion argument into a separate subsection explicitly labeled “when is a manifold of sectional curvature ,…”, and ii) recall that at any one can form the tangent cone , and will define the one-sided directional derivative and the second-variation.
Novelty compared to [2], [5]
The main novelty compared with [2] is the ambient geometry. That is, [2] assumes Hadamard space or bounded, non‑positive curvature (NPC) only but we assume general CAT(K) space with . Positive curvature breaks convexity of squared distance and we overcome this by providing a strong geodesic convexity constant and carry it through the bias/variance decomposition. [5] is for un‑weighted, i.i.d. barycentres but we handle self-normalized kernel weights depending on all ; requires new empirical process + strong-convexity argument.
Practical role of the angle
We agree that angles at an interior point of a geodesic triangle rarely given an explicit statistical treatment. Section 3.3 was written to fill exactly this gap. We give several concrete settings illustrate its usefulness.
| Domain | What the angle encodes | Why continuity of is useful |
|---|---|---|
| Wild/ocean-current forecasting on the sphere | Change in large-scale flow direction between two forecast lead-times , around today’s analysis . | Operators care about veer vs. back decisions (clockwise vs. counter-clockwise change). The result guarantees that small perturbations in the training set cannot flip a decision. |
| Ronot-arm orientation in | The dihedral angle between two predicted end-effector orientations transported to the mean orientation. | When blending motions, the angle controls the shortest interpolation path. Our bound ensures the kernel regression output never makes the arm “swing the long way round” after a small sensor perturbation. |
| Diffusion-tensor imaging (each voxel’s local fibre direction is a point on ) | Crossing angle between fibres , at the local Fréchet mean . | Crossing/branching detection relies on thresholding this angle (e.g., declaring a crossing if ). Our angle-stability result provides a finite-sample guarantee that the detection rule is robust to subsampling of gradient directions. |
| Kendall shape space$ (positive-curvature complex projective space) | Principal geodesic directions of shape variability; the angle determines mode coupling. | Shape analysis often interpret “bending vs. stretching” modes by looking at the angle between the first two principal geodesics. Our theorems justifies bootstrapping confidence intervals for these angles. |
| Phylogenetic trees | Although globally non-positively curved, each orthant boundary possesses spherical links; angles there quantify speciation vs. horizontal transfer events. | Stability of such angles supports hypothesis tests on reticulation vs. branching. |
I truly appreciate the authors for detailed answers.
For the comparison with [2], in contrast to what the authors claimed, I think positive curvature cases are in fact included in [2], as the bounded space results. What the authors claimed to be the differences, e.g., strong convexity parameter and replacing by the covering number growing rate, seem to be already included in [2] as Assumption 1. In my opinion (correct me if I am thinking wrong), the only difference is that this paper works in a multivariate covariate case, but I think this extension is straightforward.
To be more specific, the bounded space analyses of [2] do not assume the curvature upper bound K, so they made the variance inequality as the assumption. On the other hand, as authors pointed out, the variance inequality holds at least locally, if there is a curvature upper bound (Line 589. Comments: Again, I expect this result to be true, but the justification is missing. I do not view this result trivial to omit the justification), and then the authors proceed the analyses with the variance inequality. However, in that case I think Line 590 also requires a precise statement. What is ‘small’ in this line and how large the local neighborhood of should be? The exact forms of these quantities are important, as to proceed as in the Line 600 you need a priori knowledge that your estimator is inside that small neighborhood of . In particular, given the fact that is a random quantity, it is unclear whether would lie in such a local neighborhood of . To the best of my knowledge, ways to get over this obstacle are three: 1. Restrict the entire space by small neighborhood of , so that in spite of randomness is always guaranteed to be inside the neighborhood. 2. Use the condition in Theorem 3.3 of [6]. 3. Restrict the family of distributions whose population mean satisfies the variance inequality. The second and third approaches are already proposed in [2], and the first case would anyway fall down to the analysis of bounded metric space in [2] again. In this regard, to claim the novelty, I believe the authors should be able to make Line 590 hold without relying on these existing strategies. Otherwise, as noted, the results largely reproduce those of [2].
For the local jet expansion, if I understand the authors’ response correctly, authors seem to restrict M to be just a manifold for Proposition 4. Then, I do not find an implication of this proposition. Isn’t it just a direct Taylor expansion argument in Riemannian manifolds? I think if authors want to claim Proposition 4 as one of the main results, one should provide the statement beyond manifold.
Overall, although I truly appreciate the authors’ detailed responses, I still find that many of the paper’s main results overlap with existing work in the Frechet regression literature, in particular, [2]. Based on my understanding, the only genuinely new contribution is the angle property discussed in Section 3.3. I find this result truly interesting, and examples authors provided in the response seem to verify its usefulness in practical problems as well. However, I am not convinced that it alone provides sufficient novelty to warrant acceptance.
[6] Adil Ahidar-Coutrix, Thibaut Le Gouic, Quentin Paris, Convergence rates for empirical barycenters in metric spaces: curvature, convexity and extendible geodesics, Probab. Theory Related Fields, 2020.
Thank you for the careful reading, especially for pointing out where our comparison with [2] could have been clearer. Below we recognize the argument so the true differences are transparent, then address the neighborhood / small- question and the scope of Proposition 4.
In [2], only boundedness is imposed and curvature never enters any rate. The strong convexity constant in Assumption 1 is left abstract (called ), while we assume a CAT(K) bound with explicit and derive , . Thus the constant driving variance depends explicitly on and the data diameter . This reveals a geometry-driven phase change, the variance term degrades continuously as . No such curvature-sensitivity can be extracted from [2], so their bounds cannot predict the MSE gap we see empirically between and . Indeed, Theorem 3 gives for any smoothing for any smoothing bandwidth ,
where and depend on the kernel and the data distribution (see L687-L700). In the experiment, these two constants are identical for the two runs (positive / negative curvatures) because we feed the same cloud of Euclidean predictors and keep the same kernel, bandwidth schedule and sample size . Hence all the curvature enters through only, which can be written as
Thus, for every negative curvature space, we have . For positive curvature spaces, consider the unit sphere with constant curvature . Points are drawn only from the spherical cap , where is the geodesic (polar) distance from the north pole. The farthest a sample point can be from the north pole is exactly that polar angle, hence , . Using the explicit formula for the convexity constant for , . Then, the ratio of two moduli is . Every risk bound in Sec.3 carries a factor and therefore, the expected MSE on the sphere must exceed that on the hyperbolic disk by % whenever variance dominates the bias. The empirical gap observed in Table 1 is and therefore, we can observe that, numerically, MSE on the positive curvature space exceed that on the negative curvature space by %, well within sampling noise of the theoretical forecast.
Next, L590 in the old draft indeed lacked a quantitative statement. We now insert
Because for any bandwidth sequence with , the estimator lies in the required ball with probability . This avoids the three work-arounds listed by the reviewer, we neither shrink the whole space, nor assume the variance inequality a priori at . Instead, it is a self-bounding argument. Once is explicit, the quadratic growth of the risk pulls back inside automatically.
We now give the missing justification: if is CAT() or CAT() with diameter , then for any measure supported in that ball, . The proof combines the hinge-convexity of squared distance in Hadamard spaces with Toponogov comparison when . Thus the inequality holds globally in the settings we analyze.
We agree that the basic Taylor formula on a Riemannian manifold is classical. What is new is that in a CAT(K) space we can still write with , where is expressed by the Alexandrov angle plus an -curvature correction. The lemma bridges non-smooth comparison geometry with smooth manifold calculus. It is precisely why our rates continue to hold on quotients and stratified shape spaces where only a curvature bound (not charts) is available. We will reposition Prp4 as a technical lemma to avoid overstating its novelty.
I truly appreciate the authors additional detailed justification which I missed when I am writing the response.
I think the first part is something new which the authors can emphasize; seems like the authors are claiming that the curvature upper bound gives the tighter variance inequality constant than just plain boundedness condition. I view this as an improvement of [2]. Then, ignore the bullet point 1 of my previous response (still, bullet point 2 is still a valid question).
For the second part, unfortunately the authors' justification does not seem to fully explain the concern I raised. What I concerned was the first part: "Strong convexity with modulus gives for every " this only holds when we have a priori guarantee that is in the local neighborhood (of strong convexity / variance inequality) of . The authors are in particular plugging-in , the random quantity, so one needs to ensure is in that local neighborhood, despite the randomness. How can one guarantee that?
I truly appreciate the authors for the detailed answer.
I still have a few more concerns on the authors' claim.
-
The authors claim that the difference between [2] and this work lies on the use of the curvature information in variance inequality. I agree this is also the new component compared to [2]. However, the local variance inequality of the squared distance functional is well-known in Riemannian manifolds, and the proof technique is basically the same as in CAT(K) spaces as they use the comparison theorem, e.g., see the discussion after Corollary 2.1 of [7]. Precisely, all these comparison results (both Riemannian and CAT(K)) are from the comparison between the sphere . Thus, the authors are correct that there is an improvement compared to [2] that the variance inequality constant depends on the curvature upper bound, but I personally think this additional component--plugging-in such variance inequality constant for CAT(K)--is a marginal result, as they are (while might be controversial) the direct extension from the Riemannian result. And it seems like except this variance inequality constant part, the rest of argument is largely reproducing [2] as I mentioned in the previous response.
-
One more crucial concern I have is that I am uncertain whether authors are actually using the correct result. To be honest, it is very hard for me to check whether authors' arguments are correct, as authors defer many of the results by "well-known comparison geometry result" without providing the formal derivation or references. To be specific, authors claim that the variance inequality constant for the squared distance function is of the form of . However, authors did not provide any precise derivation or reference for this result. On the other hand, while the result is on Riemannian manifold, [7] gets the in the form of instead of , and I believe should be the right one for not only Riemannian but also general CAT(K) spaces as well; as mentioned, both Riemannian and CAT(K) results on this strong convexity are basically rooted from the same comparison between the sphere , and for the squared distance on sphere is the right quantity to appear. In fact, this strong convexity is tight on sphere, which is also CAT(K). So this result should also be tight in CAT(K) as well. In this regard, I doubt the is in the form of as authors claim, and believe the quantity in [7] is the right quantity.
Overall, I truly appreciate to the authors for pointing out their unique component compared to [2], the constant for variance inequality on CAT(K) spaces. However, such contribution is (in my personal opinion) a direct extension of the Riemannian result due to the reason I mentioned in the above. Hence, even with this extra component, I am still not convinced that they provide sufficient novelty to warrant acceptance. Furthermore, I find many arguments in the paper still ambiguous and unclear, which are hard to verify, and even some of them possibly incorrect.
[7] Foivos Alimisis et al. A Continuous-time Perspective for Modeling Acceleration in Riemannian Optimization. AISTATS 2020.
Why the bounded-space result of [2] cannot recover rate
Let us denote by the strong-convexity parameter that appears in Assumption of [2]. Because no curvature is assumed there, the smallest value it can ever take on a ball of radius in a positively curved manifold is . Hence for every , . So the bound of [2] is strictly optimistic compared with what a CAT(K) geometry allows. It cannot predict the excess MSE we observe on the sphere cap. This constant-gap is explicitly what our Theorem 3 and the experiment quantifies.
High-probability radius
Let . Strong convexity with modulus gives for every ,
Now plug and rearrange
The bracket in the above is an empirical process centered at its expectation . Applying Bennett’s inequality for the empirical process yields for some universal . Because whenever , the estimator is automatically in the required neighborhood with overwhelming probability, no restriction, no extendible geodesic assumption and no pre-shrinking of the parameter space.
Thank you for raising additional questions.
vs.
While we already gave the derivation of explicit form of in the previous rebuttal, we show again that this point is not significant problem. For any set and choose a unit-speed geodesic from to with initial velocity . The squared distance is on the open ball with . Its Riemannian Hessian in the tangent space is for all . Because decreases on , the minimal eigen-value over the whole ball of radius is attained at the boundary . Denoting the diameter , the strong convexity modulus that enters is therefore . This coincides with the formula in [7]. For define . Because and on , . Let , and then, with . Here,
with . Differentiate gives . Because on , we have , and hence . Since and is strictly increasing, for every . Therefore on . We have and is strictly increasing, so for all . Multiplying back by the positive factor gives and therefore, for . Hence, . By replacing by the larger quantity every rate is unchanged, and no result becomes invalid.
guarantee
Let , and . On the fixed ball of radius , one have for all , where . Here, there is a sequence such that with probability at least , . On the same high-probability event we have, satisfies , and for any in , , . Hence, . Since , we get , i.e., where . Because , eventually . Thus on the same event we conclude that
In particular, plugging into the local convexity / Taylor expansions is now justified with probability .
Thank you very much for the time and care you devoted to our paper. We deeply appreciate every point you raised in your initial review as well as during the rebuttal discussion, and we will incorporate all of your suggestions and corrections in our revised version.
It is rare to encounter such a constructive and detailed discussion during the review phase of an international conference. Your thoughtful engagement has greatly strengthened our work, and we are sincerely grateful for your invaluable contributions.
I truly appreciate the authors for elaborating detailed derivations.
Unfortunately, I guess I will have to conclude my thoughts on this paper, as discussion period is about to end. I would like to summarize my overview of the work as follows:
-
Despite the authors detailed feedback, I am not still fully convinced to recommend the acceptance of the paper. First of all, the contribution seems minor, given the fact the main part of the paper is largely reproducing [2]. Second, while authors gave the very detailed answer, I think the paper requires the substantial rewriting as many results are displayed without rigorous derivation or references.
-
That said, the paper has some improvements compared to [2], some of which are things I missed during my initial review. First, the angle stability result (Section 3.3) is new to the best of my knowledge. While this result may not have the direct usage, there are some potential applications. Second, the paper provide the tighter variance inequality constant compared to [2]. In my personal opinion, this improvement is marginal, as it is straightforward from the well-known Riemannian comparison theorem [7]. But still I can say getting the tighter constant is important for theoretical results.
In conclusion, to be honest I am not still supportive to accepting this paper. However, given the fact that there is more improvement than I initially assessed, I will increase the score to take into account the change of my view.
This paper presents a rigorous theoretical foundation for Fréchet regression, which generalizes regression to non-Euclidean metric spaces (such as manifolds). The key innovation is the use of comparison geometry, particularly CAT(K) spaces (geodesic metric spaces with curvature bounded above by K), to study or prove:
-
the existence, uniqueness, stability of the Fréchet mean and associated regression estimators according to the sign of K.
-
the sample Fréchet mean converges exponentially fast to the population mean; pointwise consistency of nonparametric Fréchet regression, and the convergence rate O(h_n^{2\beta}+(nh_n^d)^{-1}), a usual trade-off case from Euclidean nonparametric statistics carried over to the CAT(K) setting.
-
angle stability and local geometric effects.
The paper also provides some real world data experiments.
优缺点分析
The strength of this paper is to use comparison geometry, particularly CAT(K) spaces, to study the Fréchet regression and establish statistical guarantees (see summary). Particularly, they provided rates of convergence compared to to the existing work Chen & Müller (2022).
The weakness of this paper is as follows.
-
As the paper appears to be mainly theocratically focused paper, the use of CAT(K) space is more tailored to suit the need for proofs while the fact that the data is assumed to subject to this fixed curvature K geometry is restrictive, that is not data adaptive.
-
Moreover, although there are real world data experiments, at least the paper could discuss whether the data could be used to directly learn the curvature or to determine if the CAT(K) condition is satisfied, or roughly satisfied. To this end, is there any oracle approach to determine if the data fits in this CAT(K) model, such that even if the data may not satisfy this CAT(K) model, any data preprocessing or transformation could be applied to remedy this?
问题
Please answer the question in Section weakness. Moreover, what does approximate equality sign mean, say in equation 9? It seems there is no rigorous explanation for it throughout the paper.
局限性
yes
最终评判理由
The main concern I have is on the condition CAT(K) about its empirical validity and feasibility. This is to me a real important part and The author didn’t include enough discussion in the original manuscript. I am, however, satisfied with the response the authors provided in the rebuttal. Thus I maintain a positive score.
格式问题
No
We would like to thank you for your very constructive comments.
Making the curvature assumption more data‑adaptive
As you suggested, more data-adaptive way to tune curvature from the data, rather than assuming it fixed in advance, is very useful in practice. To address this point, we might use the following three ideas.
Empirical triangle-comparison estimator
Randomly pick triplets of points from the dataset. For each triplet, approximate the geodesics (e.g., via pairwise shortest-path or local linear interpolation).
In the simply connected 2-D model space of constant curvature , the law of cosines relates the three side-lengths to the angle at the vertex opposite side :
where
and
For each triangle, we observe its three side-lengths and its measured opposite angle . Solve numerically for the that best satisfies the spherical / hyperbolic law of cosines.
Then, take , and check how many triangles violate the CAT() inequality by more than a tolerance , and adjust up or down until, e.g., 95% of our triangles satisfy
Cross-validation over a curvature grid
Consider the grid of candidate curvatures , spanning the range one believe plausible. Then, for each data point , compute its model-space coordinates relative to some reference via stereographic or exponential-map embedding of curvature . In practice, one only need pairwise distances in the model space. Split into folds: for each fold, fit the nonparametric Fréchet regressor assuming space = CAT(), compute held-out squared geodesic-distance error, and average over folds to get . Finally, select .
Slack tuning for approximate CAT(K)$
Even once is chosen, real data rarely satisfy axiomatic CAT(\hat{K}) exactly. Instead, we can consider tuning “slack” .
For each test triangle , compute . Let be the -th percentile of the . Then declare the space -approximate CAT(). All your convexity constants and stability bounds (e.g., strong convexity constant ) become , for a small correction one can bound analytically.
We will add the above discussion in the revised manuscript.
Empirical Oracle for “Is it CAT(K)?”
This point is very interesting, and we consider the discussion as follows. We believe that including this discussion on the revised manuscript is very helpful for better understanding of practical usefulness of the study.
Sample triplets uniformly at random from the dataset. Then, for each triangle, compute the three side-lengths , and pick points at the some fraction along each geodesic, and measure the violation
where are the comparison points in the model space of curvature .
If , one have an -approximate certificate. More robustly, report the -percentile of as . If is “small” relative to the data’s diameter, we’re approximately CAT().
Learning directly from data
Each triangle plus its measured angle at yields a local curvature solve . Then, one idea is numerically invert for and aggregate (median or M-estimate) to get .
For each triangle , with true side-lengths and measured opposite angle , define the residual
where by definition,
and
By construction, the true curvature satisfies whenever the data are exactly CAT(K). In reality each will be nonzero because of measurement noise in .
Fix one triangle , and write the measured data as
where is the estimation error (we assume ). Define
and let . By assumption, . A first-order Taylor in both the parameter and the data gives
Set , and rearrange to isolate gives
since is a nonzero constant under non-degeneracy and .
One do this for , and each satisfies . Under mild symmetry / moment conditions, the median (or mean) of these i.i.d. estimates then concentrates at
by the usual order-statistics CLT (or Delta-method on the sample median).
Moreover, denote by our Fréchet estimate using assumed curvature . A sensitivity analysis via the implicit function theorem on the first-order condition shows
since the empirical Hessian is bounded away from zero by our strong-convexity constant . Hence the additional error from plugging-in is asymptotically negligible compared to the usual rates.
Data-driven “remedies” when CAT(K) fails
Even if the raw distances violate CAT(K), one can re-embed or learn a nearby metric that is CAT(): introduce slack variables for each triangle and solve
plus a penalty . This yields a closest CAT() metric.
Clarification on in Eq (9)
In fact, our nonparametric Fréchet regression weights are the usual normalized kernel weights,
When we wrote , we meant that, for large ,
i.e., the random normalization factor is asymptotically deterministic (by LLNs), and hence each is proportional to .
Equivalently, one can show under standard density-and-bandwidth assumptions that there exist constants such that, with high probability for large ,
In our proofs, we only need this proportionality and the fact that . We will replace by the exact definition above and include the statement in the revised version.
Thanks for the thoughtful follow-up. We agree that the data-adaptive aspects should be visible in the paper itself (not just in the rebuttal). Below is what we propose to add—short, self-contained, and ready to paste—so readers can apply the ideas immediately.
- New Remark after Theorem 3 (main text): “Data-adaptive curvature and approximate CAT(K)": Defines an empirical CAT(K) score, two practical selectors for K (triangle-based and CV-based), a slack parameter for approximate CAT(K), and the self-localization radius used in our proofs. States the plug-in stability bound: using instead of does not change the rate.
- Short lemma (main text, same spot): Self-localization under strong convexity. Gives the precise radius ensuring the empirical minimizer lies inside the local neighborhood, with constants spelled out.
- Appendix (very short, 1–2 pages): Proof sketches for the plug-in stability bound and the self-localization lemma. A one-paragraph note on how to compute the triangle-based score in practice.
- Notation fix (Section with Eq. (9)).
Again, we appreciate the guidance. We will integrate the above remark + lemma + small appendix so the data-adaptive perspective is visible at a glance.
I thank the authors for the detailed answer for my questions in the review. I think this data adaptive part is a very important issue and the clarification you provided here is very helpful for its real applications. However, I believe this could have been better if it were inserted into the paper in some way before. Thus, I would appreciate it if the authors could summarize them into several key ideas and make them as a remark/discussion in the final manuscript. I will maintain my score here.
This paper is concerned with Fréchet regression on spaces. It is divided into two parts: a theoretical part and an experimental part. The theoretical part starts with some classical results on the Fréchet mean in spaces, then establishes several other results in the context of spaces, in particular (i) a concentration result for the sample Fréchet mean, (ii) pointwise consistency and (iii) convergence rate of the nonparametric Fréchet regression estimator, and (iv) angle stability for the conditional Fréchet mean. The experiments section compares, in terms of mean squared error, the estimation performance of Fréchet regression on positively curved and negatively curved spaces, showing the superiority of hyperbolic coordinates. This is shown on synthetic data as well as on real-world datasets.
优缺点分析
The paper is overall well written and the exposition is clear. The results presented in sections 3.2. and 3.3. of the theoretical part are interesting and implications of these results for practical purposes are included at the end of each subsection.
However, the importance of the results of sections 3.4. and 3.5. is less clear to me, as detailed in the questions below. Moreover, it is not always clear in my opinion which results are already known, and which are new, and how the new results relate to the existing literature. In particular it seems to me that Theorem 1 is closely related to Theorem 5.8. of [a], giving a concentration inequality for sample Fréchet mean in CAT(K) spaces for , however [1] is not cited in the paper. Finally, the experiments section is very short, and a bit disappointing in my opinion: the experiments concern the estimation precision of the Fréchet regression estimator in positive and negative curvature, but nothing is shown to illustrate (at least some of) the precise results given in the theoretical section. As it is, the theoretical part and the experiments part seem somewhat unrelated.
[a] V. E. Brunel and J. Serres, Concentration of empirical barycenters in metric spaces. International Conference on Algorithmic Learning Theory. PMLR, 2024.
EDIT : I have increased my rating according to the authors' clarifications regarding their contributions. I believe that providing information on results already known in the literature concerning Fréchet regression will add significant value to the paper.I would also like to thank the authors for their careful response to the reviewers' comments, which have helped to clarify the impact of their results.
问题
- It is not clear in my opinion which results were known, and which are new, and how the new results relate to the existing literature. It is stated that the lemmas in Section 3.1. follow from previous work. What about the Lemmas 6 and 7 on angle comparison and continuity in spaces, and Lemma 8 on angle comparison in a Riemannian manifold? Concerning the main results on concentration, consistency and rates of convergence of Fréchet regression in spaces, how do they relate to the literature? Were similar results known in other more restrictive contexts, or weaker results in the same context? In particular, how does Theorem 1 relate to Theorem 5.8. of [a], that gives a concentration inequality for sample Fréchet mean in spaces for ?
- Section 3.4: I am not sure that I understand the point of the results in this section. It seems to me that Lemma 8 is a general Riemannian geometry result, not particularly linked to Fréchet regression, and not used in any other result of the paper. As for Proposition 4, it is written “By expanding the Fréchet functional in the tangent space via the exponential map, one can gain insights into the functional’s curvature and higher-order properties. » However here the curvature term is not explicit, neither is it explicitly computed in the proof if I am not mistaken. Thus, it is not clear to me what the reader could use this formula for.
- Experiments do not really illustrate the theoretical results: they only show better estimation in negative curvature than in positive curvature for the MSE criterion. Would it be possible to illustrate some of the formulas given in the theorems ?
- How restrictive is the assumption of Hölder continuity for in Theorem 3? More precisely, under what conditions on the distributions of and can this regularity be derive?
Minor comments:
- I don’t understand the inner product in last formula of Proposition 5
- Section 2 : definition 7 requires M to be a Riemannian manifold and not just a geodesic metric space like stated at the beginning of the section.
- In Lemma 8, why not use and in all the formula, including in the term ?
- It could be more clearly stated that the predictor live in the Euclidean space.
- Why not use the same notations in Proposition 5 and its proof ? ( and )
- Typos. In the inequalities line 514, a term is missing on the third line. Typo in the equation of line 536. "Proof of Proposition 4" in line 563 should refer to Lemma 4 and 5? Theorem 5 line 160 should be "Lemma". In eq.(6), should be . In the second equality of equation line 631, the integral should be taken over ; and again line 633. Typo in eq. line 450. Appendix D is not referenced in the text.
局限性
Yes.
最终评判理由
I have increased my score to 4, because I find that the proposed generalization of several existing related results on Fréchet regression problem for manifold with possibly positive curvature to be an interesting and valuable contribution. The authors have committed to adding clear statements and a table to clarify which results are news, as well as to better highlight the relevance of their results concerning angle stability and local jet expansion of the functional. Additionally, explanations about the context of the experiences also helped to clarify their relevance. Overall, the numerous clarifications made during the revision phase must appear in the paper for it to be accepted.
格式问题
No formating concerns.
Thank you for your very constructive comments.
In earlier literature?
To address this concern, we will insert the following table in the revised manuscript, and add corresponding citations.
| Tag in paper | Statement | In earlier literature? | What is new here |
|---|---|---|---|
| Lem 1-5 (Sec. 3.1) | Existence / uniqueness of Fréchet mean under basic CAT(K) hypothesis | Classical – e.g. Yokota ’16, Karcher ’77 | We only restate (no claim of novelty) |
| Lem 6 | Angle comparison in general CAT(K) (both K < 0 and K > 0) with explicit perimeter bound | Reshetnyak (1968) gave , positive K version with a sharp bound appears nowhere in print. | New extension + explicit bound |
| Lem 7 | Quantitative Hölder‑type continuity of Alexandrov angles when triangle vertices move an amount | No earlier quantitative modulus; only qualitative continuity. | First explicit bound, needed for our statistical stability proofs |
| Lem 8 | Second-order angle expansion in a Riemannian manifold with remainder | Classical second variation exists but without a tidy remainder usable in statistics | We package the formula with an explicit error term; not fundamentally new but not written out elsewhere |
| Thm 1 | Exponential concentration of kernel-weighted local Fréchet means in CAT(K) | Thm 5.8 of Brunel–Serres (2024) is for unweighted i.i.d. barycenters; no predictors | Handles self-normalized kernel weights depending on all ; requires new empirical process + strong-convexity argument. |
| Thm 2 | Pointwise consistency | Immediate once Thm 1 holds | - |
| Thm 3 | Minimax non-parametric rate in curvature | Schötz (2020) gives same rate only in Hadamard () spaces; no positive curvature, no -constant. | Rate holds for any sign via our modulus that repairs loss of convexity when . |
| Prop 4 | Second-order jet of Fréchet functional in metric setting | Only for smooth manifolds in classical Riemannian textbooks | We extend to Alexandrov tangent cone & supply error bound |
Clarification of Section 3.4
We appreciate the reviewer’s concern and agree that the motivation of Section 3.4 was not made sufficiently explicit in the current manuscript. Below we (i) clarify why the local‐expansion tools are needed for later statistical tasks, (ii) show that Lemma 8 is the missing geometric ingredient for those tools, and (iii) give the concrete curvature term that was implicit in Proposition 4 together with two concrete ways it can be used. After Thm 3, we ultimately want a second-order stochastic expansion's variance involves the operator (the Hessian of the Fréchet functional at the target). Without this part, we only have first-order consistency, and the expansion supplies exactly the . Empirically, the sample Fréchet mean or regression estimator is obtained by gradient descent or Newton steps on the manifold. The explicit Hessian term allows us to give local quadratic convergence guarantees and curvature-corrected step sizes. Specifically, let be the Newton residual after iterations. If (with the Lipschitz constant of ), then , i.e., quadratic convergence with a curvature-corrected step factor .
Lemma 8 states
where and . It provides the remainder bound in Proposition 4, ensuring the Taylor remainder is uniformly controlled across in a compact set, and it supplies a curvature-dependent Lipschitz constant for the Alexandrov‐angle map, which we later feed into the angular-perturbation and bias part. We will add a short pointer to those uses at the start of Sec 3.4 so that the logical chain is visible. In the revision, Proposition 4 will read , , with , where is the Riemannian curvature tensor at and we identify with is normal coordinates.
Experiments and Theory
Theorem 3 gives for any smoothing for any smoothing bandwidth , satisfies
where and depend on the kernel and the data distribution (see L687-L700). In the experiment, these two constants are identical for the two runs (positive / negative curvatures) because we feed the same cloud of Euclidean predictors and keep the same kernel, bandwidth schedule and sample size . Hence all the curvature enters through only, which can be written as
Thus, for every negative curvature space, we have . For positive curvature spaces, consider the unit sphere with constant curvature . Points are drawn only from the spherical cap , where is the geodesic (polar) distance from the north pole. The farthest a sample point can be from the north pole is exactly that polar angle, hence , . Using the explicit formula for the convexity constant for , . Then, the ratio of two moduli is . Every risk bound in Sec.3 carries a factor and therefore, the expected MSE on the sphere must exceed that on the hyperbolic disk by ≈ 21% whenever variance dominates the bias. The empirical gap observed in Table 1 is and therefore, we can observe that, numerically, MSE on the positive curvature space exceed that on the negative curvature space by ≈ 16%, well within sampling noise of the theoretical forecast.
Hölder continuity
In Thm 3, the only place where Hölder modulus is used is in the bias term of the oracle bound. This condition itself follows from two ingredients that are completely separate from CAT(K) geometry: i) strong geodesic convexity of the Fréchet functional, and ii) a quantitative continuity assumption on the family of conditional laws .
For each predictor value , let , . Because is -strongly geodesically convex, the (sub-) gradient equation characterizes the minimizer and provides the ellipticity needed for an implicit-function argument. Assume the conditional laws move Hölder‑continuously in Wasserstein-2: , for and . Note that the total-variation continuity or any transport cost dominating also works, but is the weakest metric that still controls first and second moment and therefore is the least restrictive. Write , . Because minimizes while is -convex, . Add and subtract :
Hence, , and we have
So, no extra regularity of is assumed.
When does it hold?
- Smooth conditional density. If possesses a density that is in uniformly in and has bounded second moments, then the map is in .
- Location-scale families. If with having identity covariance, Lipschitz or Hölder continuity of and immediately gives the assumption with the same exponent.
- Kernel design-based models. For the synthetic experiments in Sec.4, the response is generated by rotating a fixed base point along a geodesic by an amount that depends smoothly on the Euclidean predictor: the rotation angle is in the predictor and therefore and the assumption holds with .
- Discrete or tree-valued . Even in non-smooth cases, one often has an -transport bound . Because on bounded spaces, the assumption still follows.
Minor comments
- Clarification of the inner product. In that formula we intended the usual Riemannian metric at the base point , applied to two tangent-vectors. Concretely, , where is the inner product furnished by the manifold’s Riemannian metric and is the (2,0) Fréchet-Hessian interpreted as a self-adjoint linear operator via that same metric.
- Notation errors and typos. Thank you for pointing out, we will fix them all accordingly.
Thank you for your detailed response, I greatly appreciated the thorough answers to my concerns. The table you provided is particularly relevant, and emphasize that your paper generalizes several results of the literature. Nevertheless, I feel that the experimental section could have better reflected the results presented in this work.
That said, I am happy to raise my score thanks to the clarifications you provided in your response regarding the related literature and Section 3.4.
In this paper, the authors study Frechet regression (a setting where data labels are lying in a non-Euclidean space) and provide a theoretical analysis of it via comparison geometry (i.e., via comparing with known spaces with constant curvature K). The main focus in on CAT(K) spaces, which have thinner triangle when compared with a space with constant curvature K.
In particular, the prove statistical convergence results for this problem. In Theorem 1, they prove the distance between the population mean and the empirical mean is small with high probability, for large sample sizes. Next, in Proposition 2, they prove the same results for L_p norms. Later in Theorem 2 they prove that almost surely the empirical regressor will converge to the population regressor, and finally in Theorem 3 they prove a non asymptotic convergence rate for population risk in Fechet regression. They also provide convergence for angles in Theorem 4
They conclude the paper with experiments.
优缺点分析
Pros:
- very well written paper
- nice theoretical results
- relevant to the conference (specifically geometry/ML intersection)
Cons:
- barely motivated
- some assumptions are not well explained (why they make sense?)
- only focus on CAT(K)
问题
This is an interesting paper about regression when labels are non-Euclidean. The authors derive statistical convergence (high probability, L_p, non-asymptotic, etc.) bounds on this problem. They all match our previous understanding from Euclidean spaces, thus extending them to beyond such settings.
Major Comments/Question:
-
Missing concrete applications: the paper is not well motivated about why we need going beyond Euclidean labels. Please provide more evidence for that.
-
Given a space, how can we ever verify if it is CAT(K)? It looks this assumption is theoretically plausible but hard to see or verify. Please provide evidence why assuming we have CAT(K) is reasonable.
-
Section 3.4 is a bit awkward. I can barely understand what the purpose of it is. It seems the authors are using Taylor series but I don't get why. Please explain it.
Other Comments:
-
In Definition 1, how do you make sure the comparison triangle exists? How do you find the so-called "corresponding" points for x and y in comparison triangle?
-
In Definition 3, shouldn't the parameter depend also on ? They way the definition phrased look like it doesn't.
-
In Line 77, are you implicitly assuming that the number K is fixed? Otherwise, what do you mean by comparison triangle there?
-
Section 3 provides an interesting series of lemmas as background. This is interesting but in my opinion it would be great if you can include exact references for them (where exactly they are appeared). This helps the reader if they want to trace back the previous results.
-
When the kernel in Line162 satisfy Assumption 1? Does it always do that? Can you provide a proof? This is essential because it is difficult then to find cases satisfying Assumption 1.
-
In Theorem 4, isn't it the case that achieving a small epsilon needs exponentially many samples in dimension since the optimal transport measure estimation suffers from the curse of dimensionality? In that case the results of Theorem 4 are barely applicable in practice in high dimensions. Can we somehow extend them to a better distance such as sliced Wasserstein?
-
In Equation 10, people usually continue analysis by optimizing the bandwidth and obtaining the final rate as a function of sample size. I suggest the authors to do it to make their result more readable.
局限性
yes
最终评判理由
The authors provided a comprehensive response to my comments, thus I increased my score from 2 to 4. However, Reviewer BzDS has some criticisms about the novelty of this work, so my score update is only conditional on resolving Reviewer BzDS concerns. Thanks!
格式问题
N/A
First of all, we would like to thank you for your constructive comments.
Real‑World Domains with Intrinsically Non‑Euclidean Labels
Diffusion-Tensor Imaging
At each voxel we observe SPD matrix . The space of SPD matrices carries the affine-invariant Riemannian metric
under which it is a non-positively curved manifold.
A naive Euclidean mean of SPD tensors can leave the SPD cone (produce negative eigenvalues) and blurs crossing fibers. Therefore, considering Fréchet regression on manifolds might be helpful.
Shape Analysis
Landmark-based shapes (e.g., outlines of organs, 2D / 3D facial scans) are equivalence classes under rotation, translation and scaling. Kendall’s shape space is a quotient of the sphere, with positive curvature but well-studied CAT(1) properties when restricted to diameter .
Here, treating point coordinates ignores ignores shape invariance and leads to spurious modes in the mean shape. Thus, modeling how anatomical shape depends on covariates (age, disease) directly on the manifold is reasonable.
Phylogenetic Trees & Evolutionary Distances
Each response is an unrooted tree with edge-lengths (e.g., evolutionary distances between species). Here, averaging tree-distance vectors in ignores topology changes, yielding fractured consensus trees. Linking environmental predictors (e.g., temperature gradients) to shifts in lineage-tree structures via manifold-valued regression is one possible application.
Verifying the CAT(K) property in practice
Many of the spaces we care about in applications are already rigorously known to be CAT(K) in the geometry literature (please see the above examples). If our data domain is one of the above (or a known quotient of them), we can simply see the standard references.
On a smooth manifold , if we can show all sectional curvatures satisfy
then by the classical Alexandrov theorem it is CAT(K).
Here, if we don’t known our space analytically, we can always test the CAT(K) condition on sampled triples :
- Compute the three pairwise distances , , .
- Build the unique comparison triangle in with the same side lengths.
- Sample points , at a feq fractional positions, and check , for a small numerical tolerance .
- Repeat over many random triangles, and if the vast majority satisfy it, we are empirically almost CAT(K).
Motivation of Section 3.4
Recall that the expansion in Section 3.4 is
Why it is useful
- Second-order geometry & rates. In Euclidean regression one often invokes a second-order local curvature (Hessian) of the risk to prove asymptotic normality or to get refined bias/variance expansions. Here, the same idea shows that the strong convexity constant is really the smallest eigenvalue of . That’s how one derives the concentration, and a parametric CLT in future work.
- Algorithmic design. Having a local quadratic model tells us how to choose Newton steps , which might accelerate convergence compared to gradient-descent.
- Uncertainty quantification. In statistical inference on manifolds one often needs a covariance in the tangent space. The inverse Hessian directly gives the asymptotic covariance of .
- Bias corrections and model comparison. when we compare two candidate regression models , local jet expansion gives , and similarly for . We can then do a likelihood-ratio or Wald-type test.
Why does the comparison triangle exist?; How do we pick the corresponding points?
Why does the comparison triangle exist?
We start with a geodesic triangle in the CAT(K) space . By geodesic we mean there are length-minimizing segments , and whose length satisfy the usual triangle inequalities:
In the model space (the simply-connected, constant-curvature surface, any three positive numbers satisfying the triangle inequalities can be realized as the side-lengths of a unique (up to isometry) geodesic triangle.
- Length match: we set , and verify satisfy usual triangle inequalities and .
- Construct in : by existence theorems for geodesic triangles in constant-curvature surfaces, there is a triangle with side lengths . Uniqueness up to global isometry means we can fix one choice of that triangle.
How do we pick the corresponding points?
Once the three model-points are pinned down, they inherit the same labeling as , , . Now,
- Locate on . By hypothesis lies on the geodesic segment from to . Let , so .
- Transfer to the model segment. In , there is a geodesic of length . We define as the point on at distance from . By construction,
and
- Same for . If , let , and set to be the point on the model segment at parameter .
dependence on
Yes, in the definition must depend on the function whose convexity we are measuring. We’ll update the text to make this explicit.
Line 77: Question on
In fact, there is no hiding in Definition 6. When one define the Alexandrov angle one always compare the little geodesic triangle to a Euclidean triangle, not to the constant-curvature model one used elsewhere for CAT(K). That is, all of the model-space curvature we needed in the CAT(K) definition is about controlling distances in large triangles. But when we zoom in to an infinitesimal angle at a point, the Euclidean comparison is the canonical one, angles live in the tangent cone, which is always flat.
Exact reference for background lemmas
Thank you for pointing out. In the revised manuscript we’ll tag each one with an explicit citation.
When the kernel satisfy Assumption 1?
One sets for each fixed ,
where is a bounded, integrable kernel with , often taken compactly supported or with exponential tails and is a bandwidth satisfying and .
We claim that under these conditions, and provided the marginal density of is continuous and strictly positive at the evaluation point , the weights indeed satisfy
for every bounded . Here is a sketch of the proof:
Define
By the classical strong LLNs,
A change of variables yields
and similarly for . As , continuity of the joint density at gives
Since , for large the denominator stays bounded away from zero and we can write
We will add the above discussion in the revised version.
Plugging sliced‑W
You’re right that if we try to estimate a full Wasserstein-2 distance between two -D conditional laws, the best known rates are for and for , so in moderate or high we need just to drive that term below . That exponential dependence in the ambient dimension is exactly the “curse of dimensionality”.
The sliced-W distance replaces the full -D transport by averaging 1-D projections. Because each 1-D W converges at the parametric rate , one can show under mild regularity that sliced-W gives for all . Thus it allows us to avoid blowup.
Everything in the proof on Theorem 4 is entirely first-order in the chosen metric on measures, and they never exploit any special geometry of , exactly same arguments go through verbatim if one replace the full W distance by any order. Thus, as you suggested, that modification cures the curse of dimensionality.
Bandwidth optimization
As you suggested, optimizing to minimize gives the familiar choice . That makes the result immediately transparent.
Thank you so much for your detailed response to my comments/questions. I liked the discussion on verifying the CAT(K) property (I understand it's a separate interesting research question, possibly for future works). My concerns have been mostly addressed.
I'm happy to increase my score, but can you respond here before that, what exact changes are you going to make in your next version? No need to answer my questions, just a short list of the changes here as a summary (those that you promised above). I will update my score after that. Thank you!
Thank you for your response. Here’s a concise checklist of manuscript edits we will make based on the rebuttal:
- Add an applications paragraph (Intro): Insert three concrete, non-Euclidean response domains (SPD/DTI, Kendall shape space, phylogenetic trees) and why Euclidean averaging fails there.
- New discussion on practical guidance for verifying CAT(K) (§2 or Appendix): analytic route via sectional curvature bounds and empirical triangle-comparison test with tolerance .
- Revise §3.4 (motivation + explicit formula): Open with why the local jet expansion is needed (rates/CLT, algorithms, uncertainty), and make curvature explicit in .
- Clarify comparison triangles (§2, angle section): Short proof note on existence/uniqueness under triangle inequalities and perimeter bound, precise rule for choosing corresponding points.
- Notation fixes: State that the Alexandrov angle uses Euclidean comparison (no hidden ), make in -strong convexity explicitly function-dependent.
- Tagging/background (§3.1): Attach explicit citations to each background lemma referenced.
- Assumption 1 (Kernel LLN) & proof sketch (Appendix/§3.2): Add the normalized kernel-ratio SLLN argument with conditions.
- Metrics on measures (discussion after Thm 4): Note that results extend verbatim to sliced-Wasserstein, with rate advantages in high-d.
- Bandwidth optimization (end of §3.3): Add calculation yielding the resulting rate.
Thank you again for your detailed and constructive comments.
Thank you so much for providing the list of updates! I'm happy to increase my score (+2) conditioned on the promised changes.
Also, Reviewer BzDS mentions some criticisms about the novelty of the paper. I suggest that the authors provide a comprehensive response to that. Thanks!
In this submission, the author(s) leverage the comparison geometry to understand the important Fréchet regression problem, which provides a rigorous theoretical analysis and statistical guarantees for Fréchet nonparametric regression, including exponential concentration bounds, convergence rates and insights into Alexandrov angle stability. Some numerical validations are also provided to support the theoretical findings.
优缺点分析
Strengths: The manuscript is well-written and a rigorous theoretical analysis and statistical guarantees for Fréchet nonparametric regression based on the comparison geometry.
Weaknesses: I don't see any obvious weakness in the manuscript.
问题
Good work, I don't have other questions.
局限性
The manuscript considers the exact manifold in the manuscript, which may be impractical. So the noised manifold may be a more practical setup and deserves a further investigation in the future. But this is beyond the scope of the current manuscript. So I brought it here just for the potential considerations in the future for the author(s).
最终评判理由
I keep my positive attitude to the manuscript and hence maintain my score.
格式问题
I think this submission does not have the formatting problem.
We are very happy to receive positive comments from you.
Furthermore, we found the additional analysis you mentioned on noisy manifolds to be a very interesting idea for us. Therefore, we would like to add the following additional discussion to the Appendix (related discussion is included in Appendix C). We believe that discussions on such noisy manifolds are very important from a practical point of view and make this study very solid.
Idea on noisy manifold: -Approximate CAT(K)
We might define the following -approximate CAT(K) space: a geodesic metric space is -approximate CAT(K) if for every geodesic triangle of perimeter and every pair of points , , their distance satisfies
where is the exact comparison triangle in the constant-curvature model.
When , this recovers the usual CAT(K) condition.
Approximate convexity
Let be -approximate CAT(K) with . Fix and . Then, for every geodesic ,
where and C is a small absolute constant.
(Proof Sketch). In the exact model one has , then square and invoke the usual CAT(K) convexity
The extra in every comparison step contributes at most once we expand and bound .
Here, in exact CAT(K), we know
for some . Under the -slack this becomes
so any two minimizers , must satisfy
Hence .
In particular, strict uniqueness fails only up to an window.
Moreover, our concentration proof uses , we now also have an extra term from the approximate convexity, so
All our sub-Gaussian (or sub-exponential) tail arguments stay the same, but in the end we expect to get a high-probability bound
with probability . So we see how the manifold “noise” degrades the resulting estimator.
How small must be?
One we get the bound like the above, to preserve our original our original rates up to constants, we need
If is fixed or grows slowly, this requires to decay about .
In practice, we can see how accurately we need to reconstruct geodesic distances to ensure our Fréchet regression estimates remain in the CAT(K) regime.
My sincere thanks to the author(s) for their detailed discussion on the potential noisy manifolds and I will maintain my score to recommend the paper for publication.
This paper investigates the statistical properties of Fréchet regression in CAT(K) spaces, i.e., metric spaces with an upper curvature bound. The main contributions are non-asymptotic convergence results (Section 3.2) for the kernel Fréchet regression estimator and angle stability results (Section 3.3) for the nonparametric (kernel-based) conditional Fréchet mean estimator.
The reviewers raised several concerns. In particular, Reviewer BzDS pointed out that parts of the paper are unclearly written or potentially misleading. More generally, the scope and motivation of the paper remain limited, and the presentation of the results lacks clarity. While the authors provide some discussion of applications and position their work relative to the literature, the significance and novelty are not sufficiently established.
The AC also found several unclear points in the main text and is not convinced by the current presentation. The experiments are also quite toy-level. Due to the high competition this year, the AC recommends rejection but suggests the authors to improve the motivation, clarify the scope and contributions, strengthen the related work discussion, and refine the theorem statements for a future conference or journal submission.