Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds
摘要
评审与讨论
This paper considers sampling from distributions of the form over spherical manifolds. Under the high-accuracy regime, the authors provide a Riemannian proximal sampler that produces a sample whose law is -close to the target distribution in TV distance within an iteration complexity polylogarithmic in . It is well-known that the major difficulty in manifold Langevin-based sampling is the intractability of simulating manifold Brownian motion, as the kernel generally lacks a closed-form expression. The authors overcome the issue by truncating the heat kernel and show that an ideal proximal sampler achieves polylog complexity with an inexact proximal operator and the truncated manifold Brownian motion. They further instantiate the algorithm using rejection sampling and show that the implementation preserves the same iteration complexity.
优缺点分析
Strengths
Overall, this paper is well-written and easy to follow. It should be the first to show that an ideal algorithm can achieve linear convergence by incorporating the inexact proximal operator on a Riemannian manifold. The authors demonstrate strong mathematical maturity and solid knowledge of both differential geometry and numerical methods. The theoretical analysis appears to be rigorous, and the proofs are generally convincing.
Weaknesses
However, I remain dubious about the novelty and the broader significance of this work. First, the results seem to be a relatively easy and straightforward extension of proximal sampling techniques from the Euclidean setting. The adaption is perhaps expected, and might limit the originality of the results.
A more serious concern arises from a practical perspective. The implementation of the Riemannian proximal sampler relies crucially on rejection sampling to correct the additional bias introduced by the inexact oracles. But, it is folklore that rejection sampling can be computationally intractable in real-world and high-dimensional applications as it might require an exponential number of proposals in dimension. This brings serious issues and may cause the "practical implementations" to be totally infeasible in practice. It might be possible that the inexact method can be made more practical by considering other common techniques like MH used in high-accucary sampling, but the authors do not consider this or offer an explanation. Moreover, it is okay to be without an empirical section for a purely theoretical paper. However, the total lack of experiments is problematic when the practicality of the algorithm is not fully justified by theoretical results alone. Therefore, I would recommend rejection unless these concerns are addressed.
问题
Futher questios are already mentioned in the weakness part.
局限性
The paper is majorly theoretical and does not have negative societal impact.
最终评判理由
I changed my score to positive after the discussion with the authors, as I acknowledge that they have addressed the issue of impracticality to an acceptable extent.
格式问题
No formatting concerns.
We thank the reviewer for taking the time to engage with our work and for their comments. While we respectfully disagree with some aspects of the assessment (main one being ``total lack of experiments''), we appreciate the opportunity to clarify the points of misunderstanding.
Response to weaknesses:
"However, I remain dubious about the novelty and the broader significance of this work. First, the results seem to be a relatively easy and straightforward extension of proximal sampling techniques from the Euclidean setting. The adaption is perhaps expected, and might limit the originality of the results."
- Actually, the extension is not as easy as it looks. The proximal sampling techniques was interpreted as alternatively run heat flow and its time reversal [1], where a key property is that certain "convexity" (e.g., LSI) is preserved along the heat flow. This is easy to prove in the Euclidean space setting, as one can interpret heat flow as convolution with Gaussian, but on a Riemannian manifold, such strategy doesn't work. See our Proposition 31 for a theoretical justification, where heat flow was interpreted as a Markov operator, which is quite different from the Gaussian convolution treatment.
"A more serious concern arises from a practical perspective. The implementation of the Riemannian proximal sampler relies crucially on rejection sampling to correct the additional bias introduced by the inexact oracles. But, it is folklore that rejection sampling can be computationally intractable in real-world and high-dimensional applications as it might require an exponential number of proposals in dimension. "
- In general, a rejection sampling algorithm would result in an exponential number of proposals in dimension. However, this can be avoided. In Euclidean space, the proximal sampler also requires rejection sampling to implement the RGO step, see [1]. But by carefully choosing a step size and constructing a suitable proposal, the number of proposals is in dimension. We did theoretically justify this, see Proposition 49 and Proposition 44. Roughly speaking, we showed that for , the cost for rejection sampling is of constant order, considering the dimension.
"This brings serious issues and may cause the "practical implementations" to be totally infeasible in practice. It might be possible that the inexact method can be made more practical by considering other common techniques like MH used in high-accucary sampling, but the authors do not consider this or offer an explanation. Moreover, it is okay to be without an empirical section for a purely theoretical paper. However, the total lack of experiments is problematic when the practicality of the algorithm is not fully justified by theoretical results alone. Therefore, I would recommend rejection unless these concerns are addressed."
-
Please note that we had provided simulation experiments (for both implementations) in Appendix B, see pages 15 and 16.
-
Furthermore, based on your questions, we tested the practical applicability by sampling from , and the experiment result shows that the (average) number of proposals is not exploded by dimension. We will add this new experimental result to the revised draft as well.
Thank you again for your feedback and questions. We sincerely hope that our response has addressed your concerns and may lead you to reconsider your evaluation. Should our clarifications increase your confidence in the work, we would be truly grateful if you felt it appropriate to adjust the score accordingly.
[1] Yongxin Chen, Sinho Chewi, Adil Salim, and Andre Wibisono. Improved analysis for a proximal algorithm for sampling. In Conference on Learning Theory, pages 2984–3014. PMLR, 2022.
I thank the reviewer for the detailed response. Although I remain dubious about the practility of rejection sampling, I acknowledge that the authors have addressed this issue in a way that is worth a publication. As a result, I change my score to a positive one.
Dear Reviewer m8ed,
Thank you for changing your mind and for recommending the paper for publication and the positive score.
best,
Authors
This article generalizes an existing Euclidean proximal sampler to Riemannian manifolds, where the performance of the sampled, measured by its convergence rate, is established. The proposed sampler is based on two key oracles MBI and RHK, aiming to sampling the exact Brownian motion on manifolds. The practical implementations of the oracles are discussed, and analyzed. Preliminary numerical results, show the effectiveness of the proposed method.
优缺点分析
Strength: The article is mostly well written and easy to follow. The main results in Theorem 6 and 8 are easy to understand, though they seem to be highly technical (and novel) to prove. The nature of the results is of theoretical interest, notably Corollary 10 gives a concert case to show the implication of the main results.
Weakness: The idea of Varadhan’s asymptotics mentioned in Section 6 is not very intuitive. A better explanation is needed (see questions below). Some notations should be unified to better understand the results. Due to the lack of space, it is a pity that all the numerical results are given in Appendix.
问题
-
Why in Theorem 8, there is no distinction between positive and negative curvature situations, as in Theorem 6?
-
Algorithm 2 and 3, when it is possible to find a suitable t, i.e. to ensure the existence of t and C_RHK or C_MBI such that … it is unclear whether C_RHK and C_MBI are negative.
-
The idea of Varadnan asymptotics makes sense when eta -> 0, according to Fact 5. However, the rates of inexact oracles rely on large eta in Assumption 1, therefore it is conceptually contradictory to use Varadnan asymptotics in a large eta regime. A clearer explanation about this is needed to ensure that these results are correct.
-
Clarify the space of g in Definition 4. For all g in which space?
-
The notations of pi^Y|X and pi^X|Y in Algorithm 1 are not consistent with Section 4. Are they the same as pi_eta^Y|X and pi_eta^X|Y in Assumption 1?
-
Clarify the notation tilde O (difference to O) used in several places (e.g. thereom 8).
局限性
yes
最终评判理由
The rebuttal by Authors has clarified all my questions. I maintain my score .
格式问题
no
We thank the reviewer for their thoughtful questions and positive evaluation.
Response to weaknesses:
"Some notations should be unified to better understand the results. Due to the lack of space, it is a pity that all the numerical results are given in Appendix."
- Thanks for the suggestion. We will reorganize and try to move some experimental results to the main draft as much as the space allows.
Response to questions:
"Why in Theorem 8, there is no distinction between positive and negative curvature situations, as in Theorem 6? "
- Thanks for pointing out this. Yes there should be a slight distinction, where in case of negative curvature we still need an curvature dependent upper bound for the step size . But the iteration complexity will remains to be the same. The revised theorem look like this:
- [Theorem 8] Similar to Theorem 6, let be a Riemannian manifold without boundary. Assume Assumption 1 holds. For any initial distribution , to reach total variation distance with oracle accuracy
and step size (for negative curvature, we additionally require ),
- if satisfies -, we need iterations.
- if satisfies -, we need iterations.
"Algorithm 2 and 3, when it is possible to find a suitable t, i.e. to ensure the existence of t and or such that … it is unclear whether and are negative. "
- We analyzed this theoretically in the appendix, please see for example Proposition 39.
"The idea of Varadnan asymptotics makes sense when , according to Fact 5. However, the rates of inexact oracles rely on large eta in Assumption 1, therefore it is conceptually contradictory to use Varadnan asymptotics in a large eta regime. A clearer explanation about this is needed to ensure that these results are correct."
- Yes, intuitively, when is large, Varadhan's approach could lead to a larger error. An error analysis for Varadhan approach is still open. Our theoretical high-accuracy results are for truncation approach. But we are able to experimentally verify the effectiveness of this approach.
"Clarify the space of g in Definition 4. For all g in which space?"
- can be any smooth function on .
"The notations of and in Algorithm 1 are not consistent with Section 4. Are they the same as and in Assumption 1?"
- Yes. See line 170: When there is no ambiguity, we omit the step size ...
"Clarify the notation tilde O (difference to O) used in several places (e.g. thereom 8)."
- Yes, we will clarify the tilde O notation in our paper. In Theorem 8, we used the tilde O notation to emphasize that we are only considering the dependency.
We thank you again for your insightful feedback and questions. If our response above satisfies your concerns, we would greatly appreciate an increase of score to reflect your increased confidence in our work.
Dear Reviewer dci1,
We wanted to check if our response answers your question? If so, we would greatly appreciate an increase of score to reflect your increased confidence in our work. Please let us know if you have any other questions.
Thanks, Authors
The paper proposes a Riemannian extension of the proximal sampler [1, 2]. Instead of the Gaussian Oracle and Restricted Gaussian Oracle used in the standard proximal sampler, the paper introduces the Manifold Brownian Increment (MBI) oracle and the Riemannian Heat Kernel (RHK) oracle (Section 3), and derives convergence guarantees (Section 4). Unlike in the Euclidean case, these oracles may not be exact. Therefore, the paper proposes approximate implementations of these oracles (Section 5), which still guarantee convergence (Section 4).
[1] Yin Tat Lee, Ruoqi Shen, Kevin Tian. Structured Logconcave Sampling with a Restricted Gaussian Oracle. Proceedings of Thirty Fourth Conference on Learning Theory, 2021.
[2] Yongxin Chen, Sinho Chewi, Adil Salim, Andre Wibisono, Improved analysis for a proximal algorithm for sampling. Proceedings of Thirty Fifth Conference on Learning Theory, 2022.
优缺点分析
Strengths
-
To the best of my knowledge, this is the first paper to introduce a Riemannian version of the proximal sampler. In particular, the method works under a more general setting (e.g., less smoothness, LSI) than previous Riemannian samplers [3]. The fact that the proposed method does not require -smoothness of the potential is favorable, especially since verifying such a condition on Riemannian manifolds is less straightforward (and is, in fact, conjecturally related to the curvature of the manifold [4]).
-
While the main proof follows a similar structure to the Euclidean case (Theorem 6), extending the canonical assumptions in the Euclidean space to the Riemannian setting appears to be nontrivial and involve additional technical challenges (e.g., Proposition 31).
-
The fact that the proposed approximate algorithm satisfies the required assumptions (Section 5.2) and still guarantees convergence is a favorable result.
Weaknesses
-
Minor comment: In Section 2.1, Line 133, it would be helpful to specify the metric being used. Assigning a different metric (e.g., the Bures–Wasserstein metric) on the same space can result in a positively curved geometry.
-
The analysis relies on the LSI condition of the distribution on the manifold. However, it is unclear how strict or mild this condition is in the context of Riemannian manifolds. It would have been helpful to illustrate how strict or mild this assumption is by providing examples or criteria that can build the intuition (see also the below question section).
[3] Xiang Cheng, Jingzhao Zhang, Suvrit Sra. Efficient Sampling on Riemannian Manifolds via Langevin MCMC. Proceedings of the 36th International Conference on Neural Information Processing System, 2022.
[4] Foivos Alimisis, Antonio Orvieto, Gary Becigneul, Aurelien Lucchi. A Continuous-time Perspective for Modeling Acceleration in Riemannian Optimization. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. 2020.
问题
-
I am not sure whether Lines 601–602 are correct. Does the function satisfy Assumption 3.3 of [5]? At a glance, appears to be possibly negative. If so, Theorem 3.4 of [5] would not apply. It is unclear to me whether actually satisfies the LSI.
-
Related to the previous question: as mentioned in the paper, there exists no globally supported non-constant strongly log-concave density on compact manifolds. More generally, it is conjectured that the convexity and smoothness of the function are closely related to the curvature of the manifold (see the paragraph before Section 5 in [4]). In this regard, I suspect that the existence of densities satisfying the LSI is not always guaranteed and may require some curvature condition on the manifold. Do the authors have any thoughts on this? Theorems 28 and 31 provide some partial answers, but I wonder whether there are any other known examples.
-
In Theorems 28 or 31, if , then . In this case, Theorem 6 appears to diverge, even if the step size is controlled, since the denominator becomes less than 1. Is this divergence inevitable?
[5] Mufan Bill Li, Murat A. Erdogdu. Riemannian Langevin Algorithm for Solving Semidefinite Programs. Bernoulli 29(4), 2023.
局限性
The authors expressed limitations of the work. I have stated additional limitations that I found in the weaknesses section.
最终评判理由
As discussed in the initial review, this paper has some interesting results.
-
To the best of my knowledge, this is the first paper to introduce a Riemannian version of the proximal sampler. In particular, the method works under a more general setting (e.g., less smoothness, LSI) than previous Riemannian samplers [3]. The fact that the proposed method does not require -smoothness of the potential is favorable, especially since verifying such a condition on Riemannian manifolds is less straightforward (and is, in fact, conjecturally related to the curvature of the manifold [4]).
-
While the main proof follows a similar structure to the Euclidean case (Theorem 6), extending the canonical assumptions in the Euclidean space to the Riemannian setting appears to be nontrivial and involve additional technical challenges (e.g., Proposition 31).
-
The fact that the proposed approximate algorithm satisfies the required assumptions (Section 5.2) and still guarantees convergence is a favorable result.
The main weakness of the paper is the LSI condition on Riemannian manifolds, which is not direct and has to be analyzed case-specific. While there is a criterion, such criterion has a limited usage. I believe this is an open problem needs not be addressed directly in this work and require the further study.
In conclusion, I will keep my initial rating (accept).
格式问题
I did not find any issues with the paper formatting.
We thank the reviewer for their thoughtful questions and positive evaluation.
Response to weaknesses:
"Minor comment: In Section 2.1, Line 133, it would be helpful to specify the metric being used. Assigning a different metric (e.g., the Bures–Wasserstein metric) on the same space can result in a positively curved geometry."
- Thanks for the comment, we will specify the metric. We are using the affine invariant metric.
"The analysis relies on the LSI condition of the distribution on the manifold. However, it is unclear how strict or mild this condition is in the context of Riemannian manifolds. It would have been helpful to illustrate how strict or mild this assumption is by providing examples or criteria that can build the intuition (see also the below question section)."
- Thanks for the suggestion, we will add more examples to clarify. In general, for convex potentials, the celebrated Bakry-Emery condition provides general results for a density to satisfy LSI based on the spectrum of the Hessian of the potential and the Ricci curvature of the manifold. The question is wide open in the case when the potential is not convex.
- For the Euclidean setting, recent works like [2, 3, 4] for some recent works of translating LSI/PI conditions on to conditions on like the Polyak-Lojasiewicz condition. We believe that similar conditions could be derived in the manifold setting, although there might be more sophisticated technical challenges to handle. We also emphasize that even in the Euclidean setting this question is not fully resolved.
Response to questions:
"I am not sure whether Lines 601–602 are correct..."
- Let . Clearly is the minimizer, and is the maximizer. Note that they are the only two critical points. To compute the Riemannian Hessian, we construct a smooth extension of . Let be a vector field defined on . The Euclidean directional derivative of is
$
DG(x)[u] = u x^{T} \kappa \mu + x u^{T} \kappa \mu
= \kappa (x^{T} \mu) u + \kappa (u^{T} \mu) x
$
Hence when , we have . Recall that the only critical point that is not a global minimum is . (Actually, it is the maximizer.) We can get (for all unit tangent vector ), the Riemannian Hessian satisfies . We see that the Hessian has negative eigenvalue, and hence is positive.
"Related to the previous question: as mentioned in the paper, there exists no globally supported non-constant strongly log-concave density on compact manifolds..."
- Indeed, as mentioned before, this is related to the curvature-dimension conditions (e.g., Bakry-Emery), which is discussed for example in [1](Proposition 5.7.1). For convex potentials, the LSI is satisfied baesd on the Ricci curvature of the manifold. More generally, it is open with recent explorations in the Euclidean setting, as mentioned previously.
"In Theorems 28 or 31, if , then ..."
- In our notation, . When , we have , hence . For example, if , we have .
We thank you again for your intriguing questions and careful evaluation.
[1] Dominique Bakry, Ivan Gentil, and Michel Ledoux. Analysis and geometry of Markov diffusion operators, volume 103. Springer, 2014.
[2] Sinho Chewi and Austin J Stromme. The ballistic limit of the log-sobolev constant equals the Polyak–Łojasiewicz constant. arXiv preprint arXiv:2411.11415, 2024.
[3] Yun Gong, Niao He, and Zebang Shen. Poincare inequality for local log-Polyak-Łojasiewicz measures: Non-asymptotic analysis in low-temperature regime. arXiv preprint arXiv:2501.00429,2024.
[4] August Y Chen and Karthik Sridharan. Optimization, isoperimetric inequalities, and sampling via Lyapunov potentials. arXiv preprint arXiv:2410.02979, 2024.
I truly appreciate the authors response.
I agree Bakry-Emery condition provides some examples when the potential is convex. But my question was more like since there is no globally non-trivial continuous convex function on complete Riemannian manifold with finite volume [1], considering the convex potential on Riemannian manifolds may be quite restrictive. So I was wonder one can have an example when the potential is non-convex.
However, I understand figuring out LSI condition in case of the non-convex potential is an open and nontrivial problem. Also, since there are many Riemannian manifolds with infinite volume (in fact, on the reference I cited, there is a phrase "complete non-compact manifold with non-negative curvature has infinite volume" while I could not find the exact result), I think Bakry-Emery condition would be a good criteria to clarify the meaning of the condition.
One (optional) suggestion might be introduce Bakry-Emery, and then state the result I cited ("this condition may not serve as a good criterion for certain situations, ..."), and then say "for infinite volume manifolds Bakry-Emery can serve as a good criterion...".
Overall, I am happy to keep my positive score.
[1] Shing-Tung Yau, Non-existence of continuous convex functions on certain Riemannian manifolds, Math. Ann. 1974.
Thanks you for getting back.
We fully agree with you on this point and thanks for the pointer to [1]. We will incorporate your suggestion in our revision.
High-accuracy sampling on manifold is proposed by extending the work in Lee et al. [2021]. This algorithm is firstly discussed assuming the exact oracles are available, and then discussed under the inexact case.
优缺点分析
Strength: The paper provides a high accuracy sampler on manifold, which is the first attempt towards this direction. The convergence of the sampler is firstly established under the exact MBI and RHK oracles, and then discussed based on the inexact oracles via heat kernel truncation.
Weaknesses:
- LSI somehow corresponds the KL condition in optimization. In optimization, there is a well-established framework for the convergence analysis of the algorithms under the KL condition, even for Riemannian optimization. Similarly, after assuming LSI, it seems the analysis follows a well-established route similar to the Euclidean case, even though there is a key property that needs to be carefully checked.
- The paper also considers analysis under the inexact oracle via kernel truncation, it seems there are no experiments to validate the effectiveness of this scheme. Overall, despite the theoretically high accurate result, I guess how to do the sampling efficiently to satisfy the two oracles approximately is still very challenging for the manifold case.
问题
- What exactly in Line 292 refers to?
- In Proposition~9, the truncation level is given in order for the inexact sample to achieve certain accuracy. I am wondering whether this should also relies on how fast the reject sampling converges?
- In Line 144, what does the ``minimal solution'' exactly mean?
- Overall, what properties should satisfy in order to satisfy the LSI and PI properties?
- In Line 584, should it be the Riemannian Langevin Monte Carlo Algorithm?
局限性
yes
最终评判理由
Extension from the Euclidean space to manifold is novel, and future work on further improving the practicality of the algorithm is interesting. I will maintain the score.
格式问题
none
We thank the reviewer for their thoughtful questions and positive evaluation.
Response to weaknesses:
"LSI somehow corresponds the KL condition in optimization..."
- We would like to clarify that the extension is more involved than it might appear on surface. The proximal sampling algorithm was interpreted alternatively as running heat flow and its time reversal in [1], where a key property is that certain "convexity" (e.g., LSI) is preserved along the heat flow. This is easy to prove in the Euclidean space setting, as one can interpret heat flow as convolution with Gaussian. But on a Riemannian manifold, such a strategy doesn’t work. Please refer to our Proposition 31 for a theoretical justification, where heat flow was interpreted as a Markov operator, which is quite different from the Gaussian convolution treatment. It appears that this subtle yet important point may not have been fully considered in your assessment.
- Also, as opposed to the Euclidean setting, due to the lack of closed-form solution for the heat kernel, both oracles are to be approximated to make the method practical. In our work, we provide high-accuracy rates under this simultaneous inexact oracles setup, which is also different (and relatively non-trivial) from the Euclidean setup.
"The paper also considers analysis under the inexact oracle via kernel truncation..."
- Thanks for the question. Actually, we did provide experiments for both practical approaches; the results were presented in Appendix B, page 16. The first two examples are for hyperspheres, where we implemented the truncation approach. The third one is for the manifold of positive definite matrices, and we implemented the Varadhan approach. Experimental results suggest that our proximal sampler (using both approaches) achieves a higher accuracy compared with the Riemannian Langevin Monte Carlo method.
Response to questions:
"What exactly in Line 292 refers to? "
- denotes the geodesic distance on Riemannian manifold. The notation was mentioned in line 81-82. Informally speaking, consider , like our earth. The geodesic distance between two cities is the length of a path that people need to travel. Here the path is "curved" (because the earth itself is curved), but not a Euclidean straight line.
"In Proposition 9, the truncation level is given in order for the inexact sample to achieve certain accuracy. I am wondering whether this should also relies on how fast the reject sampling converges?"
- Yes, in Proposition 9 we mainly consider the truncation level needed for a desired accuracy. On the other hand, the cost for rejection sampling (i.e., how fast rejection sampling converges), is an open problem in general. But we are able to give an theoretical analysis for the cost of rejection sampling when , see Remarks at line 335 and Corollary 10.
"In Line 144, what does the "minimal solution'' exactly mean? "
- Roughly speaking, if there is another function that also solves the heat equation, then it must be greater than or equal to the heat kernel. See [2](Theorem 4.1.5, Proposition 4.1.6) for more details.
"Overall, what properties should satisfy in order to satisfy the LSI and PI properties? "
- Thanks for the intriguing question. For the Euclidean setting, recent works translated LSI/PI conditions on to conditions on like the Polyak-Lojasiewicz condition. These works considered the LSI/PI constant for . For example, in the limit, i.e., "low temperature" setting, [3] considered LSI constant and [4] considered PI constant. Another work [5] considered a "Optimizability" condition and analyzed LSI/PI constant for (informally) . We believe that similar conditions could be derived in the manifold setting, although there might be more sophisticated technical challenges to handle (and we plan to consider this in our future work, thanks to your question). We also emphasize that even in the Euclidean setting this question is not fully resolved.
"In Line 584, should it be the Riemannian Langevin Monte Carlo Algorithm? "
- Yes. Sorry for the typo, we will correct it.
We thank you again for your insightful feedback and questions. If our response above satisfies your concerns, we would greatly appreciate an increase of score to reflect your increased confidence in our work.
[1] Yongxin Chen, Sinho Chewi, Adil Salim, and Andre Wibisono. Improved analysis for a proximal algorithm for sampling. In Conference on Learning Theory, pages 2984–3014. PMLR, 2022.
[2] Elton P Hsu. Stochastic analysis on manifolds. Number 38 in Graduate Studies in Mathematics. American Mathematical Soc., 2002.
[3] Sinho Chewi and Austin J Stromme. The ballistic limit of the log-sobolev constant equals the Polyak–Łojasiewicz constant. arXiv preprint arXiv:2411.11415, 2024.
[4] Yun Gong, Niao He, and Zebang Shen. Poincare inequality for local log-Polyak-Łojasiewicz measures: Non-asymptotic analysis in low-temperature regime. arXiv preprint arXiv:2501.00429,2024.
[5] August Y Chen and Karthik Sridharan. Optimization, isoperimetric inequalities, and sampling via Lyapunov potentials. arXiv preprint arXiv:2410.02979, 2024.
Thank you for your response. I am still confusing about one point. Since "the cost for rejection sampling (i.e., how fast rejection sampling converges), is an open problem in general", what does proposition9 mean since Algorithm1 is based on rejection sampling.
Thanks for your clarification question.
Please note that Algorithm 1 is stated assuming exact availability of RHK and MBI oracles. Theorem 6 and 8 provides high-accuracy guarantees for Algorithm 1 under the availability of exact and inexact oracles respectively. For the inexact case, the oracles are required to satisfy Assumption 1. Proposition 9 concerns Algorithms 2 and 3, which is one way to implement the RHK and MBI oracles. Specifically we show that this implementation satisfies the condition in Assumption 1. To show, it is enough to determine the truncation level.
Rejection sampling is a part of the implementation in Algorithm 2 and 3, which are both based on truncated heat kernels. The cost of rejection sampling comes in the number of steps of the for loop in both the algorithms.
If we have the exact heat kernel, and we are on the Euclidean space (for which the exact heat kernel is known), then the cost of rejection sampling can be proved to be [1].
Hypothetically, even if we have the exact heat kernel on a Riemannian manifold, the cost for rejection sampling is actually unknown. This is what we mean in our previous response when we say "the cost for rejection sampling (i.e., how fast rejection sampling converges), is an open problem in general."
Intuitively, one can expect that if we have a highly-accurate evaluation of the heat kernel, the cost of rejection sampling should be the same as that of rejection sampling with the exact heat kernel.
For the case of sphere, we provide the end-to-end result (including the cost of rejection sampling) in Corollary 10. In proving this result, we show that the cost of rejection sampling (even with the inexact heat-kernel based on truncation level as stated in Proposition 9), is similar to the Euclidean case; See Proposition 39 and 44 for details. Furthermore, we note that empirically the cost of rejection sampling for the Positive Semi Definite manifold in Section B.3 is also . Theoretically proving such a result for more general manifolds is an open question.
To summarize, to show Assumption 1 holds, it is enough to set the truncation level of the heat-kernel. The cost for rejection sampling should be independent of the truncation level, as long as the truncation level is providing high-accuracy heat kernel evaluations.
We will clarify this in detail in our revision.
Please let us know if you have any other questions.
[1] Yongxin Chen, Sinho Chewi, Adil Salim, and Andre Wibisono. Improved analysis for a proximal algorithm for sampling. In Conference on Learning Theory, pages 2984–3014. PMLR, 2022.
Dear Reviewer 2wfT,
We wanted to check if our response answers your question? If so, we would greatly appreciate an increase of score to reflect your increased confidence in our work. Please let us know if you have any other questions.
Thanks,
Authors
Thank you for your clarification.
Reviewers unanimously find merit in this work, praising its presentation quality (Reviewer dci1, Reviewer m8ed), technical soundness (Reviewer dci1, Reviewer m8ed), novelty in the sense of the setting being minimally explored (Reviewer 2wfT). The main weakness raised is by Reviewer m8ed, who worries that the theory is too similar to known Euclidean proximal sampling theory. Reviewer m8ed, based on their score, does not view this concern as too significant, and neither do I: non-Euclidean theory developed in other domains such as optimization (for instance of geodesic convexity) is interesting, and has resulted in new phenomena not necessarily present in the Euclidean case, which may well be the case here as the issues become better understood.
On basis of reviewer consensus, I recommend this work is accepted.