PaperHub
4.9
/10
Rejected4 位审稿人
最低1最高4标准差1.1
1
4
3
3
ICML 2025

Bayesian Parameter Shift Rules in Variational Quantum Eigensolvers

OpenReviewPDF
提交: 2025-01-23更新: 2025-06-18
TL;DR

Gradient-based optimizer with Gaussian process derivative estimator performs state-of-the-art in variational quantum eigensolvers.

摘要

关键词
parameter shift rulevariational quantum eigensolverquantum computingconfidence regionGaussian process

评审与讨论

审稿意见
1

This paper studies variational quantum eigensolvers, a task of minimizing the quantum energy with hybrid computation on a quantum computer and a classical computer. The authors propose a Bayesian parameter shift rule, which estimates the gradient of the quantum energy by Gaussian processes. Concretely, noisy function observations $f ^ * (\mathbf{x}) $ are obtained by executing a quantum computer. Then, they use a Gaussian process to predict the gradient $\nabla f ^ *$ by conditioning a Gaussian process on the noisy observations (the function values $f ^ *$ and the gradient $\nabla f ^ *$ again follow a (joint) Gaussian process).

The quantum energy function is minimized by simulating gradient descent by using gradient estimates from the GP. The authors empirically show that the propose gradient estimation method is superior than parameter shift rules and a few other gradient estimation methods in quantum computing.

给作者的问题

None at this moment.

论据与证据

The curves in most figures in this paper are heavily overlapped, and thus it is hard to tell if the proposed method is clearly better. So it would be helpful to increase the number of random seeds/independent runs. On this note, do the shaded regions in the figures represent standard deviations or standard errors?

方法与评估标准

Yes, they make sense. They compare with the conventional parameter shift rule, and a few other baselines based on GPs. The evaluation metrics are the changes in energy and fidelities, which seem to be standard in the literature.

理论论述

I briefly skimmed through the proofs in the appendix. I believe the proofs are correct, but I did not check them line by line since the proofs are mostly tedious calculations. The proofs are essentially calculating the the posterior mean and variance of the gradient GP. This is quite straightforward, as all we need to do is (a) calculating the derivatives of the mean function and the kernel function and (b) plugging them into the GP posterior prediction rules. So I am not sure if they are great enough to be termed as theorems. Though, it appears that it does take some (somewhat tedious) efforts to calculate the derivatives.

Edited: The point of my original comment is that the proofs are basically several pages of simple algebra that calculate the GP posterior mean/variance. Hence, Theorem 3.1 and Theorem 3.2 should be called as lemmas or propositions.

实验设计与分析

Yes, I read through the experiment section. The designs make sense.

补充材料

Yes, I skimmed through the proofs in Appendix C.

与现有文献的关系

Unable to comment on this because I am not familiar with quantum computing.

遗漏的重要参考文献

No.

其他优缺点

  1. The ideas proposed in this paper are not novel. The idea of simulating gradient descent by gradient estimates from GPs has been already proposed before (e.g., Muller et al., 2021).

  2. Also, Muller et al. (2021) have proposed an acquisition function that select queries adaptively for gradient estimation by minimizing posterior uncertainty, which I think should work better then the heuristic method in Section 4.1. Nonetheless, it is interesting to see a comparison between them.

Muller, S., von Rohr, A., & Trimpe, S. (2021). Local policy search with Bayesian optimization. Advances in Neural Information Processing Systems, 34, 20708-20720.

其他意见或建议

I tend to reject this paper because the proposed method is not novel and quite straightforward from the machine learning perspective. Though, this paper might be qualified as a novel application of existing machine learning methods to quantum computing. Though, I am not the best person to comment on how interesting this paper is from the quantum computing perspective, which I hope other reviewers can cover. But even if this is an interesting application in quantum computing, perhaps it is better for the authors to submit it to quantum computing venues.

作者回复

Unfortunately, we found Reviewer gnv9 overlooking most of our main contributions, which were acknowledged by the other reviewers. Reviewer gnv9 gave the following main criticisms for recommending a clear reject:

  1. The proposed method is not novel and quite straightforward from the machine learning perspective.
  2. The proof of the theorems is straightforward.
  3. The acquisition function proposed by Mueller et al. (2021) [1] should work better than the heuristic method in Section 4.1.

We respectfully but strongly disagree with the reviewer on all those points, as detailed below.

  1. The reviewer seems to understand that our main contribution is to propose SGD with the gradient estimated by a GP. This is incorrect. The derivative prediction of GP was already discussed in Rasmussen & Williams (2006), and even [1] did not claim SGD with GP as a novelty. What our paper proposes, as novel methods, are Bayesian PSR and GradCoRe, which are GP-based methods incorporating strong prior knowledge based on the physics of the VQE. We argue that this kind of contribution, i.e., incorporating domain knowledge into existing methods, is standard in the ML community. More specifically, our main contribution is to enhance the naive SGD with the GP gradient estimator, used in [1], by using PSR with the help of the VQE kernel [2]. “Using PSR” means that we observe an even number of equidistant points (symmetric with respect to the target location), which was proven to be optimal in our theory. We argue that this point is sufficiently clear for most ML readers, given that all other reviewers acknowledged our main contributions correctly. Summarizing our main contributions from the ML perspective: we improve GP with PSR, which can be justified only because of the specific functional form (14) of the VQE objective. We theoretically show that our Bayesian PSR is a generalization of PSR (Theorem 3.1), as well as the optimality of the equidistant observations (Theorem 3.2). Based on our theory, we propose GradCoRe as a theoretically grounded method, where the observation points are fixed to the optimal points, and optimize the measurement shots based on the posterior uncertainty.

  2. We find unfair and inappropriate that the reviewer claims that the proof is straightforward without providing any reference where a straightforward proof is given or giving a straightforward proof themselves. We would be happy to learn about the straightforward alternative proof that the reviewer has in mind, considering that our proof required pages of derivation with techniques from linear algebra and Fourier analysis.

  3. We are sure that a naive application of GIBO [1] will not work for VQEs for the following reasons. PSR was proposed as a robust alternative to a naive finite difference estimate, which does not work for VQEs with significant noise. PSR leverages the strong prior information that the VQE objective has the trigonometric functional form (14). This allows us to take the maximum span, i.e., the equidistant observations, for gradient estimation. In contrast, GIBO, without incorporating physics knowledge, is expected to choose points similar to the finite difference estimation with a small span (see Fig.2 in [1]), and it even limits the exploration range only to the neighborhood of the current optimal point. Therefore, we argue that GIBO will not give a comparably accurate gradient estimate to PSR. One could argue that GIBO with VQE kernel (and without the exploration restriction) might in principle choose the optimal equidistant points after several steps. However, even in this case, with the acquisition function of GIBO allowing for evaluating only a single new observation, it is very unlikely that GIBO results in this optimal behavior without a clever physics-informed search algorithm. In contrast, our GradCoRe cleverly chooses the theoretically optimal points based on our Theorem 3.2, and adjusts the number of measurement shots based on the posterior uncertainty prediction. We strongly disagree with the reviewer who calls our method "heuristic,” because we theoretically prove the optimality of its choice of observed points.

The reviewer also criticizes our presentation of experimental results.

The curves in most figures in this paper are heavily overlapped, and thus, it is hard to tell if the proposed method is clearly better.

We agree with the reviewer on this point, and refer the reviewer to our rebuttal to Reviewer gVNx for our answer. In short, we increased the number of trials, as suggested, and applied a statistical test, which proves statistically significant improvement by GradCoRe against the baselines. We thank the reviewer for the suggestion.

Despite the several points of disagreement detailed above, we thank the reviewer for investing time in reviewing our paper.

References

  • [1] Mueller et al., NeurIPS (2021)
  • [2] Nicoli et al., NeurIPS (2023)
审稿人评论

I thank the authors for the response.

What our paper proposes, as novel methods, are Bayesian PSR and GradCoRe, which are GP-based methods incorporating strong prior knowledge based on the physics of the VQE. We argue that this kind of contribution, i.e., incorporating domain knowledge into existing methods, is standard in the ML community. More specifically, our main contribution is to enhance the naive SGD with the GP gradient estimator, used in [1], by using PSR with the help of the VQE kernel [2].

The prior knowledge is entirely encoded by the VQE kernel. But the VQE kernel itself is not a contribution of this paper---it was propsoed by Nicoli et al. (2023). In short, a large fraction of this paper is applying the VQE kernel to the method of Muller et al. (2021), which I still don't think is a significant contribution.

We find unfair and inappropriate that the reviewer claims that the proof is straightforward without providing any reference where a straightforward proof is given or giving a straightforward proof themselves. We would be happy to learn about the straightforward alternative proof that the reviewer has in mind, considering that our proof required pages of derivation with techniques from linear algebra and Fourier analysis.

I understand that calculating these derivatives indeed requires a lot of efforts, and the authors spent pages on them in the appendix. However, the number of pages does not necessarily correlate with the technical depth. Theorem 3.1 and Theorem 3.2 basically calculate the derivatives of the GP. In particular, the evaluation location $X$ is given and the VQE kernel Eq (15) has a closed-form, and thus the only thing left is plugging them into the gradient GP prediction rule. Now analytically inverting the kernel matrix may require some efforts, but I am sure that the particular form of the kernel and the grid evaluation structure can help here. I understand the algebra is messy, but conceptually this is straightforward. I acknowledge that the calculation involves some linear algebra and trigonometry identities. But this is well expected..

One could argue that GIBO with VQE kernel (and without the exploration restriction) might in principle choose the optimal equidistant points after several steps. However, even in this case, with the acquisition function of GIBO allowing for evaluating only a single new observation, it is very unlikely that GIBO results in this optimal behavior without a clever physics-informed search algorithm. In contrast, our GradCoRe cleverly chooses the theoretically optimal points based on our Theorem 3.2, and adjusts the number of measurement shots based on the posterior uncertainty prediction. We strongly disagree with the reviewer who calls our method "heuristic,” because we theoretically prove the optimality of its choice of observed points.

  1. I don't think GIBO only allows evaluating a single new observation at a time. It is very straightforward to optimize the acquisition function over a batch of points to minimize the posterior uncertainty.
  2. I agree that GIBO does not choose the number of evaluation points.
  3. I am not fully convinced that the proposed GradCoRe has any principled advantages against GIBO. I am not sure whether equidistant points are inherently better than algorithmically chosen points by optimizing acquisition functions (not necessarily GIBO), i.e., whether the grid restriction is necessary. But this might involve some quantum computing background, which I hope other reviewers with relevant expertise can provide comments on.
作者评论

1. Contributions

a large fraction of this paper is applying the VQE kernel to the method of Muller et al. (2021)

This is a misunderstanding. Our proposed GradCoRe is not an ML model but optimization strategy. From the ML perspective, we could set SGD and GIBO (Muller et al., 2021) with the VQE kernel (SGD-VQEK and GIBO-VQEK) as naive baseline methods. Our novelty is to enhance SGD-/GIBO-VQEK with knowledge from physics of VQE, i.e., PSR with the observations at x±αx\pm\alpha for arbitrary α\alpha. PSR allows us not to rely on finite difference gradient estimation with a small span α1\alpha\ll1, which suffers from large noise in VQE. Our contributions are 1) integrating this prior knowledge to SGD-/GIBO-VQEK by choosing the points that PSR observes, 2) theoretically relating PSR and GP prediction, 3) theoretically analyzing the optimality of α=π/2\alpha=\pi/2, and 4) using GP’s uncertainty prediction for minimizing the quantum computing costs.  Combining all four contributions, we propose GradCoRe as a novel physic-informed optimization strategy for VQE. One could see our contributions similar to those in many Bayesian optimization papers where the novelty is not in the ML model itself but in the optimization strategy. 

2. Theory

I am sure that the particular form of the kernel and the grid evaluation structure can help here. I understand the algebra is messy, but conceptually this is straightforward.

We see this claim as highly unfair and inappropriate. Our proof indeed relies on "the particular form of the kernel and the grid evaluation structure", but how to leverage them to prove theorems is far from trivial. Would the reviewer argue that any theorem relying solely on them is straightforward to prove? How Reviewer gnv9 assesses the trivialness of results such as those, e.g., Theorem 4.1, in  Fonseca & Petronilho, "Explicit inverse of a tridiagonal k−Toeplitz matrix," Numer. Math. (2005) and some propositions, e.g., Propositions 3.1 and 3.2, cited therein?  These works focus on deriving explicit expressions for the entries of the inverse J1J^{-1} of a matrix—J1J^{-1} is an even simpler closed-form than GP solutions we tackled—and the proof relies on “the periodicity and the tridiagonal Toeplitz structure of JJ”.  Would Reviewer gnv9 claim that mathematical journals such as Numer. Math., Linear Algebra Appl., etc., have inadvertently published those trivial proofs? 

More generally, if reviewers are allowed to criticize theory in this way, one can criticise numerous generalization bound theorems established in ML by claiming that

I am sure that the positive-definiteness of kernels and the convexity of loss functions can help here. I understand the derivation is messy, but conceptually this is straightforward. I acknowledge that the calculation involves some linear algebra and probability bounds.”

Is this subjective (and we think inappropriate) critique, without elaborating how trivial it is to leverage the positive-definiteness and convexity, considered a valid justification for reviewers to reject papers in the ML community? If so, many highly cited theoretical papers would have been rejected. We would respectfully ask Reviewer gnv9 to reconsider the fairness of their claim. We also respectfully request the other reviewers, AC, SAC, and PCs to assess the fairness and validity of Reviewer gnv9’s critique.

3. Advantages of GraCoRe

It is very straightforward to optimize the acquisition function over a batch of points to minimize the posterior uncertainty.

This claim is incorrect. As discussed in Sec.5 in P. Frazier, "A tutorial on Bayesian optimization” (see the comment "Parallel EI (14) and other parallel acquisition functions are more challenging to optimize than their original sequential versions”), acquisition functions are non-convex wrt batch points, and we have to rely on greedy search. We cannot expect that straightforward implementations of GIBO-VQEK can find the optimum observed points that GradCoRe uses (based on Theorem 3.2), even in our smallest Ising (5,3) setting, where 80 points in 40 dimensional space need to be optimized.

I am not fully convinced that the proposed GradCoRe has any principled advantages against GIBO. I am not sure whether equidistant points are inherently better than algorithmically chosen points by optimizing acquisition functions (not necessarily GIBO)

We suspect that this comment comes from Reviewer gnv9’s incorrect belief on the straightforwardness of batch optimization we rebutted above. If it would be really straightforward, GIBO-VQEK should indeed find the points with maximum span α=π/2\alpha=\pi/2, as our Theorem 3.2 states. GradCoRe is superior to GIBO-VQEK because it does not require highly challenging batch minimization of posterior uncertainty, and achieves the same performance as GIBO-VQEK equipped with a highly elaborated oracle non-convex batch optimizer, which does not exist.

审稿意见
4

The authors introduce a Bayesian optimization version of the parameter shift rule optimization method used to optimize the hardware parameters of quantum eigensolvers. Specifically, the problem of optimizing the parameters reduces to finding the ground state energy of a quantum hamiltonian defined by the gates of the quantum computer. The authors’ main theoretical contribution is to show that their algorithm gives the correct mean for the gradient of the, and give a bound on the variance. The authors also provide experiments which show that their Bayesian optimization parameter shift rule algorithm for quantum eigensolvers has lower variance than previous parameter shift rule algorithms.

update after rebuttal

Thank you for the helpful clarifications.

The main novelty and contributions of this work seem to be in the area of quantum computing. Admittedly, I am not very familiar with this area, but to the best of my knowledge there seems to be a clear improvement in theoretical guarantees and empirical results. That being said, I will defer to the other reviewers who are more familiar with quantum mechanics applications for the final evaluation.

给作者的问题

One weakness of the paper is that they do not consider imperfections in quantum hardware. This would be good to discuss in more detail, perhaps in the conclusion.

论据与证据

Yes: The authors’ main theoretical contribution is to show that their algorithm gives the correct mean for the gradient of the, and give a bound on the variance. The authors also provide experiments which show that their Bayesian optimization parameter shift rule algorithm for quantum eigensolvers has lower variance than previous parameter shift rule algorithms.

方法与评估标准

Yes

理论论述

I did not check the proofs in the appendix.

实验设计与分析

The authors also provide experiments which show that their Bayesian optimization parameter shift rule algorithm for quantum eigensolvers has lower variance than previous parameter shift rule algorithms. The experiments seem to be set up well.

补充材料

I did not check the proofs in the appendix.

与现有文献的关系

The comparison to previous works is good.

遗漏的重要参考文献

N/A

其他优缺点

N/A

其他意见或建议

N/A

作者回复

We thank the reviewer for their careful and positive evaluation of our manuscript.

One weakness of the paper is that they do not consider imperfections in quantum hardware. This would be good to discuss in more detail, perhaps in the conclusion.

Following the referee’s suggestion, we will include a discussion in the conclusion about the imperfections of quantum hardware and the noise they induce. In fact, it was investigated how ML approaches can mitigate the hardware noise in recent work [1]. Specifically, Ref. [1] showed that the GP with the VQE kernel is capable of handling hardware noise.

References

审稿意见
3

The paper proposes Bayesian parameter shift rules to estimate the gradients of variational quantum circuits with the presence of measurement shot noises using the minimal number of observations. The optimality of the proposed method is theoretically proven under mild conditions.

With the Bayesian parameter shift rule, an SGD-based optimization framework making use of adaptive observation is proposed to find optimal parameters for variational quantum circuits.

The new framework is validated on benchmarks consisting of 5 qubits Heisenberg and Ising models’ ground energy estimations. It outperforms conventional SGD methods when the number of shots is fixed.

给作者的问题

See weaknesses.

论据与证据

Yes.

方法与评估标准

Yes.

理论论述

No.

实验设计与分析

Yes.

补充材料

No.

与现有文献的关系

It provides a new framework to find optimal parameters of variational quantum circuits, which may save costs for performing a wide range of quantum machine learning tasks.

遗漏的重要参考文献

No in my knowledge.

其他优缺点

Pros:

  1. The paper establishes rigorous formulations of the gradient estimator with Gaussian process prediction with measurement noises. This is very helpful in the evaluation of gradients in applications.

  2. The experiments include many baseline methods and have demonstrated the advantage of the proposed method.

Cons:

  1. Figure 3 and Figure 4 seem messy to read, and it is hard to fully understand the information in the figures.

  2. The experiments lack a few cases with larger sizes (qubit numbers and layers) to demonstrate the generalizability of the proposed framework.

其他意见或建议

No.

作者回复

We thank the reviewer for their valuable comments.

Figure 3 and Figure 4 seem messy to read, and it is hard to fully understand the information in the figures.

In Figs. 3 and 4 (as well as the new results in this anonymized link), the medians and the 25-/75th-percentiles are shown as solid curves and shadows, respectively. We agree with the reviewer that the shadows overlap, and it might not be easy to judge which methods are better in this plot. For this reason, we also show the “Trial Density” plot to the right of each figure, where the difference can be relatively easier to identify. More specifically, we observe, in both energy/fidelity plots, that GradCoRe has a high-density area (with a higher peak) in the low energy/fidelity region compared to the baselines. The overlapping distributions are due to the high dependence of the achieved energy/fidelity on the random initialization—if the initial point is close to the ground state, the problem is easy for all methods. Therefore, good methods should be identified as the ones that achieve low energy/fidelity, even for bad initializations. This is observed as higher peaks in lower energy/fidelity regions of the “Trial Density” plot. Additionally, to empirically prove the superiority of our GradCoRe compared to previous SOTA methods, we performed the Wilcoxon signed rank test for 100 trials. In the table below, we summarize the p-values for the statistical significance that GradCoRe outperforms the baselines, NFTGP, EMICoRe, and SubCoRe. We observe that p < 0.05 in all cases with large margins, rejecting the null hypothesis that there is no difference in performance between GradCoRe and the baselines. This proves that GradCoRe outperforms the baselines in a statistically significant way. We will include these statistical test results in the paper.

𝚫EnergyNFTGPEMICoReSubsCoRe
Ising (5,3)9.68e-054.23e-093.84e-05
Ising (7,5)1.16e-074.02e-107.49e-10
Heisenberg (5,3)8.57e-118.19e-112.29e-12
Heisenberg (7,5)2.37e-165.84e-162.82e-17
𝚫FidelityNFTGPEMICoReSubsCoRe
Ising (5,3)1.80e-021.42e-075.19e-04
Ising (7,5)1.54e-054.06e-092.89e-08
Heisenberg (5,3)5.30e-062.78e-073.16e-10
Heisenberg (7,5)1.31e-022.69e-028.20e-03

The experiments lack a few cases with larger sizes (qubit numbers and layers) to demonstrate the generalizability of the proposed framework.

We thank the reviewer for the suggestion. To further validate our method and its generalizability, we ran additional experiments for the Ising and Heisenberg (see [1]) Hamiltonians with different Qbits and Layers. The results on Ising (Q=5, L=3), Ising (7, 5), Heisenberg (5, 3), and Heisenberg (7, 5) can be seen at this anonymized link. We include Ising (5, 3), which is the setting of the original experiment in the submission, because we now increased the number of trials from 50 to 100 (for all settings). We observe that GradCoRe outperforms the baselines in all cases (see also the statistical test results above), and that the performance gain by GradCoRe is even more substantial in the more complex (Heisenberg) and larger scale (7, 5) settings. We will include those results in the paper.

References

审稿人评论

I thank the authors for the response. My score remains the same.

审稿意见
3

This paper proposes a method, Bayesian PSR, for optimizing Variational Quantum Eigensolvers (VQEs) by introducing a Bayesian variant of the parameter shift rule (PSR). This method integrates Gaussian Processes (GPs) to estimate gradients of the VQE objective with uncertainty information. The authors introduce a concept, Gradient Confident Region (GradCoRe), which helps minimize the number of measurements needed for gradient estimation during optimization.

给作者的问题

NA

论据与证据

The authors present experiments to show that Bayesian PSR and GradCoRe outperform standard optimization methods. However, the manuscript does not provide clear evidence or details regarding how Bayesian PSR performs on larger or more complex quantum systems. A broader range of experiments could strengthen the claim of general applicability.

方法与评估标准

The proposed methods and evaluation criteria seem appropriate for the problem at hand. The evaluation metrics make sense.

理论论述

The paper provides theoretical foundation for Bayesian PSR. The theoretical claims appear sound based on the presented proofs.

实验设计与分析

  1. The experiments use relatively small model sizes, and further validation with larger systems (in terms of qubits and layers) would be helpful to demonstrate the scalability of the approach.

  2. In Figure 4, it is not straightforward to understand how the proposed method outperforms the SOTA model.

补充材料

The manuscript does not provide supplementary material, but there is an appendix that includes additional details about the methods and experiments.

与现有文献的关系

The key contributions of this paper relate well to the broader field of quantum machine learning and optimization. The paper builds on previous works in gradient-based optimization methods for VQEs, particularly those involving PSRs.

遗漏的重要参考文献

NA

其他优缺点

Strengths:

  • The proposed Bayesian PSR seems interesting.

Weaknesses:

  • The manuscript could benefit from more extensive experimentation on larger quantum systems to validate the scalability of the approach.
  • The writing is somewhat rushed in places, and there is a lack of clarity in the explanation of certain experimental setups and the relationship between different optimization strategies.

其他意见或建议

NA

作者回复

We thank the reviewer for their thoughtful questions and comments.

The authors present experiments to show that Bayesian PSR and GradCoRe outperform standard optimization methods. However, the manuscript does not provide clear evidence or details regarding how Bayesian PSR performs on larger or more complex quantum systems. A broader range of experiments could strengthen the claim of general applicability

Thank you for the suggestion. We ran additional experiments with Ising and Heisenberg (see [1]) Hamiltonians with different Qbits and Layers. The results on Ising (Q=5, L=3), Ising (7, 5), Heisenberg (5, 3), and Heisenberg (7, 5) can be seen at this anonymized link. We include Ising (5, 3), which is the setting of the original experiment in the submission, because we now increased the number of trials from 50 to 100 (for all settings). We observe that GradCoRe outperforms the baselines in all cases (see also the statistical test results below), and that the performance gain by GradCoRe is even more substantial in the more complex (Heisenberg) and larger scale (7, 5) settings. We will include those results in the paper.

In Figure 4, it is not straightforward to understand how the proposed method outperforms the SOTA model.

In Fig.4 (as well as in the new results from the anonymized link ), the median and the 25-/75th-percentiles are shown as solid curve and shadows, respectively. We agree with the reviewer that the shadows overlap, and it might not be easy to judge which methods are better in this plot. For this reason, we also show the “Trial Density” plot to the right of each figure, where the difference can be relatively easier to identify. More specifically, we observe, in both energy/fidelity plots, that GradCoRe has a high-density area (with a higher peak) in the low energy/fidelity region compared to the baselines. The overlapping distributions are due to the high dependence of the achieved energy/fidelity on the random initialization—if the initial point is closer to the ground state, the problem is easy for all methods. Therefore, good methods should be identified as the ones that achieve low energy/fidelity, even for bad initializations. This is observed in the form of higher peaks in lower energy/fidelity regions of the “Trial Density” plot. Additionally, to empirically prove the superiority of our GradCoRe compared to previous SOTA methods, we performed the Wilcoxon signed rank test for 100 trials. In the table below, we summarize the p-values for the statistical significance that GradCoRe outperforms the baselines, NFTGP, EMICoRe, and SubCoRe. We observe that p < 0.05 in all cases with large margins, rejecting the null hypothesis that there is no difference in performance between GradCoRe and the baselines. This proves that GradCoRe outperforms the baselines in a statistically significant way. We will include these statistical test results in the paper.

𝚫EnergyNFTGPEMICoReSubsCoRe
Ising (5,3)9.68e-054.23e-093.84e-05
Ising (7,5)1.16e-074.02e-107.49e-10
Heisenberg (5,3)8.57e-118.19e-112.29e-12
Heisenberg (7,5)2.37e-165.84e-162.82e-17
𝚫FidelityNFTGPEMICoReSubsCoRe
Ising (5,3)1.80e-021.42e-075.19e-04
Ising (7,5)1.54e-054.06e-092.89e-08
Heisenberg (5,3)5.30e-062.78e-073.16e-10
Heisenberg (7,5)1.31e-022.69e-028.20e-03

The writing is somewhat rushed in places, and there is a lack of clarity in the explanation of certain experimental setups and the relationship between different optimization strategies.

We have carefully proofread the paper again after the submission. Nevertheless, we would greatly appreciate it if the reviewer would point towards the sections where such writing issues are found so that we can put more focus on those parts and enhance the overall clarity of the paper.

References

最终决定

Consider the zeroth-order optimization problem (10) that arises in the quantum variational eigensolver. This paper proposes the Bayesian parameter-shift rule (PSR), which uses a Gaussian process approach to estimate the gradient of the objective function and enables solving the optimization problem via stochastic gradient descent (SGD). In addition, leveraging the uncertainty information provided by the Bayesian PSR, the paper introduces the notion of a “gradient confidence region (GradCoRe),” aiming to further reduce the number of measurement shots required for gradient estimation. The experimental results demonstrate the superiority of the proposed method in comparison to existing ones.

The work appears to be solid, and its superiority is clearly demonstrated. Nevertheless, as indicated by Reviewer gnv9, the Bayesian PSR is a conceptually straightforward application of existing Gaussian process theory, though the derivation may be non-trivial, while the notion of GradCoRe is natural and heuristic.

My evaluation is that this paper lies on the borderline of acceptance, which aligns with Reviewer WrpS’s opinion in the Reviewer–AC discussion. Moreover, Reviewer 4QQ8, despite giving the highest score, admitted that the score was intended to acknowledge the contribution to quantum computation, but noted that they are not an expert in this area and expressed concern regarding the paper’s significance to the machine learning community. Given the competitiveness of ICML, I suggest rejecting this paper.

The authors are encouraged to highlight the challenges and novelties in developing the method in their revision, or to consider submitting to a publication venue in quantum information.