Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems
摘要
评审与讨论
This paper benchmarks a variety of SPSA (and SPSA-esque) optimization algorithms on a suite of noiseless quantum machine learning tasks. The authors find that vanilla SPSA can work very well, although hyperparamter tuning is important (for SPSA as well as all optimizers).
优点
- QML papers often go about selecting optimizers blindly, and it is always important to have a more rigorous and empirically informed approach to the problem
- The paper is generally well written and understandable
缺点
- Two sentences start with “beyond” in the second paragraph (a minor point on readability)
- Figure 1 should have the exact solutions as a horizontal line. This makes it easier to understand the scale of the error.
- There are a few citations that this would benefit from using [1-3], some of which include randomised hamiltonians as a benchmark.
- The core of my issue with this paper is that it doesn’t do enough to build on previous work. The results are not new, SPSA as a good non-parameter shift gradient optimizer and the importance of hyperparameter tuning are both mentioned in [2]. Although more verification and more problem evaluation is always good, the results and methods are generally similar to a growing lineage of QML optimization benchmarks.
- SPSA would benefit from more mathematical background and explanation when first presented.
- This paper does not seem geared towards (a decent knowledge of QML is necessary for appreciation of the paper, and it does not make much of an attempt to familiarize people with it. This is not a slight against the paper a priori, just a specific concern with this venue), nor necessarily of great interest to, the ICLR community (which is largely dominated by classical ML researchers).
- Given the known dependence of optimization difficulty of QML systems on circuit width and depth (even for non gradient based optimizers [4]), this paper would benefit from more consideration and analysis of these (since the size and widths seem picked somewhat arbitrarily).
References:
[1] Bonet-Monroig, X., Wang, H., Vermetten, D., Senjean, B., Moussa, C., Bäck, T., ... & O'Brien, T. E. (2023). Performance comparison of optimization methods on variational quantum algorithms. Physical Review A, 107(3), 032407.
[2] Lockwood, O. (2022). An empirical review of optimization techniques for quantum variational circuits. arXiv preprint arXiv:2202.01389.
[3] Lavrijsen, W., Tudor, A., Müller, J., Iancu, C., & De Jong, W. (2020, October). Classical optimizers for noisy intermediate-scale quantum devices. In 2020 IEEE international conference on quantum computing and engineering (QCE) (pp. 267-277). IEEE.
[4] Arrasmith, A., Cerezo, M., Czarnik, P., Cincio, L., & Coles, P. J. (2021). Effect of barren plateaus on gradient-free optimization. Quantum, 5, 558.
问题
- Would the code be available if the publication is accepted?
We agree with all of these reviewer’s criticisms. The easily actionable things (readability issues, explaining SPSA in more detail, adding exact solutions to the plots, extra citations, making it more approachable to the ICLR audience) we are happy to add to this paper for future versions.
Although it’s not necessarily possible to make larger fundamental changes for this submission, we would especially appreciate extra feedback from this reviewer about how they would suggest we further distinguish this sort of study from prior work for future submissions. We attempted to do this by shifting the focus to using more randomness in problem selection and studying only a narrow type of optimizer, but we agree with the criticism that this isn’t necessarily enough to make it significantly novel. Other reviewers also mentioned looking at the scalability as we shift the number of parameters and number of qubits, so would something along these lines be novel enough of an addition? Though as also briefly touched on in the paper, adding extra dimensions like this would make a fair comparison extra difficult as we’d have to hyperparameter tune significantly many more times. So it would probably necessitate some adaptive strategy for choosing hyperparameters and readers would have to accept that such a strategy is good enough for the purposes of a fair comparison.
And yes, code will be provided if the publication is accepted. We are also happy to provide a ZIP of it for additional review beforehand.
I think scalability could make for a very interesting addition. Many traditional blackbox optimizers fail as you increase dimensionality, and this non trivial scaling is often lost on most QML papers, so providing more empirics for this specific area could definitely be helpful. Although BP is a known problem for gradient and gradient free optimizers, scaling in non BP zones or with BP avoid techniques could be an interesting area.
Given the authors feedback and the changes that are possible, I have updated my score to reflect this (3 -> 5).
The paper focuses on benchmarking local zeroth-order optimizers for variational quantum algorithms. The motivation behind this paper is to provide benchmarks and insights for understanding the performance of different optimizers in variational quantum learning problems. In addition to random parameter initialization, the authors also randomize the parameterized circuit/ansatz and objective to create a more diverse and realistic benchmark. By conducting experiments based on randomized tasks, the study provides findings and insights into the behavior of various optimizers.
优点
The paper investigates an important issue in quantum machine learning, exploring the performance of classical optimizers for variational quantum systems. It has done a number of numerical experiments to present the findings and observations.
缺点
- Lacks theoretical analysis and solid results to support the performance claims made for different optimizers.
- The experiments conducted in this study are limited in scope and scale. The small scale of the experiments (few qubits) limits the generalizability of the results and makes it challenging to draw robust conclusions.
- Lacks clarity in presentation. It is not easy to follow and understand the experimental setups (e.g., algorithm, number of qubits, which kind of simulation).
- The tasks considered in this paper are not general enough.
- Does not provide significant conceptual or technical contributions to quantum machine learning.
问题
- Considering the barren plateaus issue in many variational quantum algorithms, do the main results in this paper still hold?
- What is the theory to guarantee that the randomized quantum circuit is random?
- What are the parameterized circuits used in this paper?
- What is the scalable analysis for the problem that the paper focuses on?
伦理问题详情
NA
With respect to barren plateaus, this is a nuanced topic that is a little difficult to connect to empirical results we are currently able to simulate. But the short answer is that once you reach the regime where barren plateaus become a problem, it’s unlikely that any of these optimizers will work at all. This is because the barren plateaus literature basically implies that, in many VQA situations, the loss landscape’s variance is inversely proportional to the dimensionality of the Hilbert space (which is exponential in the number of qubits). So as you increase the number of qubits, you need to sample exponentially more to overcome the noise in your system. So to avoid this scaling you would require either a good initial guess, or a fundamentally different optimization procedure that changes the type of loss landscape. So while it’s certainly possible that scaling up would affect these results in other ways, with respect to specifically barren plateaus I don’t believe it would matter too much. Because once we get into the regime where they become a problem for these type of optimizers, all of them are likely to fail dramatically.
With respect to the randomness in the quantum circuit, I’m not sure what the reviewer means in this question? As explained in the paper (last paragraph section 3), we use a Gaussian for initial parameter initialization. And all random circuits use uniform distributions for placing and choosing gates. (This is explained in the Pennylane documentation for RandomLayers circuit, which we say we use in section 3.1 and 3.2. Though we never explicitly say uniform in the paper, so we are happy to make that edit for clarity.)
For the ansatz we use, as said above please refer to section 3.1 and 3.2. We use the RandomLayers circuit from Pennylane (uniformly randomly placed RX, RY, RZ, and CNOT gates). The only exception is one of the generative modeling experiments where we use the quantum circuit born machine (QCBM). The reference we provide in the paper lays out the construction of this circuit.
We are also not sure what the reviewer is specifically referring to when asking about the scalable analysis? We could certainly study how these results change as we vary the number of qubits, number of circuit parameters, etc. And we’re happy to add more runs that vary these variables if desired. We simply chose not to because we felt the variation we had between each type of run was enough of a sampling to illustrate some interesting results.
I want to thank the authors for their reply. As BP is the major challenge in VQAs, I feel that improving optimization performances with small-scale (e.g., smaller than 20-qubit) problems is less interesting as it tells little about the major problem. Therefore, I will keep my score unchanged.
This paper benchmarks several SPSA-like (i.e., zeroth-order, local) optimizers on a randomized set of parametrized quantum learning problems (including Hamiltonian minimization and generative modeling). In each problem, the loss function is the expected value of a Hamiltonian. The authors focus on the noise-free setting and use the exact expected value in their experiments. Their findings are: (1) these optimizers are sensitive to hyperparameter tuning; (2) more elaborative optimizers are not generally better (the vanilla SPSA performs very well in most cases); (3) accelerated methods usually have a faster convergence rate.
优点
Compared with prior arts, this work focuses on the performance of zeroth-order local minimizes across a series of partially randomized tasks in quantum learning. This methodology provides new insights into translating existing knowledge to new application scenarios. Details of the experiment setting are discussed and they look reasonable to me. The experiment results are illustrated in two ways: (1) convergence/loss plots and (2) box plots showing the statistics of the end-results. Overall, the manuscript is organized and well-written. The technical claims and numerical findings are plausible.
缺点
I feel this work is way too empirical with limited theoretical insights. In Section 5, some numerical findings are interesting (even counter-intuitive). It would be better to motivate and explain a bit more in terms of optimization theory. For example, in the 1D Ising & 2D Heisenberg experiments (and also the two generative models), adamSPSA has a much slower convergence than the vanilla SPSA. This is a bit surprising because ADAM often has faster convergence compared to the vanilla gradient descent. Is this because of the noisy gradient estimation (SPSA uses a zeroth-order oracle to estimate gradient), or a unique behavior only for quantum Hamiltonian minimization problems?
I also find that the 95% confidence interval in Figures 1 & 3 looks very thin. This is also a bit surprising because the optimization landscape for these problems should be highly nonconvex, and SPSA is a stochastic optimization algorithm. One explanation I could imagine is that the initial guesses are chosen in a small neighborhood in the parameter space so the loss curves have significant overlap. Or because the hyperparameter tuning only involves 3 random keys? Can you elaborate a bit more on this? Does this numerical methodology reflect the general landscape of quantum learning?
This work only studies the noiseless setting, while I feel the noisy setting is more relevant. Due to the Heisenberg limit, the expected value of the Hamiltonian can not be computed very accurately on a real quantum computer, unless the number of samples is large. On the other hand, SPSA is a zeroth-order optimization algorithm that is sensitive to random fluctuations in the function value. It would be more relevant to consider a benchmark with a reasonable noise model (or at least, an imperfect loss evaluation).
问题
A few questions can be found in the "Weakness" section. I suggest the authors add some discussions (or experiments) regarding the noise setting. Some previous results suggest that SPSA-like optimizers are still robust in the presence of noise. How are these results connected with the current work? Also, it would be appreciated for the authors to interpret their numerical results in the context of the landscape of these quantum learning problems. In practice, the knowledge of the target learning problem usually sheds light on what optimizers we want to choose.
伦理问题详情
No ethics concerns.
We’ll note that when discussing SPSA vs adamSPSA, talking about the convergence speed isn’t always so clear cut. For all of them except the 1D ising and Randomized Hamiltonian, adamSPSA seems to converge quicker early on but then be overtaken by SPSA at some point. But we agree with the criticism that in general the observed behavior here should be more closely studied.
In regards to the confidence intervals being thin, I believe this is mainly an artifact of us doing 100 runs. So even if each run is wildly different (which you can partly see from the box-plot results of the end result of each run, as they can span a wide range), the mean value at each point is relatively certain. If you re-produce these plots using the standard deviation or min-max values as the error range, they are much wider. And we’re happy to make that change if the reviewers believe these error bars would be more insightful to readers. Though the reviewer also touches on an excellent point here about comparing runs, and that maybe we should be looking at some individual runs and try to find some interesting examples and visualize their loss landscapes rather than only presenting high-level statistics.
With regards to including noise, while we agree that it’s an important consideration, we disagree that it should be included here. This is because there’s a massive amount of different considerations for noise when it comes to the optimization of variational quantum algorithms. You can think about different types of noise from the quantum computer itself, noise as a result of estimating the cost function from multiple shots, noise from choosing which terms in a non-commuting Hamiltonian to prioritize in estimation, etc. Simply put, there’s so many moving pieces, that if you don’t simplify the problem and focus on one aspect, it becomes difficult to get any concrete insight. And while I agree it would be reasonable to include some baseline reasonable noise, I’m not sure it’s possible for such a baseline noise to exist given the massive amount of considerations that go into the noise you could practically observe in any given situation. So I believe a better way to control this factor is to simply have no noise, and accept that these results are a best-case upper-bound of the optimizers performance for any noise model.
That being said, typically other studies have studied noise by just making arbitrary choices for the number of shot-resources and just re-running their experiments to see what happens. This is definitely something we can add if desired by the reviewers. We personally think what would be most useful is to add noise as an extra dimension and continuously vary it to see its effect as it becomes larger. But doing such a thing would be too computationally intensive for us with hyperparameter tuning in our current setting, so we would need to use an adaptive method for tuning hyperparameters instead.
The paper conducts a benchmark of local zeroth-order optimizers in the realm of quantum information, encompassing a range of partially randomized tasks. Through the comparative analysis of seven optimizers, the authors reveal a noteworthy trend: simpler heuristic methods, with SPSA at the forefront, frequently outperform their more intricate counterparts.
优点
The submission conducts a systematic evaluation of local zeroth-order optimizers in quantum information, offering valuable insights that extend beyond specific quantum learning tasks. By conducting a comprehensive benchmark of seven optimizers across partially randomized scenarios, the authors provide practical guidance for using simpler heuristic methods like SPSA, to train variational quantum algorithms. Besides, the submission effectively communicates its motivation, ensuring clarity and comprehension for readers.
缺点
The submission appears to align with the benchmark paper category; however, it lacks key components that are typically expected in such a context. Benchmark papers typically aim to promote scalable, robust, and reproducible research. They often involve well-processed datasets, methods, and accessible results. Many benchmark papers also provide a dedicated website, offering documentation, example scripts, and a public leaderboard. Unfortunately, this submission falls short of meeting these essential criteria, as even the source code remains unreleased.
问题
The authors are encouraged to perform a comprehensive revision of their submission to ensure alignment with the formal benchmark paper format.
We have released code for this paper, but we did not include it in this submission as it includes personally identifying information via the github owner. And I don’t believe there was a submission option for including a code ZIP file, though apologies if we simply missed that. We are happy to provide the code for review if desired, and any published version of this paper will include said code.
With respect to the other criticisms (IE including a website, leaderboards, datasets, etc.), while certainly we agree including such things would always be better than not, given finite resources we elected to not for a few reasons. First, this benchmark doesn’t really have “data”, as it’s mostly just randomly generated optimization problems that can all be easily generated and reproduced from our code. Second, as we mention in this paper, this benchmark (and for that matter, almost any benchmark in quantum machine learning) has flaws, because we are simply not able to simulate at the scale we’d truly care about. So we don’t think that including a website with leaderboards is particularly useful, because this paper mostly serves as a study to get people thinking about different ways to get insight via randomized benchmarking rather than serve as an objective measure of “goodness” which researchers would care deeply about improving their scores on.
This paper studied randomized benchmarking of variational quantum algorithms using zeroth-order optimizers. The main strength is having a systematic evaluation of zeroth-order minimizers across a series of tasks in quantum learning. However, the paper has notable disadvantages: the result is purely empirical without any theoretical explanations or insights, and in terms of randomized benchmarking the paper does not provide enough evidence for scalability, robustness, and reproducibility. There's no supplementary material including the codes, and the appendix is also concise.
为何不给更高分
The reviewers have common opinions that the paper has notable weaknesses: On the one hand, the result is purely empirical without any theoretical explanations or insights. On the other hand, for the goal randomized benchmarking, the paper does not provide enough evidence for scalability, robustness, and reproducibility.
For future versions of this paper, the authors may consider further improvements from these perspectives. The review reports also highlight other minor points for the authors to take into consideration in future revisions.
为何不给更低分
N/A
Reject