PaperHub
7.0
/10
Spotlight4 位审稿人
最低7最高7标准差0.0
7
7
7
7
3.8
置信度
正确性3.0
贡献度3.0
表达3.0
NeurIPS 2024

Non-asymptotic Approximation Error Bounds of Parameterized Quantum Circuits

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06

摘要

关键词
quantum machine learningparameterized quantum circuitsexpressivityuniversal approximationapproximation error bound

评审与讨论

审稿意见
7

This is a solid paper for approximating Hölder smooth functions using parameterized Quantum Circuits (PQC). The results show that using PQC for approximation can achieve better results than those in Lu's paper, especially when KK, the length of the local region of Taylor expansion, is not large and the dimension dd is large.

优点

Finding the approximation rate for different structures in deep learning is an important task to understand deep learning. The author presents approximation results for PQC, and I find this result interesting.

缺点

  1. For continuous and Lipschitz continuous functions, the author only establishes the Universal Approximation Theory. Can the author improve the result to find the approximation rate based on [22] cited in the paper?

  2. The author's result is better than Lu's paper since they consider the approximation by LL^\infty-norm. In [22], it is shown that for the LpL^p norm, the coefficient in Lu's paper can also not exponentially depend on dd. In this case, is the result in this paper still better than Lu's paper?

  3. Based on my knowledge, in Lu's paper, they obtain a better rate of KK than the author's paper due to the use of the bit extraction technique, achieved by ReLU FNN, but the parameters in this technique need to be very large. Therefore, can your paper provide the bound of parameters? If so, this would be a significant benefit of your paper.

  4. The structure of the neural network is hard to train since it is not wide but deep when KK is large. Can you make the neural network shallower and wider?

  5. I think some tables and comments in the appendix can be shown on the main page such as Table S1.

问题

Mentioned in the Weaknesses.

局限性

All right.

作者回复

We thank the reviewer for their time and valuable comments. We would like to provide a detailed response to the insightful questions raised by the reviewer.

  1. For continuous and Lipschitz continuous functions, the author only establishes the Universal Approximation Theory. Can the author improve the result to find the approximation rate based on [22] cited in the paper?

Reply 1: In fact, we do prove the approximation rate for Lipschitz continuous functions, for both the Bernstein polynomial approximation and the local Taylor expansion method. For the PQC approximation error bound using Bernstein polynomials, please refer to Theorem 3. For PQC approximation error bound using local Taylor expansion, please refer to Theorem 4. We would like to remind that the class of Hölder smooth functions considered in Theorem 4 includes Hölder continuous functions and Lipschitz continuous functions. Details can be referred to the definition of Hölder smooth functions in Lines 208–214.

  1. The author's result is better than Lu's paper since they consider the approximation by LL^\infty-norm. In [22], it is shown that for the LpL^p-norm, the coefficient in Lu's paper can also not exponentially depend on dd. In this case, is the result in this paper still better than Lu's paper?

Reply 2: We would like to thank the reviewer for the insightful question. We believe that PQC function approximation under the LpL^p norm is certainly worth further investigation, which would require careful calculations and proofs. For now we cannot say whether our PQCs would have better performance than Lu's paper for approximation under the LpL^p-norm, however, we believe that using the technique in [22] could aid in studying PQC approximation performance under the LpL^p norm, and we will certainly pursue this in a later stage.

  1. Based on my knowledge, in Lu's paper, they obtain a better rate of KK than the author's paper due to the use of the bit extraction technique, achieved by ReLU FNN, but the parameters in this technique need to be very large. Therefore, can your paper provide the bound of parameters? If so, this would be a significant benefit of your paper.

Reply 3: Yes, as the reviewer pointed out, Lu's paper achieves a better dependence on KK due to the bit extraction technique, although the parameters could be extremely large. However, in the quantum strategy presented in our work, we do not use the technique of bit extraction but implement a quantum circuit to achieve the Taylor coefficients with parameters ranging between 00 and 2π2\pi since they represent angles for parameterized rotation gates.

  1. The structure of the neural network is hard to train since it is not wide but deep when KK is large. Can you make the neural network shallower and wider?

Reply 4: We thank the reviewer for highlighting this issue. Our PQC structure does not involve a tradeoff between width and depth like deep neural networks, and thus we cannot easily make PQCs shallower and wider. However, there is a fundamental difference between PQC and classical neural networks, i.e., wider PQCs are harder to train from the perspective of gradients because of the Barren Plateaus phenomenon [1]. For classical deep neural networks, the gradient can vanish exponentially with the number of layers (depth), while for quantum neural networks, the gradient decreases exponentially with the number of qubits (width). Therefore, we believe it is unnecessary to make our PQCs shallower and wider from the perspective of gradients.

[1] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven, Barren Plateaus in Quantum Neural Network Training Landscapes, Nat Commun 9, 4812 (2018).

  1. I think some tables and comments in the appendix can be shown on the main page such as Table S1.

Reply 5: Thank you for the suggestion. We will consider moving some of the content from the appendix to the main text in the revised version, as the reviewer suggested.

评论

Thanks for your reply, I will increase my score to 7.

审稿意见
7

The authors explore the power and limitations of parameterized quantum circuits (PQCs). They show that a large class of multivariate polynomials and smooth functions can be efficiently (approximately) represented by PQCs. More importantly, they show that the requirements of such PQCs compare favorably to their classical counterparts (deep ReLu nets, etc.)

优点

The main strength of the paper is that it addresses an important problem of practical and theoretical interest regarding the power and limitations of parameterized quantum circuits. The results are novel, and the highlighted open problems are of great interest.

缺点

See questions below.

问题

Page 1: Maybe you can early on present a figure depicting some PQC (and a deep ReLu net, for comparison)? This could help the reader relate to the subject matter.

Page 2: Can PQCs trivially simulate classical deep learning nets?

Page 3: Should you define mixed states for comparison with the pure states?

Should you emphasize that the chosen basis is complete?

Page 4: How is Equation (1) derived? Equation (2) captures your notion of expressibility, right?

Page 5: Can you please add some more discussion about Theorem 2?

Page 6: Can you please elaborate on Lines 220 -- 221?

Page 7: Please elaborate on the caption of Figure 2.

Line 263: Why is such an ff interesting?

Page 8: Figure 3: Please elaborate on the caption.

Figure 4: What is the moral regarding K? The higher the better? :)

Page 9: Discussion is good. Please add some more future work/next steps.

Please do not wait until the end of the paper to state that your work is (???) the first of its kind :)

What is the relationship of your work to (classical or quantum) computational complexity, and Turing machines? Please provide some discussion about how your work connects to the theory of computation.

局限性

None.

作者回复

We thank the reviewer for recognizing the contributions and novelty of our work! Here we respond to the insightful comments and questions raised by the reviewer.

  1. Page 1: Maybe you can early on present a figure depicting some PQC (and a deep ReLu net, for comparison)? This could help the reader relate to the subject matter.

Reply 1: Thank you for your kind suggestion. We will consider including a figure as suggested in the revised version.

  1. Page 2: Can PQCs trivially simulate classical deep learning nets?

Reply 2: Thanks for this insightful question! We believe that PQCs cannot trivially simulate classical deep neural networks. Using PQCs to simulate a classical neural network requires efficiently encoding classical data and implementing non-linear activation functions. Specifically, costly amplitude encoding techniques, such as QRAM, are necessary to encode classical data. Additionally, while our PQCs can properly approximate the non-linear activation function, applying this PQC element-wise to a pure quantum state is computationally challenging. In this work, we design novel PQCs to approximate functions with non-asymptotic error bounds by implementing various polynomials rather than directly simulating classical deep neural networks.

  1. Page 3: Should you define mixed states for comparison with the pure states? Should you emphasize that the chosen basis is complete?

Reply 3: Since mixed states are not used in our technical discussion, we did not define this concept. We will consider adding the definitions as suggested in the revised version.

  1. Page 4: How is Equation (1) derived? Equation (2) captures your notion of expressibility, right?

Reply 4: Equation (1) defines a PQC with data re-uploading structures as shown in [11], consisting of interleaved data-encoding circuit blocks and trainable circuit blocks. A formal definition and introduction to such data re-uploading PQCs can be found in the Appendix (Sec A.2). Equation (2) encapsulates our notion of expressibility. Formally, expressibility is captured by the class of functions that Equation (2) can approximate by adjusting its trainable parameters.

  1. Page 5: Can you please add some more discussion about Theorem 2?

Reply 5: Theorem 2 essentially characterizes the universal approximation property of PQCs, corresponding to the celebrated universal approximation theorem of classical neural networks. We will add several sentences in line 187: "Theorem 2 serves as the quantum counterpart to the universal approximation theorem of classical neural networks. Moreover, the PQCs in Theorem 2 are explicitly constructed without any impractical assumptions..."

  1. Page 6: Can you please elaborate on Lines 220 -- 221?

Reply 6: To elaborate on Lines 220-221, we will add the following sentence in Line 218: "The motivation of localization is to determine the local point, i.e., x0x_0, in Equation (8). An intuitive configuration is illustrated in Figure 2, where the stars represent the local points. To implement this localization process, a straightforward method is to realize a step function D(x)=kKD'(x)=\frac{k}{K} if x[kK,k+1K]x\in[\frac{k}{K}, \frac{k+1}{K}] for k=0,1,,K1k=0, 1, \dots, K-1. To approximate this discontinuous step function with polynomials, we exclude the Δ\Delta-neighborhood of the discontinuous points and approximate the remaining parts by polynomials."

  1. Page 7: Please elaborate on the caption of Figure 2.

Reply 7: After clarifying the motivation for the localization step, we believe that Figure 2 will be more comprehensible..

  1. Line 263: Why is such an ff interesting?

Reply 8: We emphasize that this work investigates the approximation capability of PQCs from a theoretical aspect, and the purpose of the numerical results is to support the theoretical findings. We select a bivariate polynomial function as the target function because it belongs to the function space studied in this context and is also easier to visualize, as shown in Figure 4. Also, there are situations where, despite the PQC's powerful expressibility, an imperfect optimization algorithm results in poor learning performance. Therefore, to reduce the impact of optimization imperfections on the final error, we choose a relatively simple polynomial function as our target for approximation.

  1. Page 8: Figure 3: Please elaborate on the caption.

Reply 9: We will add the following sentence to the caption of Figure 3: "The single-qubit PQCs trained to approximate the target localization function D(x)D(x) for K=2K=2 and K=10K=10".

  1. Figure 4: What is the moral regarding K? The higher the better? :)

Reply 10: Yes, with increasing KK, the PQC exhibits a smaller approximation error as expected, which is consistent with our theoretical findings. We will add relevant discussion in the revised version.

  1. Page 9: Discussion is good. Please add some more future work/next steps. Please do not wait until the end of the paper to state that your work is (???) the first of its kind :)

Reply 11: Thanks for your comments. We will make corresponding changes in the revised version according to your suggestions.

  1. What is the relationship of your work to (classical or quantum) computational complexity, and Turing machines? Please provide some discussion about how your work connects to the theory of computation.

Reply 12: Thanks for this interesting question! The complexity theory primarily addresses computational problems, whereas our work is centered on approximation problems. Although our results demonstrate that the size of PQCs is smaller than that required by some neural networks when approximating certain types of functions, we cannot draw definitive conclusions about the complexity of classical or quantum computing based on our findings. This is because both neural networks and PQCs are specific learning models that cannot fully characterize the computational power of classical and quantum computers.

评论

Thank you! :)

审稿意见
7

In this work, the authors aim to build a theoretical understanding of parameterized quantum circuits via non-asymptotic approximation error performance analysis. In particular, they demonstrate the advantages of PQCs over classical ones if specific smoothness criteria can be satisfied. The simulation results can further corroborate their theoretical understanding.

优点

  1. The paper provides a different theoretical perspective of understanding PQCs using a non-asymptotic viewpoint in addition to the universal approximation theory.

  2. Their established non-asymptotic PQC approximation error bounds are technically correct

  3. The simulation results can corroborate the theoretical results.

缺点

  1. The authors' theoretical analysis of PQCs relies on the assumption of continuous or smooth target functions, which hinders the theoretical usage from a broader case of non-smooth and non-continuous target functions.

  2. The numerical simulation is less convincing than practical machine learning datasets, so the actual machine learning applications are expected to validate the proposed theory.

  3. the experimental validation could be designed to corroborate the theoretical results, e.g., the target function relying on Holder's discussions on smooth and continuous functions.

问题

  1. how can the authors generalize the proposed theory to those cases if the target function is non-smooth and non-continuous?

  2. What is the most significant advantage of the non-asymptotical analysis on PQCs compared to the universal approximation theory?

  3. Is there a different or new discovery of the PQC's setup if using the authors' new theoretical perspective?

局限性

  1. The proposed theorems rely on continuous or smooth target functions, which hinders their application to broader use cases.

  2. The numerical simulations cannot corroborate the specific smooth and continuous functions.

作者回复

We thank the reviewer for their time and valuable comments. We would like to provide a detailed response to the questions raised by the reviewer.

  1. The authors' theoretical analysis of PQCs relies on the assumption of continuous or smooth target functions, which hinders the theoretical usage from a broader case of non-smooth and non-continuous target functions. How can the authors generalize the proposed theory to those cases if the target function is non-smooth and non-continuous?

Reply 1: First, we would like to point out that continuous functions are typical classes of target functions in the study of learning theory. For example, the most celebrated universal approximation theorems for neural networks involve approximating continuous functions. This is because machine learning data points in practical scenarios are discrete and can often be fitted into continuous functions. Second, the Hölder smooth functions considered in our work constitute a large class of continuous functions, but they are not necessarily infinitely smooth. By the definition in Eq. (7) and the following explanation, the smoothness index of the Hölder class can vary in the range (0,)(0, \infty). From a theoretical perspective, it is certainly of great interest to study the approximation of non-continuous functions, as the reviewer suggested. One possible way to generalize would be to first use continuous functions to approximate non-continuous functions, followed by the analysis in our work. Additionally, in the appendix, we introduce the construction of PQCs that implement multivariate trigonometric polynomials, which can be used to approximate any square-integrable functions (not necessarily continuous). However, to the best of our knowledge, deriving a quantitative error bound for approximating general non-continuous functions is highly non-trivial and thus requires further investigation.

  1. The numerical simulation is less convincing than practical machine learning datasets, so the actual machine learning applications are expected to validate the proposed theory.

Reply 2: We would like to emphasize that the main contribution of our work lies in the theoretical aspect of PQC approximation performance. The numerical simulation in our work is used to validate our theoretical results, which is about using PQC to approximate continuous and smooth functions. For practical machine learning datasets, there is no such target function well defined but only discrete data points, so it is not intuitive to use such "datasets" to numerically validate our theory of "function" approximation. Additionally, the dimensions of real datasets are typically very large, making it impossible to train such a huge PQC model on our computers.

  1. The experimental validation could be designed to corroborate the theoretical results, e.g., the target function relying on Hölder's discussions on smooth and continuous functions.

Reply 3: As we explained above, we designed the numerical experiment to validate our theorem. Thus, we chose a target function from the class of Hölder smooth functions that we study in the theory.

  1. What is the most significant advantage of the non-asymptotical analysis on PQCs compared to the universal approximation theory?

Reply 4: The universal approximation theorem only demonstrates the existence of PQCs that can approximate the target function within some arbitrary error. However, it does not provide the construction of the circuit and its size as a function of the error and, therefore, cannot quantify the approximation performance. The non-asymptotic analysis of PQCs for QML in our work gives the explicit construction of PQCs and quantitative approximation error bound of PQCs, which characterizes the approximation performance of PQCs in terms of circuit size and the number of parameters. More importantly, such non-asymptotic analysis of PQCs makes it possible to compare their performance with that of classical neural networks, potentially revealing quantum advantages.

  1. Is there a different or new discovery of the PQC's setup if using the authors' new theoretical perspective?

Reply 5: Yes, one of the novel contributions of our work is the setup of PQCs using quantum signal processing circuits and linear combination of unitaries circuits, which are first utilized in PQC constructions. Also, in Section 3.3 of PQC approximation for Hölder smooth functions, we propose a new construction called "nested PQC" that feeds the results of the first PQC to the second PQC. This new structure provides better approximation performance than traditional constructions and is therefore worth further study.

Overall, we would like to clarify that our main contribution is the theoretical analysis of PQC approximation performance for a wide class of functions that are typically studied in learning theory and approximation theory. We intend to use a simple toy example in the numerical experiment to validate and better illustrate our theoretical results, which is not our main focus in this work. We thank the reviewer again for pointing out the other possible classes of functions, like non-continuous functions, that are worth further investigation.

评论

Thanks for the authors' responses to my concerns. I raise the score to 7 (accept).

审稿意见
7

This paper studies the expressiveness of parameterized circuits to perform multivariate function approximation. This serves as a quantum counterpart to the theoretical results of classical machine learning, namely the universal approximation theorem. Theoretical results provide bounds on the approximation error of parameterized circuits with bounds on the quantum resources required. Finally, numerical experiments are provided, showing that parameterized quantum circuits are able to approximate multivariate functions with improved accuracy as the number of parameters increases.

优点

  • This paper uses ideas from quantum signal processing to study the expressivity of parameterized quantum circuits, thereby serving as a first step for bridging the gap between quantum algorithms for matrix functions and quantum machine learning. This is a very natural idea, and is better motivated than many other ansatz used in variational quantum algorithms. Generally speaking, I think it is valuable for works in (quantum) machine learning to be guided by results with theoretical guarantees as this typically leads to better algorithm design.
  • The overall paper is well-written and clear to understand.
  • The numerical experiments appear promising and supplement the theoretical results well, although the parameter counts are fairly large even for the small test-cases presented.

缺点

  • The use of ideas from quantum signal processing is a bit of a double-edged sword. While Theorem 3 provides a non-asymptotic error bound, the quantum resources (circuit width and depth) are only asymptotic bounds. Presumably, the resources required for performing the LCU circuit are not very practical for NISQ devices, and it is likely that there are significant improvements to be made.
  • It would be nice to include code for the numerical experiments.

问题

  • In the second paragraph of Sec 3.1, why is the number of parameters s+ds+d, rather than simply ss? If d>sd>s, then only ss variables are relevant.
  • In Sec 3.2, why were Bernstein polynomials chosen over other polynomial approximations, such as truncated Taylor series expansions? Are there other ways, and if so, what are the drawbacks?
  • In Theorem 2, how does nn depend on ϵ\epsilon? I think it would make more sense to express the resource costs in terms of 1/ϵ1/\epsilon.
  • In Theorem 3, why does nn no longer depend on ϵ\epsilon? Does it mean that both ϵ \epsilon and nn could be simultaneously chosen to be arbitrarily small?
  • I am curious about how the PQCs were implemented for the numerical experiments. I assume analytical formulas for the approximating polynomial were used, rather than full circuit implementations with the overhead of LCU, right?

局限性

There are no major negative societal impacts of this work.

作者回复

We thank the reviewer for the appreciation of the paper and inspiring feedback. Here we respond to the insightful comments and questions.

  1. The use of ideas from quantum signal processing is a bit of a double-edged sword. While Theorem 3 provides a non-asymptotic error bound, the quantum resources (circuit width and depth) are only asymptotic bounds. Presumably, the resources required for performing the LCU circuit are not very practical for NISQ devices, and it is likely that there are significant improvements to be made.

Reply 1: For the construction of our PQCs, the error bound and quantum circuit size are typically non-asymptotic, except in the step of decomposing multi-qubit gates into one- and two-qubit gates. Specifically, in the LCU procedure, constructing a larger-scale unitary block that encodes the summation of multiple unitary matrices requires applying multi-qubit control to each unitary matrix. To achieve this on a quantum computer, we utilize techniques from Ref. [1] to decompose the multi-qubit control gate into a linear-depth quantum circuit composed of CNOT gates and single-qubit rotation gates without using any ancilla qubit. The quantum resources are asymptotic in this multi-qubit control gate decomposition. These asymptotic quantum resources merely conceal constant or trivial terms, which do not impact the feasibility of our construction on quantum devices. Furthermore, decomposing multi-qubit gates to one- and two-qubit gates facilitates fair comparisons with classical neural networks in terms of network width and depth.

[1] A. J. da Silva and D. K. Park, Linear-depth quantum circuits for multiqubit controlled gates, Physical Review A 106.4 (2022).

  1. It would be nice to include code for the numerical experiments.

Reply 2: Thanks for asking, actually all the code has been uploaded to the Supplementary Material accompanying this work, along with instructions on how to run the code.

  1. In the second paragraph of Sec 3.1, why is the number of parameters s+ds+d, rather than simply ss? If d>sd>s, then only ss variables are relevant.

Reply 3: Thanks for pointing it out. Here, in Sec 3.1, we do not have any assumption about d>sd>s as we aim to have a general analysis of PQC construction. Quantum signal processing (QSP) indicates that expressing an ss-degree polynomial requires s+1s+1 parameters. Therefore, when s>ds>d, expressing an ss-degree dd-variable polynomial, with each variable having a non-zero degree, necessitates s+ds+d parameters. Conversely, when d>sd>s, at most 2s2s parameters are needed to express an ss-degree dd-variable polynomial. Overall, at most s+ds+d parameters are required to express an ss-degree dd-variable polynomial.

  1. In Sec 3.2, why were Bernstein polynomials chosen over other polynomial approximations, such as truncated Taylor series expansions? Are there other ways, and if so, what are the drawbacks?

Reply 4: In Sec 3.3, we do utilize truncated Taylor expansions for polynomial approximations, as presented in Theorem 4. We selected the Bernstein polynomial due to its ease of implementation and comprehensive theoretical results. Bernstein polynomials are considered "global" polynomials because they approximate functions over an entire interval without focusing on specific local behavior, which facilitates implementation through QSP. However, a drawback of global polynomials is their relatively poor approximation error, as illustrated in Figure 1. In contrast, Taylor series expansions are "local" polynomials that approximate functions based on local information at specific points. The approximation process involves identifying local points for expansion and using these points to construct polynomials. This locality allows for more precise approximations, as illustrated in Figure 1, making them a central approach in classical machine learning approximation theory. Based on these principles, we designed the PQC with a nested architecture. PQCs implementing local Taylor expansions yield better approximation errors, demonstrating an advantage even over near-optimal classical neural networks, as shown in our discussion part.

  1. In Theorem 2, how does nn depend on ϵ\epsilon? I think it would make more sense to express the resource costs in terms of 1/ϵ1/\epsilon.

Reply 5: In Theorem 2, we present a universal approximation theorem (UAT) for PQCs, highlighting the existence of a PQC capable of approximating any continuous function, but we do not explore the relationship between nn and ϵ\epsilon. Similarly, in classical learning theory (as in [13]), the UAT simply guarantees the existence of a neural network for approximating any continuous function within arbitrary error but does not imply the relationship between the network size and approximation error. All non-asymptotic results are detailed in Theorems 3 and 4, which go beyond UAT by providing quantitative error bound in terms of circuit size.

  1. In Theorem 3, why does nn no longer depend on ϵ\epsilon? Does it mean that both ϵ\epsilon and nn could be simultaneously chosen to be arbitrarily small?

Reply 6: In Theorem 3, epsilon does not represent the approximation error. As shown in Equation (6), the approximation error ϵ+d2d2nϵ2\epsilon+d2^d\frac{\ell^2}{n\epsilon^2} is determined by both ϵ\epsilon and nn. Choosing both ϵ\epsilon and nn to be small may result in a large approximation error.

  1. I am curious about how the PQCs were implemented for the numerical experiments. I assume analytical formulas for the approximating polynomial were used, rather than full circuit implementations with the overhead of LCU, right?

Reply 7: Yes, we did not implement the heavy LCU procedure in our experiments. There is no difference between adding monomials together and implementing the linear combination of monomial PQCs when we conduct numerical simulations, so we go for an easier way.

评论

I thank the authors for the detailed response.

最终决定

This submission aims to build a theoretical understanding of parameterized quantum circuits via non-asymptotic approximation error performance analysis. It conducts a comprehensive investigation of the expressiveness of parameterized quantum circuits (PQCs) in the context of multivariate function approximation. Furthermore, they demonstrate PQCs have more favorable expressiveness than their classical counterparts (deep ReLu nets, etc.). The reviewers are all unanimously positive about the submission. We suggest the authors further revise the submission based on the feedback.