PaperHub
5.0
/10
Rejected3 位审稿人
最低3最高6标准差1.4
6
6
3
4.0
置信度
正确性2.7
贡献度2.7
表达2.7
ICLR 2025

Limitations of measure-first protocols in quantum machine learning

OpenReviewPDF
提交: 2024-09-25更新: 2025-02-05
TL;DR

We demonstrate a learning separation in a supervised learning task between quantum models that can coherently process quantum inputs and those restricted to classical representations of them.

摘要

关键词
quantum machine learningmachine learninglearning separation

评审与讨论

审稿意见
6

In the paper entitled "Limitations of Measure-First Protocols in Quantum Machine Learning," the authors constructed a learning task based on quantum phase states, which provides a separation between the full quantum protocol and classical-shadow-based protocols in terms of sample complexity. From my understanding, the construction is fundamentally based on the fact that FBQP/qpolyFBQP/polyFBQP/qpoly\neq FBQP/poly, while the authors claimed that they successfully achieved the worst-to-average case reduction. In summary, this work theoretically studied the differences between two popular quantum machine learning paradigms, which advances our understanding of quantum models to some extent.

优点

  1. This paper clearly demonstrates its main results and the comparison to related work.
  2. The authors rigorously proved the separation of sample and running time by implementing a polynomial reduction to the Hidden Matching problem, which provides new insights in the field of quantum machine learning.

缺点

Here are some weaknesses and questions on this paper.

  1. Is the proved separation just an instance of FBQP/qpolyFBQP/polyFBQP/qpoly \neq FBQP/poly [S. Aaronson et al., 2023], where FBQP/polyFBQP/poly represents classical shadow-based algorithms (including measure multiple states by Bell measurement).
  2. The worst-to-average reduction seems very natural; if my understanding is correct: the classical shadow methods may fail for every xx (as shown in Eq.~7), due to the fact that FBQP/qpolyFBQP/polyFBQP/qpoly \neq FBQP/poly.
  3. The main contribution of this paper relies on the Theorem~2. I know the proof idea is there, but the proof details in the Appendix B is not very easy to follow.

问题

  1. It would be helpful to provide the explicit construction of U_x (also the learned measurement operator).
  2. The problem is quite artificial (alghough Y. Liu et al. Nat. Phy. 2021 is still an artifical construction); it would be perfect if the phase states can be substituted to other more practical quantum state (such as ground state or thermal state of a physical system).
评论

We thank the reviewer for their insightful comments and suggestions. In response, we have clarified the distinction between our work and Aaronson et al. (2023), emphasizing that our separation holds for efficiently preparable quantum states, which is relevant for practical quantum machine learning. We also expanded on the worst-to-average reduction, noting that our result is broader than Aaronson's and requires proof techniques such as Yao's lemma. Regarding the artificial nature of the problem, we have highlighted how our separation could extend to more physically relevant quantum states, like ground states of local Hamiltonians, though this requires new techniques beyond the scope of the current work.

We hope these revisions and clarifications address your concerns and improve the clarity of our arguments. We trust that these changes will help in reconsidering the evaluation of our work. Thank you again for your thoughtful feedback.

评论

-Is the proved separation just an instance of [S. Aaronson et al., 2023], where FBQP/poly represents classical shadow-based algorithms (including measure multiple states by Bell measurement).

We thank the reviewer for this insightful question. While our results are related to those of Aaronson et al. (2023), we would like to emphasize that they are not merely an instance of their work. Aaronson's results rely heavily on the fact that quantum advice states require an exponentially large description. In contrast, a key aspect of our work is that the separation between measurement-first protocols and fully quantum protocols holds even for quantum states that can be efficiently prepared on a quantum device, which admits a polynomially-sized description in the form of the circuit preparing it. This distinction is crucial because, without it, the separation would lack relevance to real-world quantum machine learning applications. With this consideration, we provide compelling evidence that such separations can extend to a broader range of learning tasks, as long as the states involved are sufficiently "rich." Inspired by this question, we have added a brief explanation of this significant difference in the revised version of the manuscript.

Furthermore, to bring our separation closer to practical machine learning scenarios, we employ proof strategies involving Yao's principle to specialize our result to the average case, and we also account for potential noise in the data. We would also like to refer the reviewer to Part 2 of our general response to reviewer RYau for even more details.

- The worst-to-average reduction seems very natural; if my understanding is correct: the classical shadow methods may fail for every x (as shown in Eq.~7), due to the fact that FQBP/qpoly != FBQP/poly

We thank the reviewer for this comment. We would like to note that the result from Aaronson would imply that for every f there is at least one x on which classical methods (e.g., shadows) must fail. Critically, in our average case reduction we prove a broader result: for every x, there is at least a fraction of f’s for which a classical method must fail. We believe this is not straightforward per-se and requires proof-techniques relying on tools such as Yao's lemma.

- The main contribution of this paper relies on the Theorem~2. I know the proof idea is there, but the proof details in the Appendix B is not very easy to follow.

We apologize for any difficulties the reviewer may have encountered in following our proof. If possible, could you kindly highlight any specific points that were unclear, so we can provide further clarification or elaboration? Nonetheless, we would like to emphasize that we believe the claim is genuinely nontrivial and subtle, and as such, it is unlikely that a much more streamlined proof could be provided without sacrificing essential rigor or clarity.

- It would be helpful to provide the explicit construction of U_x (also the learned measurement operator).

We thank the reviewer for this comment. We provided the explicit construction in the revised version of the paper.

- The problem is quite artificial (alghough Y. Liu et al. Nat. Phy. 2021 is still an artifical construction); it would be perfect if the phase states can be substituted to other more practical quantum state (such as ground state or thermal state of a physical system).

We appreciate the reviewer’s insight and would like to clarify that although the proof in our current work is somewhat artificial, we strongly believe that the claim is much more broadly applicable. We conjecture that a formal proof of separation for a more general class of states, such as the ground state of a sufficiently complex local Hamiltonian, is achievable. However, this would require the development of new proof techniques, which goes beyond the scope of our current work.

In this regard, we refer the reviewer to Part 2 of our addressing of the Questions/Weaknesses in the review of Reviewer Mm3W. To briefly reiterate, building on the work of Aaronson et al. (2023), we speculate that phase states could be replaced by the ground states of sufficiently complex local Hamiltonians. This would allow us to extend our separation result between measurement-first protocols and fully quantum protocols to more physically relevant problems, such as predicting the ground state properties of local Hamiltonians. While this is indeed an exciting direction, it would require further investigation and would likely constitute a separate study.

Furthermore, we note that pseudo-random phase states have recently been demonstrated to be realizable as Hamiltonian phase states (see [arXiv:2410.08073]), which are states that can be prepared by time evolution under a sparse Ising Hamiltonian. This brings us closer to the physical realization of such states and reinforces the broader relevance of our separation result.

评论

Thanks to the authors for their response, which addressed my questions and concerns.

评论

We thank the reviewer for the time spent on this review. We are particularly happy that they appreciate the soundness of our proofs and the relevance of our result within the field of quantum machine learning.

审稿意见
6

This paper investigates a fundamental question in quantum machine learning (QML): whether quantum advantages can persist when quantum data is first measured and converted to classical information before processing. The authors establish a formal separation between two types of QML protocols:

  1. "Measure-first" protocols: Those that first measure quantum states using a fixed measurement strategy (though possibly complex and coherent) before processing

  2. "Fully-quantum" protocols: Those that can process quantum states coherently and maintain quantum data throughout the entire learning & deployment process

The main contribution is proving that there exists a learning task involving quantum measurements where measure-first protocols provably require exponentially more data than fully-quantum protocols, even when restricted to efficiently preparable quantum states. This is shown by constructing a specific learning problem based on the Hidden Matching problem and quantum pseudorandom states.

优点

  • The paper addresses a fundamental question about the nature of quantum advantages in machine learning and provides concrete evidence that some quantum tasks inherently require maintaining quantum states throughout processing.

  • The proofs are rigorous and combine multiple techniques from quantum computing, cryptography, and communication complexity.

  • Unlike previous work, the separation holds even for average-case performance (not just worst-case), efficiently preparable quantum states (not just arbitrary states), and scenarios with experimental noise and errors.

缺点

  • While the paper discusses robustness to noise theoretically, no numerical simulations or experimental results are provided to demonstrate the practicality of the proposed protocols.

  • While the separation is proven rigorously, it relies on a somewhat artificial learning task constructed specifically to demonstrate the separation. It would be valuable to understand if similar separations exist for more natural learning problems.

  • The paper focuses on a specific type of quantum learning problem involving Hidden Matching. It remains unclear how broadly these limitations of measure-first protocols apply to other quantum learning scenarios.

问题

Given the theoretical claims regarding noise robustness for the separations established in this work, could the authors add a numerical experiment showcasing the separation under noisy settings? For example, it would be beneficial to simulate your protocols with realistic noise models for near-term quantum devices. It would also be useful to see how the separation between measure-first and fully-quantum protocols changes as noise increases.

The main result (Theorem 1) is stated in terms of an existence statement. Could the author provide a more concrete description regarding the task that lead to the separation between "measurement-first" vs "fully quantum" methods?

Do the authors consider the separation to hold in many natural learning problems that people are actively working on? Could the authors comment on whether the community should consider most problems to be addressable using measurement-first protocols? If not, could the authors comment on what families of problems one should expect fully-quantum protocols to be much more powerful than measurement-first protocols? Providing a few concrete examples of widely-studied quantum machine learning problems where they expect their results might be relevant would also be useful in this context.

评论

We sincerely thank the reviewer for their thoughtful comments and suggestions. In response, we have made several important clarifications that we believe strengthen our work. First, we have provided a more concrete description of the task leading to the separation between "measurement-first" and "fully quantum" methods, clarifying its connection to Theorem 1, and we have made cosmetic revisions to improve the clarity and readability of the presentation. Additionally, we have expanded the discussion regarding the relevance of our results to natural quantum learning problems. We elaborated on cases where fully-quantum protocols are essential, especially in learning problems related to (F)BQP\qpoly, and highlighted a promising avenue for future research: the potential for a separation in learning tasks involving the ground states of sufficiently complex local Hamiltonians. We hope that these clarifications, along with the more detailed conclusion, will give the reviewer a clearer understanding of the broader applicability of our results. With these revisions and clarifications, we hope the value and novelty of our work is now more apparent, and we hope the reviewer might reconsider their evaluation of our paper.

评论

- Do the authors consider the separation to hold in many natural learning problems that people are actively working on? Could the authors comment on whether the community should consider most problems to be addressable using measurement-first protocols? If not, could the authors comment on what families of problems one should expect fully-quantum protocols to be much more powerful than measurement-first protocols? Providing a few concrete examples of widely-studied quantum machine learning problems where they expect their results might be relevant would also be useful in this context.

We thank the reviewer for this insightful question. Indeed, recent years have seen significant results where measurement-first protocols provide surprisingly strong theoretical guarantees in various physically-relevant tasks, such as learning phases of gapped Hamiltonians (e.g., see [1,2]). Our result, however, highlights that there are learning tasks where a fully quantum protocol is essential. In particular, our results indicate that ML problems related to the hard problems in (F)BQP\qpoly are the learning problems where fully-quantum protocols are required.

As a consequence of this, we would like to draw the reviewer’s attention to an interesting connection of our result with more physically relevant tasks. According to an important result by [3], any quantum computation that can be solved with polynomial-sized quantum state advice can also be solved using a different quantum circuit with advice derived from the ground state of a local Hamiltonian. Formally, this implies that any task in BQP\qpoly can be tackled within BQP using a quantum state from the ground state of a local Hamiltonian as advice. Based on this, we speculate that learning problems where the input data is the ground state of a sufficiently complex local Hamiltonian might exhibit a separation between measure-first and fully quantum protocols. Although we recognize the distinctions between relational and decisional problems, and we acknowledge that rigorously proving such learning separations involves significant complexities, we believe a promising direction for exploring the limitations of measure-first protocols in physically relevant tasks would be to investigate the learning properties of local Hamiltonian ground states. This is an intriguing avenue for future research, though it would likely require a dedicated, separate project

We already touched upon this in the previous version of the conclusion, but for completeness, we have expanded the conclusion section in the revised version to provide a more in-depth discussion on this matter.

[1] Huang, Hsin-Yuan, et al. "Provably efficient machine learning for quantum many-body problems." Science 377.6613 (2022).

[2] Huang, Hsin-Yuan, et al. "Quantum advantage in learning from experiments." Science 376.6598 (2022).

[3] Scott Aaronson and Andrew Drucker. "A full characterization of quantum advice". Proceedings of the forty-second ACM symposium on Theory of computing.

评论

- While the paper discusses robustness to noise theoretically, no numerical simulations or experimental results are provided to demonstrate the practicality of the proposed protocols. Given the theoretical claims regarding noise robustness for the separations established in this work, could the authors add a numerical experiment showcasing the separation under noisy settings? For example, it would be beneficial to simulate your protocols with realistic noise models for near-term quantum devices. It would also be useful to see how the separation between measure-first and fully-quantum protocols changes as noise increases.

We appreciate the reviewer’s suggestions for numerical simulations and experimental validation. We would like to mention that we are already collaborating with an experimental team on this front. However, there are significant challenges in conducting such experiments even just in ideal simulations. For instance, demonstrating that the separation holds "for all" MF protocols requires identifying the optimal MF protocol for the given task, which is a complex problem. While we have some ideas on how to address this, it is a substantial undertaking that we believe warrants a separate, dedicated project. Moreover, we would like to clarify that this is a theoretical paper, where we provide rigorous theoretical guarantees on the robustness to noise. While investigating the effects of different noise models on our protocols is certainly an interesting direction, we believe it falls outside the scope of this work. In particular, since our primary focus is on establishing the theoretical separation between measure-first and fully-quantum protocols, rather than on noise analysis itself, we have chosen not to include numerical simulations. Investigating other aspects would present a completely different set of challenges and convey a different message, which is outside the scope of our current work.

- The main result (Theorem 1) is stated in terms of an existence statement. Could the author provide a more concrete description regarding the task that leads to the separation between "measurement-first" vs "fully quantum" methods?

We thank the reviewer for their comment, and we are glad to have the opportunity to improve the presentation of the task that leads to the separation. In the initial manuscript, the task that results in the separation between "measurement-first" and "fully quantum" methods is outlined in Section 3.1, with a high-level overview provided in Section 1.1. To enhance clarity, in the revised version of the paper, we explicitly state that the task referenced in Theorem 1 is the same as the one defined in Section 3.1, and we have made several cosmetic changes to improve readability.

评论

We thank the reviewer for taking the time to provide their feedback. We are pleased that they recognize the relevance of our work within the field and acknowledge our use of diverse proof techniques, which distinguishes our results from previous works.

评论

Dear Reviewer,

The authors have provided their rebuttal to your comments/questions. Given that we are not far from the end of author-reviewer discussions, it will be very helpful if you can take a look at their rebuttal and provide any further comments. Even if you do not have further comments, please also confirm that you have read the rebuttal. Thanks!

Best wishes, AC

评论

I would like to thank the authors for their responses. Their responses have addressed my questions.

审稿意见
3

The authors design a quantum machine learning task that exhibits a sample and time complexity separation between two classes of quantum algorithms acting on input training data consisting of quantum data and classical labels with the goal of producing a classical description of a quantum circuit that can then be deployed on unseen quantum data to produce samples from a distribution related to the training data. The main difference between the two classes of quantum algorithms is that the more powerful class is allowed to process the input quantum-classical training data in "one go", while the weaker class is hamstrung to first turn the quantum data into classical data, one training data-point at a time, without looking at the classical labels before being allowed to process this now-classical data together with the classical labels through a quantum algorithm to produce a classical description of the deployment quantum circuit. At deployment time, the weaker class is then further forced to measure the unseen quantum data before feeding it into the quantum circuit produced at the end of the training phase.

Unsurprisingly, there is a complexity separation between the fully quantum algorithm and the hamstrung quantum algorithm which seems to essentially be a restatement of the intractability of the Hidden Matching Communication problem. Finally, the authors claim to extend the complexity separation to efficiently preparable training data sets by using pseudo-random functions.

优点

Originality: The authors venture to create a quantum machine learning task with average-case complexity separations between measurement-first and fully-quantum quantum algorithms. Quality: The authors seem to know the techniques of the sub-field well (including those related to POVMs, HM problem, Yao's principle, QPRFs, ...), as well as other related work (including classical shadows, shadow tomography, ...) Clarity: The authors do try to provide both diagrammatic, high-level and low-level explanations. Significance: The authors do aim to produce a significant result by hoping to shed light on the role of information loss due to measurement when performing quantum machine learning in the average-case setting with realistic training data.

缺点

Unfortunately, the high aims and strong potential of the "Strengths" section, seem not to be met. I believe this is mainly due to the overly restrictive setting of the weakened class of "measurement-first" quantum algorithms. The most confusing and vague constraint is: "that the measurement strategy cannot depend on the specific target concept". Looking at Definition 5 and 7 for clarity this seems to be the following constraint on the hamstrung class:

Training: Quantum data + classical labels -----[Point-wise measurement]------> M(Quantum data) + classical labels = Classical data + classical labels ----[Quantum Algorithm]----> Classical description of Quantum circuit

Deployment: Quantum data -----[Point-wise measurement]------> M(Quantum data) = Classical data ----[Quantum Algorithm given by circuit above]----> Sample

Therefore it seems that the main difference between the two classes is that the weaker one actually only operates on classical data during both training and deployment while the stronger one is allowed to operate on quantum data. This is a difference between data capacity and not really about a separation between interesting algorithmic classes.

Secondly, there does seem to be quite a few unclear statements or mistakes:

  • Lines 157-159 is unclear
  • Lines 205-210 claims a significant difference between this work and Jerbi et al.'s, but given the above characterization it is not clear that this is the case.
  • Line 229 introduces f', why the prime?
  • \pi_x is not anywhere clearly connected to \Lamda_x
  • z and x are used inconsistently/confusingly in multiple places
  • Eq (3) and Eq (5): the training data should not include x, perhaps g_i(x) is somehow meant (however see below)?
  • Eq (3) and Eq (5): (y, b) ~ \pi_x and not (x,y,b)
  • Line 285: perhaps \Tilde{\pi}x should be \Tilde{\pi}{T_x} since the x dependence is surely only through T_x
  • Eq (8): how can x appear as an unbound function variable on the LHS and as a bound variable on the RHS?
  • Line 370: the use of Aaronson et al. is the very heart of the whole paper, yet this is not reproduced anywhere in the paper
  • Line 434: please remind me what non-uniform means in this context.
  • Eq (9): the notation of dot and unfilled function brackets is not defined/explained.
  • Lines 475-480: is unclear, especially given that there are supposed to be different random functions for each data-point.
  • Eq (11): if g_i(x) is of size n+1, doesn't this change the learning problem in unclear-ways, not least because the dimensions no longer match (2n + 1 != 2n + 2)
  • Eq (29): \pi_x(f) -> \pi_x(f^{(k)})
  • Line 787: is query access -> has query access

问题

  • How can my following perceived weakness from above be remedied: "This is a difference between data and not really about a separation between interesting algorithmic classes."
  • Is my understanding correct that the essence of the separation between the two classes boils down to the inability to compress the unseen quantum data during deployment time? What would happen if the measurement restriction during training were modified to allow the measurements to depend on the labels? Or if we must keep them blind of the labels, joint measurements across all data-points were allowed?
  • Line 289: Why is the "more general" case not studied, when it is arguably much more interesting than the restrictions under consideration?
  • Theorem 5: What are the assumptions of Yao's principle? Are they satisified?
  • Line 810: How can Bob output 1 when y is in R_f(x) without knowledge of f?
评论

We sincerely thank the reviewer for their detailed comments and suggestions, which have greatly helped us to better articulate the significance of our work. In our reply, we have addressed several points of concern and clarified key aspects of our results. Specifically, we have provided a broader context for the "measure first, ask questions later" paradigm to demonstrate the nontrivial nature of our separation results. We have also explained why our reduction is both meaningful and distinct from a mere restatement of known results, particularly in the context of efficiently preparable states and real-world machine learning scenarios.

We hope that the explanations provided in our reply may lead the reviewer to a more favorable opinion of our work as we have clarified that: (i) while the result might initially appear unsurprising, the broader context of the power of MF protocols underscores that establishing such a separation is far from trivial, and (ii) we emphasize that our result is not merely a restatement of the classical intractability of the hidden matching problem but represents a significant contribution that bridges multiple foundational concepts in quantum machine learning. Thank you for your careful consideration, and we hope these clarifications will help in re-evaluating the merit of our work.

评论

Thank you for attempting to address my concerns.

First let me step back and make the following orientating remark: I do acknowledge that the worst-to-average aspect, the analysis of the noisy scenario and the consideration of an efficiently preparable input state takes the HM result into new interesting territory.

I worry that I may be completely misunderstanding the setting. My current understanding hinges on the fact that the learning task is to learn x (which is why I thought it was a typo that x was included in the training data and it is how I understood the necessity of introducing g_i: to make the learning task non-trivial). Once x is learned then U_x and hence \Lambda_x (the POVM) can easily be constructed (a bias that should be afforded to the learning algorithm since the data has well defined structure). I have also understood any particular f to be a random "scrambling" mechanism to hide the fixed, to-be-learned x.

Now, this paper's aim is to bring insight into the field of QML; it is first and foremost an ML paper and should be judged on the quality/non-triviality of the ML task, the naturalness and capacities of the learning algorithms and the manner in which the task teases out performance separations between the algorithms. This is the source of my dissatisfaction with the paper, HM is a communication task that I think has unsatisfactorily been recast as a learning problem. However, I realise I am not clear on what the learning task is. I currently think the task should be to learn x. If it is not this and it is to turn a blind eye to the known structure of the input data (that x is the only actual unknown data, not the construction steps of U; a familiar cryptographic/data-compression lens on ML: computational intractability/cost not obfuscation is the right adversarial/interesting setting) and to pretend we don't how the (y,b)'s were constructed then what is the task? If we are given x but the MF algorithm is not allowed to use it, then what can the MF algorithm do at all? What is the best MF attempt? It would seem, that without the task of learning x there is no learning task. There might as well not be any training stage at all. The best MF attempt would then be to only focus on the deployment stage. It seems the MF algorithm must then learn x and f at deployment? To justify this assessment:

  • The authors themselves state: "Importantly, the classical description cannot be tailored to the specific concept being learned; otherwise, the learning task could be effectively solved at the measurement stage, negating the need for further analysis." This is a great way to state my reservation. My position is that, of course, the classical description should be allowed to be tailored to the specific concept being learned! This is what learning is all about, tailoring responses to inputs. Therefore if I insist that this is a minimum power that should be granted the MF algorithm, then the authors already agree that this "negat[es] the need for further analysis."
  • The left side of Figure 1, itself suggests that the measurement stage (the yellow box) should at least be allowed to simultaneously analyse T_x and \rho^*, which of course does not match the actual restrictions placed on the MF algorithm. The MF algorithm should more accurately be drawn with two disconnected yellow boxes, which shows the artificiality and the classicality of MF's setting.

It is my above understanding (that the training data should not contain x, that Bob does not have access to x and the mistakes in Eq 8, the mismatch of dimensions in Eq 11) that lead me to assign a score of 1 for soundness. However, now it seems that the mistakes and mismatch were typos or not serious and that it was intended all along for the training data to include x? If this is the case, then yes soundness of the math is fine but now I contend that soundness of the learning task is poor.

评论

I have had time to revisit my comment about Line 810 and the author's response stating:

there is no "Bob" present in this context. and we note that the distinguisher is provided with knowledge of both f and x, and can thus efficiently check whether y is in Rf(x)R_f(x). and the distinguisher algorithm itself is not situated within the communication complexity setup of the HM problem.

However, this surely can't be the case because in the paper the author's state?: "In particular, if Eq. 31 does not hold, then one can construct a one-way communication protocol for HM"

In other words, HM is situated in the proof via proof by contradiction. That is, if (30) holds (by assumption which will be shown to produce a contradiction), we check the two cases for (31): if (31) holds then we break PRF and if (31) does not hold we break HM. In particular, if (31) does not hold, we enter the HM setting and use the measure-first protocol to break HM. As the author's state: "Bob perform steps (2) − (3)...[and] step (6)". Bob does not have access to f in HM (I do see that he does have access to x here but only because he has to generate Tx^M, which is unrelated to my concern that x should not be part of the training data in the ML task setting).

In conclusion, I now worry that Theorem 3 is not sound in its current form.

Finally, I have a minor new request for clarification:

Theorem 2 and 3 refer to the untrainability of MF for \epsilon . \delta > c and \epsilon . \delta . p > c. Is this supposed to be (1-\epsilon) . (1-\delta) > c and (1-\epsilon) . (1-\delta) . p_succ > c ?

Please do address my previous and above concerns before the end of the discussion period.

评论

Dear Authors,

Please take a close look at the replies from the Reviewer RYau. In particular, further clarifications were required and please try to respond accordingly. Thanks.

Best wishes, AC

评论

We thank the reviewer for their response and for acknowledging that our techniques take the HM result into new and interesting territory.

While we fully respect the reviewer’s judgment regarding the quality and non-triviality of the ML task, as well as the naturalness/capacities of the learning algorithms and the manner in which the task teases out performance separations between the algorithms, we believe certain details may have been overlooked. We are grateful for the reviewer’s detailed outline of their understanding and for raising specific questions about these aspects, as this allows us to directly address areas where potential misunderstandings may have arisen.

评论

First, allow us to elaborate further on our learning setting and clarify the potential misunderstanding surrounding it. The reviewer correctly notes that the goal is to learn xx) and reproduce the corresponding measurements, with the gi g_i functions ensuring the non-triviality of the task. However, the interpretation of the role of the ff functions as “random scrambling mechanisms” to obscure xx is not accurate. Instead, the ff functions (or, more precisely, their associated phase states) serve as inputs to the learning problem. To provide a clearer analogy: in supervised learning, the data typically consists of (data point, label) pairs. In our setting, each phase state associated with a given ff represents a data point. We apologize for any confusion caused by our notation.

The key challenge in this learning task (i.e., its non-triviality) does not arise from the ff functions scrambling xx, but rather from the fundamental limitations of any measurement strategy to efficiently compress the (pseudorandom) phase states into succinct classical descriptions. Importantly, while these phase states do admit such a succinct classical description (e.g., the circuit that prepares them), which contains sufficient information to approximately and efficiently recover the measurements associated with all Λx\Lambda_x for a subset of ff values (i.e., in the average-case setting). However, our main result demonstrates that no measurement strategy can efficiently compress the phase states into such descriptions.

Our primary goal is to use the HM task as a tool to study the limitations of quantum-to-classical data compression in a machine learning context. What distinguishes our approach from previous results on quantum-to-classical data compression limitations is the incorporation of several key considerations ubiquitous in machine learning settings. Specifically, we extend the analysis to the average-case scenario, analyze the robustness of the task under noise, and consider efficiently preparable input states that have succinct representations but are provably infeasible to identify in polynomial time. These aspects, as the reviewer acknowledged, take the HM problem into new and interesting territory.

In summary, while the gig_i functions ensure the non-triviality of the learning task, the primary barrier lies in the impossibility of efficiently compressing pseudorandom phase states, not in the ff functions obscuring xx. We hope this clarification resolves any misunderstandings and strengthens the reviewer’s confidence in the contributions of our work.

评论

Next, allow us to clarify some misunderstandings regarding the naturalness and capacities of the learning algorithms and the manner in which the task reveals performance separations between them. We hope to address these points and resolve any remaining concerns.

Reviewer: If we are given xx but the MF algorithm is not allowed to use it, then what can the MF algorithm do at all? What is the best MF attempt?

We thank the reviewer for this insightful question, which led us to reflect on the exposition of our result and prompted improvements in the revised version of the manuscript (see Appendix D). To clarify, consider an MF protocol defined by (M,A)(M, A), where AA corresponds to the blue box in Fig. 1 and MM corresponds to the yellow box in Fig. 1. It is important to emphasize that the learning algorithm AA in the MF protocol can utilize xx -- whether xx is provided directly via the data or learned via the gig_i. However, the key restriction lies in the measurement strategy MM, which is responsible for compressing the quantum phase states into a succinct classical description. This MM must remain independent of xx and always perform the same operations for all xx. One might naturally ask whether relaxing this restriction to allow MM to depend on xx would improve performance. However, if MM were allowed to read and consequently have its measurements/steps depend on xx, then it could simulate any fully quantum (FQ) protocol, and no separation between MF and FQ protocols would remain. This restriction is therefore crucial to defining the MF protocol in a meaningful way and preserving the separation between MF and FQ protocols. Next, let us consider two examples that aim to illustrate the potential power and capabilities of MF protocols, as well as precisely how they are permitted to utilize the x$ obtained from the dataset.

APPROACH 1:
In this approach, the measurement strategy MM employs the classical shadow method from [1] to obtain a classical representation of the input quantum states. The learning algorithm AA then uses the data T^x\hat{T}_x to determine xx and outputs a hypothesis hxh_x as follows: hxh_x uses the classical shadow ρ^\hat{\rho}^* (Fig. 1) to estimate the probabilities of outcomes corresponding to Λx\Lambda_x. Based on these estimates, hxh_x generates samples from the estimated probability distribution of Λx\Lambda_x. While classical shadows can, in principle, be used to approximate the required distributions, this would require exponentially many copies of the quantum state due to the exponential precision needed to achieve small total variation distance. Crucially, our main result shows that no improvement to the classical shadow protocol can make it feasible to obtain a succinct classical description of the phase states that is sufficient for the ML task at hand.

APPROACH 2:
In this approach, the measurement strategy MM attempts to identify a polynomially-sized description of the circuit UU that approximately prepares the input phase state (i.e., U00UρU|0\rangle \langle 0|U^\dagger \approx \rho). The learning algorithm AA again uses the data TxT_x to determine xx and outputs a hypothesis hxh_x that works as follows: hxh_x utilizes the circuit description of UU provided by MM to prepare the phase state and then implements the measurement Λx\Lambda_x. However, while pseudorandom phase states can be prepared by polynomial-sized circuits, our results -- using tools from the HM problem and pseudorandom quantum states -- prove that no measurement strategy MM can efficiently infer the circuit that prepares these states. While this protocol might initially appear efficient, our results highlight fundamental obstacles that prevent it from succeeding.

In both approaches, we emphasize a clear delineation of dependencies on xx: the behaviour of MM is entirely independent of xx (e.g., MM consistently runs the classical shadow protocol or attempts to find a preparation circuit, regardless of the specific xx). However, the learning algorithm AA and the hypothesis hxh_x it produces are allowed to depend on xx, as AA learns from the data and obtains xx.

[1] Huang, Hsin-Yuan, Richard Kueng, and John Preskill. "Predicting many properties of a quantum system from very few measurements." Nature Physics 16.10 (2020): 1050-1057.

评论

Finally, allow us to address the justifications provided by the reviewer in their bullet-pointed list:

Reviewer:

  • The authors themselves state: "Importantly, the classical description cannot be tailored to the specific concept being learned; otherwise, the learning task could be effectively solved at the measurement stage, negating the need for further analysis." This is a great way to state my reservation. My position is that, of course, the classical description should be allowed to be tailored to the specific concept being learned! This is what learning is all about, tailoring responses to inputs. Therefore, if I insist that this is a minimum power that should be granted to the MF algorithm, then the authors already agree that this 'negat[es] the need for further analysis.'"

Our response: Recall that an MF protocol consists of two components: MM, which performs measurements and does not depend on xx, and AA, which is a learning algorithm that can depend on xx. As we aimed to illustrate in the examples above, the learning in an MF protocol -- i.e., tailoring its responses to the input -- occurs entirely within the learning algorithm AA, which operates on the classical description of the phase states provided by MM.

If we were to allow MM to also depend on xx, then MM could effectively simulate any fully quantum (FQ) protocol. This would entirely erase the distinction we aim to investigate, as our goal is to explore the inherent separation between MF and FQ protocols. This restriction on MM is not arbitrary but foundational to our study, as it ensures that the measurement-first approach adheres to its defined paradigm and allows for meaningful comparison with fully quantum protocols.

Reviewer:

  • The left side of Figure 1 itself suggests that the measurement stage (the yellow box) should at least be allowed to simultaneously analyze TxT_x and ρ\rho^*, which of course does not match the actual restrictions placed on the MF algorithm. The MF algorithm should more accurately be drawn with two disconnected yellow boxes, which shows the artificiality and the classicality of MF's setting. It is my above understanding (that the training data should not contain xx, that Bob does not have access to xx, and the mistakes in Eq. 8, the mismatch of dimensions in Eq. 11) that lead me to assign a score of 1 for soundness. However, now it seems that the mistakes and mismatches were typos or not serious, and that it was intended all along for the training data to include xx? If this is the case, then yes, the soundness of the math is fine, but now I contend that the soundness of the learning task is poor.

Our response: We greatly appreciate the reviewer’s detailed feedback, as it helped us identify and clarify potential points of confusion. Upon reflection, we agree that Figure 1 in the original manuscript could lead to misinterpretation. To address this, we will revise the figure in a future updated version of the paper to make the protocol clearer, which we hope will resolve the reviewer’s concerns. Allow us to explain the changes we plan to implement.

In the previous version of Figure 1, an arrow was drawn to indicate that TxT_x and ρ\rho^* were inputs to the measurement strategy MM. This was indeed misleading for two reasons:

  1. It could give the impression that TxT_x (training data) and ρ\rho^* (deployment phase data) are simultaneously accessible to MM, whereas they pertain to distinct phases of the protocol.

  2. More importantly, MM in the MF protocol does not receive the labels that allow one to recover xx; it only processes the quantum states ρ\rho from TxT_x. The labels xx, or partial information about xx through the gig_i, are accessed exclusively by the learning algorithm AA.

In the updated figure, we will explicitly highlighted this distinction by clearly separating the roles of MM and AA and which of these has access to the labels xx (or partial information about xx through the gig_i).

To further clarify: it was always intended that the training data includes xx (or partial information about it). We hope that this updated explanation, along with the revisions to Figure 1 and the illustrative examples discussed earlier, resolves any misunderstandings and provides a clearer picture of the protocol structure.

评论

We thank the reviewer once again for their detailed feedback and insightful questions, which have allowed us to better articulate the contributions and implications of our work. In particular, we addressed misunderstandings regarding the roles of MM and AA in MF protocols, clarified how learning occurs within the MF framework, and explained why MM must remain independent of xx to maintain a meaningful separation from FQ protocols. We also provided examples of MF strategies to illustrate their capabilities and the challenges they face, highlighting the non-triviality of the learning task and the connection to fundamental limitations in quantum-to-classical data compression. We hope these clarifications resolve the reviewer’s concerns and provide a clearer understanding of our work’s significance. We respectfully ask the reviewer to reconsider their evaluation in light of these points and hope our responses and explanations lead to a higher opinion of our work.

评论

We thank the reviewer for raising their concern, which has given us the opportunity to further clarify the correctness of our proof.

We firmly believe that Theorem 3 is true in its current form and also that the proof is correct, barring one potential misunderstanding that we clarify next. We acknowledge that this misunderstanding may have arisen due to our phrasing used in the proof and we clarify this below and in the updated version of the manuscript. Specifically, we currently write:

"Bob performs steps (2)−(3)...[and] step (6)."

A more accurate phrasing would be:

"Bob performs steps (2)−(3)...[and] the first part of step (6)."

To clarify further, Bob only executes the portion of step (6) where they output the sample (x,y,b)(x, y, b). Importantly, Bob does not verify whether xRf(y)x \in R_f(y) for correctness, as such verification is unnecessary for the HM problem.

Allow us to reiterate the key reasoning behind our proof.

Since our proof of non-learnability is by contradiction, we begin by assuming that the learning task is (ϵ,δ,p)(\epsilon, \delta, p)-MF learnable on pseudorandom phase states for (1ϵ)(1δ)p>c(1-\epsilon) \cdot (1-\delta) \cdot p > c, where c>7/8c > 7/8. We then show that this assumption leads to a contradiction.

  1. In the first part of our proof, we demonstrate that under this assumption, the distinguisher algorithm will output 11 with probability >c> c for all pseudorandom phase states.
  2. At this point, there are two possible scenarios:
    • The distinguisher also outputs 11 with probability >c> c over truly random phase states.
    • The distinguisher outputs 11 with probability c\leq c over truly random phase states.
  3. We show that the first scenario leads to a contradiction with the hardness of the HM problem. Specifically, if the distinguisher outputs 11 with probability >c> c for all truly random phase states, we construct a protocol for the HM problem with success probability >7/8> 7/8—a result that is have been proven to be impossible in the literature.

The protocol for the HM problem is as follows: Bob performs steps (2)−(3), Alice performs steps (4)−(5), and then sends ψ^f\hat{\psi}_f to Bob, who executes the first part of step (6) to output the sample (x,y,b)(x, y, b). Now supposing the distinguisher outputs 11 with probability >c> c over truly random phase states, then this implies that the pair (y,b)(y, b) sampled by Bob is correct for the HM problem with probability >7/8> 7/8, which contradicts the hardness of the HM problem.

Thus, we establish that the distinguisher must output 11 with probability c\leq c for all truly random phase states. This implies that the distinguisher behaves strictly differently on pseudorandom phase states versus truly random phase states, violating the pseudorandomness assumption.

We hope this explanation adequately addresses the reviewer’s concerns and clarifies the correctness of the proof. Thank you again for the opportunity to elaborate on this crucial aspect of our work.

The reviewer raises the following concern:

"Theorem 2 and 3 refer to the untrainability of MF for ϵδ>c\epsilon \cdot \delta > c and ϵδp>c\epsilon \cdot \delta \cdot p > c. Is this supposed to be (1ϵ)(1δ)>c(1-\epsilon) \cdot (1-\delta) > c and (1ϵ)(1δ)psucc>c(1-\epsilon) \cdot (1-\delta) \cdot p_{succ} > c?"

We thank the reviewer for pointing this out, as this is indeed the correct formulation of the theorems. Fortunately, this does not affect the validity of the theorems or the correctness of the proofs. We have updated the manuscript to reflect the correct formulation.

评论

Thank you for the detailed incorporation of the feedback.

First let me offer another two minor corrections:

  • pg 16: "leading A_f to most of the time output 0" -> surely it should be 50/50?
  • pg 16: "where they obtain the a sample ..." -> "where they obtain a sample ..."

Right so I think we are getting closer to understanding each other.

  • The learning task is to learn x. So the data point can't just be f (as stated in the response). Perhaps it is helpful to think that the data is (g(x), f) and the label is (y, b). Yes f is part of the data-point but its role IS to scramble x AND IT IS NOT LEARNABLE, since it is a completely random part of the data AND more importantly, a new random f is generated at deployment.
  • As we agree, for the learning to be non-trivial, g cannot be the identity, so the whole work should be presented with a non-trivial g from the outset (the identity case should not even be considered if designing a meaningful ML task is the goal).
  • The real result is that efficient classical-compression of quantum data is not possible. But this is HM and the authors attempts at turning it into a learning problem is not convincing. This can be seen in the two example MF strategies: for the classical shadows example it needs exponential data at deployment (under the version where x is given); for the circuit-learning perspective compression-DECOMPRESSION is the task which again is not a learning task from the traditional supervised ML point of view. It could be seen as a GENERATIVE task, but then, the separation between MF and fully-quantum is due to the difference between polynomial-sized classical (and therefore not actually about MF) and quantum random seed data.

My suggestion going forward is to rewrite this interesting work as an extension of HM (if this has not already been done in the literature) or if the authors really want to connect it to ML then somehow present it as a generative ML task (which now that I think about it is related to the other reviewer's connection to complexity classes with advice).

评论

We thank the reviewer for their reply and for pointing out the two minor corrections.


Reviewer: pg 16: "leading A_f to most of the time output 0" → surely it should be 50/50?


Our response: It should indeed be the case that A_f outputs 0 with probability greater than 1/8. We will clarify this in the updated version of the manuscript.

Allow us to address the other points below.


Reviewer: The learning task is to learn x. So the data point can't just be f (as stated in the response). Perhaps it is helpful to think that the data is (g(x), f) and the label is (y, b). Yes, f is part of the data point, but its role IS to scramble x AND IT IS NOT LEARNABLE, since it is a completely random part of the data AND, more importantly, a new random f is generated at deployment.


Our response: A more accurate description would be that the data can be thought of as f with labels (g(x), y, b), where y and b depend on both f and x.


Reviewer: As we agree, for the learning to be non-trivial, g cannot be the identity, so the whole work should be presented with a non-trivial g from the outset (the identity case should not even be considered if designing a meaningful ML task is the goal).


Our response: To ensure complete transparency and avoid introducing any "artificial" learning elements, we chose to present results in the main text with g as the identity. This approach highlights the fundamental difficulties of the problem without attributing any success to extraneous assumptions about g. That said, we agree that for many real-world machine learning tasks, non-trivial g plays an important role. We address this broader perspective in the appendix of our manuscript.


Reviewer: The real result is that efficient classical compression of quantum data is not possible. But this is HM, and the authors' attempts at turning it into a learning problem are not convincing. This can be seen in the two example MF strategies: for the classical shadows example, it needs exponential data at deployment (under the version where x is given); for the circuit-learning perspective, compression-DECOMPRESSION is the task, which again is not a learning task from the traditional supervised ML point of view. It could be seen as a GENERATIVE task, but then, the separation between MF and fully-quantum is due to the difference between polynomial-sized classical (and therefore not actually about MF) and quantum random seed data.


Our response: At the risk of being repetitive, we emphasize that our primary interest lies in exploring the fundamental limits of "measure-first protocols" within the context of supervised learning—a question that we believe remains meaningful and highly relevant to the QML community. We have demonstrated an impossibility result for a well-defined class of problems involving efficiently generatable quantum states and have provided detailed reasoning on why and how such classical approaches fail. Furthermore, through these discussions, we clarified the motivations for framing the task as we did, along with the boundaries of our definitions and the broader implications.

We recognize that the reviewer may feel that alternative formulations (e.g., generative tasks or compression-decompression tasks) might have framed the problem differently or addressed other related questions. While we respect this perspective, we believe our chosen focus and scope are valid and important, particularly given the separation we aim to show between measure-first and fully quantum protocols. At this stage, we also believe we have addressed any misunderstandings or concerns about the soundness of our results and do not see how they could justifiably merit a low score for soundness.

That said, we respect the reviewer’s final judgment on the suitability of our work for ICLR, which we continue to see as an appropriate venue given the relevance of the topic, the technical content, and the improvements made throughout this exchange. Regardless of the decision, we are deeply grateful to the reviewer for their detailed feedback and for raising insightful points that have helped significantly improve both the clarity and rigor of our work.

评论

- Eq (8): how can x appear as an unbound function variable on the LHS and as a bound variable on the RHS?

We thank the reviewer for finding this notational error on our part. x should not appear on the RHS. We fixed it in the revised version of the paper.

- Line 370: the use of Aaronson et al. is the very heart of the whole paper, yet this is not reproduced anywhere in the paper

For completeness, we now include a description of Aaronson’s U_x in the Appendix.

- Line 434: please remind me what non-uniform means in this context.

Non-uniform here means there does not need to be an efficient Turing machine that on input n outputs the quantum circuit for instance size n.

- Eq (9): the notation of dot and unfilled function brackets is not defined/explained.

Thank you for the remark. We explained the notation below Eq. 9.

- Lines 475-480: is unclear, especially given that there are supposed to be different random functions for each data-point.

Each data point corresponds to a specific function. The distinguisher evaluates the performance of the MF protocols based on the given function. If the MF protocols were effective on pseudorandom states (PRS), there would be a noticeable difference in performance: poor performance when the functions are truly random and good performance when they are pseudorandom. Since such a difference cannot occur, the MF protocols will also fail on pseudorandom functions. We further clarify the point in the main text too.

- Eq (11): if g_i(x) is of size n+1, doesn't this change the learning problem in unclear-ways, not least because the dimensions no longer match (2n + 1 != 2n + 2)

This change only affects the length of the output bitstring: instead of (x,y,b){0,1}2n+1(x, y, b) \in \{0,1\}^{2n+1}, it becomes (gi(x),y,b){0,1}2n+2(g_i(x), y, b) \in \{0,1\}^{2n+2}. This increase of the length of the bitstring by 1 has no further consequences for the learning problem. In particular, the (y,b)(y,b) variables still follow the same probability distribution determined by the POVM Λx\Lambda_x, so the learning problem remains fundamentally the same.

- Eq (29): \pi_x(f) -> \pi_x(f^{(k)})

We thank the reviewer for finding this typo, we fixed it in the revised version.

- Line 787: is query access -> has query acces

We thank the reviewer for finding this error, we fixed it in the revised version

评论

- Lines 157-159 is unclear

Currently is says:

”Directly applying these techniques to our learning setting that is focused on learning a 2n2^n-outcome POVM by estimating the probability of each possible measurement outcome j{1,2,...,2n}j \in \{1,2,...,2^n\} requires exponential precision ϵ\epsilon, achievable only with an exponential amount of resources.”

To provide more clarity, we changed it to

“As we want to learn distribution associated to a 2n2^n-outcome POVM with an ϵ\epsilon error in TV, directly applying these techniques to estimate each of the 2n2^n outcomes would require an exponential precision for each. In their upper bound estimates, this results in the requirement of an exponential number of samples.

We hope this is clearer.

- Lines 205-210 claims a significant difference between this work and Jerbi et al.'s, but given the above characterization it is not clear that this is the case.

In Jerbi et al., the target is a quantum state UψU|\psi\rangle, and the goal is to learn the unitary U. Measurements are performed on the evolved state UψU|\psi\rangle, rather than on the initial state ψ|\psi\rangle alone. This distinction is critical and highlights a fundamental difference between the two works.

- Line 229 introduces f', why the prime?:

We thank the reviewer for noticing this typo, we removed the prime.

- \pi_x is not anywhere clearly connected to \Lamda_x

πx\pi_x is a probability distribution over the bitstring (x,y,b){0,1}2n+1(x, y, b) \in \{0, 1\}^{2n+1}. Specifically, the probability distribution for the (y,b)(y,b) variables corresponds directly to the outcome of applying the POVM Λx\Lambda_x on the input state ψf|\psi_f\rangle. This establishes a clear link between πx\pi_x and Λx\Lambda_x . We also highlight this in Section 3.1 below Definition 7.

- z and x are used inconsistently/confusingly in multiple places

This issue has been resolved by clarifying the notation. We now denote j as the orthonormal basis of the POVM, and zz as the label for the quantum states, specifically z=(x,y,b)z = (x, y, b).

- Eq (3) and Eq (5): the training data should not include x, perhaps g_i(x) is somehow meant (however see below)?

We highlight in Appendix A that there are multiple scenarios to consider. In particular, gi(x)g_i(x) can be any function of x that "leaks" information about xx. One particularly simple to understand example is gi(x)=xg_i(x) = x, which we have chosen to go with in the main text. We kindly refer to Appendix A for further details.

- Eq (3) and Eq (5): (y, b) ~ \pi_x and not (x,y,b)

For how it is defined πx\pi_x, the correct notation is indeed (x,y,b)πx(x,y,b)\sim\pi_x. However, note that only (y,b)(y,b) follows the probability distribution given by the POVM Λx\Lambda_x on the input states ψf|\psi_f\rangle

- Line 285: perhaps \Tilde{\pi}x should be \Tilde{\pi}{T_x} since the x dependence is surely only through T_x.

Of course the measure first protocol (and the fully quantum too) only receives information about xx through TxT_x. However we feel it would overcomplicate the notation to put \TildeπTx\Tilde{\pi}_{T_x} instead of \Tildeπx\Tilde{\pi}_x.

评论

- Theorem 5: What are the assumptions of Yao's principle? Are they satisified?

We refer to Yao principle as stated in, for example, https://nerva.cs.uni-bonn.de/lib/exe/fetch.php/teaching/ws1819/vl-aau/lecturenotes05.pdf. The assumptions are that the variables A\mathbb{A} and X\mathbb{X} are random variables following a probability distribution over the set A\mathcal{A} and X\mathcal{X} respectively. Our setting perfectly matches these conditions.

- Line 810: How can Bob output 1 when y is in R_f(x) without knowledge of f?

The reviewer appears to be referring to a step in the "distinguisher algorithm," whose goal is to differentiate between truly random and pseudorandom phase states. However, there seems to be a misunderstanding, as there is no "Bob" present in this context. We can understand the source of the confusion, as we do relate the performance of the distinguisher algorithm to the lower bounds of the Hidden Matching (HM) problem. Nevertheless, the distinguisher algorithm itself is not situated within the communication complexity setup of the HM problem.

Importantly, the use of pseudorandom states and distinguisher algorithms is a key distinction that sets our result apart from being essentially a restatement of the intractability of the HM problem. Finally, we note that the distinguisher is provided with knowledge of both f and x, and can thus efficiently check whether y is in Rf(x)R_f(x).

评论

- How can my following perceived weakness from above be remedied: "This is a difference between data and not really about a separation between interesting algorithmic classes."

We agree that our results can be interpreted as demonstrating a separation in solving a learning task based on the type of data provided (classical vs quantum). However, we do not view this as a weakness of our work. On the contrary, the converse claim -- that there is no separation in value -- is precisely at the core of the "measure first, ask questions later" approach, which we set out to investigate and challenge. Addressing this separation is a significant and timely question within the quantum machine learning (QML) community, as highlighted in our general response above. We believe our work contributes to this ongoing discussion by providing a rigorous framework and evidence for when and why such separations emerge. To properly investigate this, it is necessary to study two classes of algorithms: one that can access the full quantum states and another that only receives classical descriptions of these states. Importantly, the classical description cannot be tailored to the specific concept being learned; otherwise, the learning task could be effectively solved at the measurement stage, negating the need for further analysis. This constraint underscores the significance and the necessity to study the two "algorithmic classes" (i.e., measure-first vs fully-quantum) as defined in our paper.

- Is my understanding correct that the essence of the separation between the two classes boils down to the inability to compress the unseen quantum data during deployment time? What would happen if the measurement restriction during training were modified to allow the measurements to depend on the labels? What would happen if the measurement restriction during training were modified to allow the measurements to depend on the labels?

We agree with the reviewer’s observation that the separation result stems from the measurement restrictions during deployment. However, we do not view this as a significant limitation of our results. In fact, previous notable works [2,3] have shown that fixed measurement schemes during deployment can still yield highly powerful models, challenging the perceived necessity of quantum computation and information in quantum machine learning (QML). Our primary goal was to examine the limitations of common MF protocols found in the literature, and we demonstrate that a separation exists for these variants of MF protocols. While a very interesting related question lies beyond the scope of our work, exploring how far "quasi"-MF schemes can go is a natural next step. We have also given thought to this direction and have intuition that separations may also be established in more general cases -- e.g., when the deployment involves a fully quantum strategy (rather than a "measure first, ask later" scheme). However, proving such separations formally would require significant additional work and novel results, which fall outside the current scope of this paper. Lastly, we note that our separation result holds even when measurements are performed on multiple input states, and we have clarified this point in the revised version of the paper.

- Line 289: Why is the "more general" case not studied, when it is arguably much more interesting than the restrictions under consideration?

We believe that our current result already makes a significant contribution by demonstrating a learning separation between models that are of particular interest to the community. This addresses what we consider to be the principal and most fundamental question in this context. The more general cases are natural and obvious follow-ups, which we fully intend to explore in future work.

评论

For different reasons, we must respectfully disagree with the assessment that our results are essentially a restatement of the intractability of the Hidden Matching (HM) problem. While we completely agree that we reduce the intractability of a learning problem to a lower bound in communication complexity to derive our result, we do not see this as a weakness per se, as the reduction itself is far from trivial. In fact, we would argue that reducing to a known separation is not inherently a limitation—after all, many foundational separations in quantum computing rely on reductions to established claims.

The reviewer’s statement appears to frame this as a shortcoming, and we interpret this to mean that the reduction is considered straightforward or uninteresting. On this point, we must respectfully disagree. Our results specifically address physical quantum states that are efficiently preparable on quantum hardware. It is important to note that the HM problem arises in the context of communication complexity, and its lower bounds do not directly apply in scenarios where states are efficiently preparable. In particular, if the quantum state can be efficiently prepared, the preparation circuit itself can be treated as a message in the communication protocol, thereby bypassing the standard lower bounds.

To address this challenge, we rigorously demonstrated that the learning problem remains classically intractable even for efficiently preparable states by leveraging pseudo-random states and imposing a time efficiency requirement on the learner—a standard consideration in machine learning contexts. This additional step is crucial because, without it, the separation would have no meaningful relevance to real-world cases. With it, however, we provide compelling evidence that such separations extend more broadly, as long as the states under consideration are sufficiently “rich”.

Furthermore, due to our focus on a machine learning scenario, there are additional key aspects that set our work apart from the traditional HM problem: our analysis emphasizes both average-case correctness and robustness to noise, which are critical considerations in practical learning tasks that are not considered in the HM problem. Covering all these common ML considerations in our result required us to apply all the various techniques of the sub-fields listed out by the reviewer in the “strenghts” section.

In summary, we believe that the reduction we employ is not only nontrivial but also highly relevant to practical contexts, offering insights beyond what a straightforward restatement of the HM problem would provide. We hope this explanation helps clarify our perspective.

We are also surprised by the score of 1 for soundness (which we understand as technical correctness and absence of mistakes), as we are not aware of any errors in our proofs, and the reviewer has not identified any specific issues (we find that the “unclear statements or mistakes” listed by the reviewer were mostly typos and things we could’ve explained more clearly).

评论

We thank the reviewer for their thoughtful feedback and appreciate their acknowledgment of our understanding of the various techniques employed in our proof. While we are naturally disappointed that the reviewer finds our results "unsurprising," we believe we can understand how one might arrive at this opinion. At first glance, it may appear that we simply compare a stronger model with a weaker one and then demonstrate the expected result—that they differ. However, we respectfully argue that the situation is far more nuanced, and we would like to take this opportunity to clarify why we believe our results are more surprising than they may initially seem.

The primary critique seems to be that our measure-first (MF) protocol, as defined, is too “hamstrung” (which we take to mean deliberately weakened) in comparison to the fully-quantum (FQ) protocol. While we understand this perspective, we would like to emphasize that recent advancements in the field have demonstrated that establishing a clear separation between these two protocols is far more challenging than it might initially seem.

To illustrate, it is worth beginning with the remarkable successes in the literature on shadow tomography and classical shadows. These breakthroughs have led leading researchers in the field to suggest that, in many cases, one can “measure first, ask questions later.” Indeed, a statement of this nature made during a conference explicitly inspired us to investigate the matter further.

Reflecting on the general evidence supporting this sentiment, MF protocols have shown surprisingly strong capabilities in a range of physical learning tasks. For instance, in a recent milestone work on predicting properties of gapped Hamiltonians using a variant of an MF protocol [2], it was shown that an algorithm employing a fixed measurement procedure (i.e., classical shadows of ground states [1]) could accurately predict numerous observables on the ground states of Hamiltonians.

Similarly, we would like to draw the reviewer’s attention to another recent work by experts in the field [3]. This study demonstrated that for many quantum input data learning tasks, classical convolutional neural networks, provided with classical shadow representations of quantum states as input (i.e., a MF protocol), achieved performance levels comparable to quantum neural networks (i.e., a FQ protocol). In fact, the authors explicitly raised the question of whether any task could conclusively demonstrate a clear separation between classical and quantum approaches.

To further contextualize this, consider our setting in quantum machine learning. For example, if in our task the labels corresponded to expectation values of local observables, there would be no separation between MF and FQ protocols due to the provable guarantees of classical shadows [1]. Thus, in light of this substantial body of evidence, we argue that expecting a provable difference between MF and FQ protocols is not obvious. Moreover, the challenge of identifying a task that does exhibit such a difference is not straightforward.

We hope this clarification inspires the reviewer to appreciate our perspective and see further value in our work. Even if the final outcome aligns with what one might naively expect, it is important to note that recent (and often surprising) results in the field suggest otherwise.

References: [1] Huang, Hsin-Yuan, et al. "Provably efficient machine learning for quantum many-body problems." Science 377.6613 (2022).

[2] Huang, Hsin-Yuan, et al. "Quantum advantage in learning from experiments." Science 376.6598 (2022).

[3] Bermejo, Pablo, et al. "Quantum convolutional neural networks are (effectively) classically simulable." arXiv preprint arXiv:2408.12739 (2024).

评论

Dear reviewers, dear Area Chairs,

We sincerely appreciate the invaluable feedback provided by the reviewers. Their insightful comments have not only allowed us to clarify key aspects of our work but also to enhance it significantly in the final revised version.

In particular, we have made the following improvements to the manuscript:

  1. We clarified how our results could, in principle, extend to a broader scenario where the advice states are ground states of local Hamiltonians rather than phase states. While we emphasize this point in the Conclusion, we also underscore here that our work represents the first rigorous separation between measure-first (MF) and fully-quantum (FQ) protocols in a supervised learning task.

  2. To further illustrate the relevance of our findings, we connected them to existing literature highlighting the significant power of MF protocols. Additionally, we expanded the manuscript with Appendix D, which includes two illustrative examples of MF learning protocols. These examples showcase the strong capabilities enabled by the MF learning algorithm, and in discussing their limitations, we underline how our work significantly advances beyond the hidden matching (HM) problem and Aaronson’s results [1]. We respectfully believe that we have thoroughly expanded on the quality and significance of the machine learning task, the naturalness and capacities of the learning algorithms, and how the task highlights performance distinctions between them, both in our responses and the revised manuscript.

  3. We made several adjustments throughout the manuscript to improve readability and enhance the clarity of the text.

We hope this summary effectively conveys our dedicated efforts to refine the manuscript in response to the reviewers' valuable feedback. Once again, we are truly grateful to the reviewers for their input.

[1]: Aaronson et al., A Qubit, a Coin, and an Advice String Walk Into a Relational Problem, 2023

AC 元评审

This paper studied limitations for quantum machine learning algorithms that use fixed measurement schemes on the input quantum states. This is motivated by both recent advances in randomized measurement protocols as well as machine learning for quantum states. In particular, the authors proved limitations of measure-first protocols in the average case, improving on the state-of-the-art which only focuses on worst-case scenarios.

There were adequate discussions during the rebuttal, in particular between the authors and the reviewer RYau. Technical contents were clarified and the paper was subsequently improved by the authors. However, the current results are more like extending the hidden matching problem and contributing to complexity theory (worse-case to average-case), with its nature being more as a quantum information problem than a machine learning problem. The paper is also purely theoretical, and other reviewer also agreed with its limited practical application. Considering all, the decision is to reject this paper at ICLR 2025.

For future versions of the paper, if the authors continue to submit to machine learning venues, more efforts are solicited for its connection to machine learning, in particular practical applications. Otherwise it might suit better to a quantum information/computation venue.

审稿人讨论附加意见

There were adequate discussions during reviewer discussions. This is also reflected in the metareview above.

最终决定

Reject