6.5

/10

Poster4 位审稿人

最低6最高8标准差0.9

3.3

置信度

正确性3.0

贡献度2.5

表达3.0

ICLR 2025

QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation

Yehui Tang,Mabiao Long,Junchi Yan

OpenReview PDF

提交: 2024-09-26更新: 2025-03-06

摘要

关键词

quantumproperty estimationmachine learning

评审与讨论

审稿意见

评分: 6置信度: 32024-11-03

This paper addresses the challenge of Quantum state property estimation (QPE) in quantum many-body physics, focusing on predicting characteristics like correlation and entanglement entropy from measurement data. The authors introduce QuaDiM, a non-autoregressive generative model using diffusion models, which avoids the need for an intrinsic qubit ordering and treats all qubits equally and unbiasedly. QuaDiM learns to map physical variables to ground state properties during offline training and can sample from learned distributions for unseen variables to predict unknown quantum states. Empirical results on the 1D anti-ferromagnetic Heisenberg model with up to 100 qubits show that QuaDiM outperforms baseline models, particularly auto-regressive approaches, under limited training data and reduced inference sample complexity.

优点

I am not an expert in quantum physics, so I can only provide my insights from the perspective of general generative models.

First of all, the paper provides empirical evidence that models like diffusion models, which can capture the correlations between qubits, can outperform sequential-based models such as transformers and RNNs. This observation is an important finding in itself and can potentially steer research focus in the relevant AI4Science area.

缺点

As mentioned earlier, I would like to raise some concerns from a generative model perspective:

Figure 3 is not informative to me; I can barely discern any difference between QuaDiM and LLM4QPE visually.

Based on Figure 2, the methodological novelty appears limited.

Could you please provide details about the computational resources used for the experiments? For example, the number of GPUs used, the training time, and the model size.

It would be beneficial if the authors could provide the standard deviation over seeds in Tables 1 and 2.

Minor typo:

Line 72: there is a double 'the'.

问题

Please correct me if I was wrong in terms of my previous comments :)

评论- Response to Reviewer kHhj (Part 3)

2024-11-22

For training and inference in our experiments, a single 2080Ti GPU is used. Training our QuaDiM (the main part is the transformer with 4 heads, 4 layers, and 128 hidden dimension, and a conditional embedding layer FFN to encoding the physical parameters, totally $\approx$ 800,000 parameters) requires nearly 9 hours. We report the inference speed under the task of predicting correlations with $L=100, M_{in}=1000$ and $M_{out}=1000$ . The results are as follows.

Method	RMSE	Generated samples per sec.
CS	0.0547	-
LLM4QPE	0.0531	14.6
QuaDiM ( $T_f=2000$ )	0.0478	5.7
QuaDiM ( $T_f=1000$ )	0.0537	8.1
QuaDiM ( $T_f=500$ )	0.0541	12.7
QuaDiM ( $T_f=100$ )	0.0882	37.4

As shown in the table above, when reducing inference to $T_f=500$ diffusion steps on a single GPU (2080Ti), QuaDiM achieves a lower RMSE score compared to the classical shadow baseline while demonstrating an inference speed comparable to LLM4QPE.

We acknowledge that the sampling time of diffusion-based models poses a significant challenge when applied to quantum many-body problems (as we already discussed in the limitation and conlusion section of our paper). This limitation is inherent to the nature of diffusion-based approaches. In our future work, we seek to explore the incorporation of techniques such as consistency training, potentially enabling one-step sampling for diffusion-based applications in the domain of quantum many-body problems. We remain optimistic about these possibilities and appreciate the reviewers' understanding of this ongoing effort.

Q4: It would be beneficial if the authors could provide the standard deviation over seeds in Tables 1 and 2.

A4: We sincerely thank the reviewers for the suggestion. In the Appendix, we have provided results including standard deviations, which are over four different random seed initializations. To comply with the page limitation by the ICLR committee, the new tables with standard deviations have been temporarily placed in the appendix. Once the paper is officially published, the tables in the main text will be updated accordingly.

Q5: Minor typo: Line 72: there is a double 'the'.

A5: Thank you for your feedback. We have correct this in the updated version of the paper.

评论- Response to Reviewer kHhj (Part 2)

2024-11-22

Note that while diffusion models and transformers are established in the ML community, their adaptation to the quantum many-body domain is non-trivial. Unlike typical applications like language modeling, quantum measurement data involve non-sequential, exponentially large dimensions with complex entanglement [9]. QuaDiM encourages to alleviate these issues by introducing a non-autoregressive generative framework for QPE. Here, we would like to briefly reiterate our contributions and innovations:

Non-Autoregressive Diffusion-Based Approach: QuaDiM is the first-ever (to the best of our knowledge) non-autoregressive conditional generative model specifically designed for QPE. Unlike standard approaches that rely on sequential dependencies (e.g., auto-regressive models), QuaDiM employs diffusion models to iteratively denoise Gaussian noise into the target quantum state distribution, with the hope of encouraging equal treatment of all qubits and removing bias introduced by sequential modeling. This framework is novel in the QPE context, addressing the limitations of existing machine learning models that fail to capture non-sequential, high-dimensional entanglement structures.
Scalability to Large Systems: Our empirical evaluation extends to quantum systems up to 100 qubits (up to $2^{100}$ dimension). This involved a large-scale data collection process from classical simulation. Notably, the simulation and data collection process utilizing the Matrix Product State (MPS) algorithm required nearly two weeks of computation on a cluster equipped with 4 Intel Xeon Gold 6248 CPUs (total cores: 80, total threads: 160). Notably, with reduced sample complexity, our model outperforms state-of-the-art baselines.

We again sincerely appreciate the reviewers’ thoughtful feedback. We hope that this clarification will aid in further understanding and help to underscore QuaDiM's potential as a valuable contribution to the field.

[1] Gebhart V, Santagati R, Gentile A A, et al. Learning quantum systems[J]. Nature Reviews Physics, 2023, 5(3): 141-156.

[2] Carrasquilla J. Machine learning for quantum matter[J]. Advances in Physics: X, 2020, 5(1).

[3] Chen Z, Newhouse L, Chen E, et al. Antn: Bridging autoregressive neural networks and tensor networks for quantum many-body simulation. NeurIPS, 2023.

[4] Tang Y, Xiong H, Yang N, et al. Towards LLM4QPE: Unsupervised Pretraining of Quantum Property Estimation and A Benchmark. ICLR, 2024.

[5] Wang Z, Liu C, Zou N, et al. Infusing Self-Consistency into Density Functional Theory Hamiltonian Prediction via Deep Equilibrium Models. NeurIPS, 2024.

[6] Carrasquilla J, Torlai G, Melko R G, et al. Reconstructing quantum states with generative models[J]. Nature Machine Intelligence, 2019.

[7] Xiao T, Huang J, Li H, et al. Intelligent certification for quantum simulators via machine learning[J]. npj Quantum Information, 2022.

[8] García-Pérez G, Rossi M A C, Sokolov B, et al. Learning to measure: Adaptive informationally complete generalized measurements for quantum algorithms[J]. PRX quantum, 2021.

[9] Huang H Y, Kueng R, Torlai G, et al. Provably efficient machine learning for quantum many-body problems[J]. Science, 2022.

[10] Lewis L, Huang H Y, Tran V T, et al. Improved machine learning algorithm for predicting ground state properties[J]. Nature Communications, 2024.

[11] Huang H Y, Broughton M, Cotler J, et al. Quantum advantage in learning from experiments[J]. Science, 2022.

Q3: Could you please provide details about the computational resources used for the experiments? For example, the number of GPUs used, the training time, and the model size.

A3: Thanks for your questions. Utilizing classical machine learning methods to address quantum-related problems primarily faces challenges due to the significant complexity involved in simulating quantum systems. Our simulations reached up to 100 qubits (dimensions of $2^{100}$ ).

The computational bottleneck is not training and inference with deep learning models, but generating synthetic data, i.e. classically generating target states and simulating quantum measurements. (Needless to say, this classical bottleneck does not occur in actual experiments.) Even using Matrix Product State (MPS) algorithm to accelerate simulation, it spends nearly two weeks collecting the data on a cluster equipped with 4 Intel Xeon Gold 6248 CPUs (total cores: 80, total threads: 160).

评论- Response to Reviewer kHhj (Part 1)

2024-11-22

We sincerely appreciate the time and effort you have taken to provide thoughtful feedback and helpful suggestions. We are glad to address your questions in detail. We have addressed the typos in the revised version of the PDF and added necessary clarifications along with additional numerical experiments in the Appendix. The changes made in the PDF are highlighted in blue for easy reference.

Q1: Figure 3 is not informative to me; I can barely discern any difference between QuaDiM and LLM4QPE visually.

A1: Thank you for the valuable feedback. In Figure 3 of our main paper, our intention is to present a comparative visualization of the performance of our proposed model, QuaDiM, and the baseline LLM4QPE under varying numbers of samples $M_{out}$ . Each subplot corresponds to a specific $M_{out}$ , and the root mean squared error (RMSE) values, which indicate the predictive accuracy (lower is better), are highlighted in the upper-right corner of each subplot for reference.

We recognize that the color contrast in Figure 3 may not be sufficiently distinct. To address this, we include an additional plot Figure 6 in the Appendix, where we present a more fine-grained visualization of the prediction performance of QuaDiM. In this figure, each point represents the absolute error (lower is better) between the predicted correlation and the ground truth for all pairs of qubits across different $M_{out}$ values.

From Figure 6, it can be seen that QuaDiM consistently outperforms LLM4QPE and QuaDiM achieves lower variance compared with LLM4QPE. Specifically, the absolute errors for QuaDiM is more narrowly concentrated around lower values, highlighting its consistent predictive performance. In contrast, LLM4QPE exhibits wider variability.

These results highlight the practical advantages of QuaDiM, especially in scenarios where sampling a large number of measurements is computationally expensive. We hope this additional clarification enhances your understanding of our model's strengths.

Q2: Based on Figure 2, the methodological novelty appears limited.

A2: We sincerely thank the reviewer for the feedback. We would like to highlight that Quantum state property estimation (QPE) is a challenging problem in the field of quantum physics and computing [1,2], with implications for quantum simulation, cryptography, and hardware validation. By investigating the critical challenge of efficiently predicting quantum state properties in large-scale systems (up to $2^{100}$ dimensions studied in our work), QuaDiM highlights the practical feasibility of deep learning in quantum sciences.

Firstly, We emphasize leveraging deep learning frameworks to QPE, a challenging problem in quantum many-body problems that has garnered increasing attention in recent years. Many advanced works have been published in AI conferences, focusing on both the adaptation and application of deep learning methods such as autoregressive models [3], pretraining strategies [4], and deep equilibrium models [5]. Beyond these, a series of contributions have emerged, arising from both experimental [6,7,8] and theoretical [9,10,11] perspective. We aspire for this paper to serve as a source of inspiration for researchers across both the AI and quantum computing domains.

评论- Sincerely Awaiting Your Feedback

2024-11-25

Dear Reviewer kHhj,

I hope this message finds you well. As the rebuttal phase deadline is approaching, we would like to kindly request your further feedback to ensure a comprehensive revision and to ensure our clarifications have addressed your concerns effectively. Your feedback is invaluable, and we greatly appreciate your time and insights.

Best regards

2024-11-26

We sincerely thank the reviewer for the additional feedback. Your suggestions are invaluable in helping us refine our manuscript, and we will do our best to address your concerns thoroughly.

It is worth noting that predicting properties of quantum states using deep learning techniques is indeed a cutting-edge research area, and the efficiency of these techniques in this domain remains an open question. We would like to briefly share some of our observations here and maintain an open stance for further discussion with the reviewer.

From a theoretical perspective [1,2], for learning and predicting correlations (equivalent to the expectation of local observables) and two-body entropy as investigated in our paper, the required $M_{in}$ does not scale exponentially and is independent of the number of qubits $L$ . However, in general cases, reconstruction of the full representations of the quantum states (known as quantum tomography, a orthogonal research direction to our work) typically requires exponentially large number of training measurements.

From a numerical perspective, recent work [3] based on autoregressive models has explored new deep learning methodologies to reduce $M_{in}$ , achieving SOTA results. In our paper, we approached this challenge from the perspective of the non-sequential nature of the distribution corresponding to quantum states. Our experiments demonstrated that the non-autoregressive diffusion model achieves a reduction in $M_{in}$ compared to the SOTA autoregressive model when predicting correlations and entropy of the 1-D Heisenberg model (as shown in Figure 5).

Additionally, we provide the RMSE (lower is better) of predicting correlations for both our model and the baselines under varying $M_{in}$ values. Due to time constraints, we report results only for $L=100, M_{out}=1000$ in the following table.

	$M_{in}=200$	$M_{in}=400$	$M_{in}=600$	$M_{in}=800$	$M_{in}=1000$
CS	0.1296	0.0814	0.0682	0.0598	0.0547
RNN	0.1679	0.1132	0.0976	0.0853	0.0806
LLM4QPE	0.1358	0.0781	0.0633	0.0565	0.0531
QuaDiM	0.1224	0.0735	0.0597	0.0526	0.0478

These discussions and results will be involved in the revised version of our paper. Once again, we deeply appreciate the reviewer’s feedback. In the limited time remaining for the rebuttal process, we are eager to further discuss any remaining concerns or suggestions you might have. Thank you for your constructive engagement and valuable insights!

[1] Huang H Y, Kueng R, Torlai G, et al. Provably efficient machine learning for quantum many-body problems[J]. Science, 2022.

[2] Lewis L, Huang H Y, Tran V T, et al. Improved machine learning algorithm for predicting ground state properties[J]. Nature Communications, 2024.

[3] Tang Y, Xiong H, Yang N, et al. Towards LLM4QPE: Unsupervised Pretraining of Quantum Property Estimation and A Benchmark. ICLR, 2024.

评论- thanks for response

2024-11-26

Thanks for the response.

I have last question, can you report the plot of the number of trained samples vs evaluation results (hopefully it won't be too complicated, and I am sorry for the last minute request)? I am curious about this. I slightly doubt that the training does not need such large scale of the data (2^100) since the qubit has structure.

评论- Looking Forward to Your Reply

2024-11-27

Dear Reviewer kHhj,

I hope this message finds you well. I would like to kindly ask whether our latest responses adequately addressed your concerns and clarified the points raised in your previous question. Your detailed and constructive feedback has been invaluable in shaping our work, and we deeply value your insights.

Our primary goal is to ensure that the paper meets the rigorous standards of ICLR while also contributing meaningfully to the intersection of deep learning and quantum physics. If there are any remaining concerns or areas where further clarification might help, we would be more than happy to address them in the spirit of collaboration and continuous improvement.

Thank you once again for your time and effort in reviewing our responses for your last question. We greatly appreciate your guidance in helping us refine our work.

Warm regards

评论- Thanks for the reply

2024-11-28

Thank you for the clarification. After revisiting [1], which shares the same technical backbone (diffusion model), and reviewing the response regarding training scalability, I realized that the task difficulty of [1] is significantly higher than what the author proposed. Therefore, the criticism directed at [1] seems unfair, although I agree that [1] cannot yet scale to hundreds of qubits. I suggest the author clarify this point in the paper. It took me some time to understand the fundamental differences, and I hope future readers will not have to go through the same process.

I will increase the score once the manuscript is updated.

[1] Yuchen Zhu, Tianrong Chen, Evangelos A Theodorou, Xie Chen, and Molei Tao. Quantum state generation with structure-preserving diffusion model. arXiv preprint arXiv:2404.06336, 2024

评论- Further Response to Reviewer kHhj (Part 2)

2024-11-28

In summary, we argue that the approach in [1] is fundamentally limited in practical applications due to the infeasibility of using full density matrices as input. Our work, in contrast, is well-aligned with recent advancements in QPE and provides practical solutions to bridge quantum states and classical generative models. To provide further clarity, we will include additional discussion about [1] in the revised version of our paper. Due to the page limitations set by the ICLR committee, we have temporarily placed the discussion in the appendix G, but they will be updated in the main text once the paper is officially published.

Once again, we sincerely thank the reviewers for their valuable comments, which have greatly contributed to improving the quality and clarity of our paper.

[1] Yuchen Zhu, Tianrong Chen, Evangelos A Theodorou, Xie Chen, and Molei Tao. Quantum state generation with structure-preserving diffusion model. arXiv preprint arXiv:2404.06336, 2024

[2] Cramer M, Plenio M B, Flammia S T, et al. Efficient quantum state tomography[J]. Nature communications, 2010, 1(1): 149.

[3] Torlai G, Mazzola G, Carrasquilla J, et al. Neural-network quantum state tomography[J]. Nature physics, 2018, 14(5): 447-450.

[4] Huang H Y, Kueng R, Torlai G, et al. Provably efficient machine learning for quantum many-body problems[J]. Science, 2022.

[5] Lewis L, Huang H Y, Tran V T, et al. Improved machine learning algorithm for predicting ground state properties[J]. Nature Communications, 2024.

[6] Tang Y, Xiong H, Yang N, et al. Towards LLM4QPE: Unsupervised Pretraining of Quantum Property Estimation and A Benchmark. ICLR, 2024.

[7] Wu Y D, Zhu Y, Wang Y, et al. Learning quantum properties from short-range correlations using multi-task networks[J]. Nature Communications, 2024, 15(1): 8796.

评论- thanks for the reply

2024-11-29

Thanks for the careful clarification. I have increased the score.

评论- Further Response to Reviewer kHhj (Part 1)

2024-11-28

We sincerely thank the reviewers for their thoughtful feedback and the time and effort they have devoted to reviewing our work.

Before addressing the specific concerns, we would like to clarify that we were aware of the method proposed in [1] prior to writing this paper, and we have discussed it in our paper (Line 140).

However, after thorough analysis, we contend that their method falls short of providing a viable approach for applying machine learning (diffusion-based models in that paper) to quantum physics. Their technique, which involves using density matrix (a $2^{2L}$ -size complex-valued matrix) of quantum states directly as inputs, is largely impractical due to the foundational requirement in learning of quantum states: the necesity to first gather quantum measurement data via quantum measurements on the states themselves.

Given this practical limitation, the numerical experiments in [1] are restricted to toy demos of systems with only $L=4$ qubits (in contrast to our experiments, which scale up to $L=100$ qubits). To further enhance the reviewers’ understanding, we provide a detailed discussion below on the impracticality of [1] and the key distinctions between their approach and ours. We are deeply grateful for the reviewers’ patience in reading our response.

The impracticality of using full density matrix as input in [1]:
One of the principal features distinguishing classical systems from quantum many-body systems is that quantum systems require exponentially many parameters in the system size to fully specify the state [2].

To obtain information from a real quantum system, measurement is required (Such as the pauli measurement used in our paper). Measurement results in discrete outcomes due to the collapse of quantum states. Obtaining the full density matrix to construct the training set in [1] typically demands an exponentially large number of quantum measurements on an actual quantum computer, followed by extensive post-processing of these measurement outcomes, which generally entails exponential overhead [2]. As a result, in [1] each sample in the training set (i.e., a full density matrix representing a specific quantum state) would require exponential storage space and computational resources, making this approach impractical for real-world applications.
The orthogonality of our task (quantum state property estimation, QPE) to quantum state tomography (QST):
The task in [1] is more aligned with QST (however, it is not a typical QST approach, as [1] explicitly requires the density matrix to be used as training input, whereas QST typically does not impose such a constraint), which seeks to reconstruct the full density matrix. Neural network-based QST [3] generally approximates the probability distribution over the outcomes of an informationally complete measurement using a variational manifold represented by a neural network. However, [1] bypasses measurement data entirely and directly uses the density matrix as input. To the best of our knowledge, this approach is impractical, as it contradicts the fundamental constraints of data acquisition, i.e., quantum measurement in quantum systems.
Our focus on quantum state property estimation (QPE):
Unlike QST, QPE specifically targets the prediction of specific properties of quantum states without reconstructing the full density matrix. This task has recently garnered significant attention in the quantum physics and machine learning communities, with several cutting-edge theoretical [4,5] and empirical [6,7] studies already published. In terms of model design, unlike [1]’s approach of directly modeling the density matrix using generative models, our work delves into bridging the gap between quantum states and the classical joint distributions modeled by generative models. Additionally, we explore how the continuous latent variables of diffusion models can be decoded into discrete quantum measurement data while preserving physical validity. We have elaborated on these in response A3 to Reviewer 9ozo, and response A7 and A8 to Reviewer oe1d. These discussion has also been incorporated into Appendix C of the revised version of our paper. We appreciate the reviewers’ patience in reviewing this material.
The Potential of Adapting Our Model to QST:
We also note that a similar concern was raised by another reviewer regarding whether our model could be used to obtain the log probabilities of each measurement outcome. For a detailed discussion from our preliminary and immature view, please refer to our response to Reviewer 9ozo in A7.

审稿意见

评分: 8置信度: 32024-11-03

This paper introduces QuaDiM, a non-autoregressive diffusion model for quantum state property estimation (QPE). Unlike traditional autoregressive approaches that impose a (potentially biased) sequential ordering on qubits, QuaDiM treats all qubits equally through an iterative denoising process. The model is evaluated on the 1D anti-ferromagnetic Heisenberg model for systems up to 100 qubits, demonstrating superior performance in predicting correlation and entanglement entropy compared to baseline methods, especially with limited measurement data.

优点

The proposed method of using diffusion model for quantum state tomography is novel. The paper effectively adapts diffusion models to quantum systems by introducing a token embedding function that maps discrete measurement outcomes to continuous features, allowing seamless integration with the diffusion process.
The method seems to scale well to large systems. The model demonstrates strong performance on systems up to 100 qubits, which is significant for quantum computing applications.

缺点

While the paper mentions T=2000 denoising steps, there's no ablation study on how the number of steps affects performance versus computational cost.
The paper fixes the embedding dimension at d=128 without analyzing how different dimensions affect the model's performance and computational requirements.
The paper only considers Pauli-6 POVM, while evaluations on other POVM measurements may be beneficial.
Although the authors made reasonable efforts, it is unclear to non-physicists how wave functions become probability distributions. Adding a section in the appendix discussing POVM measurements is preferable.

问题

How does the choice of diffusion schedule (β_t values) affect the model's performance? Would adaptive scheduling improve results?
The paper uses Pauli-6 POVM measurements. How would the model perform with other measurement bases, and could this be made basis-independent?
Could the authors comment on if the model can be used to obtain the log probabilities of each measurement outcome?
POVM probabilities can be non-physical. Does the model always learn physical solutions?
Could the model be modified to predict other quantum properties beyond correlation and entanglement entropy? What architectural changes would be needed?

伦理问题详情

No concerns.

评论- Response to Reviewer 9ozo (Part 4)

2024-11-22

As shown, QuaDiM still outperforms the baselines in this scenario. We hope this clarification addresses the reviewers’ concerns. Thank you again for the valuable feedback, and we look forward to further improving our work based on your comments.

[1] Renes J M, Blume-Kohout R, Scott A J, et al. Symmetric informationally complete quantum measurements[J]. Journal of Mathematical Physics, 2004, 45(6): 2171-2180.

Q5: How does the choice of diffusion schedule (β_t values) affect the model's performance? Would adaptive scheduling improve results?

A5: We sincerely thank the reviewers for the feedback. In our experiments, we follow the implementation in [1,2] and adopt the square-root noise schedule for the noise coefficient. That is $\overline{\alpha}_t = 1-\sqrt{t/T + c}$ , where $\overline{\alpha}\_t = \prod\_{s=0}^{t}(1-\beta\_s)$ and $c$ is a small constant and we set $c=0.0001$ . We have clarified and explicitly stated this in the updated version of the paper.

[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Advances in neural information processing systems, 2020.

[2] Li X, Thickstun J, Gulrajani I, et al. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 2022.

Q6: The paper uses Pauli-6 POVM measurements. How would the model perform with other measurement bases, and could this be made basis-independent?

A6: Thank you for your valuable question. In our response to A4, we have included results based on tetrahedral POVM. Regarding your inquiry about whether our current method can be made basis-independent, we admit that our proposed model has not yet achieved this.

This limitation arises because different measurement protocols typically imply the use of different basis. As noted in our response in A3, the decomposition of the same quantum state generally differs depending on the chosen basis. Consequently, the coefficients $\Psi$ of the wave function $|\Phi\rangle$ , as well as the form of the corresponding classical joint distribution $p$ , are also different. This means that switching to a different IC-POVM would require retraining the model.

By the way, we have noticed some recent pioneering research [1,2] exploring adaptive measurement strategies to find optimal measurement basis for specific quantum systems. These works may provide inspiration for our future efforts to make the model basis-independent. We remain open to further discussion with the reviewer on this intriguing new direction.

[1] García-Pérez G, Rossi M A C, Sokolov B, et al. Learning to measure: Adaptive informationally complete generalized measurements for quantum algorithms[J]. PRX quantum, 2021, 2(4): 040342.

[2] Glos A, Nykänen A, Borrelli E M, et al. Adaptive POVM implementations and measurement error mitigation strategies for near-term quantum devices[J]. arXiv preprint arXiv:2208.07817, 2022.

Q7: Could the authors comment on if the model can be used to obtain the log probabilities of each measurement outcome?

A7: Thank you for your comments. We would like to emphasize that the focus of this paper is to predict certain properties of quantum systems by training on measurement data and physical parameters. On the other hand, obtaining the log probabilities of each measurement outcome typically corresponds to quantum state tomography [1]. It is orthogonal to our research, aiming to reconstruct the full density matrix of the quantum state. This can be achieved through approaches such as supervised learning [2]—where the model is provided with paired data in the form of (measurement outcome, probability)—or optimization-based methods, such as variational Monte Carlo [3,4].

In our preliminary and immature view, to adapt to quantum state tomography tasks, the proposed model architecture might not require modification. However, the data collection process and the loss function might need to be adjusted based on the specific task. For instance, under a supervised learning paradigm, training data might need to be collected in the form of (measurement outcome, probability), and the loss function could be replaced with a supervised loss function, such as Mean Squared Error (MSE) loss.

We hope that the novel techniques proposed in our paper could inspire researchers in this field to explore a broader range of quantum state learning tasks in the future.

[1] Torlai G, Mazzola G, Carrasquilla J, et al. Neural-network quantum state tomography[J]. Nature physics, 2018, 14(5): 447-450.

[2] Zhu Y, Wu Y D, Bai G, et al. Flexible learning of quantum states with generative query neural networks[J]. Nature communications, 2022, 13(1): 6222.

[3] Chen Z, Newhouse L, Chen E, et al. Antn: Bridging autoregressive neural networks and tensor networks for quantum many-body simulation[J]. Advances in Neural Information Processing Systems, 2023, 36: 450-476.

评论- Response to Reviewer 9ozo (Part 3)

2024-11-22

We may consider a system of $L$ qubits. It can be described by the wave function:

|\mathbf{\Phi}\rangle = \sum_{\sigma_1=1}^{M}\cdots\sum_{\sigma_L=1}^{M} \mathbf{\Psi}(\sigma_1,\ldots,\sigma_L)|\sigma_1,\ldots,\sigma_L\rangle, $$ where $\mathbf{\Psi}: \mathbb{Z}^{L}\rightarrow \mathbb{C}$ maps a fixed configuration $\sigma=(\sigma_1,\ldots,\sigma_L)$ of $L$ qubits to a complex number which is the amplitude satisfying $\sum_{\sigma_1=1}^{K}\cdots\sum_{\sigma_L=1}^{K} |\mathbf{\Psi}(\sigma_1,\ldots,\sigma_L)|^2=1$, and $\sigma_i \in \{1,\ldots,K\}$ is one of the $K$ possible outcomes by performing quantum measurement on the $i$-th qubit. It is formulated in a complex Hilbert space where the vector representation of the quantum state $|\mathbf{\Phi}\rangle \in \mathbb{C}^{K^L}$ and its density matrix $|\mathbf{\Phi}\rangle\langle\mathbf{\Phi}|\in \mathbb{C}^{K^L\times K^L}$, which becomes astronomical for large $L$. Performing quantum measurement independently on $L$ qubits is easy to be implemented. The most common strategy is to combine $L$ single-qubit measurement operators to $\mathbf{\Pi}\_{k,1}\otimes\cdots\otimes\mathbf{\Pi}\_{k,L}$ where $\otimes$ is the Kronecker product. Such measurement procedure outputs a measurement string $\sigma=(\sigma_1,\ldots,\sigma_L)$ where $\sigma_i \in \{1,\ldots,K\}$ with probability $|\mathbf{\Psi}(\sigma_1,\ldots,\sigma_L)|^2$. Define $p(\sigma_1,\ldots,\sigma_L)=|\mathbf{\Psi}(\sigma_1,\ldots,\sigma_L)|^2$. We can reformulate the wave function of quantum states to a **classical joint distribution**. It is a valid and legal joint distribution since $\sum_{\sigma_1}\cdots\sum_{\sigma_L}p(\sigma_1,\ldots,\sigma_L)=\sum_{\sigma_1}\cdots\sum_{\sigma_L}|\mathbf{\Psi}(\sigma_1,\ldots,\sigma_L)|^2=1$ and $p(\sigma_1,\ldots,\sigma_L)\geq 0$. Thus, we connect the quantum wave function with a classical joint distribution and attempt to approximate this joint distribution using neural network methods, including autoregressive approaches [2,3,4] and our proposed non-autoregressive method QuaDiM. If there are any further unclear aspects in our explanation, we sincerely hope the reviewer can point them out, and we will do our best to provide clarifications. [1] Nielsen M A, Chuang I L. Quantum computation and quantum information. Cambridge university press, 2010. [2] Carrasquilla J, Torlai G, Melko R G, et al. Reconstructing quantum states with generative models[J]. Nature Machine Intelligence, 2019. [3] Tang Y, Xiong H, Yang N, et al. Towards LLM4QPE: Unsupervised Pretraining of Quantum Property Estimation and A Benchmark. The Twelfth International Conference on Learning Representations (ICLR), 2024. [4] Xiao T, Huang J, Li H, et al. Intelligent certification for quantum simulators via machine learning[J]. npj Quantum Information, 2022. **Q4: The paper only considers Pauli-6 POVM, while evaluations on other POVM measurements may be beneficial.** **A4:** Thank you for your comments. We emphasize the use of Pauli-6 POVM measurements for data collection because **this measurement protocol is easy to implement on current quantum devices (NISQ devices)** and is **informationally complete (IC)**. This means that all the information of the quantum state can be recovered classically with a sufficiently large number of IC-POVM measurements. In other words, given the probability of each measurement outcome of IC-POVM, the quantum state can be uniquely determined. To further validate our method, here we consider another type of IC-POVM: the **tetrahedral POVM** [1], to collect measurement data. The corresponding measurement operators are $\{\frac{1}{4}(\mathbf{I}+\mathbf{s}^{(a)}\cdot \mathbf{P})\}\_{a\in\\{0,1,2,3\\}}$, where $\mathbf{I}$ is the identity matrix, $\mathbf{P}$ represents the ensemble of Pauli operators $(X, Y, Z)$ and $\mathbf{s}^{(0)}=(0,0,1)$,$\mathbf{s}^{(1)}=(\frac{2\sqrt{2}}{3},0,-\frac{1}{3})$,$\mathbf{s}^{(2)}=(-\frac{\sqrt{2}}{3},\sqrt{\frac{2}{3}},-\frac{1}{3})$,$\mathbf{s}^{(3)}=(-\frac{\sqrt{2}}{3},-\sqrt{\frac{2}{3}},-\frac{1}{3})$. It is easy to check that $K=4$ for the tetrahedral POVM. Due to time constraints, we fixed $L=10$ with $M_{in}=1000$, re-run the simulations to collect data, and re-train our model and the baselines for predicting the correlations. The numerical results are reported as follows: | Method | M=1000 | M=10000 | |---------|--------|---------| | CS | 0.0512 | 0.0164 | | RBFK | 0.0735 | - | | NTK | 0.0747 | - | | RNN | 0.0514 | 0.0163 | | LLM4QPE | 0.0503 | 0.0141 | | QuaDiM | 0.0433 | 0.0107 |

评论- Response to Reviewer 9ozo (Part 2)

2024-11-22

Q3: Although the authors made reasonable efforts, it is unclear to non-physicists how wave functions become probability distributions. Adding a section in the appendix discussing POVM measurements is preferable.

A3: Thank you for your valuable comments. We will add the necessary definitions and notations related to quantum wave functions and POVM measurements to facilitate a clearer understanding for the reviewers. The additional content has already been incorporated into the appendix of the updated version of the paper.

We aim to explain the fundamental concepts of quantum computing as clearly and step-by-step as possible using mathematical notions. We sincerely appreciate the reviewer's patience in taking the time to read through the following content. (For a comprehensive discussion, we refer the reviewers to the Section 2.1 of the book [1])

Note that in Section 2 of the paper we have already provided a brief description of the fundamental concepts of quantum computing. We will further elaborate on the details below.

A single qubit -- the smallest unit of quantum computing -- is mathematically represented as a vector $|\psi\rangle=\alpha|0\rangle+\beta|1\rangle$ parameterized by two complex numbers satisfying $|\alpha|^2+|\beta|^2=1$ . Operations on a qubit must preserve this norm, and thus are described by $2 \times 2$ unitary matrices. Of these, some of the most important are the Pauli operators; it is useful to list them here:

X \equiv\left[\begin{array}{cc} 0 & 1 \\\\ 1 & 0 \end{array}\right] , \quad Y \equiv\left[\begin{array}{cc} 0 & -i \\\\ i & 0 \end{array}\right] , \quad Z \equiv\left[\begin{array}{cc} 1 & 0 \\\\ 0 & -1 \end{array}\right]. $$ One could do some linear algebras and check that $|0\rangle = \big[\begin{smallmatrix}1\\\\0\end{smallmatrix}\big]$ and $|1\rangle = \big[\begin{smallmatrix}0\\\\1\end{smallmatrix}\big]$ are the eigenvectors of $Z$, $|+\rangle=\frac{1}{\sqrt{2}}\big[\begin{smallmatrix}1\\\\ 1\end{smallmatrix}\big]$ and $|-\rangle=\frac{1}{\sqrt{2}}\big[\begin{smallmatrix}1\\\\ -1\end{smallmatrix}\big]$ are the eigenvectors of $X$, $| \boldsymbol{i}\_{+}\rangle=\frac{1}{\sqrt{2}}\big[\begin{smallmatrix}1\\\\ \boldsymbol{i}\end{smallmatrix}\big]$ and $| \boldsymbol{i}\_{-}\rangle=\frac{1}{\sqrt{2}}\big[\begin{smallmatrix}1\\\\-\boldsymbol{i}\end{smallmatrix}\big]$ are the eigenvectors of $Y$. The same qubit can be decomposed in to different orthonormal basis. For example,

\begin{aligned} |\psi\rangle &= \alpha|0\rangle + \beta|1\rangle \ &= \frac{1}{\sqrt{2}}(\alpha+\beta)|+\rangle + \frac{1}{\sqrt{2}}(\alpha-\beta)|-\rangle \ &= \frac{1}{\sqrt{2}}(\alpha-\beta\boldsymbol{i})|\boldsymbol{i}_{+}\rangle + \frac{1}{\sqrt{2}}(\alpha+\beta\boldsymbol{i})|\boldsymbol{i}_{-}\rangle. \end{aligned} $$ Positive-operator valued measurement (POVM) is the testing or manipulation of a physical system to yield a numerical result. POVM is described by a set of measurement operators $\\{\mathbf{\Pi}\_k\\}\_{k=0}^{K-1}$ satisfying $\sum_{k} \mathbf{\Pi}\_k = \mathbf{I}$ and each $\mathbf{\Pi}\_k$ is positive semi-definite, where $K$ is the total number of measurement operators. In this paper, we consider the Pauli-6 POVM (also named as randomized single-qubit Pauli measurements in some literature) such that the measurement operators are $\\{\frac{1}{3}|0\rangle\langle 0|, \frac{1}{3}|1\rangle\langle 1|,\frac{1}{3}|+\rangle\langle +|, \frac{1}{3}|-\rangle\langle -|, \frac{1}{3}|\boldsymbol{i}\_+\rangle\langle \boldsymbol{i}\_+|, \frac{1}{3}|\boldsymbol{i}\_-\rangle\langle \boldsymbol{i}\_-|\\}$ . It is easy to check that these operators satisfy the POVM definition and $K=6$ . The reason for choosing the Pauli-6 POVM is that this measurement protocol is easy to be implemented on current quantum devices (NISQ devices) and is informative-completed (IC) (it means that all the information of the quantum state could be recovered classically by a sufficient large number of IC-POVM measurements. In other words, given the probability of each of the measurement outcomes of IC-POVM, we can uniquely determine the quantum state).

Measuring a qubit leads to collapse of the qubit and produces an outcome $k$ with the probability $p(k)$ satisfying the Born rule, which states that $p(k) = \mathrm{tr}(\rho \mathbf{\Pi}_k)$ , where $\rho=|\psi\rangle\langle\psi|$ and $\langle\psi|$ is the transpose conjugate of $|\psi\rangle$ .

评论- Response to Reviewer 9ozo (Part 1)

2024-11-22

Thank you for taking the time to provide detailed comments and valuable suggestions. We are happy to address your questions step by step. We have addressed the typos in the revised version of the PDF and added necessary clarifications along with additional numerical experiments in the Appendix. The changes made in the PDF are highlighted in blue for easy reference.

Q1: While the paper mentions T=2000 denoising steps, there's no ablation study on how the number of steps affects performance versus computational cost.

A1: We sincerely thank the reviewers for their valuable suggestions. The slow sampling speed is indeed a significant concern when it comes to diffusion models (which we have already mentioned in the conclusion and limitation section of the paper).

To evaluate this, we fix the number of diffusion steps during training for QuaDiM while shrinking the inference steps $T_f$ using the approach introduced in DDIM (Song et al., 2020). We evaluate the model’s performance under different inference step settings and compare it with both a learning-free baseline (classical shadow, CS) and a learning-based SOTA model LLM4QPE. Due to time constraints, we only report results for the task of predicting correlations with $L = 100$ , $M_{in} = 1000$ , and $M_{out} = 1000$ .

Method	RMSE	Generated samples per sec.
CS	0.0547	-
LLM4QPE	0.0531	14.6
QuaDiM ( $T_f=2000$ )	0.0478	5.7
QuaDiM ( $T_f=1000$ )	0.0537	8.1
QuaDiM ( $T_f=500$ )	0.0541	12.7
QuaDiM ( $T_f=100$ )	0.0882	37.4

Q2: The paper fixes the embedding dimension at d=128 without analyzing how different dimensions affect the model's performance and computational requirements.

A2: We sincerely thank the reviewer for raising this question. In all our experiments, we set the embedding hidden dimension to $d = 128$ (as well as other Transformer parameters such as the number of heads to 4 and the number of layers to 4) primarily for two reasons: first, it follows the default settings used in prior work LLM4QPE proposed by Tang, et al, and second, based on empirical observations, $d = 128$ often strikes a balance between good generalization performance and avoiding excessive dimensionality.

To further investigate, we evaluate the model's performance on the task of predicting correlations under a fixed dataset configuration $L = 10, M_{in} = 1000,M_{out} = 1000$ with different $d$ values from $\{64, 128, 256, 512\}$ , and the resulting RMSE scores (lower is better) are: 0.0518, 0.0432, 0.0449, 0.0457, respectively. As the results show, setting $d = 128$ achieves the best performance, confirming its suitability in this context.

We hope that these additional results provide more insight into the rationale behind our parameter choices. Should the reviewers have further concerns or suggestions, we would be delighted to address them in detail.

评论- Response to Reviewer 9ozo (Part 5)

2024-11-22

[4] Sehayek D, Golubeva A, Albergo M S, et al. Learnability scaling of quantum states: Restricted Boltzmann machines[J]. Physical Review B, 2019, 100(19): 195125.

Q8: POVM probabilities can be non-physical. Does the model always learn physical solutions?

A8: Thank you for your insightful question. In this work, we chose the physically implementable and informationally complete Pauli-6 POVM [1]. In our response to A4, we further supplemented the results using the tetrahedral POVM, which is also informationally complete but more challenging to implement physically. The results demonstrate that our model outperforms the baselines under both measurement protocols.

Since both POVMs in our experiments are informationally complete, theoretically, a sufficient amount of measurement records (meaning the exact probabilities of all possible measurement outcomes are available) uniquely determines the measured quantum state. This ensures that adequate measurement data enable the model to learn physical patterns. Consequently, our numerical experiments did not exhibit non-physical behavior, and the model always learned physical solutions.

We reasonably speculate that the reviewer's concern may relate to using informationally incomplete POVMs (e.g., measuring only with Pauli Z) for data collection and model training, which might result in the model failing to capture the complete physical patterns. This negative impact is indeed possible and would require sophisticated neural network design and the incorporation of physical priors to achieve comparable performance to informationally complete POVMs, as studied in [2]. However, exploring the use of informationally incomplete POVMs is beyond the scope of this paper, and we are happy to investigate this in future research.

[1] Huang H Y, Kueng R, Preskill J. Predicting many properties of a quantum system from very few measurements[J]. Nature Physics, 2020, 16(10): 1050-1057.

[2] Koutný D, Ginés L, Moczała-Dusanowska M, et al. Deep learning of quantum entanglement from incomplete measurements[J]. Science Advances, 2023, 9(29).

Q9: Could the model be modified to predict other quantum properties beyond correlation and entanglement entropy? What architectural changes would be needed?

A9: Thank you for your insightful comments. In this paper, we focus primarily on two specific tasks: predicting correlation and entanglement entropy. These tasks are chosen because they are well-studied in the literature and have relatively established benchmarks [1,2].

Broadly speaking, beyond these two tasks, our model is capable of predicting other properties of quantum states through quantum measurements. For example, it can estimate linear functions of quantum states, such as symmetry-breaking phases and expectation values of local observables, as well as non-linear functions of quantum states, such as purity and the energy variance of local Hamiltonians.

We also anticipate that the model could be further adapted to enhance its representational capacity to capture more complex patterns of quantum states and even quantum processes. For instance, future work could involve designing new embedding layers that incorporate information about the observables used in quantum measurements as additional features. Moreover, new layers and multimodal approaches could be involved to input information of quantum process into the model by representing them as graph topologies. These adjustments would enable the model to better understand and leverage the intricate structure of quantum systems.

In future work, we aim to extend our research to explore additional properties of quantum states and further evaluate the versatility of our model across diverse quantum systems. Thank you again for highlighting this direction, which we believe will be a valuable avenue for future exploration.

[1] Huang H Y, Kueng R, Preskill J. Predicting many properties of a quantum system from very few measurements[J]. Nature Physics, 2020, 16(10): 1050-1057.

[2] Tang Y, Xiong H, Yang N, et al. Towards LLM4QPE: Unsupervised Pretraining of Quantum Property Estimation and A Benchmark. The Twelfth International Conference on Learning Representations (ICLR), 2024.

评论- Sincerely Awaiting Your Feedback

2024-11-25

Dear Reviewer 9ozo,

I hope this message finds you well. As the rebuttal deadline is approaching, we would greatly appreciate your valuable feedback at your earliest convenience. Your insights are highly important to us, and we are eager to address your comments thoroughly.

Best regards

2024-11-27

Dear Reviewer 9ozo,

Thank you very much for your thoughtful and constructive feedback, and for considering the possibility of raising the score. We greatly appreciate your support and are encouraged by your positive comments regarding our work.

We fully understand that your final decision is contingent on the review process, and we deeply value the careful consideration you’ve given our paper. If there are any further aspects that may require clarification or additional discussion, please don’t hesitate to let us know. We are more than happy to provide any additional information that might help in your final assessment.

Once again, thank you for your time and effort in reviewing our submission. Your input is invaluable, and we are truly grateful for your support.

Best regards

On behalf of all authors

2024-11-27

Thanks for the detailed rebuttal. I would like to keep recommending acceptance of the paper. In the meantime, I may consider raising the score before the deadline.

审稿意见

评分: 6置信度: 32024-11-03

The paper proposes Quantum State Property Estimation using Diffusion Models (QuaDiM), a conditional diffusion model designed to learn and generate distributions of quantum states and accordingly quantum properties, for a given Hamiltonian parameter, from measurement data. Traditionally, auto-regressive models have been used for this task, modeling the quantum state array as a sequential structure despite the lack of inherent sequential ordering in quantum states. Unlike these approaches, QuaDiM denoises and shapes all states in the array simultaneously, as is typical in diffusion models, allowing it to effectively capture non-sequential interactions between spatially separated states. The proposed structure essentially includes an embedding that allows discrete quantum states to be treated as continuous diffusive evolution within a (conditional) diffusion model. The authors validate that the proposed QuaDiM outperformed existing main models, including RNN-based models and LMM4QPE, for large-scale quantum problems (e.g., L = 100).

优点

The proposed QuaDiM outperforms the baselines based on sequential models, e.g., RNN-based and transformer-based ones. It is intriguing, especially to handle very large-scale quantum states (e.g., L = 100) as a benchmark.
The paper is generally well-written. I am not an expert in quantum computing, but I was able to follow this paper well and found it interesting to read.

缺点

This paper is essentially an application paper that applies a diffusion model combined with a text embedding structure to the task of quantum state property estimation. It effectively addresses an interesting topic in the field of quantum computing through appropriate structural selection; however, it is uncertain whether this topic is broadly interesting to the ICLR audience.
I am not fully convinced why a diffusion model is the most suitable choice for the (non-sequential) quantum property estimation task. To me, this task fundamentally seems to be manageable by other generative models, such as conditional VAEs, GANs, or energy models. The authors mention that these generative models can only handle a single specific quantum state and cannot generalize to unseen states, but I am curious why a diffusion model is capable of generalizing to unseen states. While it is true that diffusion models demonstrate superior performance in terms of both fidelity and diversity, achieving high performance for diffusion models generally requires a large amount of training data. Therefore, the authors would clarify why the proposed QuaDiM would be particularly beneficial for quantum state property estimation, compared to other generative models, in a practical setting (e.g., limited observation as the authors mentioned).
The authors claim that the primary motivation behind the proposed QuaDiM model is to eliminate the sequential handling of quantum states present in existing models. However, QuaDiM employs an embedding function to represent discrete quantum states as continuous variables, following the approach in [1], which uses token and positional embeddings commonly used in text generation. It seems that this embedding, originally designed for text generation, inherently carries a sequential bias, which makes the motivation and effectiveness of the proposed QuaDiM less convincing to me.
The authors should explicitly mention the number of parameters and training time of the models used to ensure a fair comparison. Although this is not a critical issue, the network architecture and number of parameters of the main competitor, LLM4QPE, and the proposed QuaDiM appear to be nearly identical.
Some minor issues: In Table 2, for L = 70, M = 100 and QuaDiM (ours), is the result 0.0597 correct? Line 048, Fig. 2? Line 072, the the?, ...

[1] Li, X., Thickstun, J., Gulrajani, I., Liang, P. S., & Hashimoto, T. B. (2022). Diffusion-LM improves controllable text generation. Advances in Neural Information Processing Systems, 35, 4328-4343.

问题

Could the authors clarify why diffusion models have a suitable structure for learning non-sequential quantum state estimation compared to other generative models?
Could the authors explain why the embedding method used in QuaDiM has a structure suitable for non-sequential quantum state estimation?

评论- Response to Reviewer MPMV (Part 5)

2024-11-22

Reasoning Behind Positional Embeddings
Traditional autoregressive models used for modeling quantum states introduce sequential bias through predefined sampling orders (e.g., left-to-right), as evidenced by [1,2]. In contrast, our diffusion-based framework generates all measurement outcomes simultaneously, avoiding sequential dependencies during sampling.
- Positional embeddings in our context are employed to capture the structural information among qubits. Additionally, we further investigate the model’s RMSE in predicting correlations when using relative positional embedding (follow the implementation in [3]) and no positional embedding at all. Due to time constraints, we only report results for $L=100, M_{in}=1000$ . The experimental results are provided below (lower is better) and are involved in the PDF's Appendix. Experimental results show that absolute and relative positional embeddings yield comparable performance, but removing positional information altogether significantly degrades the model's ability to predict quantum correlations. This underscores the embeddings' role in preserving spatial relationships among qubits.
- We acknowledge that the current positional encoding method may not perfectly reflect quantum system-specific structures. As a step forward, we plan to explore customized positional encodings tailored to quantum systems in future work.

	Absolute Position Embedding	Relative Position Embedding	No Position Embedding
$M_{out}=100$	0.1686	0.1681	0.4527
$M_{out}=1000$	0.0478	0.0482	0.3269
$M_{out}=10000$	0.0125	0.0139	0.2895
$M_{out}=20000$	0.0098	0.0110	0.2148

In conclusion, we appreciate the reviewer’s thoughtful concerns regarding the use of embeddings inspired by language models for quantum measurement data. By highlighting the relationship between quantum systems and classical joint distributions, as well as drawing analogies to text corpora, we hope our statement clarifies the rationale behind this design choice. Our empirical results further demonstrate the effectiveness of positional embeddings in capturing structural information among qubits, while recognizing that there is room to refine these embeddings to better align with quantum-specific characteristics.

We sincerely thank the reviewer for their valuable feedback, which has provided us with meaningful directions for future research to enhance the adaptability and rigor of our approach.

[1] Bortone M, Rath Y, Booth G H. Impact of conditional modelling for a universal autoregressive quantum state[J]. Quantum, 2024, 8: 1245.

[2] Ibarra-García-Padilla E, Lange H, Melko R G, et al. Autoregressive neural quantum states of Fermi Hubbard models[J]. arXiv preprint arXiv:2411.07144, 2024.

[3] Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155, 2018.

Q4: The authors should explicitly mention the number of parameters and training time of the models used to ensure a fair comparison. Although this is not a critical issue, the network architecture and number of parameters of the main competitor, LLM4QPE, and the proposed QuaDiM appear to be nearly identical.

A4: The main part of our proposed QuaDiM is the transformer, which shares a similar structure of the sota baseline LLM4QPE, a transformer-based autoregressive model. For a fair comparison, the transformer of both is fixed at 4 heads, 4 layers, and 128 hidden dimension, along with a conditional embedding layer FFN to encoding the physical parameters. The total parameters $\approx$ 800,000. On a single 2080Ti GPU and for $L=100$ with batch size is 64 and 100,000 iterations, the training time of QuaDiM and LLM4QPE is nearly 9 hours.

Q5: Some minor issues: In Table 2, for L = 70, M = 100 and QuaDiM (ours), is the result 0.0597 correct? Line 048, Fig. 2? Line 072, the the?

A5: Thank you for the feedback. We have corrected these errors in the updated version of the paper. That is, 0.0597 -> 0.5970 and delete double the.

Q6: Could the authors clarify why diffusion models have a suitable structure for learning non-sequential quantum state estimation compared to other generative models?

A6: We sincerely thank the reviewer for the question. Kindly refer to our response in A2 for details.

Q7: Could the authors explain why the embedding method used in QuaDiM has a structure suitable for non-sequential quantum state estimation?

A7: Thank you for your thoughtful question. Please see our detailed response provided in A3 for further clarification.

评论- Response to Reviewer MPMV (Part 4)

2024-11-22

[3] Bortone M, Rath Y, Booth G H. Impact of conditional modelling for a universal autoregressive quantum state[J]. Quantum, 2024, 8: 1245.

[4] Ibarra-García-Padilla E, Lange H, Melko R G, et al. Autoregressive neural quantum states of Fermi Hubbard models[J]. arXiv preprint arXiv:2411.07144, 2024.

[5] Vahdat A, Kautz J. NVAE: A deep hierarchical variational autoencoder[J]. Advances in neural information processing systems, 2020.

[6] Saxena D, Cao J. Generative adversarial networks (GANs) challenges, solutions, and future directions[J]. ACM Computing Surveys (CSUR), 2021, 54(3): 1-42.

[7] Carleo G, Troyer M. Solving the quantum many-body problem with artificial neural networks[J]. Science, 2017, 355(6325): 602-606.

[8] Deng D L, Li X, Das Sarma S. Quantum entanglement in neural network states[J]. Physical Review X, 2017, 7(2): 021021.

Q3: The authors claim that the primary motivation behind the proposed QuaDiM model is to eliminate the sequential handling of quantum states present in existing models. However, QuaDiM employs an embedding function to represent discrete quantum states as continuous variables, following the approach in [1], which uses token and positional embeddings commonly used in text generation. It seems that this embedding, originally designed for text generation, inherently carries a sequential bias, which makes the motivation and effectiveness of the proposed QuaDiM less convincing to me.

A3: We appreciate the reviewer's insightful feedback on the use of embeddings inspired by language models for quantum measurement data. We would like to clarify our reasoning and demonstrate the appropriateness of this approach.

Relation Between Quantum States and Classical Joint Distributions
Consider the quantum state represented as a wave function $|\psi\rangle = \sum_{\sigma_1=1}^{K}\cdots\sum_{\sigma_L=1}^{K} \Phi(\sigma_1,\ldots,\sigma_L)|\sigma_1,\ldots,\sigma_L\rangle, $$ where $\Phi(\sigma_1,\ldots,\sigma_L)$ is the amplitude of the basis $(\sigma_1,\ldots,\sigma_L)$ satisfying the normalization condition (according to Born rule in quantum mechanics) $\sum_{\sigma_1=1}^{K}\cdots\sum_{\sigma_L=1}^{K} |\Phi(\sigma_1,\ldots,\sigma_L)|^2 = 1$. The quantum state inherently exhibits a probabilistic nature governed by the Born rule, where measurement outcomes follow a joint probability distribution:$

P(\sigma_1, \sigma_2, \dots, \sigma_L) = |\Phi(\sigma_1, \sigma_2, \dots, \sigma_L)|^2, $$ The joint distribution $p$ is a valid probability distribution since $p(\sigma_1,\ldots,\sigma_L)\geq 0$ and $\sum_{\sigma_1}\cdots\sum_{\sigma_L}p(\sigma_1,\ldots,\sigma_L)=1$. This formulation mirrors classical joint distributions frequently encountered in NLP tasks, where dependencies among tokens (qubits) are encoded probabilistically. Representing quantum measurement data as continuous embeddings aligns with this probabilistic perspective and facilitates modeling the complex dependencies among qubits. 2. **Analogies Between Quantum Measurement Data and Text Corpora** - A **quantum measurement string** (e.g., $\sigma = (\sigma_1, \dots, \sigma_L)$) is analogous to a **sentence** in a corpus, encoding interdependencies among qubits. - The possible measurement outcomes for each qubit correspond to **tokens**, akin to words in a vocabulary $\mathcal{V}$ (with size $|\mathcal{V}|=K$ and $K$ is the total number of possible measurement outcomes, which is 6 in our experiments), while the quantum system's physical condition can be likened to **contextual information** that influences the sentence structure. - Given these parallels, applying language model-inspired embeddings to quantum measurement data is a natural extension, facilitating scalable and generalizable representation learning.

评论- Response to Reviewer MPMV (Part 3)

2024-11-22

1. Clarification of Motivations and Contributions

To address your concerns more effectively, we would like to begin by briefly restating the motivations behind our work.

Encourage to Eliminate Autoregressive Bias: QuaDiM leverages diffusion models to provide a novel non-autoregressive method for quantum state property estimation (QPE), a challenging problem of quantum many-body problems [1,2]. Recent studies [3,4] have found that the inherent constraints of autoregressive methods—specifically, the pre-defined factorization order (e.g., left-to-right or right-to-left order in 1D chains, or zig-zag orders in 2D systems)—often introduce biases into the model. This bias influences the ability of variational models including deep learning models to accurately describe correlations among qubits and constrains the expressivity of these models in learning quantum systems. Unlike autoregressive methods, QuaDiM encourages to eliminate the need of predefined qubit ordering.
Generalization to unseen states: Diffusion models learn to progressively refine Gaussian noise into the target distribution conditioned on the hidden representations of physical variables, with the hope of capturing the entire distribution of quantum states rather than a specific set of states. This mechanism naturally enables generalization to unseen quantum states since we can sample new measurement outcomes from the trained models conditioned on the unseen physical variables (even hard to simulate in a real quantum computer or hard to simulate classically) out of the training set.
Large-scale experiments and Practical Feasibility: QuaDiM demonstrates superior scalability and generalization capabilities compared to both learning-free baseline classical shadow (CS) and learning-based SOTA LLM4QPE (autoregressive model). It accurately predicts quantum properties like correlation and entanglement entropy for systems up to 100 qubits (up to $2^{100}$ dimension). Moreover, our results reveal that QuaDiM achieves lower prediction errors with reduced training data and fewer samples during inference, making it more practical for real-world scenarios where quantum measurements are expensive and time-consuming both in quantum experiments (1.6$ per second for IBM quantum computer) and classical simulations.

2. Potential Advantages Over VAEs, GANs, and Energy-Based Models

Our choice of diffusion models was primarily motivated by their intuitive alignment with the requirements (three points above) of quantum property estimation; the following represents our preliminary analysis, and we remain open to further discussions and deeper explorations with the reviewer on this topic.

Limitations of VAEs, GANs and Energy-Based Models:
- VAEs are effective for learning latent representations but often struggle with accurately modeling sharp or highly complex distributions [5].
- While GANs can model complex distributions, their training is notoriously unstable due to issues like mode collapse [6]. This instability is particularly problematic for quantum systems, where capturing the full range of quantum state correlations is crucial.
- Energy-based models (such as RBMs [7,8] for modeling quantum states) are powerful but computationally expensive, especially when scaling to quantum systems with a large number of qubits. Diffusion models provide a more computationally efficient alternative while retaining the ability to model complex distributions.
Practical Efficiency of Diffusion Models:
- As shown in our experiments, QuaDiM achieves state-of-the-art performance even with a limited number of training samples (Table 1, 2 and Figure 5). The iterative denoising process encourages to high fidelity and diversity in generated quantum states, outperforming baselines in predictive accuracy.

In conclusion, we deeply appreciate the reviewer’s insightful feedback, which has allowed us to reflect on and further clarify the motivations and design choices behind QuaDiM.

While our work demonstrates the potential of diffusion models for quantum state property estimation and highlights their advantages in terms of scalability, generalization, and efficiency, we acknowledge that this is an emerging and dynamic research area. We hope that our contributions can serve as a foundation to inspire further exploration and innovation, including the application of alternative generative models.

We remain open to constructive discussions and suggestions from the reviewer to strengthen this line of research further. Thank you again for your valuable input.

[1] Gebhart V, Santagati R, Gentile A A, et al. Learning quantum systems[J]. Nature Reviews Physics, 2023, 5(3): 141-156.

[2] Carrasquilla J. Machine learning for quantum matter[J]. Advances in Physics: X, 2020, 5(1).

评论- Response to Reviewer MPMV (Part 2)

2024-11-22

[5] Wang Z, Liu C, Zou N, et al. Infusing Self-Consistency into Density Functional Theory Hamiltonian Prediction via Deep Equilibrium Models[J]. NeurIPS, 2024.

[6] Carrasquilla J, Torlai G, Melko R G, et al. Reconstructing quantum states with generative models[J]. Nature Machine Intelligence, 2019.

[7] Xiao T, Huang J, Li H, et al. Intelligent certification for quantum simulators via machine learning[J]. npj Quantum Information, 2022.

[8] García-Pérez G, Rossi M A C, Sokolov B, et al. Learning to measure: Adaptive informationally complete generalized measurements for quantum algorithms[J]. PRX quantum, 2021.

[9] Huang H Y, Kueng R, Torlai G, et al. Provably efficient machine learning for quantum many-body problems[J]. Science, 2022.

[10] Lewis L, Huang H Y, Tran V T, et al. Improved machine learning algorithm for predicting ground state properties[J]. Nature Communications, 2024.

[11] Huang H Y, Broughton M, Cotler J, et al. Quantum advantage in learning from experiments[J]. Science, 2022.

Q2: I am not fully convinced why a diffusion model is the most suitable choice for the (non-sequential) quantum property estimation task. To me, this task fundamentally seems to be manageable by other generative models, such as conditional VAEs, GANs, or energy models. The authors mention that these generative models can only handle a single specific quantum state and cannot generalize to unseen states, but I am curious why a diffusion model is capable of generalizing to unseen states. While it is true that diffusion models demonstrate superior performance in terms of both fidelity and diversity, achieving high performance for diffusion models generally requires a large amount of training data. Therefore, the authors would clarify why the proposed QuaDiM would be particularly beneficial for quantum state property estimation, compared to other generative models, in a practical setting (e.g., limited observation as the authors mentioned).

A2: We sincerely thank the reviewer for their thoughtful and constructive feedback.

It is worth noting that predicting properties of quantum states using deep learning techniques is indeed a cutting-edge research area [1]. We acknowledge that generative models like VAEs, GANs, and energy-based methods remain valuable tools in quantum machine learning. Our work with QuaDiM demonstrates the potential of non-autoregressive approaches and aims to inspire further exploration of these methods for QPE.

In the following response, we would try our best to clarify the motivation behind our choice of diffusion models, highlight their potential advantages over alternative generative methods, and provide further insights into their suitability for quantum state property estimation tasks. We are happy to maintain an open stance for further discussion with the reviewer.

评论- Response to Reviewer MPMV (Part 1)

2024-11-22

We are truly grateful for your time, detailed feedback, and thoughtful suggestions. We would do our best to provide clarifications step by step. We have addressed the typos in the revised version of the PDF and added necessary clarifications along with additional numerical experiments in the Appendix. The changes made in the PDF are highlighted in blue for easy reference.

Q1: This paper is essentially an application paper that applies a diffusion model combined with a text embedding structure to the task of quantum state property estimation. It effectively addresses an interesting topic in the field of quantum computing through appropriate structural selection; however, it is uncertain whether this topic is broadly interesting to the ICLR audience.

A1: Thank you for the feedbacks. We will try our best to address the reviewer's concerns.

Firstly, we would like to highlight that leaning quantum states hold critical importance for both the quantum computing and deep learning communities.

Significance of Quantum Applications for Machine Learning

Applying machine learning methods to quantum many-body problems is a rapidly growing area of interest [1,2]. Quantum state property estimation (QPE) is a challenging problem in the field of quantum physics and computing, with implications for quantum simulation, cryptography, and hardware validation. By addressing the critical challenge of efficiently predicting quantum state properties in large-scale systems, QuaDiM highlights the potential feasibility of diffusion-based models in quantum sciences.
Examining the application of deep learning approaches in the context of QPE - one of the challenging problems in the field of quantum many-body problem - from both experimental [6,7,8] and theoretical [9,10,11] perspectives holds indispensable value. Our validation of the proposed method’s feasibility is based on empirical and experimental evidence.
Note that many advanced works have been published in learning quantum states in AI conferences, focusing on both the adaptation and application of deep learning methods such as autoregressive models [3], pretraining strategies [4], and deep equilibrium models [5].

Next, we would like to reiterate our contributions and innovations:

Non-Autoregressive Diffusion-Based Approach: QuaDiM is the first-ever (to the best of our knowledge) non-autoregressive conditional generative model specifically designed for QPE. Unlike standard approaches that rely on sequential dependencies (e.g., auto-regressive models), we propose QuaDiM, a diffusion-based model to iteratively denoise Gaussian noise into the target quantum state distribution, with the hope of encouraging equal treatment of all qubits and removing bias introduced by sequential modeling. This framework is novel in the QPE context.
Scalability to Large Systems: Our empirical evaluation extends to quantum systems up to 100 qubits (up to $2^{100}$ dimension). This involved a large-scale data collection process from classical simulation. Notably, the simulation and data collection process utilizing the Matrix Product State (MPS) algorithm required nearly two weeks of computation on a cluster equipped with 4 Intel Xeon Gold 6248 CPUs (total cores: 80, total threads: 160). Notably, with reduced sample complexity, our model outperforms state-of-the-art models.
Generalizability Across Quantum States and Properties: A significant strength of QuaDiM is its generalizability. Once trained, QuaDiM can predict a variety of quantum state properties—including correlation and entanglement entropy—without the need for retraining. Moreover, it is not limited to a single quantum state but can generalize to previously unseen quantum systems. This includes states that are difficult to prepare on current noisy intermediate-scale quantum (NISQ) devices or impossible to simulate classically. Such flexibility highlights QuaDiM’s potential for addressing broader quantum tasks efficiently.

In conclusion, QuaDiM introduces a novel approach with its non-autoregressive architecture in solving quantum-many body problems, with its scalability and generalizability. We believe its interdisciplinary perspective and empirical results provide a meaningful step forward, and we hope it will be considered as a valuable contribution for the ICLR audience.

[1] Gebhart V, Santagati R, Gentile A A, et al. Learning quantum systems[J]. Nature Reviews Physics, 2023, 5(3): 141-156.

[2] Carrasquilla J. Machine learning for quantum matter[J]. Advances in Physics: X, 2020, 5(1).

[3] Chen Z, Newhouse L, Chen E, et al. Antn: Bridging autoregressive neural networks and tensor networks for quantum many-body simulation. NeurIPS, 2023.

[4] Tang Y, Xiong H, Yang N, et al. Towards LLM4QPE: Unsupervised Pretraining of Quantum Property Estimation and A Benchmark. ICLR, 2024.

评论- Sincerely Awaiting Your Feedback

2024-11-25

Dear Reviewer MPMV,

I hope this message finds you well. As the rebuttal period is quickly approaching its deadline, we kindly request your valuable feedback to ensure a comprehensive revision. Your insights are crucial to us, and we look forward to your response.

Best regards

评论- Anticipating Your Feedback with Gratitude

2024-11-26

Dear Reviewer MPMV,

As the discussion window is approaching its closure, we are eager to hear back from you regarding our submission. We wish to ensure that our responses have adequately addressed your concerns. Should there be any remaining issues or new concerns that have arisen, please do not hesitate to inform us. We are more than willing to provide further clarifications and engage in discussions to enhance the quality and relevance of our work.

Thank you very much for your review and support.

Best regards

2024-11-27

I appreciate the authors' detailed responses, which have clarified my questions regarding the motivation behind the use of the diffusion model and positional encoding. I now recognize this paper as a solid contribution to the application of quantum science, and accordingly have increased the review score. That said, while the proposed algorithm is well-defined and thoughtfully constructed, the absence of substantial advancements in ML theory or notable algorithmic innovations makes it challenging to provide a higher evaluation.

2024-11-27

Dear Reviewer MPMV,

We sincerely thank you for your insightful feedback and for considering an increase in your score. Your detailed suggestions have greatly contributed to improving the clarity and depth of our manuscript. We appreciate the time and effort you have invested in reviewing our work and providing constructive recommendations.

Best regards

审稿意见

评分: 6置信度: 42024-11-04

The paper proposes QuaDiM, a new framework based on diffusion models that learns the ground state of quantum systems conditioned on system parameters and enables quantum property estimation. QuaDiM uses a transformer-based architecture, where at inference time, measurement outcomes are first generated as continuous-valued hidden vectors and then decoded into discrete values. Numerical experiments are performed on 1D anti-ferromagnetic Heisenberg model up to $100$ qubits and predicted properties include correlation and entanglement entropy.

优点

The paper is very readable, with clear and understandable notations and easy-to-follow reasoning across the paper.
The proposed methods achieve SOTA performance on predicting correlation and entanglement entropy on 1D anti-ferromagnetic Heisenberg model with a reasonably large system size, justifying the effectiveness of the proposed approach.

缺点

While the motivation for using non-autoregressive models is briefly discussed in the introduction, there could be more discussion and empirical evidence to support the claims about the sub-optimal performance of auto-regressive methods. While the measurements are not like language, the underlying physics model can still have some order structure (like a chain) and locality information is often important. Therefore, it's not completely obvious why diffusion models are necessarily better.
QuaDiM lacks machine learning technical novelty. The conditional diffusion model are already well studied in the literature and the used score neural network seems to be standard transformer as well. Given that ICLR is a ML conference, it would be more interesting to see applications and inspiration that can be drawn from QuaDiM to more general problems.
The experimental setup are not clearly presented. For example, the generation protocol of system parameter $x \in \mathbb{R}^{L - 1}$ is not described in the experiments section, not to mention the training and test set splitting process. This makes it hard to understand under what level QuaDiM extrapolates to "unseen parameters".
The experiment design is relatively weak. It would be interesting to consider more complicated property estimation problems, such as the estimation of entanglement negativity, two site observables with long ranges in between, or even the estimation of system parameters in the Hamiltonian. Second, it would be more convincing if QuaDiM is examined on a more complicated physics model, such as some with non-trivial long-range entanglement. It would be more convincing if the extrapolation of QuaDiM to critical state of the physics model could be tested and examined.
The benchmarks are not representative enough. For example, [1-3] are also works that model quantum states that should enable property estimation as downstream tasks, but are not discussed or compared.

References:

[1] Fitzek, David, et al. "RydbergGPT." arXiv preprint arXiv:2405.21052 (2024).

[2] Du, Yuxuan, et al. "Shadownet for data-centric quantum system learning." arXiv preprint arXiv:2308.11290 (2023).

[3] Zhang, Yuan-Hang, and Massimiliano Di Ventra. "Transformer quantum state: A multipurpose model for quantum many-body problems." Physical Review B 107.7 (2023): 075147.

问题

Can you discuss more on in what cases diffusion models would be strictly better than auto-regressive models, in the quantum state modeling setting?
I am not completely clear on how continuous latent vectors generated by QuaDiM are decoded back into measurement outcomes. Can you elaborate more on this?
Why QuaDiM will generate measurements consistently sampled from a valid quantum state without violating structure constraints. How is this enforced in the learning algorithm?
Does QuaDiM extrapolate to unseen system parameter $x$ distribution, likely with disjoint support as the training distribution, e.g., trained on $x \in [0, 0.8]$ , tested on $x \in [0.9, 1]$ ?
Can you comment on QuaDiM's training efficiency? First, it seems that a relatively large $M_{in}$ is required for learning one ground state to good accuracy, also $M_{out}$ is not a small number as well. Can $M_{in}$ and $M_{out}$ be improved? Second, how many distinct pairs of (ground state, measurements) are needed for learning the mapping between Hamiltonian parameter and system ground state correctly, and how does this scale with the number of qubits?

评论- Response to Reviewer oe1d (Part 3)

2024-11-22

Note that while diffusion models and transformers are well-established in the ML community, their adaptation to the quantum many-body domain is non-trivial. Unlike typical applications like language modeling, quantum measurement data involve non-sequential, exponentially large dimensions with complex entanglement [9]. QuaDiM encourages to alleviate these issues by introducing a non-autoregressive generative framework for QPE. Here, we would like to briefly reiterate our contributions and innovations:

Non-Autoregressive Diffusion-Based Approach: QuaDiM introduces the first non-autoregressive conditional generative model specifically designed for QPE. Unlike standard approaches that rely on sequential dependencies (e.g., auto-regressive models), QuaDiM employs diffusion models to iteratively denoise Gaussian noise into the target quantum state distribution, with the hope of encouraging equal treatment of all qubits and removing bias introduced by sequential modeling. This framework is novel in the QPE context, addressing the limitations of existing machine learning models that fail to capture non-sequential, high-dimensional entanglement structures.
Scalability to Large Systems: Our empirical evaluation extends to quantum systems up to 100 qubits (up to $2^{100}$ dimension). This involved a large-scale data collection process from classical simulation. Notably, the simulation and data collection process utilizing the Matrix Product State (MPS) algorithm required nearly two weeks of computation on a cluster equipped with 4 Intel Xeon Gold 6248 CPUs (total cores: 80, total threads: 160). Notably, with reduced sample complexity, our model outperforms state-of-the-art baselines.