QVAE-Mole: The Quantum VAE with Spherical Latent Variable Learning for 3-D Molecule Generation
We propose the first fully (conditional) quantum VAE for 3-D data (molecule) generation with a von-Mises Fisher distributed latent space.
摘要
评审与讨论
This paper proposes a fully quantum VAE framework, QVAE-Mole, for 3D molecule generation. It introduces a quantum encoding scheme and adopts a von Mises-Fisher distributed latent space. A conditional version, QCVAE-Mole, is also presented for property-specified generation. Experiments show that the model outperforms other quantum or hybrid methods, and achieves competitive performance with fewer parameters compared to classic methods.
优点
I'm not an expert in this direction, but the approach looks novel. This paper proposes the first fully quantum VAE for 3D molecule generation, which has the potential quantum advantage, especially in the NISQ era. Adopting a von Mises-Fisher (vMF) distributed latent space to meet the inherent coherence of the quantum system, which is more suitable than the normal distribution used in previous methods. The model outperforms all other quantum (or hybrid) methods and achieves comparable results to several classic methods with significantly reduced parameters.
缺点
I have no major concerns about this article, just the following two minor shortcomings:
- I think some background knowledge needs to be described in a bit more detail, at least it should be in the appendix (For some laymen like me).
- The Atom stability and Molecule stability metric are compared in most 3D molecular generation paper, while this paper does not show this result.
问题
See Weaknesses Section.
局限性
See Weaknesses Section.
Parameterized Quantum Circuits.
Parameterized quantum circuits (PQCs) consist of parameterized gates and offer a concrete way to implement quantum machine learning algorithms. Specifically, the common parameterized quantum gates contain:
The parameters (e.g., ) in the quantum gate can be either learnable parameters for optimizers or classical information that we want to encode. A quantum machine learning model can be constructed using a sequence of parameterized quantum gates. The initial quantum states can be transformed into the output quantum states. By measuring the output of the quantum circuit, we can convert quantum information into classical information, which can be used to calculate the cost function of the optimization task. We can use classical optimizers to minimize the cost function by adjusting the parameters of quantum gates.
Thank you for acknowledging our work and your suggestions have been immensely helpful. Below is our detailed response.
W1: I think some background knowledge needs to be described in a bit more detail, at least it should be in the appendix (For some laymen like me).
Thank you for your suggestion. We will rewrite the quantum preliminary part and here we provide more details for you to make our paper more readable. Due to word count limitations, part of the answer are included in the Official Comment below..
Single-qubit quantum state. In quantum computing, the fundamental building blocks of computation are qubits (short for quantum bits), which are the quantum analog of classical bits. Unlike classical bits, which can only take on one of two possible values (0 or 1), a qubit can exist in a superposition of the two states, represented by the vector: where and represent the two basis states of one qubit, and and are complex numbers that satisfy the normalization condition . When is measured, it will collapse to either the or state with a probability or .
Mathematically, the quantum state of one qubit can be denoted as a complex 2-dimensional vector, e.g., , , and . The Bloch sphere is a sphere of radius 1, which is a useful tool for visualizing the state of a single qubit. Any other state of one qubit can be represented by a point on the surface of the sphere.
Multi-qubit quantum state. Multi-qubit quantum states are an extension of single-qubit quantum states, and a -qubit quantum state can be represented as a complex -dimensional vector in Hilbert space. This is why quantum systems are often described as living in a -dimensional Hilbert space. More specifically, a two-qubit system can be represented as , where and represent the tensor product .
Quantum circuits. Quantum circuits are constructed using quantum gates, which are analogous to classical logic gates. Some commonly used single-qubit gates include the Pauli-X gate, the Pauli-Y gate, and the Pauli-Z gate. They can be represented by the unitary matrix: , , . The Controlled-NOT (CNOT) gate is a two-qubit gate that flips the second qubit (target) if the first qubit (control) is in the state. When a quantum gate acts on a quantum state , it transforms this state to another quantum state , according to the mathematical operation , where represents the unitary matrix associated with the quantum gate.
W2: The Atom stability and Molecule stability metric are compared in most 3D molecular generation paper, while this paper does not show this result.
It should be noted that the baselines consist of two categories. One category is the classic generation model for 3-D molecules. The other category includes the quantum model SQ-VAE and the hybrid model QGAN, which are limited to generating molecular graphs and cannot generate 3-D molecules. Therefore, we adopt the commonly used metrics Valid, Unique, Novel, which can be used for both 2-D and 3-D molecular generation, to evaluate the effectiveness of different types of molecular generation methods.
Moreover, Atom stability measures the proportion of atoms that have the right valency and Molecule stability measures the proportion of generated molecules for which all atoms are stable. In our case, all valid molecules are stable (which means here Molecule stability = Valid), thus here we only provide the atom stability further.
| Method | Atom stability () |
|---|---|
| Dataset | 99.0 |
| MLP-VAE (classical) | 88.6 |
| E-NFs (classical) | 85.0 |
| G-SchNet (classical) | 95.7 |
| G-SphereNet (classical) | 94.7 |
| EDM (classical SOTA) | 98.5 |
| SQ-VAE (quantum) | 86.2 |
| QGAN-HG (hybrid) | 90.2 |
| P2-QGAN-HG (hybrid) | 69.1 |
| QVAE-Mole (ours) | 94.3 |
We hope this response could facilitate your understanding of our work and ease your concern, looking forward to receiving your feedback soon.
Thank you for the rebuttal and clarification. I believe the paper merits recognition among peers in related communities. I have increased my rating.
Dear Reviewer,
Thank you for your response. We are truly appreciative of your positive feedback and the time you have dedicated to evaluating our paper. Your insights are extremely valuable to us, and we are dedicated to integrating the suggested improvements into the final version during the rebuttal phase. We are very grateful for your support and guidance.
Best regards
This paper introduces a Variational Autoencoder (VAE) with a von Mises-Fisher (vMF) latent space for 3D molecular (conditional) generation. This approach leverages the capabilities of quantum computing, particularly within the Noisy Intermediate-Scale Quantum (NISQ) era, to achieve efficient and effective molecular generation.
优点
The paper is well-structured, featuring efficient schematic diagrams and theoretical explanations.
缺点
Formulas 8 and 9 can be represented using more rigorous mathematical notation.
问题
- Please clarify the term "fully" and, if necessary, explain how it compares to its counterparts.
- Is the latent space defined as discrete [1]? If so, please provide a rationale for this choice.
[1] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).
局限性
The emphasis on the QM9 dataset may constrain the generalizability of the results to other molecular datasets or different types of data.
The current state of quantum machine learning
Due to the current limitations of quantum hardware, the number of layers and parameters in quantum methods are severely restricted. Quantum machine learning (QML) models, particularly quantum generative models, are still in their infancy compared to well-developed classical neural models with vast numbers of parameters. Consequently, the performance of current QML models may not yet match that of SOTA classical counterparts.
To support our above points, we collect the following facts:
-
The quantum versions of classical ML algorithms running on NISQ devices barely take SOTA classical algorithms as their baselines e.g. QCNN [1], QGAN [2], QLSTM [3], etc. On the one hand, conducting experiments on NISQ devices itself makes a significant contribution to the implementation of quantum algorithms. On the other hand, NISQ devices are difficult to obtain, and there is a significant gap between the physical qubit connectivity topology and quantum algorithm design, making it challenging to deploy quantum algorithms on NISQ devices. Additionally, running on NISQ devices faces the challenge of big quantum noise.
-
Quantum algorithms running on simulators included classical baselines, but rarely perform better than the classical baseline [4]. For example, [5] (NeurIPS 2020) proposed the quantum RNN for classification as evaluated on MNIST, with their results only achieving 94% accuracy (classical simple fully connected layer can achieve accuracy better than 95%). [6] (ICML 2023a) proposed the quantum molecular embedding algorithm and applied it to molecular property prediction, with their results showing nearly a 40% gap from the classical SOTA baselines. [7] (ICML 2023b) proposed a quantum Quadratic Assignment Problem (QAP) solver and applied it to the Traveling Salesman Problem (TSP), with their results falling short of the classical nearest insertion algorithm (A heuristic algorithm proposed in 1997).
The general limitation underscores the need for further research to address the deficiencies of NISQ devices, particularly regarding the practicality of real quantum computers and the challenges posed by quantum noise. Once these issues are resolved, we can increase the circuit depth and number of parameters in quantum models, potentially matching the performance of SOTA classical networks with vast numbers of parameters.
References
[1] Quantum convolutional neural networks. Nature Physics 2019.
[2] Experimental quantum generative adversarial networks for image generation. Physical Review Applied 2021.
[3] Quantum long short-term memory. ICASSP 2022.
[4] Better than classical? The subtle art of benchmarking quantum machine learning models, arXiv 2024.
[5] Recurrent quantum neural networks. NeurIPS 2020.
[6] Quantum 3D graph learning with applications to molecule embedding. ICML 2023.
[7] Towards quantum machine learning for constrained combinatorial optimization: a quantum QAP solver. ICML 2023
Thank you for acknowledging our work and your suggestions have been immensely helpful. Below is our detailed response.
W1: Formulas 8 and 9 can be represented using more rigorous mathematical notation.
Thanks for your suggestion and we will revise this.
The von Mises-Fisher (vMF) distribution is a probability distribution on the unit sphere in .
-
Formula 8 is written as where denotes the mean direction and denotes the concentration parameter, is commonly set as a constant during training. The normalization constant is equal to , where is the sample space .
-
Formula 9 is written as which represents that the latent variable is sampled from the distribution with mean direction in the latent space.
Q1: Please clarify the term "fully" and, if necessary, explain how it compares to its counterparts.
Thank you. "Fully" means our method only uses quantum parameters. The counterparts are the hybrid quantum-classical methods, which use a mix of classical parameters from classical model layers and quantum parameters from quantum model layers. The hybrid method is hard to deploy to the real quantum computer because it needs to communicate frequently between quantum devices and classical devices, which is time-consuming. Generally speaking, designing a hybrid model is relatively trivial and does not clarify the role of the quantum layer within the overall model, whereas designing an effective fully quantum model is challenging and innovative.
Q2: Is the latent space defined as discrete? If so, please provide a rationale for this choice.
Thanks, the latent space of QVAE is NOT defined as discrete. We adopt von Mises-Fisher (vMF) distribution as latent prior, which lies in a hyperspherical space and is continuous.
L1: The emphasis on the QM9 dataset may constrain the generalizability of the results to other molecular datasets or different types of data.
Thanks, in line with baselines {E-NFs, G-SchNet, G-SphereNet, SQ-VAE, QGAN}, we only choose QM9 dataset as our evaluation benchmark. However, our approach can further extend to other bigger datasets since the number of qubits in our proposed framework comes to ( denotes the number of atoms in one molecule).
Here we add experiments on a larger 3-D dataset named GEOM. Compared to QM9, GEOM stands out as a larger-scale dataset of molecular conformers, comprising 430,000 molecules, with up to 181 atoms and an average of 44.4 atoms per molecule . The molecules in this dataset exhibit larger sizes and more intricate structures.
In line with EDM, on GEOM, here we report the atom stability and the Wasserstein distance between the energy histograms of datasets and the generated molecules. Here we only report two baselines since the other baselines do not include experiments on GEOM dataset, we will leave replicating their models on the new dataset for future work due to the limited time for rebuttal.
| Method | Atom stability () | Wasserstein distance () |
|---|---|---|
| Dataset | 86.5 | 0 |
| MLP-VAE | 41.2 | 5.21 |
| EDM (classical SOTA) | 81.3 | 1.41 |
| QVAE-Mole (ours) | 69.1 | 3.12 |
It can be seen that our method outperforms MLP-VAE. Although our method falls short of EDM (the classical SOTA baseline ), this result demonstrates that our approach can achieve relatively reasonable generation results on datasets with larger and more complex data volumes. (For a discussion on the performance comparison between quantum algorithms and classical algorithms, see Supplement to L1 in the official comment below for details.)
We hope this response could address your concerns, looking forward to receiving your feedback soon.
Thanks for your rebuttal. I have swittched the rating.
Dear Reviewer,
Thank you for your response and for taking the time to review our paper. We greatly appreciate your positive evaluation and the confidence you have shown in our work. Your feedback is invaluable to us, and we are committed to incorporating the suggested improvements during the rebuttal phase into the final version of the paper. We sincerely appreciate your support and guidance.
Best regards
The authors introduce the first Variational Autoencoder (VAE) and Conditional Variational Autoencoder (CVAE) formulated entirely as parameterized quantum circuits (PQC), as opposed to hybrid methods that combine learnable parameters in quantum circuits with classical learnable parameters. In previous hybrid models, classical parameters were necessary to translate between the normalization constraint of quantum states, where the norm evaluates to unity, and latent normal distributions, which are not constrained by norm. The authors suggest using the von Mises-Fisher distribution on a hypersphere as prior latent space distribution, which automatically normalized the samples to unity. This approach allows them to formulate the entire VAE as a PQC.
To construct a CVAE, the authors encode conditions into the initial quantum states. Experiments on the QM9 dataset demonstrate that using the von Mises-Fisher distribution as a latent prior improves the performance of the proposed VAE. Furthermore, the use of the CVAE helps to align sample properties with the conditions, compared to unconditional sampling.
The authors compare the share of valid, unique, and novel VAE samples with classical and hybrid quantum methods. The proposed VAE outperforms the hybrid and two classical models but performs worse than three other classical models.
优点
-
The von Mises Fisher distribution as prior for the latent space is introduced. This distribution is preferable for quantum states since it satisfies the constraint that the norm of sampled vectors is unity.
-
On the quantum simulator, the proposed method can generate samples faster than all methods that have better metrics.
-
The authors compare the proposed method to many baselines, however, the description of the baselines lacks details.
缺点
-
Lack of novelty: The paper combines the 3D representation, PQC (parametric quantum circuit) and dimension reduction approach from 3D-QAE (https://arxiv.org/abs/2311.05604) and the variational Autoencoder from SQ-VAE (https://arxiv.org/abs/2205.07547) with the only major modification that the latent prior is chosen as the von Mises-Fisher distribution and that the 3D points can have a type (the element), which is one-hot encoded.
-
The data reported for the outperformed baseline SQ-VAE cannot be found in the respective paper, for E-NFs and QGAN (https://arxiv.org/pdf/2101.03438), the metrics are different than reported. It is not stated whether these models were retrained or if so, which hyperparameters were chosen or how this retraining could be reproduced for validation. The experiment section of the MLP-VAE paper does not contain any application to molecular or 3D data at all, and it is unclear how hyperparameters were chosen for the encoder and decoder networks for this specific task.
Since the method lacks major novelty, detailed comparison to the baselines and ablations are essential.
-
For the conditional generation, it is not stated how a) the properties like logP values of generated samples are obtained (this typically requires MD simulation or quantum chemical calculations) and b) How the equality of continuous properties is defined for Table 2: The evaluation here seems to be problematic. For example, the logP values in the violin plot in Figure 5 are negative for a significant portion, rounding all these values up, as described in the caption of Table 2, seems to be an arbitrary decision. This would mean that values are rounded to either 0, if negative, or 1, if positive, instead of rounding them to the nearest integer which would result in worse performance than the values reported in Table 2. c) The data does not support the results presented for logP in Table 2: An improvement for QCVAE-Mole from 2.6 to 45.6% for logP=1 is reported. We find that this is not supported by the data. The violin plot in Figure 5 shows very low support of the distribution for logP=1, and thus the data does not support the findings reported in Table 2.
-
It is not reported explicitly how the metrics validity, novelty and uniqueness are calculated.
-
Line 219: It is stated without proof or explanation that the KL term in the ELBO loss is constant. This does not seem to be the case since the KL divergence should also depend on the location of the mean and not only on the variance.
-
Line 180: “In common VAEs, both the prior and posterior are defined as normal distributions.”: The priors can be defined at wish, often one chooses normal distributions, the VAE does not fundamentally depend on this choice and alternative priors have been suggested in the literature.
-
Line 190: “where we take the absolute value of to transform it from the complex domain to the real domain.” How is this operation defined? If it is defined as taking the absolute value of each amplitude (each vector component in the qubit basis), the norm of the amplitude vector (not of the quantum state) might not be unity anymore.
-
Line 348: “We also observe that utilizing normal distribution leads to better performance in classic VAE, which proves classic data tends to follow a normal distribution.”: Here the use of the word “proves” is perhaps not the best choice, as the experiments do not necessarily “prove” this statement. How is “classic data” defined? The input data follows the data distribution, only the latent space follows a normal distribution, and which distribution is more suitable in latent space highly depends on the complexity of the encoder. Maybe rather: “which indicated that in the setting at hand, imposing a normal distribution as latent prior in comparison to the von Mises-Fisher distribution is beneficial for classical variational autoencoders.”
Technical remarks: 9. Line 289: “To the best of our knowledge, we are the first full quantum model” -> …, we propose …
问题
- How were the performance numbers for the baselines reported in Table 1 obtained? Were the baselines retrained and how were the hyperparameters chosen? Please also compare with the weaknesses section.
- Line 121: Why does the amplitude encoding allow to use the exponentially large Hilbert space in contrast to the angle encoding?
Post-rebuttal update: I raised my score to 4
局限性
The biggest limitation of the presented method is that it is outperformed by classical approaches, e.g. EDM as reported in Table 1. The paper currently lacks a sufficient discussion of this limitation and possible further disadvantages.
References:
[1] Limitations on Quantum Dimensionality Reduction, ICALP, 2011
[2] Quantum computation and quantum-state engineering driven by dissipation, Nature Physics, 2009.
[3] Quantum state reduction: Generalized bipartitions from algebras of observables, Physical Review A, 2020.
[4] Nonlinear transformations in quantum computation, Physical Review R, 2023.
[5] Quantum Autoencoders for Efficient Compression of Quantum Data, Quantum Science and Technology, 2017.
[6] Structure-based drug design with equivariant diffusion models[J]. arXiv, 2022.
[7] Molecular generative model based on conditional variational autoencoder for de novo molecular design[J]. Journal of cheminformatics, 2018
[8] 3d equivariant diffusion for target-aware molecule generation and affinity prediction[J]. arXiv, 2023.
[9] Quantum convolutional neural networks. Nature Physics 2019.
[10] Experimental quantum generative adversarial networks for image generation. Physical Review Applied 2021.
[11] Quantum long short-term memory. ICASSP 2022.
[12] Better than classical? The subtle art of benchmarking quantum machine learning models, arXiv 2024.
[13] Recurrent quantum neural networks. NeurIPS 2020.
[14] Quantum 3D graph learning with applications to molecule embedding. ICML 2023.
[15] Towards quantum machine learning for constrained combinatorial optimization: a quantum QAP solver. ICML 2023
Further discussion of "quantum advantage in the future" in molecule generation.
Computational approaches aim to sample from regions of the whole molecular and solid-state compounds called chemical space which could be on the order of larger than [1]. In this case, classical models will suffer from curse-of-dimensionality especially when facing large molecular systems [2]. However, A -qubit quantum system possesses Hilbert Space, which means that the space grows exponentially with the number of qubits. We are able to access a huge Hilbert space with fewer qubits, thus holding the promise of tackling large-scale molecular generation tasks.
Compared to the curse of dimensionality faced by classical algorithms, quantum algorithms only need to increase the number of qubits, and the scale of problems they can handle will grow exponentially. [3] involves quantum annealing for molecular optimization, which outperforms other molecular optimization methods, finding molecules with better properties in 1/20th to 1/10th of the time previously required. Furthermore, when we can have more than 100, or even 1000 qubits in the future, quantum algorithms will theoretically be able to simulate the massive molecular system [4,5]. Quantum computing will undoubtedly have advantages in terms of scale in the future, but its actual effectiveness remains to be verified, especially with the need for noise-tolerant hardware.
References:
[1] Quantum generative models for small molecule drug discovery[J]. IEEE transactions on quantum engineering, 2021.
[2] Mol-CycleGAN: a generative model for molecular optimization[J]. Journal of Cheminformatics, 2020.
[3] Q-Drug: a Framework to bring Drug Design into Quantum Space using Deep Learning[J]. arXiv:2308.13171, 2023.
[4] Progress toward larger molecular simulation on a quantum computer: Simulating a system with up to 28 qubits accelerated by point-group symmetry[J]. Physical Review A, 2022.
[5] Molecular quantum dynamics: A quantum computing perspective[J]. Accounts of Chemical Research, 2021.
Supplement to W5: Detailed proof for "the KL term in the ELBO loss is constant"
The vMF distribution is defined on a dimensional hypersphere, with its sample space as , and its probability density function is given by: where is a predefined parameter vector. As we can see, it is a distribution centered on across the space . A more common notation for the vMF distribution is where . When , the vMF distribution is uniform on the sphere.
vMF-VAE selects the uniform distribution on the sphere () as the prior , and chooses the posterior distribution as the vMF distribution: To simplify, here is a hyperparameter (the larger the , the more concentrated the distribution is around ), thus the only parameter of comes from . Now we can calculate the KL divergence term:
We know that the mean direction of the vMF distribution is aligned with , and its norm depends only on and . Substituting into the equation above, we know that the KL divergence term only depends on and . Once these two parameters (dimension and hyperparameter ) are determined, the KL divergence term becomes a constant (when , it is greater than 0).
Supplement to L1: The performance does not match that of classical methods.
Due to the current limitations of quantum hardware, the number of layers and parameters in quantum methods are severely restricted. Quantum machine learning (QML) models, particularly quantum generative models, are still in their infancy compared to well-developed classical neural models with vast numbers of parameters. Consequently, the performance of current QML models may not yet match that of SOTA classical counterparts.
To support our above points, we collect the following facts:
-
The quantum versions of classical ML algorithms running on NISQ devices barely take SOTA classical algorithms as their baselines e.g. QCNN [9], QGAN [10], QLSTM [11], etc. On the one hand, conducting experiments on NISQ devices itself makes a significant contribution to the implementation of quantum algorithms. On the other hand, NISQ devices are difficult to obtain, and there is a significant gap between the physical qubit connectivity topology and quantum algorithm design, making it challenging to deploy quantum algorithms on NISQ devices. Additionally, running on NISQ devices faces the challenge of big quantum noise.
-
Quantum algorithms running on simulators included classical baselines, but rarely perform better than the classical baseline [12]. For example, [13] (NeurIPS 2020) proposed the quantum RNN for classification as evaluated on MNIST, with their results only achieving 94% accuracy (classical simple fully connected layer can achieve accuracy better than 95%). [14] (ICML 2023a) proposed the quantum molecular embedding algorithm and applied it to molecular property prediction, with their results showing nearly a 40% gap from the classical SOTA baselines. [15] (ICML 2023b) proposed a quantum Quadratic Assignment Problem (QAP) solver and applied it to the Traveling Salesman Problem (TSP), with their results falling short of the classical nearest insertion algorithm (A heuristic algorithm proposed in 1997).
The general limitation underscores the need for further research to address the deficiencies of NISQ devices, particularly regarding the practicality of real quantum computers and the challenges posed by quantum noise. Once these issues are resolved, we can increase the circuit depth and number of parameters in quantum models, potentially matching the performance of SOTA classical networks with vast numbers of parameters.
Supplement to W2 & Q1: Baselines and Metrics "
Here we provide the detailed description of each baseline and we will further add this part to Appendix.
-
MLP-VAE: We implement the original vanilla VAE(https://arxiv.org/abs/1312.6114) using a three-layer perceptron for both the encoder and decoder. While the original paper does not cover molecular or 3D data, we use the same data preprocessing scheme as our proposed QVAE, without normalization (see Section 3.1 and lines 202-209 in our paper for details). To ensure a fair comparison, we maintain the same input data dimensions, hidden state dimensions, and training configuration as in our QVAE.
-
E-NFs,G-SchNet,G-SphereNet,EDM, QGAN: Each method provides its pretrained models in their respective Git repositories. We use their model checkpoints to generate molecule samples for evaluation. Specifically, since G-SchNet and EDM offer 10,000 generated molecules in their repositories, we directly use these samples for evaluation.
-
SQ-VAE(https://arxiv.org/pdf/2112.12563): This method does not provide its code. Since it also addresses the molecule generation task, we choose to replicate it on TorchQuantum based on the description in the original paper, including the quantum circuits, the quantum measurement scheme, and the training configuration. For fairness, we use the same number of qubits and quantum parameters as in our method for the replication.
It should be noted that the baselines consist of two categories. One category is the classic generation model for 3-D molecules. The other category includes the quantum model SQ-VAE and the hybrid model QGAN, which are limited to generating molecular graphs and cannot generate 3-D molecules. Therefore, we adopt the commonly used metrics Valid, Unique, Novel, which can be used for both 2-D and 3-D molecular generation, to evaluate the effectiveness of different types of molecular generation methods. Here we provide the detailed definition of each metric:
-
Valid: This metric measures the percentage of generated molecules that are chemically valid, which is defined as the percentage of molecular graphs which do not violate chemical valency rule.
-
Unique: This metric measures the ratio of unique molecules among the generated set. This metric ensures that the model is not generating the same molecule multiple times, promoting a variety of different structures.
-
Novel: This metric assesses the fraction of generated molecules that do not appear in the training data. A higher novelty score indicates that the model can generate new, previously unseen molecules, which is crucial for discovering new compounds.
Note that it is unreasonable to only consider novelty and uniqueness without validity ((https://arxiv.org/pdf/2203.17003) also points out this issue): like in the extreme case if the model’s validity is only 1%, but these valid molecules are all unique from each other and different from the training set, resulting in 100% for both uniqueness and novelty. Thus, we adopt Unique×Valid and Novel×Valid as metrics instead.
Supplement to W3-a: How to calculate properties like logP.
We use rdkit to compute the property SA, QED, and logP of each generated molecule. Specifically, here we use
from rdkit.Contrib.SA_Score import sascorer
from rdkit.Chem.QED import qed
from rdkit.Chem import AllChem, Crippen
SAscore = sascorer.calculateScore(mol)
SAscore = round((10-SAscore)/9,2)
QEDscore = qed(mol)
logP = Crippen.MolLogP(mol)
For homo-lumo gap, we use the method in https://github.com/divelab/DIG/blob/dig-stable/dig/ggraph3D/utils/eval_prop_utils.py.
Thanks for your review and feedback, below is our detailed response. Due to word count limitations, part of the answer and references are included in the Official Comment below.
W1: About novelty (SQ-VAE (https://arxiv.org/abs/2205.07547) and 3D-QAE).
Thanks, and we would like to humbly point out that you may have some misunderstandings that may affect the interpretation of our work.
-
First, the SQ-VAE mentioned in our paper, which also serves as one of our baselines, is the one that proposes a scalable quantum generative autoencoder (SQ-VAE) (https://arxiv.org/pdf/2112.12563). It is not the same paper as you referred to. The paper you referred to neither has any quantum components nor targets on molecule generation, so we neither include it as our related work nor baseline.
-
Second, the dimension reduction approach is not from paper 3D-QAE. Instead, tracing out a subsystem is a common technique used to reduce the dimensionality of a quantum state [1~3]. Moreover, it has already been applied in many fields. For example, [4] uses it to implement nonlinear transformations of quantum states, and [5] utilizes it to efficiently compress data.
For the position and contribution of our work, please see general response above.
W2 and Q1: Questions of baselines.
As we mentioned above, the SQ-VAE referenced in our paper is not the same as the paper you referred. And we give detailed description of the setting of baselines, see Supplement to W2 & Q1 in official comment below.
W3-a: How to calculate properties like logP.
In line with many recent works in AI4drug [6~8], We utilize external chemical libraries, especially rdkit.Chem, see Supplement to W3-a in official comment below for details.
W3-b: How equality of continuous properties is defined.
We believe there are some misunderstandings regarding the experimental setup, and we apologize for any confusion. In the section Single Condition, we trained four different models, each targeting SA, QED, LogP, and the homo-lumo gap, respectively. Each column in Table 2 indicates the percentage of molecules whose properties, when rounded up, match the specified condition. For instance, for the condition logP = 0.0, the results show that 57.8% of generated molecules have logP values within the range [-0.5, 0.5), rather than 0 for negative values. Similarly, the data in the logP = 1.0 column reflects that using logP = 1.0 as the input condition, 45.6% of generated molecules have logP values in the range [0.5, 1.5). Table 2 aims to demonstrate that the QCVAE model can increase the proportion of generated molecules with desired properties when given a single condition. We will include details of the experimental setup in our revised paper.
W3-c: The violin plot does not support the results presented for logP in Table 2.
We believe there are still some misunderstandings. First, we want to clarify that Figure 5 and Table 2 report results from two different experiments. Table 2 presents the results under a single condition, while Figure 5 shows results for multiple conditions given simultaneously. In the section Multiple Conditions, we train and evaluate the proposed QCVAE-Mole under multiple conditions, meaning we simultaneously provide the four properties: SA, QED, LogP, and gap. Details can be found in lines 335-341 of our paper.
W4: It is not reported explicitly how the metrics validity, novelty and uniqueness are calculated.
Thanks for the suggestion, we will add this part, see Supplement to W2 & Q1 in official comment below.
W5:: It is stated without proof or explanation that the KL term in the ELBO loss is constant.
Thanks, here we provide the detailed proof, see Supplement to W5 in official comment below.
W6: Writing suggestion for Line 180.
Thanks, we will modify this according to your suggestion.
W7: Line 190: “where we take the absolute value of to transform it from the complex domain to the real domain.” How is this operation defined?
We perform the transformation by computing the absolute value of the complex number, which is specifically implemented as for any complex number of the form . This operation ensures that the norm of the amplitude vector is still unity.
W8: Question with Line 348:
Thanks for the the suggestions. We admit that using "prove" here is not appropriate, and we will modify this.
Q2: Amplitude encoding vs. angle encoding.
- Angle encoding represents classical data through rotation quantum gates on individual qubits. For a vector , each component is used to parameterize a rotation, such as .
- Amplitude encoding can take full advantage of the exponential size of the Hilbert space associated with quantum states. Given a vector , where , it can be encoded as the quantum state . Here, each represents the amplitude of the corresponding basis state .
- As we can see, for a -qubit quantum system, using amplitude encoding can encode dimensional classical data, while using angle encoding can only encode dimensional classical data.
L1: The performance does not match that of classical methods.
Please refer Supplement to L1 for detailed discussion.
We hope this response could clear up your misunderstanding and address your concerns. We believe that this work contributes to the quantum ML community and also marks a step towards integrating quantum ML with science. We would sincerely appreciate it if you could reconsider your rating and wish to receive your further feedback soon.
Dear Reviewer CAJp,
Thank you very much again for your feedback and the efforts you have put into reviewing our work.
As the discussion window nears its end, we eagerly anticipate your further feedback on our submission. We have already received positive responses from the other three reviewers. In particular, Reviewer hKnP and 5Ebq have raised their score to 7. We hope that our responses can also address your concerns and clarify potential misunderstandings. If there is any more clarity you seek or further discussion you would like to have, we are here to respond in the time remaining. Should our clarifications meet your expectations, we would be truly grateful for your reconsideration of the score.
With gratitude,
Authors
Theoretical contribution lacks novelty
In the 3D QAE paper mentioned ("Rathi et al.: Fully Quantum Auto-Encoding of 3D Point Clouds"), a fully-Quantum AE for 3D point clouds is introduced. The embedding of the initial state used in the paper at hand (lines 122-132) closely resembles this approach, with the addition that the atom type is one-hot encoded. Also the model architecture seems to be very similar, pointing out what parts exactly are different would benefit the paper. Thus, it seems that the presented approach can be summarized as making the fully-Quantum 3D AE from Rathi et al. a fully-Quantum 3D CVAE.
Regarding the SQ-VAE paper: Sorry for the confusion, the link was pointing to a paper with another approach with the same name. Our remarks were referring to the SQ-VAE paper by Li et al. that you mentioned. There, a Quantum Variational Autoencoder for molecules is already introduced. It uses optional fully-connected layers that render it 'not fully-Quantum'. It does not act on 3D coordinates and uses a Gaussian latent distribution.
Thus, I think that the work combines the two approaches: The fully-Quantum model architecture and embedding from Rathi et al. and the Quantum VAE from Li et al., with the normal latent distribution replaced by the vMF distribution. It would benefit the paper to mention this more explicitly.
More extensive ablations
As the theoretical novelty of the work is limited, extensive ablations and detailed explanations about how the values for baselines are obtained are required, e.g. model architectures, retraining procedure, hyperparameter optimization for the specific task.
Since the presented approach does not outperform classical baselines, it is required to discuss in detail why the contribution is still valuable. It is not guaranteed that QML approaches will automatically become more powerful when quantum computers become better. What has to happen concretely such that the QML approach will actually be better than classical ML? Are more and less noisy qubits enough or does one also need a larger dataset or larger molecules to really leverage the quantum advantage and thus be of any practical use? This could have been shown with ablation studies for trends (performance in dependence of the size of the dataset/molecule/number of qubits).
W3-b: Results of conditioned sampling This explanation makes the meaning of the table's entries clearer. It would be very important to add the precise definition of the range (e.g. [-0.5,0.5] for LogP=0) for the other conditions (SE, QED...) as well, and to explain why this range was chosen (length 1 is arbitrary, why not e.g. [-0.1,0.1]?), and also to report novelty and diversity for these four models as done in Table 1. There could be a mode collapse towards similar samples, especially when a condition is given.
On the proof of constance of KL
By reading [1], where constance of the KL term is shown as well, it became clear to me that the notation denotes the uniform distribution on the hypersphere, i.e. without any mean direction. In mathematics, the dot often denotes a function argument, thus it would be important to clarify this notation.
Conclusion
The additional explanations on the results and baselines make this part clearer to me, however, as mentioned above explanations on how the seemingly arbitrary thresholds were chosen would benefit the paper and make the results more convincing. Since I still regard the theoretical novelty as limited and it is unclear and not discussed whether the approach is scalable in the sense that it might outperform classical baselines with improvements in quantum computing, more extensive ablations would be necessary to accept the paper for NeurIPS. I raised my score to 4.
[1] Xu and Durret, Spherical Latent Spaces for Stable Variational Autoencoders, 2018. (https://arxiv.org/pdf/1808.10805)
Regarding the ablation studies for trends.
In line with baselines {E-NFs, G-SchNet, G-SphereNet, SQ-VAE, QGAN}, we only choose QM9 dataset as our evaluation benchmark. However, our approach can further extend to other bigger datasets since the number of qubits in our proposed framework comes to ( denotes the number of atoms in one molecule).
Here we add experiments on a larger 3-D dataset named GEOM. Compared to QM9, GEOM stands out as a larger-scale dataset of molecular conformers, comprising 430,000 molecules, with up to 181 atoms and an average of 44.4 atoms per molecule . The molecules in this dataset exhibit larger sizes and more intricate structures. We use 11 qubits for this dataset (7 for QM9).
In line with EDM, in this benchmark, here we report the atom stability and the Wasserstein distance between the energy histograms of datasets and the generated molecules. Here we only report two baselines since the other baselines do not include experiments on GEOM dataset. Due to the very limited time remaining for rebuttal, we will leave replicating their models on the new dataset for future work since this needs great efforts to do the adaption and retrain.
| Method | Atom stability () | Wasserstein distance () | Time (s) |
|---|---|---|---|
| Dataset | 86.5 | 0 | - |
| MLP-VAE | 41.2 | 5.21 | 0.07 |
| EDM (classical SOTA) | 81.3 | 1.41 | 1.32 |
| QVAE-Mole (ours) | 69.1 | 3.12 | 0.12 |
It can be seen that our method outperforms MLP-VAE. Although our method falls short of EDM (the classical SOTA baseline), this result demonstrates that our approach can achieve relatively reasonable generation results on datasets with larger and more complex data volumes. Additionally, even when running on a simulator, our method is faster in terms of generation time. It should be noticed that our generative model is a VAE versus a diffusion model with only a few hundred quantum parameters compared to EDM. Although the experimental results do not demonstrate a definite quantum advantage on larger-scale data, we believe that, as quantum generative models are still in their infancy stage and we are the early explorers in this field, these results are in line with expectations.
W3-b: add the precise definition of the range of each condition, explain why this range was chosen and report the novelty and diversity.
Thanks, the choices of the range are in line with "MGCVAE: Multi-Objective Inverse Design via Molecular Graph Conditional Variational Autoencoder"[1], where they also set the condition range of logP as 1 ([-0.5, 0.5), [0.5,1.5)), the range of SA and QED as 0.1 ([0.25,0.35), [0.35,0.45) ...). Although they did not provide a detailed explanation for such a setting, we think a reasonable explanation is: that the condition range is determined by the total range of the property. For most molecules in QM9, LogP property ranging from [-6,5], the SA score and QED properties are both ranging from [0,1), with the gap property ranging from [2,12]. The condition range could be about 1/10 of the entire range. We will add the precise definition of the range in the revised version according to your suggestion.
In addition, We further report the Valid, Unique, and Novel metrics in the condition generation. Here Unique* and Novel* means Unique×Valid and Novel×Valid respectively, we have mentioned the reason above.
| Condition | SA = 0.4 | SA = 0.5 | QED = 0.3 | QED = 0.4 | logP = 0.0 | logP = 1.0 | gap = 3.0 | gap = 4.0 |
|---|---|---|---|---|---|---|---|---|
| Range | [0.35, 0.45) | [0.45, 0.55) | [0.25, 0.35) | [0.35, 0.45) | [-0.5, 0.5) | [0.5, 1.5) | [2.5, 3.5) | [3.5, 4.5) |
| QVAE | 29.8 | 19.8 | 40.2 | 52.5 | 49.8 | 2.6 | 0.1 | 3.1 |
| QCVAE | 44.1 | 23.4 | 42.8 | 75.2 | 57.8 | 45.6 | 6.4 | 22.7 |
| 14.3 | 3.6 | 2.6 | 22.7 | 8.0 | 43.0 | 6.3 | 19.6 | |
| Valid, Unique*, Novel* (QVAE) | 78.1%, 27.4%, 57.4% | 78.1%, 27.4%, 57.4% | 78.1%, 27.4%, 57.4% | 78.1%, 27.4%, 57.4% | 78.1%, 27.4%, 57.4% | 78.1%, 27.4%, 57.4% | 78.1%, 27.4%, 57.4% | 78.1%, 27.4%, 57.4% |
| Valid, Unique*, Novel* (QCVAE) | 68.3%, 20.9%, 53.3% | 74.4%, 23.1%, 54.3% | 75.2%, 26.2%, 55.2% | 82.3%, 20.4%, 34.0% | 77.3%, 20.2%, 57.2% | 61.8%, 20.5%, 29.8% | 80.2%, 17.3%, 48.7% | 65.1%, 21.5%, 29.6% |
It can be observed that when given conditions, the valid metric remains relatively stable, fluctuating between 70% and 80%. The unique* metric shows a slight decline, but overall, it aligns with expectations, which may also be due to the fluctuations in the valid metric. However, we observe that when the proportion of generated molecules with desired properties increases significantly, there is a noticeable drop in the novel* metric. This may be because the model is overly reliant on the conditional input, causing it to generate molecules that closely match typical examples in the training data, thereby neglecting its ability to generate novel structures.
[1] Lee M, Min K. MGCVAE: multi-objective inverse design via molecular graph conditional variational autoencoder[J]. Journal of chemical information and modeling, 2022.
In mathematics, the dot often denotes a function argument, thus it would be important to clarify this notation.
Thanks for the suggestion, we will clarify this notation in the revised version.
Thank you for the advice and the opportunity to engage in this valuable discussion. We understand your concerns, and we have made our best effort to address these issues and provide additional ablation studies. We hope our response could ease your concerns and we will include all these experimental results and discussions in our final version. We would sincerely appreciate it if you could reconsider your rating. As the discussion period is coming to a close, we may not have enough time to respond to your further questions, sorry for that.
Thanks for acknowledging our efforts and the advice for more extensive ablations. Below we respond to your comments.
I think that the work combines the two approaches. It would benefit the paper to mention this more explicitly.*
Thank you for your suggestion. We admit our work draws inspiration from these related works, and we have already mentioned and cited them in line 34-44 and line 124. Here, we further highlight the connections and differences between the 3D-QAE and the SQ-VAE, and incorporate these points into our revised paper.
Firstly, regarding the 3D-QAE, the similarity lies in the use of amplitude encoding to encode 3D information into quantum states. We acknowledge that but it is important to note that amplitude encoding is also a common operation in quantum information processing. The fundamental difference between our work and 3D-QAE is that the quantum AE is primarily designed for information compression, whereas our QCVAE possesses generative capabilities, which are attributed to the design of our intermediate latent space and the resampling module. In addition, the Parameterized Quantum Circuits(PQC) we used are entirely different. In 3D-QAE, a simple PQC and its inverse form the encoder and decoder. However, we found that directly using the inverse parameters of encoder as decoder may even harm the performance of our quantum VAE. Thus, in our method, we designed a different hardware-efficient PQC, where even though the circuit structures of the encoder and decoder are the same, the quantum parameters they contain are independently optimized.
Secondly, regarding the SQ-VAE, we share a similar conceptual goal utilizing quantum VAEs for molecular generation, but our methodological framework is fundamentally different. Specifically, our input and output are tailored for 3D molecular structures, reflected in the encoding of input information. And the sampling of latent space variables and the final measurement method of the quantum circuit are also different. Moreover, we propose a full quantum neural network capable of multi-conditional control as encoder/decoder while SQ-VAE uses a hybrid quantum-classical layer. In defining the latent variable space, we employ the von Mises-Fisher (vMF) distribution to harness the inherent properties of quantum states, while they simply Imitate a classical VAE using a Gaussian distribution without providing any additional insight. We would like to emphasize the importance of our quantum neural network with multi-conditional control and the vMF latent space tailored to quantum characteristics.
We will add the above-detailed comparison with the existing 3D-QAE and SQ-VAE to our revised manuscript. Thank you again for your suggestion.
What has to happen concretely such that the QML approach will actually be better than classical ML?
We think the current factors hindering the further improvement of QML performance are: (1) The limited resources of simulators based on classical computers. (2) The significant quantum noise present in the available quantum hardware. Based on (1)(2), the depth of quantum circuits and the number of quantum parameters in current QML methods are significantly constrained. Once these issues are resolved, we can increase the circuit depth and number of parameters in quantum models, potentially matching the performance of SOTA classical networks with vast numbers of parameters.
The paper aims to realize 3D molecule generation on quantum hardware and proposes quantum parameter circuits for 3D molecule generation. The 3D coordinates and atomic types are explicitly encoded as the initial quantum state and input into the network. The paper selects the classic generative model Variational Autoencoder (VAE) to encode the quantum state into the latent space and decode the samples to generate novel molecules, and names the proposed architecture QVAE-mole and QCVAE-mole for conditional generation. In order to inherently meet the limitations of the quantum system, in the proposed architecture, the Von Mises-Fisher (vMF) distribution replaces the normal distribution used by VAE. The paper conducts experiments on the QM9 dataset and compares the results with classical and quantum methods. The results show that the proposed method achieves a good balance between performance and speed.
优点
- The paper proposes a method to encode atom type, geometric information, and constrained unit form information into the initial quantum state through amplitude encoding. The paper briefly explains the reasons of choosing amplitude encoding instead of angle encoding.
- The paper proposes to use vMF in a hyperspherical space, which can inherently satisfy the constraints of quantum systems. Ablation studies show that architectures with vMF distribution perform better than normal distribution except the metric Novel x Valid.
- The paper verifies the effectiveness of single condition generation from four properties: SA, QED, logP and gap, and multiple conditions generation. The trained conditional generation architecture performs better than the random generation of QVAE-Mole, which shows the effectiveness of conditional generation.
缺点
- Compared with classical molecule generation methods, the performance of the proposed method is still lag behind. For the metric Unique x Valid, the proposed method only outperforms the MLP-VAE proposed in 2013. For the metric Novel x Valid, the performance of proposed method is still lower than most classical methods.
- Compared with quantum methods QGAN-HG and P2-QGAN-HG, the speed of the proposed method is still lag behind.
- It is unclear what Valid, Unique and Novel mean and how to define these metrics. The ambiguity of the evaluation metrics will lead to unfair comparisons.
问题
- Why do the methods proposed in Table 1 have zero classical parameters? Would moving some quantum parameters to classical parameters speed up the model?
- In line 170:" To convert to latent space, here we discard the information contained in the subsystem via tracing out the state of quits. ". Why would you do this? What's the intuition behind the compression method mentioned in [1]?
- In line 348: " ..., which proves classical data tends to follow a normal distribution.". Do VAE and QVAE use different datasets? Please clarify this.
- In Figure 6 a), what is the difference between N-VAE and N-QVAE besides the quantum implementation? Why can N-QVAE generate more effective molecules than classical methods?
- What role do the four loss functions and fidelity loss in the Appendix D section play in the design? What is the contribution of each loss? Are there any hyperparameters to balance the losses?
[1]. Quantum autoencoders for efficient compression of quantum data.
局限性
The paper addresses the limitations of the hardware in Section 5. However, some limitations should be explicitly pointed out:
- How to improve the performance to obtain similar performance to classical methods.
- How to accelerate the method to make it as fast as other quantum-based methods.
Q5: What role do the four loss functions and fidelity loss in the Appendix D section play in the design? What is the contribution of each loss? Are there any hyperparameters to balance the losses?
Thanks, our classic loss function consists of four parts: 3-D coordinate loss , atomic classification loss , constraint loss , and auxiliary loss . Together, they form the reconstruction loss, which indicates the true physical meaning of the information. Specifically, supervises the reconstruction of the molecule 3-D position by geometric distance error, and supervises the reconstruction of atom types by weighted cross entropy. is used to constrain the sum of the probability of all atom types for each atom to be the same, and we use MSE loss here. Since we add padding entries to the input data, so is designed to supervise the reconstruction of these entries of zero.
On the other hand, the design of fidelity loss does not consider the real physical meaning of the output quantum vector, but instead treats the input and decoder output of the encoder as two quantum states, and then designs the loss by calculating the fidelity between them.
In our paper, we experimentally compared using only classical loss or fidelity loss. Regarding the four components of the classical loss, there are indeed hyperparameters to balance them. We can set , and the final loss becomes . The best hyperparameter configuration can be determined through grid search. Due to limited time during the rebuttal period and the time-consuming nature of grid search, we plan to include this part in the appendix as an ablation study in the future.
L1: 1.How to improve the performance to obtain similar performance to classical methods. 2. How to accelerate the method to make it as fast as other quantum-based methods.
-
The general limitation underscores the need for further research to address the deficiencies of NISQ devices, particularly regarding the practicality of real quantum computers and the challenges posed by quantum noise. Once these issues are resolved, we can increase the circuit depth and number of parameters in quantum models, potentially matching the performance of SOTA classical networks with vast numbers of parameters.
-
As previously mentioned, the faster simulation speed of other quantum methods on classical computers does not necessarily indicate better performance. (see the answer to W2 for details)
We hope this response could answer your questions and address your concerns, looking forward to receive your further feedback soon.
Reference:
[1]. Quantum autoencoders for efficient compression of quantum data
[2]. Chapter 14 of Deep learning[M]. MIT press, 2016.
[3]. Wikipedia of autoencoder https://en.wikipedia.org/wiki/Autoencoder#cite_note-:12-1
[4]. Nonlinear principal component analysis using autoassociative neural networks[J]. AIChE journal, 1991.
Thanks for your rebuttal and clarification. My concerns about the evaluation (W3 and Q4) and the loss function (Q5) still exist.
-
For the evaluation metrics, the explanation is similar to the description in Appendix G). Please provide details about the methods and tools you used to calculate these metrics, and analyze in detail why your method outperforms the classical methods. My concerns about the experimental results in Table 1 still exist.
-
For the loss function, why do you use both and at the same time? How to distinguish the padding part of the input and output molecules?
How to distinguish the padding part of the input and output molecules?
Since the dimensions of both our input and output must be ( is the number of qubits), and the entry obtained by encoding atomic information is (where is the number of atoms and is the number of atomic types), we add padding entries to fill the remaining positions. During training, the number of atoms is known, so the last positions in the output vector are also padding entries. (For generation, there are some differences since is arbitrary; please refer to lines 202-209 in the paper for details.)
We hope this response could answer your questions and address your concerns, looking forward to receiving your further feedback soon.
Thank you, we deeply appreciate your time and effort in reviewing our paper. Below is our detailed response.
For the evaluation metrics, the explanation is similar to the description in Appendix G). Please provide details about the methods and tools you used to calculate these metrics, and analyze in detail why your method outperforms the classical methods. My concerns about the experimental results in Table 1 still exist.
W3: It is unclear what Valid, Unique and Novel mean and how to define these metrics.
Thanks. As for Valid, we directly use the method in https://github.com/divelab/DIG/blob/dig-stable/dig/ggraph3D/utils/eval_validity_utils.py, which is implemented based on the RDKit tool. This method constructs chemical bonds based on the distances between atoms, then evaluates whether the bonds violate the chemical valency rules to calculate Valid. Moreover, this method can convert a molecule from its atom types and 3-D coordinates to Canonical SMILES (Simplified Molecular Input Line Entry System, which is a linear string notation used to represent chemical molecules and this notation is unique.) After the generated molecules are represented by Canonical SMILES, we can calculate the Unique metric by checking whether the SMILES strings of any two molecules are identical or not. Similarly, we can convert the training data to Canonical SMILES, then calculate the Novel metric by verifying whether the generated molecule's SMILES string is identical to any string in the dataset. It should be noted that the methods and tools we use here are in line with baselines {G-SchNet, G-SphereNet, EDM} as well as many recent works in AI4Drug [1~3].
Formally, let the set of generated molecules be denoted as , and the set of Canonical SMILES strings for molecules that pass the validity check mentioned above be denoted as . Denote the set of Canonical SMILES strings of the training data molecules as , then:
- Valid =
- Unique =
- Novel =
Note that it is unreasonable to only consider novelty and uniqueness without validity ((https://arxiv.org/pdf/2203.17003) also points out this issue): like in the extreme case if the model’s validity is only 1%, but these valid molecules are all unique from each other and different from the training set, resulting in 100% for both uniqueness and novelty. Thus, we adopt Unique×Valid and Novel×Valid as metrics instead.
[1] Structure-based drug design with equivariant diffusion models[J]. arXiv, 2022.
[2] Geometric latent diffusion models for 3d molecule generation[C]. ICML, 2023.
[3] 3d equivariant diffusion for target-aware molecule generation and affinity prediction[J]. ICLR, 2023.
Q4: In Figure 6 a), what is the difference between N-VAE and N-QVAE besides the quantum implementation? Why can N-QVAE generate more effective molecules than classical methods?
From a theoretical perspective, it is known that quantum mechanics can produce atypical patterns in data, i.e., quantum mechanics can produce statistical patterns that are computationally difficult for a classical computer to produce [4]. From an experimental perspective, we found that compared to N-VAE, N-QVAE can generate relatively reasonable 3-D coordinates, ensuring that the distances between atoms fall within the range of chemical bond lengths. However, the atom numbers and atom types generated by N-QVAE are relatively homogeneous, resulting in lower scores for novel and unique compared to N-VAE. On the other hand, the S-QVAE proposed in the paper, which uses a spherical latent space, can generate relatively diverse atom numbers and types while maintaining reasonable 3-D coordinates distribution, thus outperforming several classical methods like MLP-VAE. Thanks for your suggestion, we will include a more detailed analysis in the revised version.
[4] Quantum machine learning, Nature, 2017.
For the loss function, why do you use both and at the same time?
Thanks. In the input vector, the atomic type information is encoded using a one-hot representation, which is then concatenated. Due to the normalization requirement, this one-hot encoding is transformed to . As for the output vector, can only supervise the distribution of atomic types for each generated atom. However, we design to recover the atomic type information for the entire molecule from the output quantum vector, which is also strictly normalized. Therefore, we want the sum of the probabilities for all atom types for each atom to the expected value of , ensuring consistency between the input and output vectors. To achieve this, we further introduce .
Thank you for acknowledging our work. Your rating has provided us with great encouragement, and your detailed feedback and suggestions have been immensely helpful. Below is our detailed response. Due to word count limitations, part of the answer and references are included in the Official Comment below.
W1: Compared with classical molecule generation methods, the performance of the proposed method is still lag behind.
Thanks. Since quantum machine learning technology is still in its infancy and quantum computers have not yet reached a mature stage, most quantum machine learning methods currently cannot surpass the existing SOTA classic methods. See (1) Current state of Quantum machine learning in general response for detailed discussion.
W2: Compared with quantum methods QGAN-HG and P2-QGAN-HG, the speed of the proposed method is still lag behind.
Table 1 lists the runtime of all methods on classical computers, where quantum methods are tested using a classical quantum simulator. It is well known that quantum algorithms cannot be efficiently simulated by classical computers, so the runtime comparison on a classical computer for different types of quantum algorithms may be unfair in some sense. We here report the speed to just provide a reference and to demonstrate that, even on simulators, our methods have a speed advantage compared with the SOTA classic model.
QGAN-HG and P2-QGAN-HG indeed show higher efficiency on classical computers, this is because they are hybrid classical-quantum methods, and they contain only a small number of quantum parameters. However, our approach is fully quantum circuits with fully quantum parameters, which need more time to simulate the execution of quantum circuits. We designed fully quantum parameters to enable deployment on real quantum computers, as incorporating classical parameters would result in significant time consumption due to communication between quantum and classical devices.
W3: It is unclear what Valid, Unique and Novel mean and how to define these metrics.
Thanks for the suggestion, we will add the detailed definition of each metric to our paper.
-
Valid: This metric measures the percentage of generated molecules that are chemically valid, which is defined as the percentage of molecular graphs which do not violate chemical valency rule.
-
Unique: This metric measures the ratio of unique molecules among the generated set. This metric ensures that the model is not generating the same molecule multiple times, promoting a variety of different structures.
-
Novel: This metric assesses the fraction of generated molecules that do not appear in the training data. A higher novelty score indicates that the model can generate new, previously unseen molecules, which is crucial for discovering new compounds.
Q1: Why do the methods proposed in Table 1 have zero classical parameters? Would moving some quantum parameters to classical parameters speed up the model?
As we mentioned earlier, using classical parameters would hinder the deployment of algorithms on quantum computers because the hybrid model requires frequent communication between quantum and classical devices, resulting in significant time costs. Generally speaking, designing a hybrid model is relatively trivial and does not clarify the role of the quantum layer within the overall model, whereas designing an effective fully quantum model is challenging and innovative.
Q2: In line 170:" To convert to latent space, here we discard the information contained in the subsystem via tracing out the state of quits. ". Why would you do this? What's the intuition behind the compression method mentioned in [1]?
Thanks, [1] aims to propose a quantum AE, and traditionally, AEs are used for dimension reduction or feature learning [2,3,4]. In classical neural networks, dimensionality reduction is straightforward, as it can be achieved with a linear layer. However, dimensionality reduction can not be achieved by Parameterized Quantum Circuits, since these operations involve unitary matrix multiplication, which preserves dimensionality. In quantum systems, a common method for dimensionality reduction is through the loss of information by discarding certain qubits, achieved via the tracing-out operation. This approach allows us to effectively extract information about the quantum subsystem.
Q3: In line 348: " ..., which proves classical data tends to follow a normal distribution.". Do VAE and QVAE use different datasets? Please clarify this.
Thanks for pointing this out. Indeed, VAE and QVAE use the same dataset, but the input data for QVAE undergoes additional normalization to meet the requirements of a quantum system. Here, we want to convey that when dealing with data without normalization, imposing a normal distribution as the latent prior, compared to the von Mises-Fisher distribution, is beneficial for classical variational autoencoders. We acknowledge that using the word 'prove' here is not appropriate and might cause some misunderstanding. We will revise this.
Q4: In Figure 6 a), what is the difference between N-VAE and N-QVAE besides the quantum implementation? Why can N-QVAE generate more effective molecules than classical methods?
Thank you. In N-QVAE, the input data undergoes additional normalization to meet the requirements of a quantum system. Additionally, as you mentioned, N-QVAE uses Parameterized Quantum Circuits instead of classical neural networks for the encoder and decoder. The input dimension, latent dimension, latent prior, loss function, and training strategy are consistent with those used in N-VAE. N-QVAE can generate more effective molecules, possibly due to the advantages of Parameterized Quantum Circuits over a simple multi-layer perceptron.
General Response by Authors
We express our gratitude to all the reviewers for dedicating their time and providing valuable comments. They acknowledged that our work is well-structured (VsUh, 5Ebq, hKnP), contributive (VsUh, hKnP), effective (VsUh, 5Ebq), and presents a novel approach (hKnP). While the overall feedback from the reviewers is positive, reviewer CAJp has some reservations regarding the novelty and performance of this paper, as well as the lack of detailed descriptions of baselines. To clarify potential misunderstandings that might affect the evaluation, we first restate the position and the contribution of our work within the field of quantum machine learning.
The position of this work is to explore a quantum version of VAE for 3-D data generation (which is the first time in literature to our best knowledge), especially for molecule generation, with potential supremacy on future quantum computers. Like many works in the field of quantum ML, e.g. QCNN [1], QGAN [2], and QLSTM [3], we follow the architecture of its classic design, the VAE in our case. Though our paper gets some inspiration from other works and incorporates common techniques in quantum machine learning, proposing a quantum counterpart as well as its detailed quantum circuits compatible with NISQ devices is still highly nontrivial. Here we list the following efforts as contributions:
-
We propose the first fully (to our best knowledge) quantum VAE for 3-D data generation and its detailed quantum circuits compatible with NISQ devices. For the generated quantum vector, we fulfill its inherent and strict normalization requirement, via the von MisesFisher (vMF) distribution in a spherical latent space. In addition, we provide the theoretical analysis of the expressive power of our designed quantum circuit.
-
To our best knowledge, our method presents the first quantum conditional VAE framework and is capable of conditional 3-D molecule generation. This is attributed to two main factors: 1) We designed a quantum state encoding scheme specifically for 3-D molecular data, ensuring maximum preservation of the original information. 2) By employing angle encoding, we integrated conditional vectors into our proposed QVAE framework for training and generation. This approach endowed our model with conditional generative capabilities, enabling it to learn more specific 3-D molecular representations under given conditional information.
-
We carefully conducted all the experiments in a TorchQuantum-based simulation environment in line with many QML works [4~6]. Extensive experimental results demonstrate that our model outperforms all other quantum (or hybrid) methods and delivers comparable results when compared to several classical methods.
In the following response, we provide detailed answers to all the questions and comments point-by-point. In particular, we have provided the details of the baselines in the official comment to reviewer CAJp. We deeply appreciate the suggestions for improving this paper. If you have any further questions, please let us know so that we can provide a timely follow-up response.
References:
[1] Quantum convolutional neural networks. Nature Physics 2019.
[2] Experimental quantum generative adversarial networks for image generation. Physical Review Applied 2021.
[3] Quantum long short-term memory. ICASSP 2022.
[4] Recurrent quantum neural networks. NeurIPS 2020.
[5] Quantum 3D graph learning with applications to molecule embedding. ICML 2023.
[6] Towards quantum machine learning for constrained combinatorial optimization: a quantum QAP solver. ICML 2023
This work proposes a quantum VAE method to generate molecular configurations in 3D. The majority of the concerns reviewers initially had were resolved during the rebuttal process, and my impression is overall the strengths overweight the weaknesses, hence the recommendation of acceptance. The authors should account for these discussions in a revision.