MOFDiff: Coarse-grained Diffusion for Metal-Organic Framework Design
We develop a generative model for metal organic frameworks using a coarse-grained diffusion approach to discover carbon capture materials and validate them through molecular simulations.
摘要
评审与讨论
The paper presents a coarse grained diffusion model for generating metal organic frameworks. Coarse structures are generated by a diffusion model, after which 3d atomic structures are assembled, oriented using a heuristic optimization procedure, and optimized using a force field. The coarse grained building blocks are represented using a learned (contrastive) representation. The generative model is evaluated on an inverse design problem using GCMC simulation.
优点
For the most part, the paper is easy to follow. There are several illustrative figures that provide a good overview
The paper addresses a very important and difficult problem, and the proposed method is sensible and (at least to some degree) effective.
There are several technical novelties, including the particular way the coarse graining and reassembly is conducted.
The method is validated with GCMC simulation.
缺点
There are many "moving parts" in the modeling pipeline. It would be enlightening with more ablation studies or comparisons of different modeling choices to yield some insight into the sensitivity to the different modeling choices.
Some technical details are not described in detail. The paper builds on several existing techniques, which are referenced but not described technically. This is just a minor point, but in my view the paper could perhaps be more self contained.
问题
In the coarse grained structure, are building blocks only represented using their identity and position? Did you consider including other features such as their orientation?
How sensitive is the representation learning to the choice of fingerprint? Is ECFP4 the only fingerprint you have considered, and why did you choose that?
Abstract: What does "predicting scores in E(3)" mean? At this point it is not clear which scores this refers to.
"small geometric variations in 3D orientation": I assume "orientation" here means something other than rotation/tranlation?
Would there be room to include a short technical description of the metal-oxo / MOFid algorithms that are used? Similarly, a sentence that technically describes ECFP4?
Could you include a very brief technical description of the gemnet-oc architecture?
The first paragraph in section 3 is difficult to read - would you consider including a "concept figure" that outlines the overall training + generative process?
The first paragraph in section 3 is difficult to read - would you consider including a "concept figure" that outlines the overall training + generative process?
Thank you for the great suggestion. We have added Figure 11 to illustrate the training and sampling processes. It is currently placed in Appendix B (page 18) due to the page limitation. We welcome further feedback on the new figure.
We look forward to further discussions if you have additional questions or suggestions.
Reference:
[1] Yim, Jason, et al. "SE (3) diffusion model with application to protein backbone generation." arXiv preprint arXiv:2302.02277 (2023).
We thank reviewer 1BzN for helpful feedback and comments. We address each of the reviewer’s concerns below.
There are many "moving parts" in the modeling pipeline. It would be enlightening with more ablation studies or comparisons of different modeling choices to yield some insight into the sensitivity to the different modeling choices.
We appreciate your feedback regarding the complexity of our modeling pipeline. To provide a more comprehensive comparison, we elaborate on some early experiments and decision-making:
-
We experimented with an autoencoder-based approach for building block embedding. However, this method significantly underperformed compared to the contrastive learning embedding technique we ultimately employed.
-
We also explored incorporating orientation into the building block representation and experimented with learning orientation diffusion. This approach, however, yielded significantly inferior results compared to our current assembly algorithm. More details on this will be provided in our subsequent response.
We are now exploring alternative building block identity other than ECFP4. We will update the results as they finish. We are happy to further discuss what other comparison/analysis might be interesting to include.
Some technical details are not described in detail. The paper builds on several existing techniques, which are referenced but not described technically. This is just a minor point, but in my view the paper could perhaps be more self contained.
Thank you for the concrete feedback. We have included a description of the metal-oxo/MOFid algorithms, the ECFP4 fingerprint, and the GemNet-OC architecture in the paper in Appendix B.3 (page 20). We welcome further feedback on this section.
In the coarse grained structure, are building blocks only represented using their identity and position? Did you consider including other features such as their orientation?
Thank you for the insightful question. Yes, in the coarse grained structure, the building blocks are only represented using their identities and positions. In early experiments, we have also attempted to include the orientation as part of the building block representation and diffusing the orientation of the building blocks based on SO(3)-diffusion [1]. However, we could not get good performance. The trained model is unable to recover accurate orientation through the reverse diffusion process. We believe this is due to the orientation of building blocks being ill-defined. In our failed attempts, we use principal component analysis to obtain the canonical orientation of the building blocks. Unlike amino acids, which have a natural definition of orientation based on the atoms [1], the geometry of the building blocks is much more diverse. There exist many almost-2D or near-isotropic building blocks which makes the orientation ambiguous. Further, the layout of the building blocks is delicate and requires accurate alignment between the building blocks to render a valid MOF.
How sensitive is the representation learning to the choice of fingerprint? Is ECFP4 the only fingerprint you have considered, and why did you choose that?
In the current paper, we have only considered the ECFP4 fingerprint because of its simplicity, popularity, and satisfying results. We agree with the reviewer the choice for building block identity is worth further consideration. As also suggested by reviewer yqBG, we plan to experiment with learned identity for building block representation. We will update the results as they finish.
Abstract: What does "predicting scores in E(3)" mean? At this point it is not clear which scores this refers to.
We are sorry for the unclarity. We have revised the abstract in the updated version.
"small geometric variations in 3D orientation": I assume "orientation" here means something other than rotation/tranlation?
Thanks for pointing out this unclarity. Yes, since the GemNet-OC encoder is SE(3) invariant, the building block embedding is invariant to global translation/rotation. We have revised this sentence in the updated version.
Would there be room to include a short technical description of the metal-oxo / MOFid algorithms that are used? Similarly, a sentence that technically describes ECFP4?
We have included a short description of the metal-oxo / MOFid algorithms and ECFP4 in Appendix B.3.
Could you include a very brief technical description of the gemnet-oc architecture?
We have included a brief description of the GemNet-OC architecture in Appendix B.3.
The paper introduces a novel method for generating metal-organic frameworks (MOFs) using a coarse-grained diffusion model called MOFDiff. MOFDiff employs a coarse-grained (CG) representation derived from MOF building blocks. A graph neural network (GNN) encoder trained via contrastive learning is used to map these building blocks into a latent space (z). MOFDiff consists of four main components:
- A periodic contrastive GNN encoder for latent embedding
- A MLP for lattice parameter (L) and building block (K) count prediction
- A periodic GNN denoiser for diffusion-based MOF generation
- Finally another MLP for MOF property prediction e.g., CO2 capacity.
The denoising diffusion process consists of two steps, first on the CG based building block types and then on their 3D coordinates. Then the all atom structure is recovered using an assembly algorithm, followed by simple force field (UFF) based relaxation. MOFDiff is trained on a dataset of ~304k MOFs and shows capability in generating valid and diverse MOFs. Overall, this paper leverages a diffusion model over a CG representation of MOFs to efficiently generate complex new structures, with proven efficacy in carbon capture applications.
优点
-
The CG representation of building blocks for the diffusion model makes MOFDiff computationally efficient to train and sample new MOFs. The all atom
-
Underlying physical symmetries are appropriately handled with contrastively trained GNN encoder for the building blocks
-
Optimizing MOFs in the latent space rather than decoding repeatedly is an efficient inverse design strategy demonstrated in the paper.
缺点
-
As acknowledged in the paper, the validity of the MOFs decreases as number of building blocks, this could limit applicability to larger and more complex MOFs.
-
The authors use a MOF dataset with less than <20 building blocks. But the paper doesn't discuss about size extensivity of the GNN encoder and denoising module in detail.
-
There is limited discussion on the synthetic accessibility of the designed MOFs. Finally, the building blocks are constrained to a certain set to allow coarse graining, this may led to less diverse MOFs.
问题
-
Why was the UFF chosen as force field? Would a MOF specific force field result in better structures?
-
ECFP4 was chosen as similarity measure for contrastive learning. Would a better similarity measure e.g., learned embeddings (ChemBERTa) result in better GNN encoder?
-
How difficult would it be extend the current framework to allow novel building blocks?
Corrections:
- On page 19, there is a mismatch between Py-G and pytorch references.
We thank reviewer yqBG for helpful feedback and comments. We address each of the reviewer’s concerns below.
As acknowledged in the paper, the validity of the MOFs decreases as number of building blocks, this could limit applicability to larger and more complex MOFs.
We recognize that as the complexity of MOFs increases, the validity of our approach may diminish. In future work, we aim to refine our methodology to enhance its applicability to more complex MOFs. This could involve incorporating the lattice parameters into the diffusion process, using known templates for guidance, and enhancing building block embeddings.
The authors use a MOF dataset with less than <20 building blocks. But the paper doesn't discuss about size extensivity of the GNN encoder and denoising module in detail.
It is straightforward to apply the current model to MOFs with a larger number of building blocks. In this paper, we limit the size of MOFs under the hypothesis that MOFs with extremely large primitive cells may be difficult to synthesize.
There is limited discussion on the synthetic accessibility of the designed MOFs. Finally, the building blocks are constrained to a certain set to allow coarse graining, this may led to less diverse MOFs.
Determining the synthetic accessibility of MOFs is very challenging, and there is unfortunately no widely accepted and generally applicable method. To the best of our capability, we define chemically informed validity criteria and use molecular simulations to filter for promising MOF candidates. We look forward to attempting the synthesis of top candidates in future endeavors.
The building blocks used in this work are extracted from the training dataset, BW-DB. This building block space contains 242,000 distinctive building blocks that enable a broad generative scope and satisfying performance in our tasks. While we focus on the BW-DB dataset in this paper due to the availability of gas adsorption labels, the space of possible building blocks can be expanded by incorporating new datasets. Our fully automated pipeline allows us to extract new building blocks without the need to curate the building blocks for template compatibility.
Why was the UFF chosen as force field? Would a MOF specific force field result in better structures?
Both UFF [1] and UFF4MOF [2] are widely used in existing MOF literature. It has been shown in previous work [3] that they have similar performance. We chose UFF over UFF4MOF to be consistent with the well-established GCMC simulation protocol for gas adsorption, which uses the UFF force field. In addition, through manual inspections, we find the relaxed structures from UFF reasonable.
ECFP4 was chosen as similarity measure for contrastive learning. Would a better similarity measure e.g., learned embeddings (ChemBERTa) result in better GNN encoder?
Thanks for the great suggestion. We agree that a more chemically informed building block identity has the potential to enhance our model. We are now exploring the ChemBERTa [4] embedding instead of ECFP4 for building block representation and will update further results when they finish.
On page 19, there is a mismatch between Py-G and pytorch references.
Thank you for pointing this out. We have corrected this in the revised manuscript.
We look forward to further discussions if you have additional questions or suggestions.
Reference:
[1] Boyd, Peter G., et al. "Data-driven design of metal–organic frameworks for wet flue gas CO2 capture." Nature 576.7786 (2019): 253-256.
[2] Nandy, Aditya, et al. "A database of ultrastable MOFs reassembled from stable fragments with machine learning models." Matter 6.5 (2023): 1585-1603.
[3] Boyd, Peter G., et al. "Force-field prediction of materials properties in metal-organic frameworks." The journal of physical chemistry letters 8.2 (2017): 357-363.
[4] Ahmad, Walid, et al. "Chemberta-2: Towards chemical foundation models." arXiv preprint arXiv:2209.01712 (2022).
Thanks for providing a detailed response but my rating remains the same.
Thank you for your response, time, and concrete feedback!
The authors present MOFDiff, a diffusion model that generates a coarse-grained (CG) representation of metal-organic framework (MOF) structures using a diffusion model. MOFs are represented hierarchically with sets of atoms grouped to represent building blocks. Each MOF is represented with K building blocks such that K << N, where N is the number of atoms in the periodic cell. This is particularly important as typical MOF unit cells may contain on the order of 100s of atoms. The authors first embed the building blocks using a model trained on contrastive learning loss. The latent vector is then used to condition a denoising diffusion model to generate CG representations of MOFs. CG represents are converted back to MOFs using a novel assembly algorithm. The authors define a validity criterion for their generated MOFs and show they are capable of generating valid, novel, diverse MOFs. MOFDiff is also capable of guided inverse design and optimizing MOFs CO2 separation for carbon capture.
优点
- The paper is easy to follow and the authors motivate their reasoning for model and design choices
- The contrastive representation learning of building blocks allows the authors to learn a meaningful latent space of building blocks including metal nodes and linkers. This extends previous work which only modified linkers and as a result, generated low-diversity samples
- The proposed assembly algorithm to orient multiple building blocks using gradient-based optimization is novel and provides a strong justification for the authors' data representation choices
缺点
- In the generation process, it is unclear why the latent vector Z can be sampled from the standard normal distribution. The latent vector is generated using the OrbNet encoder trained with the contrastive learning loss. Without further regularization, like in the case of variational autoencoders using the KL divergence, the latent vector should not conform to the distribution.
- It would be useful to compare with previous work [1] on the inverse design of MOFs. The authors mention and cite a source on the low diversity of the generated MOFs with template-based systems such as SmVAE, but a comparison isn’t present in the experiments section.
- The validity and novelty of generated structures are low compared to template-based models as in [1].
- In general, while the authors include relevant work in the paper, the authors should have a related work section to place their work in the context of the field.
- The following papers could potentially be related to the present work [2], [3].
[1] Yao, Z., Sánchez-Lengeling, B., Bobbitt, N. S., Bucior, B. J., Kumar, S. G. H., Collins, S. P., ... & Aspuru-Guzik, A. (2021). Inverse design of nanoporous crystalline reticular materials with deep generative models. Nature Machine Intelligence, 3(1), 76-86.
[2] Zhou, M., & Wu, J. (2022). Inverse design of metal–organic frameworks for C2H4/C2H6 separation. npj Computational Materials, 8(1), 256.
[3] Park, Junkil, et al. "Computational design of metal–organic frameworks with unprecedented high hydrogen working capacity and high synthesizability." Chemistry of Materials 35.1 (2022): 9-16.
问题
- In Figure 5, are only novel MOFs (i.e. not in the reference dataset) used to generate the histogram?
- In the representation learning of building blocks, the authors mention small geometric variations of the building. Are these variations in the coordinate space? In other words, what transformations are used to provide positive samples in the contrastive loss?
- How are the coordinates of the building blocks assigned?
In general, while the authors include relevant work in the paper, the authors should have a related work section to place their work in the context of the field. The following papers could potentially be related to the present work [2], [3].
Thank you for the thoughtful suggestion of adding a related work section, and for pointing out additional related works. We agree more context for the present work will enhance the manuscript. We have added a related work section (Appendix A) to discuss the relevant works in more detail, as well as include additional related works pointed out by the reviewer. Due to the page limitation, the related work section is in the appendix. We welcome further feedback on this related work section.
In Figure 5, are only novel MOFs (i.e. not in the reference dataset) used to generate the
Yes, only MOFs that are valid, novel, and unique are included in Figure 5.
In the representation learning of building blocks, the authors mention small geometric variations of the building. Are these variations in the coordinate space? In other words, what transformations are used to provide positive samples in the contrastive loss?
The geometric variation naturally exists in the dataset because the same building block appears in the same or different MOFs multiple times with geometric differences (reflected in Figure 3). Therefore, we don’t need to define transformations to provide positive samples. Instead, we use any two 3D building block structures as a positive pair if they share the same ECFP4 fingerprint and the same number of connection points.
How are the coordinates of the building blocks assigned?
The coordinates of the building blocks are generated through the reverse diffusion process. The reverse diffusion process starts by randomly sampling the building block coordinates and identities. From this noisy structure, the learned score networks iteratively denoise the noisy structure to a final coarse-grained MOF structure. The all-atom MOF structure is then recovered from the CG structure through the assembly algorithm and force field relaxation.
We look forward to further discussions if you have additional questions or suggestions.
Reference:
[1] Yao, Z., Sánchez-Lengeling, B., Bobbitt, N. S., Bucior, B. J., Kumar, S. G. H., Collins, S. P., ... & Aspuru-Guzik, A. (2021). Inverse design of nanoporous crystalline reticular materials with deep generative models. Nature Machine Intelligence, 3(1), 76-86.
[2] Jablonka, K. M. (2023). mofchecker (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.1234
[3] Bucior, Benjamin J., et al. "Identification schemes for metal–organic frameworks to enable rapid search and cheminformatics analysis." Crystal Growth & Design 19.11 (2019): 6682-6697.
[4] Boyd, Peter G., et al. "Data-driven design of metal–organic frameworks for wet flue gas CO2 capture." Nature 576.7786 (2019): 253-256.
We thank reviewer nQJ6 for helpful feedback and comments. We address each of the reviewer’s concerns below.
In the generation process, it is unclear why the latent vector Z can be sampled from the standard normal distribution. The latent vector is generated using the OrbNet encoder trained with the contrastive learning loss. Without further regularization, like in the case of variational autoencoders using the KL divergence, the latent vector should not conform to the distribution.
We are sorry for the confusion. We hope to clarify that the MOFDiff model is indeed a variational autoencoder model trained with the KL regularization loss (Equation 12). This allows sampling MOF structures by sampling latent vectors from a Gaussian prior. The contrastive loss is not used for CG diffusion of MOFs. Instead, it is only used for learning the building block embedding. Only the building block encoder is trained with the contrastive loss. The building block encoder is then frozen and used to obtain learned embedding for all building blocks in the dataset. These building block embeddings are then used to represent CG MOF structures and diffusion-based generative modeling.
We hope the response above clarifies the confusion. We are more than happy to provide additional clarification if needed.
It would be useful to compare with previous work [1] on the inverse design of MOFs. The authors mention and cite a source on the low diversity of the generated MOFs with template-based systems such as SmVAE, but a comparison isn’t present in the experiments section.
We have attempted to conduct more experiments on the SmVAE model proposed in [1]. However, although the VAE source code of [1] is released, the MOF deconstructor and reconstructor used in [1] are not clearly stated or publicly available. Without the code for these steps, we are unable to run SmVAE for a new dataset, or recover the all-atom structures for the dataset or generated samples of SmVAE.
While a direct lateral comparison is not feasible, we were able to download the top 9 candidate MOFs released by [1] (GMOF-1 to GMOF-9) for CO2/N2 separation. We run our molecular simulation workflow on these MOFs and report their gas adsorption properties in Table 1. The best GMOF attains a working capacity of 2.20 mol/kg, a CO2 uptake at the adsorption stage of 2.53 mol/kg, and a CO2/N2 Selectivity of 11.46. In comparison, the MOFs generated by MOFDiff exhibit better carbon capture performance. However, it is important to acknowledge that this comparison is not entirely rigorous due to the difference in training, dataset, simulation, and optimization settings. All of our code will be open-sourced.
In addition, we hope to highlight the methodological difference between our method and SmVAE. Our method generates 3D MOF structures without templates and thus offers a distinct perspective compared to existing template-based methods such as SmVAE in addressing the diversity of computational MOF design.
The validity and novelty of generated structures are low compared to template-based models as in [1].
In the previous response, we highlighted the significant obstacles in making an apple-to-apple comparison to [1]. We wish to clarify that the validity and the novelty statistics reported in [1] and our manuscript are not directly comparable for the following reasons:
-
The criterion for validity/novelty is not clearly defined in [1]. Our validity criterion is defined as the simultaneous satisfaction of (1) matched connection, (2) successful force field relaxation, and (3) passing all criteria defined in MOFChecker [2]. Our novelty is defined through MOFid [3]. Our code will be open-sourced so the validity/novelty criterion can be reproduced in future works.
-
The training data set is different. The SmVAE model [1] was trained on a customized database of “around 2 million MOFs”. Our model is trained on BW-DB (~300k MOFs) with the original labels for gas adsorption properties [4].
-
The generation scheme is different. The SmVAE model generates MOFs based on predefined templates. Our method generates 3D structures without templates. The SmVAE model and our method also use different sampling protocols.
Thank you for the detailed feedback.
We are sorry for the confusion. We hope to clarify that the MOFDiff model is indeed a variational autoencoder model trained with the KL regularization loss (Equation 12). This allows sampling MOF structures by sampling latent vectors from a Gaussian prior. The contrastive loss is not used for CG diffusion of MOFs. Instead, it is only used for learning the building block embedding. Only the building block encoder is trained with the contrastive loss. The building block encoder is then frozen and used to obtain learned embedding for all building blocks in the dataset. These building block embeddings are then used to represent CG MOF structures and diffusion-based generative modeling.
Yes, this clarifies my confusion about the latent encoding. Thank you for including a reference to the appropriate section in the appendix as it makes the finding information significantly easier.
We have attempted to conduct more experiments on the SmVAE model proposed in [1]. However, although the VAE source code of [1] is released, the MOF deconstructor and reconstructor used in [1] are not clearly stated or publicly available. Without the code for these steps, we are unable to run SmVAE for a new dataset, or recover the all-atom structures for the dataset or generated samples of SmVAE
While unfortunate that such a comparison is not possible, this response is valid.
Thank you for your response, time, and valuable suggestions!
The paper presents a method for generating Metal-Organic Framework (MOF) structures with target properties such as carbon capture.
The proposed method consists of a contrastive learning framework for embedding the MOF blocks based on their ECFP4 fingerprint. These representations are then used to work with a coarse-grained representation of the MOF. A latent variable model is then trained to generate a new MOF. Sampling from the latent variable allows for estimating the lattice structure and the number of building blocks. Then a diffusion-based decoder is used to produce the building block types and their location in the unit cell. The building blocks are then oriented and connections are established between them. Finally, a force field relaxation is computed to finalize the position and orientation of all atoms in the MOF.
The MOFs are validated based on well-established domain knowledge. The quality of the generated MOF structures is assessed for key application tasks such as carbon capture using well-established Monte Carlo simulation.
The proposal goes beyond the limitations of template-based methods and has computational advantages in proposing novel MOF structures with desired properties.
优点
The paper clearly demonstrates the strong technical background in Material Science of the authors. The focus is on achieving concrete results in MOF generation that can have an impact on key applications of MOF materials. Correspondingly, the validation of the results is substantive.
The selected Machine Learning components, such as the choice of the latent variable model, diffusion model, and contrastive learning are precisely selected to overcome specific challenges towards the goal of generating MOFs.
缺点
The above-given strengths are closely related to the weaknesses of the paper. Clearly, a very good application paper that makes important advances for computational material science may have limited contribution to the field of Machine Learning (ML).
While a strong contribution to Material Science, from the ML perspective the work is a good combination of existing methods but does not go much further than that. The paper tells a story about many of the design decisions that were taken, but many other choices could have been made with even more recent approaches.
In that sense, the main weakness is the alignment to the ICLR conference.
问题
One potentially problematic aspect is that the number of blocks K and the lattice structure L are computed based on the sample z which also conditions the diffusion of the attributes and the locations of the blocks. This can also lead to invalid MOF structures as the block's location and orientation may not match the lattice geometry. You identify this as well in your last sentence.
- Why not incorporate these in the diffusion process?
- Or alternatively, why not compute the lattice structure based on the atoms and the bond in the unit cell? Wouldn't the atoms (block) and the bonds uniquely identify the lattice structure of the crystal?
- Why generate from an uninformed prior N(0, I)? It seems unreasonable to expect that sampling from such a distribution would give good coverage of the vast space of possible MOF configurations. Would it not be more effective to condition on a number of building blocks or present partial coarse structures? Possibly many other well-understood properties of the MOFs?
- Your generation process is limited to using the building blocks present in the training data. How broad of a coverage does this give the generating process? Are there many other MOFs possible with building blocks not present in the training data?
Reference:
[1] Yao, Zhenpeng, et al. "Inverse design of nanoporous crystalline reticular materials with deep generative models." Nature Machine Intelligence 3.1 (2021): 76-86.
[2] Park, Hyunsoo, et al. "Inverse design of metal-organic frameworks for direct air capture of CO2 via deep reinforcement learning." (2023).
[3] Hoogeboom, Emiel, et al. "Equivariant diffusion for molecule generation in 3d." International conference on machine learning. PMLR, 2022.
[4] Gruver, Nate, et al. "Protein Design with Guided Discrete Diffusion." arXiv preprint arXiv:2305.20009 (2023).
[5] Xie, Tian, et al. "Crystal diffusion variational autoencoder for periodic material generation." arXiv preprint arXiv:2110.06197 (2021).
[6] Yim, Jason, et al. "SE (3) diffusion model with application to protein backbone generation." arXiv preprint arXiv:2302.02277 (2023).
[7] Jiao, Rui, et al. "Crystal Structure Prediction by Joint Equivariant Diffusion on Lattices and Fractional Coordinates." NeurIPS (2023).
Thank you for all your detailed answers.
I do believe there are still some limitations to the proposed approach. Nevertheless, I also think that it is a good contribution to the state of the art. So I have increased my score to 8.
Thank you for your response, time, and valuable suggestions!
We deeply appreciate your suggestions on further improvements over the current model, which we respond to below.
One potentially problematic aspect is that the number of blocks K and the lattice structure L are computed based on the sample z which also conditions the diffusion of the attributes and the locations of the blocks. This can also lead to invalid MOF structures as the block's location and orientation may not match the lattice geometry. You identify this as well in your last sentence. Why not incorporate these in the diffusion process? Or alternatively, why not compute the lattice structure based on the atoms and the bond in the unit cell? Wouldn't the atoms (block) and the bonds uniquely identify the lattice structure of the crystal?
Thank you for the great suggestion. Indeed, as we conclude in our paper, we agree incorporating the lattice parameters into the diffusion process is a promising future direction to resolve the mismatch between the blocks and the lattices, and significantly improve the current model.
However, we find incorporating lattice diffusion to MOFDiff presents significant challenges. Existing work that focuses on inorganic crystals [7] has explored lattice diffusion. Their formulation is to diffuse the lattice parameters to N(0, I) and represent the coordinates of atoms in the fractional coordinates because the cartesian coordinates will become numerically unstable for small lattice sizes and the cartesian scale of the system changes across the diffusion process.
Applying a similar strategy to MOFs is more complex. First, using fractional coordinates to represent the coarse-grained coordinates may not be ideal, as fractional coordinates cannot accurately represent the essential size information of these blocks, while the sizes of the building blocks vary drastically. Moreover, as the building block identities and their respective sizes evolve during the reverse diffusion process, maintaining accurate and meaningful representations of the lattice and building blocks may be challenging. Given these challenges, we believe it requires careful consideration and novel approaches to address the unique characteristics and demands of MOFs when designing a lattice diffusion process. We value your input and recognize the importance of this issue. We are committed to addressing it in our future research endeavors.
Why generate from an uninformed prior N(0, I)? It seems unreasonable to expect that sampling from such a distribution would give good coverage of the vast space of possible MOF configurations. Would it not be more effective to condition on a number of building blocks or present partial coarse structures? Possibly many other well-understood properties of the MOFs?
Our first experiment samples MOFs from the Gaussian prior N(0, I) of the VAE model. For a well-trained VAE, sampling from the prior should result in the generation of new data instances that reflect the general characteristics of the training data. In this unconditional generation experiment, we aim to evaluate our model’s capability to recover the training distribution in a general sense. In other words, we aim to validate that MOFDiff can generate valid MOFs (Figure 6) with a wide spectrum of structural properties that resemble the training distribution (Figure 5). In contrast, a badly trained model may have “mode collapse”, and consequently be unable to capture the training distribution.
Your generation process is limited to using the building blocks present in the training data. How broad of a coverage does this give the generating process? Are there many other MOFs possible with building blocks not present in the training data?
Our method operates over 242,000 distinctive building blocks from the BW-DB dataset. Our experimental results demonstrate the current space of building blocks can already enable structurally diverse generation, as well as effective inverse design for carbon capture applications. While we focus on the BW-DB dataset in this paper due to the availability of gas adsorption labels, it is straightforward to expand the space of possible building blocks (and the scope of MOF generation) by incorporating new datasets.
We look forward to further discussions if you have additional questions or suggestions.
We thank reviewer e39L for helpful feedback and comments. We address each of the reviewer’s concerns below.
While a strong contribution to Material Science, from the ML perspective the work is a good combination of existing methods but does not go much further than that. The paper tells a story about many of the design decisions that were taken, but many other choices could have been made with even more recent approaches. In that sense, the main weakness is the alignment to the ICLR conference.
Thank you for appreciating the materials science and ML application aspects of our paper. We believe our paper also makes methodological contributions and is aligned with the ICLR audience for the following reasons:
- Our paper addresses the important and difficult task of MOF design, for which ML methods have great potential. We believe our model design makes novel and non-trivial contributions from an ML perspective:
- We propose contrastive learning for embedding the enormous space of building blocks to be used for generative modeling. In particular, in early experiments, we found autoencoding-based embedding has a much inferior performance compared to contrastive learning.
- We combine learning and algorithmic methods to solve a task that is hard to solve with a pure learning approach. In early experiments, we attempted a pure learning approach of SO(3) diffusion for building block orientation, but couldn’t get good results.
- Coarse-grained diffusion. Our diffusion model operates over CG coordinates and a vast space of learned building block identities. This is novel compared to existing diffusion models for small molecules, proteins, or crystals [3,4,5], where the atoms/amino acids have discrete types.
-
Our paper provides scientifically meaningful tasks for the ML community to make rapid progress on MOF design.
-
We will open-source our code for the entire pipeline of MOF decomposition, MOF coarse-graining, diffusion model, all-atom MOF reconstruction, relaxation, and molecular simulation. We believe our code will make MOF modeling and design much more accessible to the broader ML community.
In summary, we believe the methods, tasks, and codebase proposed in our paper will be interesting to the ICLR audience, especially the AI4Science community. In particular, our method offers new perspectives for the ML modeling of multi-scale molecular and materials systems that are ubiquitous in biological and physical science.
More details on the orientation diffusion attempt:
In early experiments, we also explored diffusing the orientation of the building blocks based on SO(3)-diffusion, which has shown promising results in protein diffusion models [6]. However, we could not get good performance. We believe this is due to the orientation of building blocks being ill-defined. In our failed attempts, we use principal component analysis to get canonical orientations for the building blocks. Unlike amino acids which have a natural definition of orientation based on the atoms, the geometry of the building blocks is much more diverse. There exist many almost-2D or near-isotropic building blocks which makes the orientation ambiguous. Further, the layout of the building blocks is delicate and requires accurate alignment between the building blocks to render a valid MOF.
The paper introduces a novel method for generating metal-organic frameworks, which is then applied for design of material with carbon capture properties. The Reviewers were enthusiastic about the paper and uniformly support acceptance.
One concern is the alignment between the paper and the ICLR community. I believe that making progress in difficult applications with innovative approaches has always been the cornerstone of our field. The model presented in the paper is technologically advanced (if not a bit too complex perhaps). The use of conditional denoising on lattice parameters that denoises embeddings of building blocks seems to be a novel contribution to the field of machine learning for chemistry. The strong execution in the paper and clear exposition make this paper in my opinion interesting for people outside of this narrow field.
All in all, it is my pleasure to recommend acceptance of the paper. Thank you for your submission. Please remember to address any remaining comments by Reviewers.
为何不给更高分
While it is interesting to many folks in the community, I think it is not as broadly interesting, as would be necessary for the spotlight category.
为何不给更低分
It is a sound application paper with some innovation in methodology.
Accept (poster)