PaperHub
6.5
/10
Poster4 位审稿人
最低6最高8标准差0.9
6
6
6
8
3.8
置信度
正确性3.0
贡献度2.5
表达2.8
ICLR 2025

Periodic Materials Generation using Text-Guided Joint Diffusion Model

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-28
TL;DR

We introduce a novel text-guided diffusion model for generating periodic materials. Our model jointly produces atom fractional coordinates, types, and lattice structures using a periodic E(3)-equivariant graph neural network (GNN).

摘要

关键词
ML4MaterialsDiffusion ModelsCrystal MaterialsMaterial GenerationAI4Science

评审与讨论

审稿意见
6

This paper presents a text-guided diffusion model for periodic material generation. Compared to the baseline methods, the proposed text-guided approach achieves more valid and stable performance on benchmark datasets.

优点

  1. Integrating text information to guide crystal structure generation is a novel practice in the field of material structure generation.
  2. From the experimental results, the method proposed in this paper achieves better performance compared to baseline methods. Additionally, the proposed method also demonstrates improved generation efficiency.

缺点

  1. Although combining text information in the field of material generation is an innovative approach, the text-guided diffusion model is not a new model framework. Therefore, the contribution of this paper is limited from the perspective of architectural innovation.
  2. TGDMat(Short) does not always outperform baseline methods, and the improvement of TGDMat(Long) over baseline methods is also marginal. This indicates that the model's results are heavily dependent on how the text information is constructed, and richer textual information can lead to slightly better outcomes than the baseline.
  3. As metrics commonly used to evaluate generative performance, the authors should consider including diversity and novelty as additional metrics for comparison.

问题

See Weaknesses.

评论

TGDMat(Short) does not always outperform baseline methods.

In Table 3, the text-guided baseline models such as CDVAE+, SyMat+, and DiffCSP+ are based on the long variant of their respective models. As a result, TGDMat(Short) does not consistently outperform these baseline methods, although it does outperform them in 7 out of 21 cases, ranking as the second-best model overall. To provide a more equitable comparison, we have separated the results of the short variants in the following table for both tasks.

Gen Task (on Shorter Prompt) :

Perov Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
CDVAE+(Short)98.1710099.4099.010.07060.13950.0246
SyMat+(Short)96.9410099.2298.400.01920.18270.2633
DiffCSP+(Short)98.2110099.6198.390.01230.11930.0266
TGDMat(Short)98.2810099.7199.240.01080.09470.0237

Carbon Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
CDVAE+(Short)10099.3482.960.13980.2804
SyMat+(Short)10099.5297.200.12063.7422
DiffCSP+(Short)10099.6597.290.08110.0870
TGDMat(Short)10099.8191.770.06810.0865

MP Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
CDVAE+(Short)87.0510099.3699.600.99300.64200.2970
SyMat+(Short)88.0899.998.5999.470.50310.39170.3622
DiffCSP+(Short)84.5710099.5299.850.33100.38000.1379
TGDMat(Short)86.6010099.7999.880.33370.32960.1189

We observe that among the total 19 metrics over three datasets in the table, TGDMat(Short) method outperforms all other baselines in 16 metrics.

CSP Task (on Shorter Prompt) :

Method#SamplesPerovPerovCarbonCarbonMPMP
Match RateRMSEMatch RateRMSEMatch RateRMSE
CDVAE+(Short)148.970.106322.650.264040.330.1037
CDVAE+(Short)2089.540.042389.610.218870.220.0876
SyMat+(Short)149.390.098523.710.256740.840.1027
SyMat+(Short)2092.100.025590.860.206971.310.0875
DiffCSP+(Short) or TGDMat (Short)156.540.058324.130.242452.220.0597
DiffCSP+(Short) or TGDMat (Short)2098.250.013788.280.225280.970.0443

In CSP Task, we observe TGDMat (Short) performs better than all the baselines across dataset in both setup: using number of samples 1 and 20. Please note in the CSP task, since atom types are given DiffCSP+ (Short) and TGDMat (Short) are the same architecture.

Marginal Improvement of TGDMat(Long) over baseline methods.

We respectfully disagree with the reviewer’s assessment that TGDMat(Long) offers only marginal improvements.

Gen Task (on Long Prompt) :

For metrics where baselines already achieve over 98% accuracy (e.g., Structural and Compositional Validity for Perov, Carbon, or Coverage Metrics), the scope for further improvement is inherently limited. However, for other metrics, TGDMat(Long) demonstrates significant enhancements. Specifically, the average improvement in property statistics are as follows:

PerovCarbonMP
Avg Improvement by TGDMat(Long)35.61%38.91%14.57%

Moreover, for the MP-20 dataset, compositional validity is improved by 5%, which is a notable advancement. Overall, TGDMat’s superior performance across seven metrics on the MP-20 dataset underscores its potential to generate novel materials that are experimentally synthesizable.

CSP Task (on Long Prompt) :

Prior unconditional diffusion models demonstrate improved Match Rates and lower RMSE when generating 20 samples (k = 20) per test material, they largely fail in both metrics when generating only one sample (k = 1) per test material. Using text guidance(Long Variant) during the reverse denoising process, with just one generated sample per test material, text-guided variants outperform respective vanilla models, thereby reducing computational overhead.

Overall, these findings highlight the impact of our key contributions in TGDMat: 1) Joint Diffusion on A, X, and L, 2) Discrete Diffusion on A, and 3) Text-Guided Diffusion.

Finally, we would like to thank Reviewer mJeo once again for these valuable comments. We will reflect these comments in the revised manuscript. We believe that our responses above address all of Reviewer mJeo's concerns and contribute to further strengthening our work.

Sincerely, The Authors

评论

Thank you for your detailed reply. I increased the rating score.

评论

Dear Reviewer mJeo,

We sincerely thank you for your positive feedback and for raising the score to 6. We have carefully incorporated all your suggested changes into our revised manuscript, and we believe these contributions significantly enhance the research at the intersection of AI and material science.

We would be grateful if you could review the revised version and let us know if it is possible to change the decision from borderline accept to accept.

Your feedback would be highly appreciated. If you feel that additional experiments or results would further strengthen the manuscript, we are more than willing to provide them and actively engage in any further discussions.

Thank you once again for your valuable insights and support. We look forward to your response.

Thank you,

The Authors

评论

Dear Reviewer mJeo,

Thank you very much for providing us with valuable feedback. We appreciate the detailed comments. Below, we have provided point-by-point responses to each of your comments.

Weaknesses

  • Regarding Architectural Innovation of TGDMat.

While text-guided diffusion models are widely used in fields such as image, video, and molecule generation, their application in periodic material generation remains largely unexplored. This study is the first to introduce text-guided diffusion for material generation. To achieve this, we have made the following novel contributions:

  • Dataset Curation: At the start of this work, no textual data was available for materials in benchmark databases such as Perov, Carbon, and MP. To address this, we curated textual data for these material databases, including both long and short textual prompts. Details of this curation process are provided in Section 4.2 and Appendix D: Textual Dataset. We plan to release these datasets to the community, anticipating that they will facilitate further exploration and research.

  • Simple Yet Effective Fusion Approach: In TGDMat, we employ a straightforward yet highly effective method to integrate text-based contextual representations into the denoising network. Specifically, at each stage of denoising, we utilize the contextual information Cp**C**_**p** derived from textual descriptions via a pre-trained MatSciBERT model and concatenate it with the input atom features.

Our innovation, however, goes beyond text-guided diffusion. We further enhance the base diffusion model by introducing Discrete Diffusion for Atom Types. Prior works like CDVAE, DiffCSP used continuous diffusion on atom types (represented as the probability distribution for k classes and applying DDPM to learn the distribution). However, atom types are discrete data and it is well established in the literature[1] that using a continuous diffusion model for discrete features is not reasonable and produces suboptimal results. Hence we consider discrete diffusion on atom types in TGDMat, where we consider A as N discrete variables belonging to k classes and leverage the discrete diffusion model (D3PM)

[1] Andrew Campbell et.al. A continuous time framework for discrete denoising models. Advances in Neural Information Processing Systems 35 (2022), 28266–28279.

  • Regarding Diversity and Novelty as additional metrics for comparison.
  • Regarding Novelty: For evaluation, we adopted the benchmark metrics introduced by CDVAE and subsequently used by prior works such as SyMat and DiffCSP. As outlined by Xie et al. in the CDVAE paper (Section 5.2: "Material Generation"), these seven metrics in the Generation Task aim to assess the novelty, validity, and property statistics of the generated materials.

  • Regarding Diversity: Since our paper focuses on conditional material generation, the goal is not to achieve diverse random generation but rather to produce constrained and targeted outputs that align with the criteria specified in the text descriptions of the test dataset. Consequently, the diversity of the generated materials depends on the diversity of the test dataset—if the test dataset is diverse, the generated materials will reflect that diversity; otherwise, they will not.

审稿意见
6

The paper focuses on a text-guided diffusion model for periodic material generation. It first designs a diffusion model that can jointly model atom types, coordinates, and lattice structures for periodic materials, then proposes incorporating material structures and properties through text to enhance generation performance. In the experiments, it demonstrates improvements by incorporating text into existing diffusion models, such as SyMat and DiffCSP, followed by showing the superiority of the proposed TGDMat over existing baselines.

优点

  • This paper proposes the joint learning fashion for three types of crystal structural information. Text guided denoising network can be the right way to the solution of the problem.
  • The paper is well-organized, self-contained with a comprehensive and detailed reference section.
  • The design of textual description in the reverse diffusion process is convincing according to the supplementary material.
  • The paper provides the computational advantage of integrating text knowledge during reverse diffusion compared to other baseline models.

缺点

  • Motivation should be clarified: (a) Providing guidance in diffusion models for improved generation is important, but it is unclear why text must be used to incorporate such guidance. The guidance information in line 70 could also be modeled as feature vectors, reducing the need for text embedding models, which may not always provide accurate text-based embeddings. Additionally, text-based models may struggle to capture subtle numerical differences. Is there any prior work on using feature vectors for guidance? What are the advantages of using text rather than directly using numerical values or feature vectors? (b) The idea of studying joint diffusion over all three components of crystal structures: the lattices, atom types, and coordinates, by employing a periodic-E(3)-equivariant denoising model, kind of the same as the DiffCSP. Though exploring both crystal structure prediction (CSP) and random generation task (Gen), the text guided denoising network seems to be straightforward by further considering the CSP model architecture. It is not clear how much of this part of work is based on previous research and what is new.

  • Method design should have rationales: The authors applied the diffusion process independently to three variables: lattice (line 210), atom types (line 248), and atom coordinates (line 272). However, in the reverse process, the denoising models use all variables with conditions for noise estimation (line 289). Given that lattice noise is independent of atom types/coordinates and vice versa, is there a reason for not simply using the lattice variable (plus CpC_p) for lattice denoising?

  • Experiments should be improved: The paper lacks an evaluation of whether the generated materials meet the text descriptions. Coverage may reflect this to some extent, but it is too general and lacks details on specific properties mentioned in the descriptions. Property statistics are at the distribution level between two sets and lack point-to-point comparisons. Also, the paper lacks the ablation study for the denoising model to demonstrate the individual importance of each component (or the authors could highlight the part if I missed it).

问题

  • In line 48, to address issues in local message passing, how about using Transformers?
  • I am not sure if I missed anything, Figure 2 shows ‘SE(3)-Equivariant GNN model’.. Also, are the diffusion and reverse processes in Figure 2 are correct?
  • The paper conducts joint diffusion on lattices, atom types, and coordinates, is there any ablation study for the separately and jointly learning of crystal geometry?
  • The CDVAE employs SE(3) equivariant GNNs adapted with periodicity to ensure the invariance of materials. Why not use it and what’s the difference between SE(3) and periodic E(3)-equivariant denoising model backbone?
  • Noticing there would be several generated samples in Table 4, how would one select the best sample if the ground truth was not available?
评论

Ablation study for the separately and jointly learning of crystal geometry.

In response to the reviewer's feedback we conducted an ablation study where we use three diffusion models to learn A,X,L separately. While sampling we sample A,X,L separately and merge them together. We fuse the textual representation in the same way in all three diffusion models. We present the results in following table and compare with TDGMat:

Perov Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
TGDMat (Learning A,X,L seperately)90.1085.4385.7783.510.3410.5910.376
TGDMat (Jontly Learning A,X,L)98.6310099.8799.520.00900.0490.018

Carbon Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
TGDMat (Learning A,X,L seperately)75.6480.9582.290.4350.584
TGDMat (Jontly Learning A,X,L)10099.9192.430.0430.063

MP Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
TGDMat (Learning A,X,L seperately)73.1877.0182.9972.410.8610.8870.634
TGDMat (Jontly Learning A,X,L)92.9710099.8999.950.2890.3080.115

We observe significant performance degradation in all metrics across datasets if we learn A,X,L separately. We will add these results in the revised manuscripts.

What’s the difference between SE(3) and periodic E(3)-equivariant denoising model backbone?

A model is An SE(3) equivariant if it follows invariance to permutation, translation, and rotation. Additionally, for crystals, we need invariance for periodic transformations, since the atoms in the unit cell can periodically repeat themselves infinite times along the lattice vector, there can be many choices of unit cells and coordinate matrices representing the same material. We denote this as a periodic E(3)-equivariant denoising model or GNN. SE(3) equivariant GNNs adapted with periodicity proposed by CDVAE are also similar to the GNN model that follows invariance to permutation, translation, rotation, and periodicity. More details about the physical symmetry of the crystal structure are provided in Appendix C: "INVARIANCES IN CRYSTAL STRUCTURE."

How would one select the best sample if the ground truth was not available?

In the absence of ground truth, selecting the best sampleamong generated materials involves ensuring things such as the generated samples adhere to fundamental physical and chemical principles, such as proper bonding patterns, reasonable interatomic distances, and compliance with crystal symmetry constraints. One can also use tools like DFT to calculate formation energy, energy above the convex hull, or phonon stability to select the best samples. However, at this point of time there is no single agreed method to choose the best sample.

Finally, we would like to thank Reviewer ijYE once again for these valuable comments. We will reflect these comments in the revised manuscript. We believe that our responses above address all of Reviewer ijYE's concerns and contribute to further strengthening our work.

Sincerely, The Authors

评论

Evaluation of whether the generated materials meet the text descriptions.

Quantitative results on the alignment of the generated structures with the given prompts, in terms of compositions, properties, and other factors, are provided in the "F.3 CORRECTNESS OF GENERATED MATERIALS" section of the Appendix (Page 23, Line 1229). This is also referenced in the "Additional Results" section (Page 10, Line 527) of the main manuscript. Following are the details :

To ensure the fidelity of our model’s outputs concerning these specified global attributes from the text prompt, we randomly generated 1000 materials (sampled from all three Datasets) based on their respective textual descriptions(both Long and Short) and assessed the percentage of generated materials that matched the global features outlined in the text prompt. In specific, we matched the Formula, Space group, Crystal System, and Dimensions of generated materials with the textual descriptions. Moreover, we examined whether properties such as formation energy and bandgap matched the specified criteria as per the text prompt (positive/negative, zero/nonzero).

Results for TGDMat(Long):

Global Feature% of Matched Materials(Perov)% of Matched Materials (Carbon)% of Matched Materials(MP)
Formula97.5098.2070.54
Space Group87.0080.7967.88
Crystal System92.6091.5573.54
Formation Energy95.49-92.88
Band Gap-98.6196.73

Results for TGDMat(Short):

Global Feature% of Matched Materials(Perov)% of Matched Materials (Carbon)% of Matched Materials(MP)
Formula90.7092.5665.22
Space Group86.5180.5058.77
Crystal System83.1981.6472.77
Formation Energy90.33-91.00
Band Gap-95.9093.33

In general, using longer text, considering Perov-5 and Carbon-24 datasets, the generated material meets the specified criteria effectively. However, when dealing with the MP-20 dataset, which is more intricate due to its complex structure and composition, performance tends to decline. Additionally, when using shorter prompts, overall performance suffers across all datasets compared to longer text inputs. This is because the longer text, provided by the robocrystallographer, offers a comprehensive range of information, both global and local, thereby enhancing the generation capabilities of TGDMat.

Questions

To address issues in local message passing, how about using Transformers?

Replacing GNNs with Transformers as the backbone network is an exciting research direction, but implementing Transformers comes up with its own challenges[1]. One of the key obstacles is the scalability and efficiency of graph transformers, which require significant memory and computational resources, particularly when using global attention mechanisms. These challenges become even more pronounced in deeper architectures, which are more prone to overfitting and over-smoothing. Additionally, graph transformers often face challenges when it comes to generalizing to graphs. As a result, in many popular graph machine learning applications, graph transformers have not yet fully replaced the Message Passing (GNN) framework. We consider this an important area for future exploration.

However, in response to the reviewer's suggestion we conduct an additional study, where we replace the backbone of TGDMat with Matformer[2], a popular transformer model for crystal property prediction. We present the results of Gen Tasks in the following table and compare them with TDGMat:

Perov Dataset:

MethodComp ValidityStruct Validity
TGDMat (Matformer Backbone)93.7790.13
TGDMat (GNN Backbone)98.63100

Carbon Dataset:

MethodComp ValidityStruct Validity
TGDMat(Matformer Backbone)-89.26
TGDMat(GNN Backbone)-100

MP Dataset:

MethodComp ValidityStruct Validity
TGDMat(Matformer Backbone)81.9684.37
TGDMat(GNN Backbone)92.97100

We observe significant performance degradation in all metrics across datasets. This needs further exploration and we keep this a scope of future work.

[1] Shehzad, A. Graph transformers: A survey. arXiv preprint arXiv:2407.09777.

[2] Yan, Keqiang, et al." Periodic graph transformers for crystal material property prediction. NeuRIPs 35 (2022): 15066-15080.

Regarding Typo in Fig-2.

We apologize for the mistake and appreciate you pointing it out. The direction of the arrows for the diffusion and reverse processes was incorrect. For the diffusion/forward process, the arrow should point from M0M_0 to MTM_T, while for the denoising/reverse process, it should point from MTM_T to M0M_0. Also, the GNN model would be a periodic E(3)-equivariant GNN model. We will correct this error and update the revised manuscript accordingly.

评论

Dear Reviewer ijYE,

Thank you very much for providing us with valuable feedback. We appreciate the detailed comments. Below, we have provided point-by-point responses to each of your comments.

Weaknesses:

Utility of Text-guidace than Feature Vectors-guidance

In literature [1], it has been extensively studied that using value guidance specifying a single or a handful of target properties or features as feature vectors might be insufficient to capture intricate conditions. In contrast, textual descriptions allow us to encompass these conditions adeptly and flexibly and produce better results than value-based guidance. Also, Since we use MatSciBert which is pre-trained on a huge corpus of articles from materials domains, it allows us to encode the more enriched and robust contextual information about the tokens.

[1] Luo, Yanchen, et al. "Text-guided diffusion model for 3d molecule generation." (2024).

However, in response to the reviewer's question, we conducted an additional experiment, where we feed all relevant conditional information e.g Formula, Space Group, Crystal Symmetry,Bond Length and Property Values as feature vectors to guide the diffusion model. Here are the results we observed for Gen Task:

Perov Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
TGDMat (Conditions as Features)96.5598.7399.1897.060.01490.12000.0290
TGDMat (Conditions as Text Emb)98.6310099.8799.520.00900.0490.018

Carbon Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
TGDMat (Conditions as Features)99.5699.3292.170.1040.087
TGDMat (Conditions as Text Emb)10099.9192.430.0430.063

MP Dataset:

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
TGDMat (Conditions as Features)83.8599.6198.9798.30.3770.3750.126
TGDMat (Conditions as Text Emb)92.9710099.8999.950.2890.3080.115

Across three datasets, we did not observe performance improvements. We will add these results in the revised manuscripts.

Key differences between DiffCSP and TGDMat

Following are the key differences between DiffCSP and our proposed TGDMat

DiffCSPTGDMat
TasksOnly CSP TaskBoth CSP and Gen Tasks
Diffusion on Atom TypeNADiscrete Diffusion (D3PM)
Model CategoryUnconditional; unable to specify the criteria required by the user.Conditional; able to specify the criteria required by the user.(in Text Format)
Text Guided DiffusionNOYes

However, note the goal of this paper is not to introduce a new diffusion model to replace existing models like DiffCSP or CDVAE for periodic material generation. Instead, we focus on demonstrating that conditional models can outperform traditional unconditional models, such as DiffCSP. Specifically, we show that incorporating textual conditions through text-guided diffusion leads to better performance compared to using unconditional models like DiffCSP. Additionally, we enhance DiffCSP by integrating discrete diffusion over atom types in our proposed TGDMat framework.

Regarding Diffusion Method Design

In the diffusion model, the forward process is non-parametric, meaning we simply add noise to three variables (atom coordinates, atom types, and the lattice) independently, without any learning involved. However, during the denoising process, a backbone Equivariant Graph Neural Network (EGNN) is used to predict the noise at each time step. At any given time t, the EGNN takes the atom types A_t, atom coordinates X_t, and lattice L_t together as input. Then the EGNN performs message passing and aggregation to generate node and graph representations, which are then used to predict the noise. Thus, the denoising process depends on all three variables.

评论

I thank the authors for delivering additional results and discussions quickly to answer my questions. Incorporating the results and discussions can improve the soundness of the work. I increase the soundness score from 2 to 3. My evaluation on the contribution remains. I may put the overall score between 6 and 7, however, based on the assessment on novelty and contribution, it cannot reach 8 in my evaluation. So I'd keep the overall score the same.

评论

We sincerely thank the reviewer for their positive feedback and for increasing the soundness score to 3. We also appreciate the reviewer’s acknowledgment that the overall score of the paper should exceed 6 (above the borderline).

Below, we summarize the novelty and contributions of our work:

  • Exploring Text-Guided Diffusion for Periodic Material Generation
  • Curating Text Datasets for Benchmark Databases
  • A Simple Yet Effective Fusion Approach to Integrate Text into the Denoising Process
  • Introducing Discrete Diffusion for Atom Types
  • Joint Diffusion on A, X, and L

We believe these contributions play a meaningful role in advancing research at the intersection of AI and material science. If the reviewer requires additional experiments or results to improve the work further, we are happy to provide them and actively participate in the discussion.

As suggested, we will incorporate the results and discussions into the revised manuscript.

评论

Dear Reviewer,

Thank you for your valuable feedback and thoughtful comments.

In response, we have uploaded the revised manuscript and made the following updates as per your suggestions:

  • Assessing how well the generated materials align with the provided text prompts (Section 5.4)
  • Comparing the utility of text-guidance versus feature vector-guidance (Appendix F.7)
  • Ablation study on the joint learning of crystal geometry (Appendix F.8)
  • Providing a detailed discussion on the key differences between DiffCSP and TGDMat.(Section 3 and Appendix B.5)
  • Correcting a typo in Figure 2 regarding the direction of forward and reverse diffusions.

We hope these revisions adequately address all the concerns you raised and improved the quality of our manuscript. If there are any remaining issues or additional clarifications needed, please let us know, and we would be happy to address them. Otherwise, we kindly request you to consider revising the score based on these updates.

We look forward to your response.

Thank you,

The Authors

审稿意见
6

The paper introduces a text-guided diffusion model for generating 3D periodic materials. The work leverages a periodic-E(3)-equivariant graph neural network (GNN) to jointly generate atom types, fractional coordinates, and lattice structures, while integrating textual descriptions at each denoising step.

优点

  1. The model effectively learns the joint distribution of atom coordinates, types, and lattice structure through an end-to-end diffusion network, which is a significant improvement over existing models that handle these aspects separately.
  2. Incorporating textual descriptions as a condition during the denoising process enhances the model's ability to generate materials that meet specific user-defined criteria, making it more versatile and user-friendly.

缺点

  1. The paper does not provide sufficient detail on how the contextual representation of long, detailed text data is fused into the denoising network to generate text-guided variants of baseline models.
  2. The visualization of de novo generated materials lacks a thorough discussion on whether the results align with the given prompts. Additionally, the alignment of generated materials with general textual conditions is not adequately discussed, which is crucial for validating the model's performance.
  3. There is a need for a more comprehensive ablation study comparing the effects of long, detailed descriptions versus short prompts on the model's guidance performance. This would help understand the robustness and versatility of the model under different input conditions.

问题

  1. Could the authors provide more details on how the contextual representation of long, detailed text data is integrated into the denoising network? What specific techniques or architectures are used to achieve this?
  2. Could the authors include an ablation study comparing long and short text prompts on the baselines in Table 1 or 2? How would the results differ, and what insights could be gained from such a study?
  3. Can the authors provide more detailed visualizations of the generated materials and explicitly discuss how well these results align with the given text prompts? Are there any metrics or qualitative assessments to evaluate this alignment?
评论

Ablation study comparing long and short text prompts on the baseline models.

We begin by presenting the ablation study results comparing long and short text prompts on both tasks, followed by key insights and observations.

Gen Task:

Perov

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
CDVAE98.2910099.2598.390.07310.14620.0291
CDVAE+(Short)98.3710099.4099.010.07060.13950.0246
CDVAE+(Long)98.4510099.5399.090.06090.12760.0223
SyMat96.8310099.1698.290.01930.19910.2827
SyMat+(Short)96.9410099.2298.400.01920.18270.2633
SyMat+(Long)97.8810099.7198.790.01720.17550.2566
DiffCSP98.1510099.2898.080.01320.12810.0267
DiffCSP+(Short)98.2110099.6198.390.01230.11930.0266
DiffCSP+(Long)98.4410099.8598.530.01190.10710.0241
TGDMat(Short)98.2810099.7199.240.01080.0940.023
TGDMat(Long)98.6310099.8799.520.0090.0490.018

Carbon

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
CDVAE10099.3582.660.15390.2889
CDVAE+(Short)10099.3482.960.13980.2804
CDVAE+(Long)10099.8284.760.13770.2660
SyMat10099.4297.170.12343.9628
SyMat+(Short)10099.5297.200.12063.7422
SyMat+(Long)10099.9097.630.11713.8620
DiffCSP99.999.4997.270.08610.0876
DiffCSP+(Short)10099.6197.290.08110.087
DiffCSP+(Long)10099.9397.330.07630.0853
TGDMat(Short)10099.8191.770.06810.0865
TGDMat(Long)10099.992.430.0430.063

MP

MethodComp ValidityStruct ValidityCov-RCov-P#ElementDensityEnergy
CDVAE86.3010099.1599.491.49210.70850.3039
CDVAE+(Short)87.0510099.3699.600.99300.64200.2970
CDVAE+(Long)87.4210099.5799.810.97200.63880.2977
SyMat87.9699.998.3099.370.52360.40120.3877
SyMat+(Short)96.9499.998.5999.470.50310.39170.3622
SyMat+(Long)97.8899.999.0199.950.48650.38790.3489
DiffCSP83.2510099.4199.760.34110.38020.1497
DiffCSP+(Short)84.5710099.5299.850.33100.38000.1379
DiffCSP+(Long)85.0710099.8199.890.31220.37990.1355
TGDMat(Short)86.6010099.7999.880.33370.32960.1189
TGDMat(Long)92.9710099.8999.950.2890.3080.115

CSP Task :

Method#SamplesPerovPerovCarbonCarbonMPMP
Match RateRMSEMatch RateRMSEMatch RateRMSE
CDVAE145.310.113817.090.296933.900.1045
CDVAE2088.510.046488.370.228666.950.1026
CDVAE+(Short)148.970.106322.650.264040.330.1037
CDVAE+(Short)2089.540.042389.610.218870.220.0876
CDVAE+(Long)149.250.105523.730.259041.800.1021
CDVAE+(Long)2089.730.041789.770.205372.560.0840
SyMat147.320.107420.810.265533.920.1039
SyMat2090.250.031689.290.218471.030.0945
SyMat+(Short)149.390.098527.710.256740.840.1027
SyMat+(Short)2092.100.025590.860.206971.310.0875
SyMat+(Long)150.880.096328.180.251043.170.1016
SyMat+(Long)2092.300.020191.650.187072.960.0820
DiffCSP152.020.076017.540.275951.490.0631
DiffCSP2098.600.012888.470.219277.930.0492
DiffCSP+(Short)156.540.058324.130.242452.220.0597
DiffCSP+(Short)2098.250.013788.280.225280.970.0443
DiffCSP+(Long)190.460.02044.630.22655.150.057
DiffCSP+(Long)2098.590.00795.270.15382.020.039

Observations:

  1. For both tasks, across all the datasets, text guidance out performs the vanilla diffusion models in almost all metrics.
  2. Our experiments suggest that using shorter prompts textguided models outperforms the vanilla baseline models. However performance is even superior when using text guided diffusions using longer prompts.
  3. For CSP task, using text guidance during the reverse denoising process, with just one generated sample per test material, text-guided variants outperform respective vanilla models, thereby reducing computational overhead.
  4. Our proposed TGDMat (Long) stands out as the leading model when compared to all baseline models and their text-guided variants across three benchmark datasets. In specific, for Gen Task, TGDMat (Long) outperforms closest baseline DiffCSP+ (Long) because we leveraged discrete diffusion on atom types, which is more powerful learning discrete variables like atom types.
  5. Finally, results indicate that utilizing shorter prompts TGDMat (Short) results in a slight decrease in overall performance compared to the longer variant TGDMat (Long). Nonetheless, the performance remains superior or comparable to baseline models (vanilla and text-guided variants)

We will add these comprehensive results in the revised manuscripts (Appendix F.3 in the revised version).

评论

Dear Reviewer D75u,

Thank you very much for providing us with valuable feedback. We appreciate the detailed comments. Below, we have provided point-by-point responses to each of your comments.

Questions:

Details on how the contextual representation of long, detailed text data is integrated into the denoising network?

As mentioned in Section 4.3.2 TEXT GUIDED DENOISING NETWORK (Line-313-18), Equation-5, At each timestep t of reverse diffusion, we concatenate textual representation Cp**C**_**p** with each input atom feature. Following this same approach, we also developed text-guided versions of the baseline models, named CDVAE+, SyMat+, and DiffCSP+, in which the contextual representation from detailed text data is integrated into the denoising networks of these models.

How well generated materials align with the given text prompts?

Quantitative results on the alignment of the generated structures with the given prompts, in terms of compositions, properties, and other factors, are provided in the "F.3 CORRECTNESS OF GENERATED MATERIALS" section of the Appendix (Page 23, Line 1229). This is also referenced in the "Additional Results" section (Page 10, Line 527) of the main manuscript. In the revised manuscript, this has been incorporated into Section 5.4 of the main manuscript.

Following are the details :

To ensure the fidelity of our model’s outputs concerning these specified global attributes from the text prompt, we randomly generated 1000 materials (sampled from all three Datasets) based on their respective textual descriptions(both Long and Short) and assessed the percentage of generated materials that matched the global features outlined in the text prompt. In specific, we matched the Formula, Space group, and Crystal System, and Dimensions of generated materials with the textual descriptions.Moreover, we examined whether properties such as formation energy and bandgap matched the specified criteria as per the text prompt (positive/negative, zero/nonzero).

Results for TGDMat(Long):

Global Feature% of Matched Materials(Perov)% of Matched Materials (Carbon)% of Matched Materials(MP)
Formula97.5098.2070.54
Space Group87.0080.7967.88
Crystal System92.6091.5573.54
Formation Energy95.49-92.88
Band Gap-98.6196.73

Results for TGDMat(Short):

Global Feature% of Matched Materials(Perov)% of Matched Materials (Carbon)% of Matched Materials(MP)
Formula90.7092.5665.22
Space Group86.5180.5058.77
Crystal System83.1981.6472.77
Formation Energy90.33-91.00
Band Gap-95.9093.33

In general, using longer text, considering Perov-5 and Carbon-24 datasets, the generated material meets the specified criteria effectively. However, when dealing with the MP-20 dataset, which is more intricate due to its complex structure and composition, performance tends to decline. Additionally, when using shorter prompts, overall performance suffers across all datasets compared to longer text inputs. This is because the longer text, provided by the robocrystallographer, offers a comprehensive range of information, both global and local, thereby enhancing the generation capabilities of TGDMat.

评论

Dear Reviewer,

Thank you for your valuable feedback and constructive comments.

In our rebuttal, we have provided additional experiments and enhanced explanations, and we hope we have addressed all the concerns raised by the reviewer. We are open to further discussions and are happy to clarify any remaining doubts.

If there are still any outstanding issues, we kindly request you to share them with us. Otherwise, we would greatly appreciate it if you could consider revising the score.

We look forward to your response.

Thank you,

The Authors

评论

Dear Reviewer D75u,

Thanks again for your insightful and thoughtful comments!

As the reviewer-author discussion period is closing soon (November 26 at 11:59 pm AoE), we would like to gently remind you that we are eagerly awaiting your feedback on our response.

We have updated our revised manuscript, where we made the following updates as per your suggestions:

  • Assessing how well the generated materials align with the provided text prompts (Section 5.4).
  • Comprehensive and detailed results for both Gen and CSP tasks across three benchmark datasets (Appendix F.3).

If there are any remaining concerns, we kindly request you to share them with us. Otherwise, we would greatly appreciate it if you could consider revising the score.

We look forward to your response.

Thank you,

The Authors

评论

Dear Reviewer D75u,

As the reviewer-author discussion period is closing soon (November 26 at 11:59 pm AoE), we would like to gently remind you that we are eagerly awaiting your feedback on our response and revised manuscript.

We are happy to inform you that all of the three reviewers now lean towards acceptance. Your insights and evaluation play a crucial role in deciding the ultimate fate of our work, and we are eagerly awaiting your response to the revised manuscript.

regards,

Authors

评论

Thank you for the rebuttal

I apprieciate the authors' efforts during rebuttal, especially the new comparison regarding the long and short textual prompt.

I have raised the score to 6.

评论

Dear Reviewer D75u,

We sincerely thank you for your positive feedback and for raising the score to 6. We have carefully incorporated all your suggested changes into our revised manuscript, and we believe these contributions significantly enhance the research at the intersection of AI and material science.

We would be grateful if you could review the revised version and let us know if it is possible to change the decision from borderline accept to accept.

Your feedback would be highly appreciated. If you feel that additional experiments or results would further strengthen the manuscript, we are more than willing to provide them and actively engage in any further discussions.

Thank you once again for your valuable insights and support. We look forward to your response.

Regards,

The Authors

审稿意见
8

This paper presents TGDMat, a novel text-guided diffusion model for generating periodic crystal materials. TDGMat jointly models the generation of atom types, coordinates, and lattices of materials using separate diffusion processes. Specifically, it uses a Denoising Diffusion Probabilistic Model for lattice modeling, and uses a discrete diffusion model D3PM for modeling atom types. As for atom coordinate modeling, the authors mainly follow DiffCSP and use a score matching objective. Note that, they apply diffusion on fractional coordinates, instead of Cartisian coordinates, which cannot reflect the periodicity of crystal materials. To ensure geometric symmetry, TGDMat uses the CSPNet proposed by DiffCSP, which ensures periodic E(3) invariance for periodic crystals.

For the text-guided component, material descriptions are generated using Robocrystallographer software. Text embeddings, produced by a language model, are then concatenated with node embeddings to guide material generation.

优点

  1. The overall performance is good, as shown in Table 3. TGDMat shows significant improvement on the random material generation task compared to previous methods. Additionally, the authors demonstrate the importance of incorporating text-guidance by showing the improved performance of baselines in Table 1 and Table 2.
  2. The method is overall sound and well-engineered. The authors have employed state-of-the-art diffusion methods for each modality associated in the whole material generation process. The source code is attached as a supplementary material.
  3. The textual annotation for the material dataset is a nice additional contribution, and is potentially impactful. It can stimulate future research for joint text and material modeling.

缺点

  1. It seems that a large proportion of the methodology is borrowed from the previous work DiffCSP. This includes the diffusion process for atom coordinates, the diffusion for lattice, and the GNN backbone.
  2. The authors are suggested to use the \citep command instead of \cite for citations to improve the readability.
  3. The proposed method has achieved significant performance on the employed evaluation metrics, like validity and coverage. Does this mean the model can be readily employed for practical material discovery in industry? If yes, can you include further discussion on this application? If not, what is a barrier? Are there any other evaluation metrics that should be measured, like the novelty of the generated material compared to the training set, before application?

问题

  1. How large is the proposed model and the compared baselines for material generation?
评论

Dear Reviewer cwRr,

Thank you very much for providing us with valuable feedback. We appreciate the detailed comments. Below, we have provided point-by-point responses to each of your comments.

Weaknesses:

Key differences between DiffCSP and TGDMat

Following are the key differences between DiffCSP and our proposed TGDMat

DiffCSPTGDMat
TasksOnly CSP TaskBoth CSP and Gen Tasks
Diffusion on Atom TypeNADiscrete Diffusion (D3PM)
Model CategoryUnconditional; unable to specify the criteria required by the user.Conditional; able to specify the criteria required by the user. (in Text Format)
Text Guided DiffusionNOYes

However, note the goal of this paper is not to introduce a new diffusion model to replace existing models like DiffCSP or CDVAE for periodic material generation. Instead, we focus on demonstrating that conditional models can outperform traditional unconditional models, such as DiffCSP. Specifically, we show that incorporating textual conditions through text-guided diffusion leads to better performance compared to using unconditional models like DiffCSP. Additionally, we enhance DiffCSP by integrating discrete diffusion over atom types in our proposed TGDMat framework.

Suggestion: Use of \citep command.

Thanks for the suggestion. We will update the revised manuscript.

Regarding Practical Deployment of the Model in Industry.

Our proposed method demonstrates significant potential for practical material discovery in industry by generating valid, diverse, and structurally plausible materials aligned with user-provided textual descriptions. This capability positions our model as an efficient tool for creating “initial templates” of materials tailored for applications such as battery materials, solar cells, or catalysts, significantly reducing the time and computational resources required for exploratory studies.

However, some barriers must be addressed before full deployment in industrial workflows. One challenge is the potential mismatch between generated structures and experimental ground truth, which arises from the inherent approximations in the model. These generated structures require further validation and refinement using computational methods, such as density functional theory (DFT), to ensure their physical and chemical feasibility before full deployment in industrial workflows.

Questions:

Model Size of all Baselines and TGDMat

Models# ParametersModel size
CDVAE492041418.771 MB
SyMat338560112.915 MB
DiffCSP1229465646.923 MB
TGDMat1243222847.448 MB

We would like to once again express our gratitude to Reviewer cwRr for their valuable comments and suggestions. We will incorporate these insights into the revised manuscript. We believe our responses above effectively address all of Reviewer cwRr's concerns and further enhance the quality of our work.

Sincerely,
The Authors

评论

Dear Reviewer,

Thank you for your valuable feedback and constructive comments.

In our rebuttal, we have provided additional experiments and enhanced explanations, and we hope we have addressed all the concerns raised by the reviewer. We are open to further discussions and are happy to clarify any remaining doubts.

If there are still any outstanding issues, we kindly request you to share them with us. Otherwise, we would greatly appreciate it if you could consider revising the score.

We look forward to your response.

Thank you,

The Authors

评论

Thank you for the response. I remain my original rating, as I did not see any updates on your submission.

ICLR allows authors to revise their manuscript during rebuttal to resolve the reviewers' concerns. You can leverage this opportunity to improve your manuscript by incorporating the reviewers' comments. To help the reviewers' recognize your update, you can use use colored texts in your revised manuscript.

评论

Dear Reviewer,

Thank you for your valuable feedback and thoughtful comments.

In response, we have uploaded the revised manuscript and made the following updates as per your suggestions:

  • Added a detailed discussion on the key differences between DiffCSP and TGDMat (Section 3 and Appendix B.5).
  • Replaced \cite commands with \citep for citations to improve the readability.
  • Included a model size comparison of baselines and TGDMat (Table 7, Appendix, Lines 1205–1212).

We hope these revisions adequately address all the concerns you raised. If there are any remaining issues or additional clarifications needed, please let us know, and we would be happy to address them. Otherwise, we kindly request you to consider revising the score based on these updates.

We look forward to your response.

Thank you,

The Authors

评论

Dear Reviewer cwRr,

As the reviewer-author discussion period is nearing its conclusion (November 26 at 11:59 pm AoE), we would like to kindly remind you that we are eagerly awaiting your feedback on our revised manuscript.

We greatly value your insightful comments, which have been instrumental in improving our work. Your suggestions have been thoughtfully incorporated into the revision, and we hope the updated manuscript addresses all the concerns you raised.

If there are any remaining issues or additional clarifications required, please do not hesitate to let us know—we would be happy to address them promptly. Otherwise, we kindly ask you to consider revising your score based on the updates.

Thank you once again for your time and effort in reviewing our work. We look forward to hearing from you.

Best regards,
The Authors

评论

Thank you for the revision. I think this paper would be a nice contribution to the ICLR conference. I have raised my scores accordingly.

评论

We sincerely thank the reviewers for their valuable insights and constructive feedback on our work. We have revised the main manuscript and appendix, with all changes highlighted in blue. A summary of the major updates is outlined as follows:

Key Updates:

  1. Additional Experiments:
    We have conducted all the additional experiments requested by the reviewers, including:

    • Assessing how well the generated materials align with the provided text prompts (Section 5.4).
    • Comprehensive and detailed results for both Gen and CSP tasks across three benchmark datasets (Appendix F.3).
    • Comparing the utility of text guidance versus feature vector guidance (Appendix F.7).
    • Ablation study on the joint learning of crystal geometry (Appendix F.8).
  2. Presentation Improvements:
    We have implemented several enhancements to improve the clarity and readability of the manuscript, including:

    • Providing a detailed discussion on the key differences between DiffCSP and TGDMat.(Section 3 and Appendix B.5)
    • Replacing \cite commands with \citep for more consistent citation formatting.
    • Correcting a typo in Figure 2 regarding the direction of forward and reverse diffusions.

We believe these revisions comprehensively address the reviewers’ concerns and significantly improve the quality of our work.

AC 元评审

This paper presents TGDMat, a text-guided diffusion model for generating periodic crystal materials. The proposed method integrates atom types, coordinates, and lattice parameters into a unified diffusion framework, guided by text-based descriptions. The reviewers appreciated the technical depth and relevance of the work, highlighting the innovative use of textual descriptions for guiding material generation. However, concerns were raised about the limited novelty compared to prior work, clarity in presenting the role of text guidance, and the robustness of the evaluation metrics. The authors addressed these issues through extensive rebuttals, additional experiments, and manuscript revisions, which convinced the reviewers of the method's contributions and applicability. With all reviewers leaning towards acceptance, the consensus supports the recommendation of accepting the paper as a poster presentation.

审稿人讨论附加意见

The discussion phase primarily focused on the method's novelty, the role of textual guidance compared to feature vector-based guidance, and the evaluation of generated materials' alignment with text prompts. The authors provided substantial clarifications, emphasizing the advantages of text embeddings in capturing complex material attributes, supported by additional experiments showing the superior performance of text-guided models. They also included ablation studies on joint versus separate learning of crystal parameters and a detailed comparison with related methods like DiFCSP. These updates addressed most reviewer concerns, leading to an improved consensus. Despite some residual doubts about broader applicability, the reviewers agreed that the work is a meaningful contribution to the field.

最终决定

Accept (Poster)