Structure prediction for molecular crystals is a longstanding challenge, as often minuscule free energy differences between polymorphs are sensitively affected by the description of electronic structure, the statistical mechanics of the nuclei and the cell, and thermal expansion. The importance of these effects has been individually established, but rigorous free energy calculations, which simultaneously account for all terms, have not been computationally viable.
Here we reproduce the experimental stabilities of polymorphs of prototypical compounds -- benzene, glycine, and succinic acid -- by computing rigorous first-principles Gibbs free energies, at a fraction of the cost of conventional methods. This is achieved by a bottom-up approach, which involves generating machine-learning potentials to calculate surrogate free energies and subsequently calculating true first-principles free energies using inexpensive free energy perturbations.
Accounting for all relevant physical effects is no longer a daunting task and provides the foundation for structure predictions for more complex systems of industrial importance.
This Materials Cloud archive contains first-principles training, validation, and test data for polymorphs of benzene, succinic acid, and glycine underlying the above-mentioned machine-learning potentials.
For each compound the archive provides two sets of data: the first based on DFT calculations with the semi-local PBE functional and the Tkatchenko-Scheffler dispersion correction, and the second based on DFT calculations with the hybrid PBE0 functional and the many-body dispersion correction of Tkatchenko et al. For each compound and both levels of electronic-structure theory, structure datasets in libatom extended-xyz format provide representative, thermalised configurations and the associated configurational energies, atomic forces, and stresses on the simulation cell. The configurations are extracted from a combination of classical temperature replica exchange molecular dynamics simulations and path-integral molecular dynamics for a representative set of perturbed unit cells based on the experimental structures of: forms I, II, Ihp and V' of benzene, alpha- and beta-succinic acid, and alpha-, beta-, gamma-, and delta-glycine. The detailed data provenance and dataset architecture is described in the attached README file.