The application of machine learning to the modeling of materials and molecules has proven to be extremely successful in accelerating the understanding, design, and characterization of materials. A major factor in this success has been the development of representations of atomic structures that reflect physics-based symmetries of the underlying interactions. Most of the descriptions of atomic properties or even global observables rely on decompositions into atomic contributions that are subsequently learnt in an atom-centered framework. However, many quantities associated with quantum mechanical calculations, such as the single-particle Hamiltonian matrices written in an atomic-orbital basis, are associated with multiple atom-centers.
Following the introduction of equivariant N-center structural descriptors, in the reference below, that generalize the very successful atom-centered density correlation features to the problem of learning properties indexed by N atoms, we present benchmarks showing how the construction can be applied to efficiently learn the matrix elements of the (effective) single-particle Hamiltonian in an atom-centered orbital basis.
In this record, we include the dataset comprising the Fock and overlap matrices in the def2-SVP of 1000 distorted water molecules, up to 4500 ethanol structures, and a subset of QM7-CHNO molecules. We also provide scripts to generate the two-center representations and fit linear and sparse kernel models for the Hamiltonians.