In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B²Rl², EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
This record includes data to support the article "Benchmarking machine-readable vectors of chemical reactions on computed activation barriers". This supports the github repository https://github.com/lcmd-epfl/benchmark-barrier-learning which contains the codes and duplicates the data.