Developing Large Language Models for Quantum Chemistry Simulation Input Generation

DOI

This repository encompasses all code used to run the experiments described in the study titled "Developing Large Language Models for Quantum Chemistry Simulation Input Generation". In addition to the code for our system architecture, we include the datasets described in the study, which can be used for further research. The repository also contains some generally helpful classes. To reproduce the results from our study, refer to the Scripts folder, where we explain the scripts used to run our experiments and gather data. For more insight into the classes used and how to implement them in your own research, refer to the Classes folder. We for instance show how to easily use our rule-based system to generate different calculations. Additionally, you can inspect and extract the various datasets we used from the Data folder, where all available datasets are explained. The Orca Output folder stores all output files gathered from running ORCA calculations. One important note is that to use the code in this repository, you should configure your own OpenAI API key in your system path. Moreover, to use RAG, one should scrape the ORCA input library with our provided script and add the ORCA manual to the Documents/Regular folder. We do not publish this here as we are not the writers.

Identifier
DOI https://doi.org/10.34894/WNRHA4
Related Identifier IsCitedBy https://doi.org/10.26434/chemrxiv-2024-9g2w2
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/WNRHA4
Provenance
Creator Pollice, Robert ORCID logo; Jacobs, Pieter Floris
Publisher DataverseNL
Contributor Groningen Digital Competence Centre; Pollice, Robert; Jacobs, Pieter Floris; DataverseNL network
Publication Year 2024
Rights CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact Groningen Digital Competence Centre (rug.nl)
Representation
Resource Type Python code, ORCA input files, User prompts; Dataset
Format application/zip; application/octet-stream; text/plain
Size 1886040; 1612; 265
Version 1.0
Discipline Chemistry; Natural Sciences
Spatial Coverage Groningen, The Netherlands