GaMS-Instruct-DH is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions. It consists of pairs of prompts and responses, some of which contain an additional context field, as well as a field in which the source of the information included in the response is listed.
The dataset focuses on prompts from the field of digital humanities and museum documentation. Its primary goal is to provide a resource that allows existing large language models already available for the field of digital humanities to be expanded to cover Slovene and other similar, but less-resourced languages (e.g. Bosnian).
Version 1.0 include approx. 10,000 prompt-response pairs which were compiled entirely by hand by a team of linguists and experts from the field of digital humanities.