A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents

PID

This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).

Identifier
PID http://hdl.handle.net/11234/1-4936
Related Identifier https://nlp.fi.muni.cz/projects/ahisto/ner-dataset
Related Identifier http://hdl.handle.net/11234/1-5024
Related Identifier https://starfos.tacr.cz/en/project/TL03000365
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-4936
Provenance
Creator Novotný, Vít; Luger, Kristýna; Štefánik, Michal; Vrabcová, Tereza; Horák, Aleš
Publisher Masaryk University, Brno
Publication Year 2022
Rights Public Domain Dedication (CC Zero); http://creativecommons.org/publicdomain/zero/1.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech; English; German; Latin
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 2
Discipline Linguistics