-
The news dataset for discriminating between Bosnian, Croatian and Serbian SET...
The SETimes.HBS dataset consists of parallel documents written in Bosnian, Croatian and Serbian, harvested from the already inactive setimes.com website publishing news in the... -
The Twitter user dataset for discriminating between Bosnian, Croatian, Monten...
The Twitter-HBS dataset consists of Twitter users, their tweets, and the label of their predominantly used language - Bosnian, Croatian, Montenegrin, or Serbian. Among the... -
A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Docum...
This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations... -
A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Docum...
These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The...