-
The Twitter user dataset for discriminating between Bosnian, Croatian, Monten...
The Twitter-HBS dataset consists of Twitter users, their tweets, and the label of their predominantly used language - Bosnian, Croatian, Montenegrin, or Serbian. Among the... -
The news dataset for discriminating between Bosnian, Croatian and Serbian SET...
The SETimes.HBS dataset consists of parallel documents written in Bosnian, Croatian and Serbian, harvested from the already inactive setimes.com website publishing news in the... -
A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Docum...
These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The... -
A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Docum...
This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations...