27 datasets found

Keywords: word normalisation

Filter Results
  • Serbian Twitter training corpus ReLDI-NormTagNER-sr 3.0

    ReLDI-NormTagNER-sr 3.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,...
  • Serbian Twitter training corpus ReLDI-NormTag-sr 1.0

    ReLDI-NormTag-sr 1.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word...
  • Croatian Twitter training corpus ReLDI-NormTagNER-hr 3.0

    ReLDI-NormTagNER-hr 3.0 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,...
  • CMC training corpus Janes-Norm 1.0

    Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,...
  • CMC training corpus Janes-Tag 1.1

    Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence...
  • CMC training corpus Janes-Norm 1.1

    Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,...
  • CMC training corpus Janes-Tag 1.0

    Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence...
You can also access this registry using the API (see API Docs).