3 datasets found

Keywords: under resourced languages

Filter Results
  • Amharic WIC Corpus

    Substantially cleaned version of existing morphologically annotated WIC Corpus.
  • Somali Web Corpus

    Somali web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
  • Tigrinya Web Corpus

    Tigrinya web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
You can also access this registry using the API (see API Docs).