4 datasets found

Keywords: web corpora

Filter Results
  • Amharic WIC Corpus

    Substantially cleaned version of existing morphologically annotated WIC Corpus.
  • Somali Web Corpus

    Somali web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
  • CEHugeWebCorpus

    This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: diskurslinguistik.net / diskursmonitor.de). It includes the filtered...
  • Tigrinya Web Corpus

    Tigrinya web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
You can also access this registry using the API (see API Docs).