-
Amharic WIC Corpus
Substantially cleaned version of existing morphologically annotated WIC Corpus. -
Somali Web Corpus
Somali web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated. -
CEHugeWebCorpus
This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: diskurslinguistik.net / diskursmonitor.de). It includes the filtered... -
Tigrinya Web Corpus
Tigrinya web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.