The Montenegrin web corpus meWaC was built by crawling the .me top-level domain in 2019. The corpus was near-deduplicated on paragraph level, normalised via transliteration into the Latin script, and morphosyntactically annotated, lemmatised and dependency-parsed with a prototype version of the classla pipeline (https://pypi.org/project/classla/). Each document is accompanied by the URL and title metadata.
The corpus is available in CoNLL-U format and as vertical file (wilth included registry) for mounting on CQP-compatible concordancers.