Annotated corpus of Slovenian language-related news articles MetaLangNEWS-Sl

PID

A comprehensive corpus of news articles on the topic of language, published in major Slovenian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, ongoing in post-Yugoslav societies. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of standard Slovenian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. MetaLangNEWS-Sl is complemented with a separate corpus of citizen metalanguage comments, i.e. online comments to the news articles, available as MetaLangNEWS-COMMENTS-Sl (http://hdl.handle.net/11356/1362). Parallel versions from Croatia (http://hdl.handle.net/11356/1369) and Serbia (http://hdl.handle.net/11356/1371) are also available.

Identifier
PID http://hdl.handle.net/11356/1360
Related Identifier https://ikss.zrc-sazu.si/en/programi-in-projekti/re-imagining-language-nation-and-collective-identity-in-the-21st-century#v
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1360
Provenance
Creator Bogetić, Ksenija; Batanović, Vuk
Publisher ZRC SAZU; Regional Linguistic Data Initiative Centre ReLDI
Publication Year 2020
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 3
Discipline Linguistics