Wordlist of the Contemporary Corpus of Lithuanian Language in the Face of War in Ukraine

PID

We present the comparative wordlist based on the Corpus of the Contemporary Lithuanian Language (CCLL2 version 2, pre-2020), supplemented by the media (courtesy of the news media company 15min – www.15min.lt) and social networks lexicons of the war in Ukraine period (Feb 2022 to Feb 2024).

For a fair comparison, all word counts have been normalized as if they were 100m words in each source. CCLL2 has 162m words, wartime media – 36m words and wartime social networks – 2m words. The term "word" does not apply here to punctuation, numbers, dates, URL's, hashtags, popular English words, etc.

The data itself is in the form of a tab-separated-values (TSV) text file consisting of the following columns: word(token), CCLL2 count, CCLL2 docs, media count, media docs, social networks count, social networks docs. Where "docs" mean number (normalized) of documents with a particular word. All words are written as case-insensitive using capital letters.

Identifier
PID http://hdl.handle.net/20.500.11821/57
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/57
Provenance
Creator Dadurkevičius, Virginijus
Publisher SITTI
Publication Year 2024
Rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; PUB
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type lexicalConceptualResource
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics