DIGIRES COVID-19 Corpus v.1

PID

DIGIRES COVID-19 Corpus v.1 consists of 351 Lithuanian media articles about COVID-19 pandemics. The corpus was compiled from various internet public Lithuanian media sources. Corpus contains 351 files in plain text format (TXT) with UTF-8 encoding. Each article consists of a title (in the 1st line) and an article body. Files are classified into two subcorpora: 1) "unrealiable" that contains articles, which were identified by professional fact checkers as fake news; 2) "reliable" that contains trustworthy articles.

Subcorpus Files Word tokens Reliable: 175 67902
Unreliable: 176 118747 Total 351 186649

Identifier
PID http://hdl.handle.net/20.500.11821/53
Related Identifier https://digires.lt/
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/53
Provenance
Creator Amilevičius, Darius; Utka, Andrius; Meidutė, Aistė; Ruzaitė, Jūratė
Publisher Vytautas Magnus University
Publication Year 2023
Rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; PUB
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type corpus
Format text/plain; application/zip; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics