DIGIRES COVID-19 ML Dataset v.1


DIGIRES COVID-19 ML dataset v.1 is a tab-separated (.tsv) file prepared for training machine learning algorithms. The training dataset was compiled from various internet public Lithuanian media sources. It contains 351 records and has the following attributes: "Title": the title of a news article "Text": the text of the article "Label": a label that marks the article as 1: unreliable; 0: reliable 1) "unrealiable" marks articles, which were identified by professional fact checkers as fake news; 2) "reliable" marks trustworthy articles.

Classes Labels Word tokens Reliable: 175 67902
Unreliable: 176 118747 Total 351 186649

PID http://hdl.handle.net/20.500.11821/54
Related Identifier https://digires.lt/
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/54
Creator Amilevičius, Darius; Utka, Andrius; Meidutė, Aistė; Ruzaitė, Jūratė
Publisher Vytautas Magnus University
Publication Year 2023
Rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; PUB
OpenAccess true
Contact info(at)clarin.vdu.lt
Language Lithuanian
Resource Type toolService
Format text/plain; application/zip; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics