CLARIN - Repositories

Lithuanian-English Cybersecurity Termbase v.0.1

The bilingual termbase is TBX export of the online termbase https://www.terminologue.org/csterms/. The termbase includes terms for 233 cybersecurity concepts.

DELFI.lt corpus

DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date,...

EMVAKA

Two Lithuanian language children’s corpora, collected during the EMVAKA project, consist of the Lithuanian language production by children aged 7–13: (1) spoken (73 files, c....

English-Lithuanian Comparable Vaccination Corpus

Two news portals were selected for comparable corpora building: the Lithuanian portal DELFI and the English portal The Guardian. The compiled corpora comprise 135 Lithuanian...

Lithuanian Spelling Checker V.1.0.45 for macOS

Lithuanian spelling checker for macOS 2020-04-10 version 1.0.45

Lithuanian morphologically annotated corpus - MATAS v1.0

MATAS corpus (version 1.0) DESCRIPTION Manually checked, morphologically annotated corpus MATAS FORMATS 1. CoNLL-U (CONLLU, conllu) 2. SketchEngine - tab delimited word per...

Lithuanian 3-gram dataset

Dataset of 3-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol...

Lithuanian 1-gram dataset

Dataset of 1-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then...

DIGIRES COVID-19 Corpus v.1

DIGIRES COVID-19 Corpus v.1 consists of 351 Lithuanian media articles about COVID-19 pandemics. The corpus was compiled from various internet public Lithuanian media sources....

Database of Lithuanian Multiword Expressions

Database of Lithuanian multiword expressions (MWE) contains bi-gram and tri-gram MWE that occured in DELFI.lt corpus (http://tekstynas.mwe.lt/) at least 10 times. In the...

Lithuanian font family AISTIKA

Original TrueType font designed and hinted in Lithuania. The font complies with the ISO/IEC 10646 (Unicode) standard and have the full set of casual and accented Lithuanian...

Lithuanian Corpus of the EU Primary and Secondary Law Acts of the Period 2015...

274,460 word corpus comprised of selected primary and secondary law acts of the EU of the period 2015-2017. The corpus was compiled of documents containing words with the root...

Frequency lists of pivot words and GSE counts

The resource contains data used to estimate the amount of words in Lithuanian texts indexed by the selected Global Search Engines (GSE), namely Google (by Alphabet Inc.), Bing...

Dual Pronoun Translation Concordances

The resource offers two data sets: concordances of dual pronoun translations from Lithuanian into English (942 concordance lines) and translations of English pronouns into...

LITIS v.1

Corpus of user-generated comments collected from two Lithuanian portals: www.delfi.lt and www.lrytas.lt Each comment is in a separate file (TXT). Each file contains: a comment,...

Wordlist of the Contemporary Corpus of Lithuanian language

Dabartinės lietuvių kalbos tekstyno žodžių formų dažniniai sąrašai Worlists of Wordforms of the Contemporary Corpus of Lithuanian language Tekstyno struktūra/Corpus Structure...

Lithuanian 2-gram dataset

Dataset of 2-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol...

Corpus KLASIUS v.02

900 extracts for the corpus were collected from manuals and publications for secondary school students included in the compulsory bibliographic descriptions of the university...

Wordlist of Lemmas from the Joint Corpus of Lithuanian

The resource is a wordlist of lemmas from the Joint Corpus of Lithuanian (JCL). The JCL is a merge of three corpora: 1) Vilnius university corpus compiled out of the Lithuanian...

Lithuanian Spelling Checker V.1.0.45 for Linux

Lithuanian spelling checker for Linux 2020-04-07 version 1.0.45

4,731 datasets found