-
Exploring genealogical blends_Online Corpus
The online corpus supplement to the paper "Exploring genealogical blends: the Surinamese Creole Cluster and the Virgin Islands Dutch Creole Cluster", published in the CLARIN... -
s.morfcorpus.6ec19594.20131227-2309
WMT 2013 Crawled News monolingual corpus, Czech, segmented by Morfessor -
Psycholinguistic Experiment Video
This is a video recording that is being used in psycholinguistic experiments. -
Prague Dependency Treebank 2.0 Sample Data
This is a small sample dataset from PDT 2.0. As such it can be released under a very permissive CC-BY license. -
Interaction and dialogue with large-scale textual data: Parliamentary speeche...
Prof. Dr. Andreas Blätte's keynote talk at the CLARIN Annual Conference 2015. Additional material, including the presented 3D visualisations, are available via... -
Sign Language Interaction
This is a sign language interaction recording made for scientific purposes. -
Replication of part of the IFA corpus
The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers in a variety... -
TXM_0.7.7_Win64.exe
TXM 0.7.7 for Windows 64-bit setup file TXM is a free and open-source (GPL v3) textual corpora analysis platform. It combines five key components: a) the ability to import and... -
Časování sloves v bengálštině
Description of verbal paradigms in Bengali. The description is written in Czech. -
Language Learning Stimulus Video
This is a video recording that is used for studying language learning by young children. -
Syntactically annotated Czech legal texts
Two legal texts syntactically manually annotated according to the Prague dependency treebank framework. Dependency trees are presented as images. The annotation editor TrEd was... -
Orthography-based dating and localisation of Middle Dutch charters
In this study we build models for the localisation and dating of Middle Dutch charters. First, we extract character trigrams and use these to train a machine learner (K Nearest... -
Annotated Route Description
This file set existing of a video stream, an audio stream and a multimodal annotation file is a frequently used as show case of how to do complex multimodal annotations with the... -
SIgn Language Recording
This is a Sign Language Recording made for scientific purposes. -
Wikipedia paths
Wikipedia category embedding starting at the top category Biology for English, French and Czech. English data are not complete. -
HELLO CAMPANIA! Philippines Collection
The Philippines collection contains data for 66 speakers: 32 first generation (G1), 28 second generation (G2), 6 homeland (G0). The collection contains three folders for each... -
TED-ELH Parallel Corpus
The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data. -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS v2.0
English-Lithuanian parallel corpus DVITAS v2 includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. Version 1 of the... -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS
English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was... -
Lithuanian-English Cybersecurity Termbase v.0.1
The bilingual termbase is TBX export of the online termbase https://www.terminologue.org/csterms/. The termbase includes terms for 233 cybersecurity concepts.