CLARIN - Repositories

Interaction and dialogue with large-scale textual data: Parliamentary speeche...

Prof. Dr. Andreas Blätte's keynote talk at the CLARIN Annual Conference 2015. Additional material, including the presented 3D visualisations, are available via...

SemMyv - Semantic Database for Erzya

This SQLite database contains Erzya lemmas and their frequencies in a big corpus. The lemmas are linked to each other based on the syntactic relations they have had in the...

Haur Hezkuntzako ipuin-bilduma

Euskal Herriko Ikastolen elkartean lantzen diren ipuinen bilduma

Sign Language Interaction

This is a sign language interaction recording made for scientific purposes.

Natas - Python 3 library for processing historical English

This library will have methods for processing historical English corpora, especially for studying neologisms. The first functionalities to be released relate to normalization of...

SemSms - Semantic Database for Skolt Sami

This SQLite database contains Skolt Sami lemmas and their frequencies in a big corpus. The lemmas are linked to each other based on the syntactic relations they have had in the...

Syntax Maker - The NLG tool for Finnish

Syntax maker is the natural language generation tool for generating syntactically correct sentences in Finnish automatically. The tool is especially useful in the case of...

Replication of part of the IFA corpus

The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers in a variety...

B2 eta C1 mailetako azterketen etiketatzea eta analisia

Hizkuntza ikasleen azterketak bildu ditugu. Europar markoko B2 eta C1 mailetako probak dira, sail bakoitzetik 20 ale. Horiek etiketatu eta ondoren esleitutako etiketekin analisi...

TXM_0.7.7_Win64.exe

TXM 0.7.7 for Windows 64-bit setup file TXM is a free and open-source (GPL v3) textual corpora analysis platform. It combines five key components: a) the ability to import and...

Časování sloves v bengálštině

Description of verbal paradigms in Bengali. The description is written in Czech.

Language Learning Stimulus Video

This is a video recording that is used for studying language learning by young children.

HD graduondokoa (Magia argibideak)

Magia jokoak egiteko argibide sorta

Finnish Locative Cases for Nouns

Picking the right locative case in Finnish can be quite the challenge. Some words seem to prefer the internal locative case such Suomessa in Finland, while other words are...

Comparison of the usage of nouns by female and male members of the Polish par...

Dataset based on the Polish Parliamentary Corpus: utterances from male and female Members of Parliament (MP), extracted from the current cadency (8th) of Sejm, between...

Syntactically annotated Czech legal texts

Two legal texts syntactically manually annotated according to the Prague dependency treebank framework. Dependency trees are presented as images. The annotation editor TrEd was...

Orthography-based dating and localisation of Middle Dutch charters

In this study we build models for the localisation and dating of Middle Dutch charters. First, we extract character trigrams and use these to train a machine learner (K Nearest...

CLIN26-Bracmat-poster.pdf

Linguistic and algebraic expressions can be analysed with similar pattern matching (PM) methods, suggesting a trove of useful methods for Natural Language Processing (NLP). For...

HABE-IXA euskarazko idazmen proben corpusa HABE-IXA Basque written test corpus

This corpus contains essays written in official HABE exams for assessing student's knowledge of the Basque language. We have collected 120 essays in each of the B1, B2, C1 and...

Finnish Semantic Relatedness Model

This model is a semantic model that captures the relatedness of Finnish words as word vectors. This model can be used in various tasks such as metaphor interpretation. For...

4,731 datasets found