-
Replication Data for: On the role of ecological validity in language and spee...
Dataset abstract This dataset contains the results from 40 language and speech researchers, who completed a survey. In the first part of the survey, respondents were asked to... -
Biber et al.'s (2016) set of 150 BNC items for the analysis of dispersion mea...
This dataset contains frequencies for a set of 150 word forms in the BNC. The set of items was compiled by Biber et al. (2016) for the purpose of analyzing the behavior of... -
WiKNN Text Classifier
WiKNN is an online text classifier service for Polish and English texts. It supports hierarchical labelled classification of user-submitted texts with Wikipedia categories.... -
Wittgenstein Archives at the University of Bergen (WAB): WiTTLex - The WiTTFi...
WiTTLex - The WiTTFind Lexicon of Wittgenstein’s Philosophical Nachlass, with Frequency Lists and Indication of the Words’ Sources in the Nachlass WiTTLex is an electronic... -
UHR's Termbase for Norwegian higher education institutions UHRs termbase for...
This is a collection of 2000 administrative terms with English - Norwegian bokmål/Norwegian bokmål - English and English - Norwegian nynorsk/Norwegian nynorsk - English... -
WordReference (2020-11-10)
A large corpus of native and non-native written speech in four languages. The WordReference corpus is a very large corpus (170M+ words) of native and non-native natural written... -
OpenEDGeS (2021-05-24)
The public license subset of the EDGeS Diachronic Bible Corpus, a diachronically and synchronically parallel corpus of Bible translations in Dutch,English, German and Swedish,... -
Europarl – svenska-engelska (2013-11-17) Europarl – Swedish-English (2013-11...
Part of European Parliament Proceedings Parallel Corpus Del av European Parliament Proceedings Parallel Corpus -
LSI (2020-08-25)
Linguistic Survey of India -
The English-Swedish Parallel Corpus (ESPC) (2022-11-15)
ESPC är en kombinerad jämförbar och parallell korpus lämplig för tvärspråkig forskning för olika typer. English-Swedish Parallel Corpus (ESPC) sammanställdes på 1990-talet i ett... -
English Models (Morphium + WSJ) for MorphoDiTa
English models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from Morphium and... -
Bosworth-Toller’s Anglo-Saxon Dictionary online
Description : This is an online edition of An Anglo-Saxon Dictionary, or a dictionary of "Old English". The dictionary records the state of the English language as it was used... -
Khresmoi Summary Translation Test Data 2.0
This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech,... -
Khresmoi Summary Translation Test Data 1.1
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. -
Khresmoi Query Translation Test Data 2.0
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans... -
Prep for Adventure: A game for the acquisition of English prepositions
The presented game is designed to teach the six most frequent English prepositions (to, of, in, for, on, and with) at the A1 to A2 levels of proficiency. Prep for Adventure is a... -
English Model (CoNLL-2003) for NameTag
English model for NameTag, a named entity recognition tool. The model is trained on CoNLL-2003 training data. Recognizes PER, ORG, LOC and MISC named entities. Achieves... -
Motion Encoding Lexicalization Patterns: Portuguese and English Learners
General Information: Data collector: Jean Costa Silva (University of Georgia) Date of collection: September-December 2022 Manner of collection: Online questionnaire via... -
Background data for: Advancing our understanding of dispersion measures in co...
Dataset description This dataset contains background data and supplementary material for Sönning (forthcoming), a study that looks at the behavior of dispersion measures when... -
Background data for: Some obstacles to replication in corpus linguistics
This dataset contains tabular files recording occurrences and frequencies of modal verbs in the Brown family corpora; nine modal verbs (can, could, may, might, must, shall,...