MULTEXT-East "1984" annotated corpus 4.0

PID

The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages.

This version of the corpus contains the linguistically annotated texts, with each word tagged by its lemma and its MULTEXT(-East) morphosyntactic description (MSD, i.e., a fine-grained feature-structure based PoS tag).

The structurally annotated texts are a separate submission (http://hdl.handle.net/11356/1044), also with somewhat different languages.

Identifier
PID http://hdl.handle.net/11356/1043
Related Identifier https://doi.org/10.1007/s10579-011-9174-8
Related Identifier http://nl.ijs.si/ME/Vault/V4/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1043
Provenance
Creator Erjavec, Tomaž; Barbu, Ana-Maria; Derzhanski, Ivan; Dimitrova, Ludmila; Garabík, Radovan; Ide, Nancy; Kaalep, Heiki-Jaan; Kotsyba, Natalia; Krstev, Cvetana; Oravecz, Csaba; Petkevič, Vladimír; Priest-Dorman, Greg; QasemiZadeh, Behrang; Radziszewski, Adam; Simov, Kiril; Tufiş, Dan; Zdravkova, Katerina
Publisher Jožef Stefan Institute
Publication Year 2010
Funding Reference info:eu-repo/grantAgreement/EC/FP7/211938
Rights MULTEXT-East licence; https://nl.ijs.si/ME/mte-licence.txt; ACA
OpenAccess true
Contact info(at)clarin.si
Representation
Language Bulgarian; Czech; English; Estonian; Persian; Farsi; Hungarian; Macedonian; Polish; Romanian; Moldavian; Moldovan; Slovak; Slovenian; Slovene; Serbian
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics