UFAL Parallel Corpus of North Levantine 1.0

Dataset

PID

This is the first release of the UFAL Parallel Corpus of North Levantine, compiled by the Institute of Formal and Applied Linguistics (ÚFAL) at Charles University within the Welcome project (https://welcome-h2020.eu/). The corpus consists of 120,600 multiparallel sentences in English, French, German, Greek, Spanish, and Standard Arabic selected from the OpenSubtitles2018 corpus [1] and manually translated into the North Levantine Arabic language. The corpus was created for the purpose of training machine translation for North Levantine and the other languages.

Identifier
PID	http://hdl.handle.net/11234/1-5033
Related Identifier	http://ufal.mff.cuni.cz/ufal-parallel-corpus-of-north-levantine
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-5033

Provenance
Creator	Sellat, Hashem; Saleh, Shadi; Krubiński, Mateusz; Pospíšil, Adam; Zemánek, Petr; Pecina, Pavel
Publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year	2023
Funding Reference	info:eu-repo/grantAgreement/EC/H2020/870930
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	English; French; Spanish; Castilian; Greek, Modern (1453-); Greek; German
Resource Type	corpus
Format	text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 13
Discipline	Linguistics