CALEM (Comprehensive Arabic LEMmas)

PID

Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabic lemmas and their corresponding inflected word forms (stems) with details (POS + Root). Each lexical entry represents a lemma followed by all its possible stems and each stem is enriched by its morphological features especially the root and the POS. It is composed of 164,845 lemmas representing 7,200,918 stems, detailed as follow: 757 Arabic particles 2,464,631 verbal stems 4,735,587 nominal stems The lexicon is provided as an LMF conformant XML-based file in UTF8 encoding, which represents about 1,22 Gb of data. Citation: – Namly Driss, Karim Bouzoubaa, Abdelhamid El Jihad, and Si Lhoussain Aouragh. “Improving Arabic Lemmatization Through a Lemmas Database and a Machine-Learning Technique.” In Recent Advances in NLP: The Case of Arabic Language, pp. 81-100. Springer, Cham, 2020.

Identifier
PID http://hdl.handle.net/11372/LRT-5102
Related Identifier http://arabic.emi.ac.ma/alelm/?page_id=273/#Lexicon
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11372/LRT-5102
Provenance
Creator Namly, Driss; Bouzoubaa, Karim; El Jihad, Abdelhamid
Publisher ALELM
Publication Year 2020
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Arabic
Resource Type lexicalConceptualResource
Format text/plain; charset=utf-8; text/xml; downloadable_files_count: 1
Discipline Linguistics