Forum corpus Janes-Forum 1.0

PID

Janes-Forum is an annotated corpus of Slovene forums from websites med.over.net, avtomobilizem.com, and kvarkadabra.net from the period 2001-02 to 2015-01. The corpus is structured into forums, threads and posts, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy and compliance with wishes of platform owners, usernames are not included in the metadata, and 'person', 'person derivative' and 'company name' named entities have been removed from the texts.

Identifier
PID http://hdl.handle.net/11356/1139
Related Identifier https://doi.org/10.4312/slo2.0.2016.2.67-99
Related Identifier https://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-Forum
Related Identifier https://doi.org/10.1007/s10579-018-9425-z
Related Identifier http://nl.ijs.si/janes/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1139
Provenance
Creator Erjavec, Tomaž; Ljubešić, Nikola; Fišer, Darja
Publisher Jožef Stefan Institute
Publication Year 2017
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics