Written corpus ccKres 1.0

Dataset

PID

Corpus ccKres consists of 9,376 documents, each containing information about the source (e.g. newspapers, magazines), year of publication, text type (fiction, newspaper), the title and author if they are known. The corpus is POS-tagged and lemmatised, and encoded in XML TEI format (Text Encoding Initiative P5). The ccKres corpus contains approximately 9% of the Kres corpus, a balanced corpus of Slovene: http://eng.slovenscina.eu/korpusi/kres.

Identifier
PID	http://hdl.handle.net/11356/1034
Related Identifier	http://eng.slovenscina.eu/korpusi/proste-zbirke
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1034

Provenance
Creator	Logar, Nataša; Erjavec, Tomaž; Krek, Simon; Grčar, Miha; Holozan, Peter
Publisher	Centre for Language Resources and Technologies, University of Ljubljana
Publication Year	2013
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	application/zip; application/gzip; text/plain; charset=utf-8; downloadable_files_count: 3
Discipline	Linguistics