KER - Keyword Extractor - Dataset

Dataset

KER - Keyword Extractor

PID

KER is a keyword extractor that was designed for scanned texts in Czech and English. It is based on the standard tf-idf algorithm with the idf tables trained on texts from Wikipedia. To deal with the data sparsity, texts are preprocessed by Morphodita: morphological dictionary and tagger.

Identifier
PID	http://hdl.handle.net/11234/1-1650
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1650

Provenance
Creator	Libovický, Jindřich
Publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year	2016
Rights	Apache License 2.0; http://opensource.org/licenses/Apache-2.0; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Czech; English
Resource Type	toolService
Format	application/x-gzip; application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 3
Discipline	Linguistics