cSMTiser: word standardisation

PID

Word standardisation of non-standard language as found in user-generated content, using cSMTiser (https://github.com/clarinsi/csmtiser), a tool for text normalisation via character-level machine translation. The tool has been trained on the Janes-Norm dataset (http://hdl.handle.net/11356/1084) and background resources.

Identifier
PID http://hdl.handle.net/11356/1169
Related Identifier https://github.com/clarinsi/csmtiser
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1169
Provenance
Creator Ljubešić, Nikola; Perovšek, Matic; Erjavec, Tomaž
Publisher Jožef Stefan Institute
Publication Year 2017
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type toolService
Format application/octet-stream; downloadable_files_count: 0
Discipline Linguistics