-
Digital Publishing for the Humanities - New Technologies and Ideas
Slides presented by the speakers at the International Conference "Digital Publishing for the Humanities - Nw Technologies and Ideas" - Rome, Bibliotheca Hertziana - Max Planck... -
Databasing on Demand and Federated Search in Manuscript Databases
This seminar showed how to create and link databases into a federated database system. The challenges that have to be overcome in order to build such a system were also shown.... -
Carniolan Provincial Assembly corpus Kranjska 1.0
The corpus contains meeting proceedings of the Carniolan Provincial Assembly from 1861 to 1913 (Obravnave deželnega zbora kranjskega / Bericht über die Verhandlungen des... -
Spoken corpus Gos 2.0 (transcriptions)
The spoken corpus Gos 2.0 is the reference speech corpus of the Slovenian language. This second edition contains about 300 hours of speech, or 2.4 million words, 127 thousand... -
Slovenian parliamentary corpus (1990-2022) siParl 3.0
The siParl corpus contains minutes of the Assembly of the Republic of Slovenia for 11th legislative period 1990-1992, minutes of the National Assembly of the Republic of... -
Training corpus SUK 1.0
The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with... -
CMC training corpus Janes-Norm 3.0
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC) consisting of about 20,000 short texts (280,000 words), mostly tweets but also blogs,... -
CMC training corpus Janes-Tag 3.0
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC) consisting of about 15,000 short texts (190,000 words), mostly tweets but also blogs,... -
Corpus of 1968 Slovenian literature Maj68 2.0
Maj68 corpus contains 1,521 texts by 198 known authors published between 1964 and 1972 in the periodicals "Tribuna", "Problemi" and "Problemi. Literatura." The texts contain... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora... -
Multilingual comparable corpora of parliamentary debates ParlaMint 3.0
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora... -
Corpus of term-annotated texts RSDO5 1.1
The RSDO5 corpus was compiled in order to serve as a training set for automatic term identification. It consists of 12 texts with 250,000 words and almost 38,000 manually... -
Collection of Slovenian paremiological units Pregovori 1.0
This corpus collects and annotates the extensive and highly valuable diachronic collection of Slovenian proverbs, 50 years and more in the making at the ZRC SAZU Institute of... -
Spoken corpus Gos VideoLectures 4.2 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training... -
Spoken corpus Gos 1.1
Gos is a corpus of spoken Slovene that includes the transcripts of approximately 120 hours of speech recorded in various situations: radio and TV shows, school lessons and... -
Training corpus ssj500k 2.3
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation.... -
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20... -
Corpus of Slovenian school texts SBSJ 1.0
Corpus of Slovenian school texts is a lemmatized and POS-tagged specialized corpus, which includes 428 short school texts written primarily by primary-school students from 1st... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million...