-
Anàlisi de la toxicitat de la política espanyola a Twitter durant la pandèmia...
Llistat dels tweets analitzats en la recerca de l'article "La toxicidad de la política española en Twitter durante la pandemia de la COVID-19" que es publica a la revista... -
Pre-trained POS tagging models for German social media
Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015).... -
German Twitter Titling Corpus
The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum... -
GermEval-2018 Corpus (DE)
This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection. -
Twitter accounts of the candidates in the 2023 German state election of Berlin
The research project SPARTA (Society, Politics and Risk with Twitter Analysis; funded by dtec.bw; dtec.bw is funded by the European Union - NextGenerationEU) collected tweets to... -
Diplomaattien Twitter-kysely 2021
Aineisto perustuu Twitteriä työssään käyttäville diplomaateille suunnattuun kyselyyn. Kyselyyn vastasi 108 islantilaista, ruotsalaista, suomalaista, tanskalaista ja virolaista... -
Dataset: tweets and events linked to the paper 'Open-domain extraction of fut...
Input data and output of research conducted in the study described in the paper: F. Kunneman and A. Van den Bosch (2016), Open-domain extraction of future events from Twitter,... -
Dataset: output related to the paper 'Event detection in Twitter: A machine-l...
This dataset features the output of intermediate steps and the final output of the research that is described in the paper: F. Kunneman and A. Van den Bosch (2014), Event... -
Dataset: input and results related to the paper 'Anticipointment detection in...
This dataset features the training models, emotion classifications and emotion patterns before and after events, related to the paper: F. Kunneman, M. van Mulken and A. Van den... -
Dataset: tweets and analyses related to the paper 'The (Un)Predictability of ...
This dataset features all the tweetids and labels that were used to model the language of 24 hashtags, and test the performance on predicting the hashtags in unseen tweets. This... -
Dataset: tweets and analysis related to the paper 'Signaling sarcasm: From hy...
This dataset features training and test tweets as well as insights into the classifier model related to the paper: Kunneman, F.A., Liebrecht, C.C., Mulken, M.J.P. van &... -
Dataset: Events and periodicity analysis related to the paper 'Automatically ...
This dataset features information on all the events that were automatically extracted from Twitter and used as input to periodicity detection, as described in the paper: F.... -
Data: Timely identification of event start dates from Twitter
This directory features data that is discussed in the paper: F. Kunneman, A. Hürriyetoglu, N. Oostdijk and A. Van den Bosch (2014), Timely identification of event start dates... -
Sarcastic Soulmates: Intimacy and irony markers in social media messaging
We research the use of sarcasm on Twitter, and show that a computer has more difficulty to detect sarcasm shared among peers than sarcasm shared with any interested audience.... -
Calibrating Twitter Data: Issue Salience and Issue Ownership in Social Media ...
This questionnaire consists of questions about the knowledge of the views of different political parties and the use of Twitter for voicing political opinions. -
Replication Data for: Climate Nags: Affect and the Convergence of Global Risk...
This data set contains the IDs of the 1,186,322 tweets used in "Climate Nags: Affect and the Convergence of Global Risk in Online Networks" (published in Continuum, 2023). The... -
Tweet code-switching corpus Janes-Preklop 1.0
Janes-Preklop is a corpus of Slovene tweets that is manually annotated for code-switching (the use of words from two or more languages within one sentence or utterance),... -
Twitter corpus Janes-Tweet 1.0
Janes-Tweet is an annotated corpus of almost 10 million tweets posted from 2013-06 to 2017-06 by approx. 9,000 users that tweet mostly in Slovene. The corpus is structured into... -
Tweet comma corpus Janes-Vejica 1.0
Janes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)correct use, according to the supplied typology. The corpus was sampled from... -
CMC shortening corpus Janes-Kratko 1.0
Janes-Kratko is a corpus of Slovene tweets manually annotated with shortening phenomena according to the supplied typology covering different types of spelling, lexical and...