The corpus used in the paper
- Jennifer Ling, Roman Klinger: “An Empirical, Quantitative Analysis of the Differences between Sarcasm and Irony”. Semantic Sentiment and Emotion Workshop, ESWC, Crete. Greece. 2016
is made available on this website.
The corpus consists of four subcorpora: irony, sarcasm, regular and figurative (figurate is ironic and sarcastic, but has been subsampled to obtain a balanced corpus). Each training subcorpus consists of 30000 Tweets; each test subcorpus of 3000 Tweets.
The file format is tab-separated, has one Tweet each line and contains the following columns:
- screen name
- geo location
We cannot make the Tweet texts publicly available. However, you can easily obtain them with the REST API based on the Tweet IDs.