Irony Sarcasm Analysis Corpus

The corpus used in the paper

Jennifer Ling, Roman Klinger: “An Empirical, Quantitative Analysis of the Differences between Sarcasm and Irony”. Semantic Sentiment and Emotion Workshop, ESWC, Crete. Greece. 2016

is made available on this website.

The corpus consists of four subcorpora: irony, sarcasm, regular and figurative (figurate is ironic and sarcastic, but has been subsampled to obtain a balanced corpus). Each training subcorpus consists of 30000 Tweets; each test subcorpus of 3000 Tweets.

The file format is tab-separated, has one Tweet each line and contains the following columns:

date
name
screen name
ID
geo location
place

We cannot make the Tweet texts publicly available. However, you can easily obtain them with the REST API based on the Tweet IDs.

Download a zip file with all data here.
Download a zip file with all data INCLUDING THE TEXT here (password needed).