Irony Sarcasm Analysis Corpus

The corpus used in the paper

  • Jennifer Ling, Roman Klinger: “An Empirical, Quantitative Analysis of the Differences between Sarcasm and Irony”. Semantic Sentiment and Emotion Workshop, ESWC, Crete. Greece. 2016

is made available on this website.

The corpus consists of four subcorpora: irony, sarcasm, regular and figurative (figurate is ironic and sarcastic, but has been subsampled to obtain a balanced corpus). Each training subcorpus consists of 30000 Tweets; each test subcorpus of 3000 Tweets.

The file format is tab-separated, has one Tweet each line and contains the following columns:

  • date
  • name
  • screen name
  • ID
  • geo location
  • place

We cannot make the Tweet texts publicly available. However, you can easily obtain them with the REST API based on the Tweet IDs.