USAGE Corpus
This USAGE corpus consists of annotations of Amazon reviews for different product categories in the languages German and English. The reviews themselves are not part of this data publication. The annotations are fine-grained, including aspects and subjective phrases. In addition, the relation of an aspect to be a target of a subjective phrase is provided as well as the polarity of the subjective phrase. The corpus consists of 622 English and 611 German reviews for coffee machines, cutlery, microwaves, toaster, trash cans, vacuum cleaner, washing machines and dishwasher. The English corpus is annotated with more than 8000 aspects and 5000 subjective phrases, the German part with more than 6000 aspects and around 5000 subjective phrases (depending on the annotator). Each review is independently annotated by two annotators.
For detailed information, read the LREC 2014 paper.
The data is available under the Open Data Commons Attribute License (ODB-By) v1.0.
Download here:
- USAGE Corpus v1.0.2: Corrected typo in readme
- USAGE Corpus v1.0.1: Added licence information to readme
- USAGE Corpus v1.0.0: Original distribution
The original data location is here. The data was originally hosted here (both links to the same files).
For copyright reasons we cannot publish the Amazon reviews themselves. Therefore the tarball contains tools to receive the corpus from the original websites. Please do not redistribute these reviews. There is also a version available which includes text, but this cannot be made publicly available.
Please also note our machine translation quality corpus, which consists of the German sentences in this corpus, automatically translated to English.
Phrase dictionaries
If you are not interested in the full annotations, the phrases extracted from the German and English corpora with polarity annotations might be an interesting resource for you:
IGGSA Shared Task Test Corpus
In addition, more data has been annotated following the structure of the USAGE corpus. The original data location is here. Detailed information is in the IGGSA shared task paper.
You can also download it here:
- IGGSA Shared Task Test Corpus v1.0.0: Original distribution