Please see the corpus website for an introduction to this data set.
The version numbers in parantheses indicate the version we used, other versions may or may not work.
We recommend a miniconda environment.
- python (3.5.2)
- numpy (1.12.0)
- scipy (0.18.1)
- gensim (0.13.4.1)
- scikit-learn (0.18.1)
- tensorflow (1.0.0)
- matplotlib (2.0.0)
To obtain reproducible results, parallel execution is disabled at several points in the code. This means things could run quite a bit faster, but would not result in the exact same results, which is now the case, with the exception of the LSTM part.
Total duration: about 4.5 hours on a machine with the following specs:
- Intel Core i7-6900K, 8x 3.20GHz
- 64 GB RAM (4x16GB DDR4-2133)
- 1TB SSD
- NVIDIA Titan X (Pascal)
To run everything, simply execute
./run.sh
The script will ask you if you want to download the corpus (requiring wget
or curl
and bzip2
).
If you are interested only in certain parts, uncomment what you don't need in run.sh
and in src/run_evaluation.py
(in particular, you can comment out entries from methodmodules
if you want to run only certain methods).
The code produces some output in the directories logs
, plots
and tables
. We have included our results here, so you can see what to expect.
The code also creates a directory models
, which will be about 500 MB in size. The entire experiments
folder, including the downloaded corpus, will be almost 1GB in size.
Category | Measure | BOW | MNB | NBSVM | BOCID | D2V | LSTM |
Negative Sentiment | Precision | 0.5521 | 0.5637 | 0.5660 | 0.5345 | 0.5842 | 0.5349 |
Recall | 0.5109 | 0.4867 | 0.4512 | 0.5452 | 0.5624 | 0.7197 | |
F1 | 0.5307 | 0.5224 | 0.5021 | 0.5398 | 0.5731 | 0.6137 | |
Positive Sentiment | Precision | 0.1000 | 0.0000 | 0.2353 | 0.0662 | 0.0397 | 0.0000 |
Recall | 0.0698 | 0.0000 | 0.0930 | 0.2093 | 0.4651 | 0.0000 | |
F1 | 0.0822 | 0.0000 | 0.1333 | 0.1006 | 0.0731 | 0.0000 | |
Off-topic | Precision | 0.2754 | 0.6190 | 0.3969 | 0.2252 | 0.2065 | 0.2742 |
Recall | 0.2379 | 0.0224 | 0.1328 | 0.5121 | 0.6241 | 0.2638 | |
F1 | 0.2553 | 0.0433 | 0.1990 | 0.3128 | 0.3103 | 0.2689 | |
Inappropriate | Precision | 0.1627 | 0.0000 | 0.1765 | 0.1516 | 0.1340 | 0.1964 |
Recall | 0.1122 | 0.0000 | 0.0495 | 0.3993 | 0.5776 | 0.1089 | |
F1 | 0.1328 | 0.0000 | 0.0773 | 0.2198 | 0.2175 | 0.1401 | |
Discriminating | Precision | 0.1847 | 0.0000 | 0.2683 | 0.1301 | 0.1111 | 0.1136 |
Recall | 0.1028 | 0.0000 | 0.0780 | 0.2943 | 0.3936 | 0.1418 | |
F1 | 0.1321 | 0.0000 | 0.1209 | 0.1804 | 0.1733 | 0.1262 | |
Feedback | Precision | 0.6554 | 0.7465 | 0.7356 | 0.5094 | 0.5240 | 0.6307 |
Recall | 0.5803 | 0.4074 | 0.5219 | 0.6879 | 0.7056 | 0.6287 | |
F1 | 0.6156 | 0.5271 | 0.6106 | 0.5853 | 0.6014 | 0.6297 | |
Personal Stories | Precision | 0.6981 | 0.5491 | 0.6916 | 0.5762 | 0.6247 | 0.6380 |
Recall | 0.5920 | 0.4578 | 0.4788 | 0.7120 | 0.8123 | 0.6658 | |
F1 | 0.6407 | 0.4993 | 0.5658 | 0.6369 | 0.7063 | 0.6516 | |
Arguments Used | Precision | 0.6105 | 0.5086 | 0.6064 | 0.5642 | 0.5657 | 0.5685 |
Recall | 0.5215 | 0.3170 | 0.4628 | 0.6106 | 0.6614 | 0.6458 | |
F1 | 0.5625 | 0.3906 | 0.5250 | 0.5865 | 0.6098 | 0.6047 | |
Wins | Precision | 2 | 2 | 2 | 0 | 1 | 1 |
Recall | 0 | 0 | 0 | 0 | 7 | 1 | |
F1 | 0 | 0 | 1 | 3 | 2 | 2 |