We develop in collaboration with Wikipedia editors a 13-category taxonomy of the semantic intention behind edits in Wikipedia articles. Using labeled article edits, we build a computational classifier of intentions that achieved a micro-averaged F1 score of 0.621.
conda create --name wiki_edit_intention python=3.5
source activate wiki_edit_intention
pip install mwapi
pip install revscoring
You might also need to install some dependencies (e.g., scipy, numpy and sklearn).
To make features associated with each revision, please run:
python ./feat_src/wiki_edit_main.py edit_intention_dataset.csv
This will generate an arff file "edit_intention_dataset.feats.arff".
To predict the edit intentions for a set of revisions, please run:
python ./pred_src/wiki_model.py edit_intention_dataset.feats.arff test_file_to_be_predicted.arff
To retrive the content of each revision, please use:
https://en.wikipedia.org/wiki/WP:Labels?diff=<replace_with_revision_id>
The mapping from label to edit intention can be found below:
{
'counter-vandalism':0,
'fact-update': 1,
'refactoring':2,
'copy-editing':3,
'other':4,
'wikification':5,
'vandalism':6,
'simplification':7,
'elaboration':8,
'verifiability':9,
'process':10,
'clarification':11,
'disambiguation':12,
'point-of-view':13
}
To use our trained word embeddings for Wikipedia article revision, please download it from this link: https://goo.gl/An7DZP (wiki_revision_trained_embedding.bin)
If you use our tools for your work, please cite the following paper:
- Yang, Diyi, Aaron Halfaker, Robert Kraut, and Eduard Hovy. "Identifying semantic edit intentions from revisions in wikipedia." In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2000-2010. 2017.
This project is licensed under the MIT License - see the LICENSE file for details.