-
Notifications
You must be signed in to change notification settings - Fork 0
ML Methods
Structure still WIP. Ideas for paper: Classification (FA, GA, Stub, etc), Regression (0-100%), Mix of both (Somehow use class confidences to estimate value)
RNN + LTSM (need to investigate more)
Uses whole article as input, but takes a long time to train. Unfeasible for real-time prediction.
Results: Achieved accuracy of 60-70%
Dataset: Wikimedia Foundation English/Russian/French
[2] Dalip, 2009. Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia
Support Vector Regression
Information gain measure (infogain) was used to evaluate the impact of the chosen features.
The performance of the method was evaluated using mean squared error (MSE) and Normalized Discounted Cumulative Gain at top k (NDCG@k).
Results: MSE of 0.82 when using all features
Dataset: English Wikipedia (2009, ~2683000 articles)
[5] Saengthongpattana, 2014. Assessing the quality of Thai Wikipedia articles using concept and statistical features
Decision Tree and Naive Bayes, with Naive Bayes achieving best results. However, they used "Concept" features, which is a much more manual and subjective measure.
Dataset: Thai Wikipedia (2014, ~85000 articles)
SVM with RBF kernel
Results: F-score of 0.8568
Dataset: ~12000 articles, 30/70 featured/low-quality
Method: 5-fold cross validation
Several ML approaches were experimented with. Accuracy and AUC were evaluated. Some hyperparameters are specified on the paper.
Method | Accuracy |
---|---|
Linear Regression | 25% |
Multinomial Logistic Regression | 60% |
KNN | 55% |
CART | 48% |
SVM | 61% |
Random Forest | 64% (58% w/o readability scores) |
Some features are more important than others (See Fig. 3), difficult_words, content_length, num_references, num_page_links being the most important. For the full list check the article (Fig. 3)
Dataset: ~20000 Articles with Qualities of FA, GA, B, C, Start and Stub, close to evenly distributed.
[12] Bassani, 2019. Automatically assessing the quality of Wikipedia contents
Method | Accuracy | MSE |
---|---|---|
Decision Tree | 47.4% | 1.883 |
K-NN | 42.4% | 2.123 |
Logistic Regression | 49.7% | 1.359 |
Naive Bayes | 30.4% | 3.573 |
Random Forest | 59.2% | 1.167 |
Support Vector Classifier | 50.6% | 1.358 |
Neural Networks | 50.3% | 1.204 |
Gradient Boosting | 61.8% | 0.919 |
Dataset: 400 articles randomly chosen from each quality, total of 2800
[13] Wang, 2019. A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia
Method | Accuracy |
---|---|
Stacked LTSMs | 71.9% |
DNN | 68.7% |
CNN | 63.4% |
CNN + LTSM | 67.3% |
LTSM w/ Dropout | 67.9% |
Basic LTSM | 69.0% |
Bidirectional LTSM | 69.7% |
Method | Accuracy |
---|---|
Decision Tree | 71.1% |
SVM | 70.8% |
K-NN | 66.3% |
Naive Bayes | 59.9% |
Note that there were only three classes (high/medium/low quality)
Dataset: 3294 articles from English Wikipedia (Wikimedia Downloads)
[14] Velichety, 2019. Quality assessment of peer-produced content in knowledge repositories using big data and social networks: The case of implicit collaboration in wikipedia
Method: Ten-fold Cross Validation, with Hyper-parameter optimization
Algorithms Experimented With: "logistic regression, C5.0, Adaboost, Bayesian networks, etc., and found that C5.0 gives the best results in this case."
Accuracy: 84.7% (Note that there was a reduced number of classes - 4)
Dataset: 4.7 Million articles (Entire English Wikipedia)
[16] Couto, 2021. Assessing the quality of health-related Wikipedia articles with generic and specific metrics
No ML methods were experimented with.
[19] Dang, 2016. Quality assessment of Wikipedia articles without feature engineering
Deep Learning approach with four hidden layers, where authors feed the NNs the articles themselves.
"In this paper, we applied the unsupervised learning algorithm called Paragraph Vector, recently known as Doc2Vec that learns vector representations for variable-length pieces of texts and overcomes the disadvantages of bag-of-words by taking into account the order and semantics of words."
"In this approach every word and every paragraph are mapped to a unique vector."
Accuracy: 55.5%
Dataset: 30000 wikipedia articles (English Wikipedia)
[33] Blumenstock, 2008. Size matters: Word count as a measure of quality on Wikipedia
Method | Accuracy |
---|---|
None (>2000 words = Featured) | 96.94% |
MLP | 97.15% |
K-NN | 96.94% |
Random Forest | 95.8% |
Note that there were only two classes (featured/non-featured)
Dataset: 11067 articles (1554 Features / 9513 Random)
[45] Lipka, 2010. Identifying featured articles in Wikipedia: Writing style matters
Ten-fold cross-validation was the used method. Domain transfer was also experimented with, but yielded overall worse results.
Method | F-Score |
---|---|
SVM (character trigram) | 0.964 |
SVM (POS trigram) | 0.941 |
Naive-Bayes | 0.904 |
Note that there were only two classes (featured/non-featured)
Dataset: 760 articles of English Wikipedia (400 from History, 360 from Biology)
[52] De La Calzada, 2010. On measuring the quality of wikipedia articles
The experimentation was mostly user-centered, and we're not interested in that part. However, the stabilized and controversial models were tested with an SVM classifier, achieving accuracies of ~78% for Stabilized and ~92% for Controversial.
Dataset: 96 Wikipedia articles for each classifier
[64] Stvilia, 2005. Information quality in a community-based encyclopedia
C4.5 Decision Tree with 10-fold cross validation achieved ~90% precision and recall, with two classes (featured/random).
Dataset: 1070 (236/834 Featured/Random) Wikipedia articles