ML Methods

Machine Learning Methods

Structure still WIP. Ideas for paper: Classification (FA, GA, Stub, etc), Regression (0-100%), Mix of both (Somehow use class confidences to estimate value)

[1] Dang, 2017. An end-to-end learning solution for assessing the quality of Wikipedia articles.

RNN + LTSM (need to investigate more)

Uses whole article as input, but takes a long time to train. Unfeasible for real-time prediction.

Results: Achieved accuracy of 60-70%

Dataset: Wikimedia Foundation English/Russian/French

[2] Dalip, 2009. Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia

Support Vector Regression

Information gain measure (infogain) was used to evaluate the impact of the chosen features.

The performance of the method was evaluated using mean squared error (MSE) and Normalized Discounted Cumulative Gain at top k (NDCG@k).

Results: MSE of 0.82 when using all features

Dataset: English Wikipedia (2009, ~2683000 articles)

[5] Saengthongpattana, 2014. Assessing the quality of Thai Wikipedia articles using concept and statistical features

Decision Tree and Naive Bayes, with Naive Bayes achieving best results. However, they used "Concept" features, which is a much more manual and subjective measure.

Dataset: Thai Wikipedia (2014, ~85000 articles)

[6] Su, 2016. A psycho-lexical approach to the assessment of information quality on wikipedia

SVM with RBF kernel

Results: F-score of 0.8568

Dataset: ~12000 articles, 30/70 featured/low-quality

[9] Dang, 2017. Measuring quality of collaboratively edited documents: The case of Wikipedia

Method: 5-fold cross validation

Several ML approaches were experimented with. Accuracy and AUC were evaluated. Some hyperparameters are specified on the paper.

Method	Accuracy
Linear Regression	25%
Multinomial Logistic Regression	60%
KNN	55%
CART	48%
SVM	61%
Random Forest	64% (58% w/o readability scores)

Some features are more important than others (See Fig. 3), difficult_words, content_length, num_references, num_page_links being the most important. For the full list check the article (Fig. 3)

Dataset: ~20000 Articles with Qualities of FA, GA, B, C, Start and Stub, close to evenly distributed.

[12] Bassani, 2019. Automatically assessing the quality of Wikipedia contents

Method	Accuracy	MSE
Decision Tree	47.4%	1.883
K-NN	42.4%	2.123
Logistic Regression	49.7%	1.359
Naive Bayes	30.4%	3.573
Random Forest	59.2%	1.167
Support Vector Classifier	50.6%	1.358
Neural Networks	50.3%	1.204
Gradient Boosting	61.8%	0.919

Dataset: 400 articles randomly chosen from each quality, total of 2800

[13] Wang, 2019. A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia

Deep Learning Methods

Method	Accuracy
Stacked LTSMs	71.9%
DNN	68.7%
CNN	63.4%
CNN + LTSM	67.3%
LTSM w/ Dropout	67.9%
Basic LTSM	69.0%
Bidirectional LTSM	69.7%

Non-Deep Learning Methods

Method	Accuracy
Decision Tree	71.1%
SVM	70.8%
K-NN	66.3%
Naive Bayes	59.9%

Note that there were only three classes (high/medium/low quality)

Feature Set Accuracy

13_table5

Dataset: 3294 articles from English Wikipedia (Wikimedia Downloads)

[14] Velichety, 2019. Quality assessment of peer-produced content in knowledge repositories using big data and social networks: The case of implicit collaboration in wikipedia

Method: Ten-fold Cross Validation, with Hyper-parameter optimization

Algorithms Experimented With: "logistic regression, C5.0, Adaboost, Bayesian networks, etc., and found that C5.0 gives the best results in this case."

Accuracy: 84.7% (Note that there was a reduced number of classes - 4)

Dataset: 4.7 Million articles (Entire English Wikipedia)

[16] Couto, 2021. Assessing the quality of health-related Wikipedia articles with generic and specific metrics

No ML methods were experimented with.

[19] Dang, 2016. Quality assessment of Wikipedia articles without feature engineering

Deep Learning approach with four hidden layers, where authors feed the NNs the articles themselves.

"In this paper, we applied the unsupervised learning algorithm called Paragraph Vector, recently known as Doc2Vec that learns vector representations for variable-length pieces of texts and overcomes the disadvantages of bag-of-words by taking into account the order and semantics of words."

"In this approach every word and every paragraph are mapped to a unique vector."

Accuracy: 55.5%

Dataset: 30000 wikipedia articles (English Wikipedia)

[33] Blumenstock, 2008. Size matters: Word count as a measure of quality on Wikipedia

Method	Accuracy
None (>2000 words = Featured)	96.94%
MLP	97.15%
K-NN	96.94%
Random Forest	95.8%

Note that there were only two classes (featured/non-featured)

Dataset: 11067 articles (1554 Features / 9513 Random)

[45] Lipka, 2010. Identifying featured articles in Wikipedia: Writing style matters

Ten-fold cross-validation was the used method. Domain transfer was also experimented with, but yielded overall worse results.

Method	F-Score
SVM (character trigram)	0.964
SVM (POS trigram)	0.941
Naive-Bayes	0.904

Note that there were only two classes (featured/non-featured)

Dataset: 760 articles of English Wikipedia (400 from History, 360 from Biology)

[52] De La Calzada, 2010. On measuring the quality of wikipedia articles

The experimentation was mostly user-centered, and we're not interested in that part. However, the stabilized and controversial models were tested with an SVM classifier, achieving accuracies of ~78% for Stabilized and ~92% for Controversial.

Dataset: 96 Wikipedia articles for each classifier

[64] Stvilia, 2005. Information quality in a community-based encyclopedia

C4.5 Decision Tree with 10-fold cross validation achieved ~90% precision and recall, with two classes (featured/random).

Dataset: 1070 (236/834 Featured/Random) Wikipedia articles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly