Skip to content

aml-spring-19/Moderating-Science-Subreddit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

Moderating the science subreddit /r/science

Authors: Arnaud Stiegler, Redouane Dziri

The dataset is a csv of about 30k reddit comments made in /r/science between Jan 2017 and June 2018. 10k of the comments were removed by moderators; the original text for these comments was recovered using the pushshift.io API. Each comment is a top-level reply to the parent post and has a comment score of 14 or higher.

(find the data here: https://www.kaggle.com/areeves87/rscience-popular-comment-removal)

This project aims at accurately classifying removed comments, leveraging some NLP tools (using scikit-learn and nltk) , with the following questions in mind:

Can we help reduce moderator burnout by automating comment removal? What features are most predictive of popular comments getting removed?

About

Using NLP to moderate Science Subreddit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published