Moderating the science subreddit /r/science

Authors: Arnaud Stiegler, Redouane Dziri

The dataset is a csv of about 30k reddit comments made in /r/science between Jan 2017 and June 2018. 10k of the comments were removed by moderators; the original text for these comments was recovered using the pushshift.io API. Each comment is a top-level reply to the parent post and has a comment score of 14 or higher.

(find the data here: https://www.kaggle.com/areeves87/rscience-popular-comment-removal)

This project aims at accurately classifying removed comments, leveraging some NLP tools (using scikit-learn and nltk) , with the following questions in mind:

Can we help reduce moderator burnout by automating comment removal? What features are most predictive of popular comments getting removed?

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
README.md		README.md
moderating-subreddit.ipynb		moderating-subreddit.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moderating the science subreddit /r/science

About

Releases

Packages

Contributors 2

Languages

aml-spring-19/Moderating-Science-Subreddit

Folders and files

Latest commit

History

Repository files navigation

Moderating the science subreddit /r/science

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages