- Get familiar with R Programming Language
- Review Linear Regression, Logistic Regression
- Learn new Machine Learning Techniques: Random Forest, Support Vector Machine
- Introduction to digital trace data
- Learn how to use Twitter API
- Pratice text mining techniques: structural topic modeling, sentiment analysis
-
Salganik, M. (2019). Bit by bit: Social research in the digital age. Princeton University Press. (optional)
-
Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. " O'Reilly Media, Inc.". (required, also available for free)
-
Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. " O'Reilly Media, Inc.". (required) (Online book)
-
Additional articles and reports on Github
-
Introduction to Data Science & Ethical Issues in Data Science (Week 1-2)
-
R Programming Language & Machine Learning 101 ( object-oriented language, review regression, support vector machine, principle component analysis, and deep learning) (Week 3-Week 8)
-
Text Mining (digital trace data, scraping Twitter, forums, using API, topic modeling) (Week 9-11)
-
Final Project (Week 12-16)
-
NCANDS - The National Child Abuse and Neglect Data System (NCANDS) is a voluntary data collection system that gathers information from all 50 states, the District of Columbia, and Puerto Rico about reports of child abuse and neglect.
-
Kaggle - Kaggle is an online community of data scientists and machine learning practitioners. It allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
-
OpenNYC. Open Data is free public data published by New York City agencies and other partners.
-
RedpillWomen Subreddit: https://www.reddit.com/r/RedPillWomen/. Ask professor for the dataset.
-
Podcasting Subreddit: https://www.reddit.com/r/podcasting/. Ask professor for the dataset.
-
Gab.com dataset. Ask professor for the Dataset
-
Twitter hashtags: Scraping Twitter using R packages. This topic will be covered in the Text Mining Part of the class.
-
NYC Teachers' #sickout during Covid19 pandemic.
-
Existing Tweet Datasets (more than 800 million of tweets). https://tweetsets.library.gwu.edu/
-
Data Camp has free resources for R: https://www.datacamp.com/users/sign_up?redirect=%2Fcourses%2Ffree-introduction-to-r%2Fcontinue
-
Guest speakers will visit class and share their data science journeys.
-
Join the class #DataHunterS2020 Slack Channel: https://join.slack.com/t/datahunters2020/signup
-
Join the R-Ladies Community Slack Channel if you want to reach out to women in Data Science: https://rladies-community-slack.herokuapp.com/
-
Listen to Podcast Data Skeptic: https://dataskeptic.com/
Have more questions, please raise an issue or email at [email protected].