In this project we explored job openings in the data science field that were listed on Glassdoor. We wanted to analyze data science job postings to see what kind of roles were most needed, where and in what industries these jobs were needed, what salary ranges were offered for different positions and what kinds of skills were companies looking for in candidates.
Look here for the source of our data and here to see the data collection process of scraping the Glassdoor website.
To navigate this project and see our analysis we have outlined the repo as follows:
This folder contains our original data files as well as the code used to process and clean our data. In this code we cleaned up variables and missing values as well as added new features to our data set. We used data for the Cost of Living Index (COI) to scale salary by cost of living.
In this folder we store the tidy data set used for all of our analysis. We also have a data dictionary that lists all the variables in our data set with a brief description about each variable.
The bulk of our analysis is in this folder. We started our research looking at different roles within the field of data science. Next, we looked at the 5 industries with the most data science roles in our data set and followed with an analysis of Glassdoor ratings and counts for companies with data science roles. We then looked at salary ranges for data science roles within different metro areas, for different positions and in the top 5 industries. Lastly, we peformed text analysis to see what kind of skills a candidate should be improving upon in looking for data science roles.
This folder contains our final report and slide deck for our presentation. Our video presentation can be found here.