Leonardo Cavalcante Araújo
Data Analytics Full-Time FEB2021, Paris & March 2021
Individual project developed in an afternoon, using a Glassdoor database found in Kaggle website.
The project had 2 distinct objectives:
- Derive statistically significant insights from a database.
- Model a regression analysis for a variable (in this project, we have chosen to do use the linear regression to predict the probability of a crime to happen in a given date with some given circunstances.)
- Database search and download, finally deciding on a open source database found in this Kaggle link.
- Data Exploration and Cleaning.
- Data Analysis & Visualisations: Using
Python
,Matplotlib
andSeaborn
. - Hypothesis Testing: to test statistically significant events.
Next steps to be developed:
- Linear Regression using OLS (Ordinary Least Squares): find a good variable to predict, maybe the salary.
- Assumptions testing: verification of the assumptions for the OLS model.
- Presentation: Google Slides construction.
- Repository "https://github.com/leo-cavalcante/glassdoor-pay-gap": you may find the main Python Notebooks produced by the team members to realize the analysis, visualisations and predictive models.
- Individual project.
Here you may find the relevant links for the main documents produced during this project: