This project contains Python scripts than I created for a personal exercise on
· Data acquisition & wrangling · Data visualisation · Predictive modelling
The dataset is available at the following URL and captures green taxi activity in NYC in 2015: https://data.cityofnewyork.us/Transportation/2015-Green-Taxi-Trip-Data/gi8d-wdg5/about
CONTENT: The two main scripts are:
-
exploratory_analysis.py I used this script to perform a basic data analysis and visualization
-
predictive_model.py Applied the Gradient Boosting Classifier to the dataset
Here, I integrate the original dataset with a dataset containing daily precipitations in NY. It can be found here: www.noaa.gov