Marcos V. O. Assis ([email protected])
Simple WEB app developed to showcase the prediction operation through the GCP API.
- Perform a descriptive analysis to generate insights about São Paulo's properties.
- Propose a regression model able to predict a property value based on its features, like Area (size), number of rooms, geographic location, and so on.
- 'Properties Dataset' - House price data of Sao Paulo (Kaggle)
- 'IBGE Dataset' - 2010 Demographic Census Information Base: Universe Results by Census Sector (Base de informações do Censo Demográfico 2010: Resultados do Universo por setor censitário)
- 'Addresses Dataset' - Address dataset of all brazilian states (UF)
- 'Geolocation Dataset' - Mesh of Census Sectors - São Paulo (Malha de Setores Censitários IBGE - São Paulo)
- Perform a Descriptive Analysis of the 'Properties Dataset';
- Merge 'Properties Dataset' and 'IBGE Dataset':
- First, merge 'Properties Dataset' to 'Addresses Dataset' to add latitude and longitude features.
- Create a geopandas Point feature on the dataset, containing coordinates of latitude and longitude.
- Import 'Geolocation Dataset', and map the ID of Census Sectors: 'draw' the census sectors as polygons, and verify which coordinate point correspond to each address (latitude and longitude coordinates).
- Finally, merge 'Properties Dataset' with 'IBGE Dataset', based on the Census Sector ID.
- Implement and test different Regression Algorithms to predict property Value based on different features sets.
- Discussion and conclusions on the achieved results.
- Implemented the prediction solution as an API, using Python and Flask.
- Deployed the API to Google Cloud Platform (GCP), using Docker and Google Cloud Run.
- enderecos.csv - Address dataset file (Download here)
- 35SEE250GC_SIR.{dbf,prj,shp,shx} - SP Geolocation dataset (all files needed) (Download here)
- data_sale_censo.csv - Merged dataset (if you want to start running from 'Predicting Property Values' section)
- models - Folder containing the pickle files for Voting and Gradient Boosting trained models.
- src/app/main.py - API file
- Dockerfile
- requirements.txt