Smart Columbus Parking Prediction
This repository contains the code needed to train a model to make predictions about the availability of parking in the city of Columbus, Ohio. It includes notebooks describing the process taken to arrive at this model version, as well as a Dockerfile and helm charts needed to package and deploy it as an api to a Kubernetes cluster.
This code and model are specifically tuned to the parking situation in Columbus, but could be used as a base to build more generic parking availability models.
For an example of the data transformation Columbus used to turn parking transaction data into the parking occupancy data needed for this model, see the parking prediction orchestrator project.
Requires:
- Python >= 3.5
- Poetry
- Docker
- Helm (if using Kubernetes)
pip3 install poetry
poetry install --dev
If you are on OS X Catalina try this if fbprophet
fails to install.
pip3 install poetry
brew install gcc@7
CXX=/usr/local/Cellar/gcc@7/7.5.0_2/bin/g++-7 CC=/usr/local/Cellar/gcc@7/7.5.0_2/bin/gcc-7 poetry install
On macOS, this can be done using Homebrew as follows:
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
HOMEBREW_NO_ENV_FILTERING=1 ACCEPT_EULA=Y brew install msodbcsql17 mssql-tools
If you're planning on running the repository's notebooks, you'll want to enable IPython widgets to avoid problems.
poetry run jupyter nbextension enable --py widgetsnbextension
While the app does not need to be built to be run locally, to deploy it somewhere a docker image will be required.
Step 1: Build Docker image:
docker build . -t parking_prediction:latest
Step 2: Run Docker image:
docker run parking_prediction:latest
This python project uses pytest
poetry run pytest
With the above dependencies installed, the notebooks can be run using:
poetry run jupyter notebook
export QUART_APP=app:app
export QUART_DEBUG=true # if you want debug messages on slow calls, etc.
poetry run quart run
Check the chart/values.yaml
file for necessary configuration values. Add those values to a custom values file for deployments.
helm upgrade --install predictive-parking ./chart --values your_values.yaml
This repository has been architected in such a way there are only a few places where changes need to be made in order to upgrade the parking availability prediction model. These files are as follows:
- the
app.model
Python module. This is where you should implement all feature engineering code, model implementation code, etc. Specifically, the following classes must be definedModelFeatures
: Apydantic
model specifying all of the features expected by your model.- This class must also provide a static
from_request
method for convertingAPIPredictionRequest
objects intoModelFeatures
.
- This class must also provide a static
ParkingAvailabilityModel
: This is the actual trained model. It should include apredict
method that takes aModelFeatures
objectfeatures
and returns prediction values as an iterable offloat
s where thei
-thfloat
gives the parking availability prediction (as a probability) for the parking zone withi
-th ID infeatures.zone_id
.
- the
train.py
script, which contains code to retrieve training data, train a model, compare its performance to its recent predecessors, and upload newly-trained models to cloud storage. When updating the model, changes may be necessary here to control- how features are derived from the retrieved dataset,
- Ideally, this would be done by converting dataset records into
PredictionAPIRequest
s and callingModelFeatures.from_request
on the requests. If the training dataset diverges from the production data in structure, however, this can be an alternative location for said code.
- Ideally, this would be done by converting dataset records into
- the core training procedure,
- how the model is packaged into a self-contained, serializable object for
storage purposes.
Other code modifications in
train.py
should only be necessary when a fundamental change has occurred in our data sources, how model performance is evaluated, etc.
- how features are derived from the retrieved dataset,
- unit tests for
app.model
intests/test_model.py
- These should be largely left unmodified or expanded upon.
Status: This project is in the release phase.
Release Frequency: This project is complete and will not be updated further beyond critical bug fixes
Release History: See CHANGELOG.md
Retention: Indefinitely
This project is licensed under the Apache 2.0 License - see the LICENSE.MD
for more details.
Follow the guidelines in the main organization repo
- Dr. Dan Moore for his data science expertise in refining the model and for his detailed notebooks explaining the process.
- Yuxiao Zhao for the initial model design
- Ben Brewer, Tim Regan and the rest of the Smart Columbus OS team for turning this into a usable and performant API.