Predicting the NCAA basketball tournament games using Machine Learning for Kaggle's (sponsored by Google Cloud) contest.
For more detail, check out my writeup.
You can run the IPython script in a Jupyter notebook locally or you can run it in my kernel on Kaggle's cloud resources.
Install Docker or Docker Toolbox
- Note: If you use Docker Toolbox, you must clone the repo underneath /Users (MacOS) or C:\Users (Windows), or else docker volume mounting WILL NOT WORK.
Once done, create a new docker machine with more processing power, disk memory, and RAM than the default machine (or as much as you can afford):
$ docker-machine create -d virtualbox --virtualbox-disk-size "50000" --virtualbox-cpu-count "4" --virtualbox-memory "8092" docker2
Now pull the docker image with all the python, ipython, and jupyter dependencies my notebook requires. We will run the notebook in this docker container.
This image is large, ~15GB, and will take a while to download/extract. Go grab a snack :)
$ docker pull kaggle/python
Extract the following datasets:
input\ZippedData\DataFiles.zip
toinput\
, and then extractinput\ZippedData\Stage2UpdatedDataFiles.zip
toinput\
You will need to overwrite files of the same name when you extract the second dataset.
Start your larger, more powerful, docker container.
$ docker-machine start docker2
Run the Jupyter notebook by running start.sh
.
If you are running Windows, you must run start.sh
from within the docker quickstart terminal.
sh start.sh
This will open a locally hosted jupyter notebook in your web browser. Open the March Madness notebook, located at nbs/script.ipynb
. Run all cells.
Running the whole script takes a couple minutes on my machine.
Once done, the script will output predictions in nbs\predictions.csv
.
CTR-C (twice) in the terminal where you started the jupyter notebook server kills the server.
Kill the docker machine when you are all done:
$ docker-machine stop docker2
To generate a bracket using the nbs\predictions.csv
generated by running the IPython notebook, run sh makebracket.sh
.
This will run in the docker container, and generate an output.png
file for this year's bracket. Your docker machine docker2
must be running for this script to run.
- Scikit-learn - Python Machine Learning library
- Kaggle - NCAA data from 1985-2018, initial basic logistic regression notebook forked from Kaggle
- Kevin Dorosh - Main Dev
This project is licensed under the BSD License - see the 3-Clause BSD website for details
The data provided is subject to all provisions set by the NCAA. It is not to be used for gambling - for more details, read the Kaggle rules.
- Kaggle for running this tournament
- Andrew Ng and his MOOC that I took to learn ML basics
- My professor Ming Chow, who enabled this self-driven study