This project was developed for Statoil/C-CORE Iceberg Classifier Challenge on Kaggle
There are three main files:
-
models.py contains the model classes.
-
train.py loads and prepares training and x-validation datasets and feeds them to an ensemble of models in batches. It uses k-folding cross validation to estimate the test error. It also saves model checkpoints, model description, and a log for viewing in Tensorboard. These are saved in a separate timestamped folder for each run to make it easier to keep track of experiments.
-
test.py loads and prepares test data. It then loads the model from the latest checkpoint created by train.py and iterates through the test data. Inspired by the work of Yarin Gal, I used drop-out during inference to create a confidence interval around each prediction. This confidence interval is treated as a measure of certainty of the model in its prediction, and it is used to adjust the predictions to achieve lower log-loss. Finally, it creates a CSV file ready to be submitted to Kaggle.