Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct train / test data #3

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

Ogaday
Copy link

@Ogaday Ogaday commented Oct 31, 2018

Prevent two types of data leakage. See the commit messages.

As `auto_arima` is run on the whole dataset, there is leakage of
algorithm parameters and the accuracy score for the test set might be
inflated as the test set was used to chose the ARIMA parameters. It's
for this same reason that Kaggle has a private leaderboard etc. In order
to prevent this I've simply split the data between train and test before
tuning the parameters.
The train set and test set were overlapping. This commit ensures that
the test data starts where the train data finishes up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant