-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
renaming "test" split to "dev" #59
Comments
I think the name is already correct, i.e., "test" but consider the following considerations. case 1: test has labelsThe current behavior should work as expected, i.e., performance score could be calculated directly from test set. case 2: test has no labels (unseen target)If the test has no labels, then the model will concatenate train and dev (split_strategy = train), and the model will predict target given the the audio file from the test set (maybe mark test database with split_strategy = predict ?). No performance score on the test (but instead the output is a prediction file [CSV] with header |
@felixbur So, consider the following (it is common, e.g., in ComParE challenge). There are three splits given by the authors of the dataset: So there are possibilities for building a model:
The output in the last option will be a CSV file containing a file and prediction of labels. This file usually is submitted to the organizer to obtain the score of the test set. |
A simple workaround maybe just keep the current test as it is but providing more option if test has label (default) to differ where is it dev (has label) or test (unseen). [DATA]
test.has_labels = False By default, it assumed test has labels ( |
@felixbur One suggestion point that after getting the best model, user will be allowed to use both |
Nkululeko only knows two splits: train and test.
but it would be more correct to name the "test" split "dev" (short for development), as we kind of always use it to optimize a model.
Any thoughts?
The text was updated successfully, but these errors were encountered: