-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuning - Augmentation Subsets Support #35
Conversation
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
Test Results 4 files 4 suites 1h 0m 32s ⏱️ Results for commit 4abe9f8. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Won't have time to review the code, just a conceptual question: What is the behavior given we now have both is_active and subset option for augmentations? Do inactive augmentations become active? It would be good to document this somewhere. |
The _subset sets |
Can we also tune the size of the selected subset? |
Not really. The way tuner params work is they override existing Config params each trial. Since Size of the subset is not a Config param we can't override it. Quick solution could be to check for special tuner param key an then generate random int in desired range to use as subset size but this solution seems a bit dirty. Do you have something nicer in mind? |
"Subset sampling currently only supported for augmentations" | ||
) | ||
whole_set_indices = self._augs_to_indices(whole_set) | ||
subset = random.sample(whole_set_indices, subset_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we preventing it from selecting a subset that was already selected in a previous run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this question I assume that typically we would use the augmentations tuning with all the other tunings disabled to get purely augmentation results. Would that be correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, there isn't really any prevention in place for selecting same subset in subsequent runs. We could change it to first get all possible combinations and then loop through them each run. But then number of trials should also be set to number of combinations so we go through all of them. I guess it depends on what would be the typical usecase for this aug subset sampling.
As you mentioned below, perhaps looping over whole powerset
is a more sensible impelmentation since normally we shouldn't be limited by the number of augmentations used and instead use all of those that produce better results. But to see if augmentation works you normally have to train the model for more epochs (at the beginning hard augs could produce worse results but with time perhaps those are the ones that actually improve acc on test set) and going over whole powerset could take a while (maybe add ability to limit minimum size of the subset to prune smaller ones).
CC: @tersekmatija on what would be the intended usecase of aug subsampling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I'm thinking more about the augmentation tuning, I think this will actually be more complex and will need more optimization. For now, I can think of a few optimizations if we'll work with some assumptions about the augmentations' effect on the training.
My idea: I think we can assume the order of augmentations should not (reasonably) matter, so if augmentation
This would decrease the number of possibilities from
However, as an extension of the above, if augmentation
This would bring the number of combinations all the way down to
More digestible in code:
best_augmentations = []
for a in all_augmentations:
if improves(a):
best_augmentations.append(a)
return best_augmentations
Someone would need to double-check my math on this though (and test whether the assumption even holds in the first place).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Math looks correct :) The challenge here is if you use more augmentations at once. If you use a
and b
and they improve the accuracy, you don't know whether contribution was made by a
or b
. This means that in any case you need to run at least n+1
(where n=80
) combinations to find out which show improvement and which do not. Furthermore, I think the challenge then might be that single augmentations improve the performance, but a combination of them decreases it (image gets too corrupted to be useful).
I still think it should be up to the user to define which augmentations are reasonable, and then merely test a few combinations to check whether they are useful or not. I think in the above scenario it's also hard to define what k
is?
I think instead of tuning the subset size, we can add another option like |
Yeah I'd skip for now. You can still achieve this with active flag on aug if you wanted to? |
Co-authored-by: Martin Kozlovsky <[email protected]>
Adds ability to randomly select subset of augmentations each tuning run. It can be defined in the config like this:
Note: Currently this is only supported for
augmentations
.