-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update doc for metric_for_best_model
when save_strategy="best"
.
#35389
base: main
Are you sure you want to change the base?
Update doc for metric_for_best_model
when save_strategy="best"
.
#35389
Conversation
metric_for_best_model
when save_strategy="best"
.
self.assertIn("`args.metric_for_best_model` must be provided", str(context.exception)) | ||
|
||
# Case 4: Metric name not provided and save_best_strategy is "steps" (i.e., not "best"). | ||
with tempfile.TemporaryDirectory() as tmpdir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not critical / minor: tbh, it seems a bit out of place for the test_save_best_checkpoint
(as well as the previous case). I would probably move it into a separate test. Or should it otherwise call at least train and test actual checkpoint saved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I agree that it logically does seem a bit out of place. I think cases 3 and 4 could be grouped into their own methods since the point isn't so much to test the save_strategy = "best"
itself but more to test the behavior related to metric_for_best_model
.
I'm not sure if actually running training would be necessary, though. Case 3 is simply to check whether a ValueError is being properly thrown at Trainer initialization time, and case 4 is also simply to check whether the __post_init__
method of TrainingArguments
properly initializes metric_for_best_model
to "loss"
when save_strategy != "best"
and load_best_model_at_end = True
. To me, neither of these seem to require training/evaluation and Trainer instantiation seems sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I would also split it into a separate test (or two test). And, yes, we are testing the init here, that's why it was looking out of place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no strong opinion. We can split it into a separate test for case 3 and 4.
@@ -477,7 +477,7 @@ class TrainingArguments: | |||
metric_for_best_model (`str`, *optional*): | |||
Use in conjunction with `load_best_model_at_end` to specify the metric to use to compare two different | |||
models. Must be the name of a metric returned by the evaluation with or without the prefix `"eval_"`. Will | |||
default to `"loss"` if unspecified and `load_best_model_at_end=True` (to use the evaluation loss). | |||
default to `"loss"` if unspecified, `load_best_model_at_end=True`, and `save_strategy!="best"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my 2cs (Disclaimer! I'm not very familiar with the whole scope of the initial change, or reason behind it!): it's a bit hard to read and understand what is going on here and why. E.g. why can't it default to loss
when save_strategy == best? What is the major difference with the load_best_model_at_end
(and save_strategy!="best")?
Again, apologies if I'm missing some obvious context here. Please feel free to disregard my comment / question then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find the place where we set metric_for_best_model = "loss"
when save_strategy!=best
. Can you explain a bit why you changed the description ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein That was a design decision made here (#31817 (comment)). It was deemed easier to debug if we don't add a hard-coded value and rather raise an error.
@SunMarc Hmm I'm starting to think that maybe the problem is that we're not able to set load_best_model_at_end = True
when save_strategy = "best"
since load_best_model_at_end
requires eval_strategy == save_strategy
but eval_strategy
doesn't have a "best"
option.
This means that if we want to use save_strategy = "best"
then we have to have load_best_model_at_end = False
, which in turn means that when save_strategy != "best"
and load_best_model_at_end = True
then the __post_init__
method of TrainingArguments is setting metric_for_best_model
to "loss"
. https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L1676:L1679
The modified docstring aims to add a bit more detail as to when the metric_for_best_model
is set to a default of "loss"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also add best
for eval_strategy
then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Left a few comments
self.assertIn("`args.metric_for_best_model` must be provided", str(context.exception)) | ||
|
||
# Case 4: Metric name not provided and save_best_strategy is "steps" (i.e., not "best"). | ||
with tempfile.TemporaryDirectory() as tmpdir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no strong opinion. We can split it into a separate test for case 3 and 4.
@@ -477,7 +477,7 @@ class TrainingArguments: | |||
metric_for_best_model (`str`, *optional*): | |||
Use in conjunction with `load_best_model_at_end` to specify the metric to use to compare two different | |||
models. Must be the name of a metric returned by the evaluation with or without the prefix `"eval_"`. Will | |||
default to `"loss"` if unspecified and `load_best_model_at_end=True` (to use the evaluation loss). | |||
default to `"loss"` if unspecified, `load_best_model_at_end=True`, and `save_strategy!="best"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find the place where we set metric_for_best_model = "loss"
when save_strategy!=best
. Can you explain a bit why you changed the description ?
What does this PR do?
Updates the docstring for
TrainingArguments.metric_for_best_model
,Trainer._determine_best_metric
, and adds a new test.Specifically, when
save_strategy="best"
we need to specify a value formetric_for_best_model
. This clashes with the previous logic thatmetric_for_best_model
would default to loss.Brought up in this comment: #31817 (comment)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@muellerzr @SunMarc (cc. @shcheklein - Author of comment)