Update doc for `metric_for_best_model` when `save_strategy="best"`. #35389

seanswyi · 2024-12-22T04:48:07Z

What does this PR do?

Updates the docstring for TrainingArguments.metric_for_best_model, Trainer._determine_best_metric, and adds a new test.

Specifically, when save_strategy="best" we need to specify a value for metric_for_best_model. This clashes with the previous logic that metric_for_best_model would default to loss.

Brought up in this comment: #31817 (comment)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@muellerzr @SunMarc (cc. @shcheklein - Author of comment)

shcheklein · 2024-12-22T05:54:24Z

tests/trainer/test_trainer.py

            self.assertIn("`args.metric_for_best_model` must be provided", str(context.exception))

+        # Case 4: Metric name not provided and save_best_strategy is "steps" (i.e., not "best").
+        with tempfile.TemporaryDirectory() as tmpdir:


not critical / minor: tbh, it seems a bit out of place for the test_save_best_checkpoint (as well as the previous case). I would probably move it into a separate test. Or should it otherwise call at least train and test actual checkpoint saved?

I guess I agree that it logically does seem a bit out of place. I think cases 3 and 4 could be grouped into their own methods since the point isn't so much to test the save_strategy = "best" itself but more to test the behavior related to metric_for_best_model.

I'm not sure if actually running training would be necessary, though. Case 3 is simply to check whether a ValueError is being properly thrown at Trainer initialization time, and case 4 is also simply to check whether the __post_init__ method of TrainingArguments properly initializes metric_for_best_model to "loss" when save_strategy != "best" and load_best_model_at_end = True. To me, neither of these seem to require training/evaluation and Trainer instantiation seems sufficient.

Agreed, I would also split it into a separate test (or two test). And, yes, we are testing the init here, that's why it was looking out of place.

no strong opinion. We can split it into a separate test for case 3 and 4.

shcheklein · 2024-12-22T17:28:20Z

src/transformers/training_args.py

@@ -477,7 +477,7 @@ class TrainingArguments:
        metric_for_best_model (`str`, *optional*):
            Use in conjunction with `load_best_model_at_end` to specify the metric to use to compare two different
            models. Must be the name of a metric returned by the evaluation with or without the prefix `"eval_"`. Will
-            default to `"loss"` if unspecified and `load_best_model_at_end=True` (to use the evaluation loss).
+            default to `"loss"` if unspecified, `load_best_model_at_end=True`, and `save_strategy!="best"`.


my 2cs (Disclaimer! I'm not very familiar with the whole scope of the initial change, or reason behind it!): it's a bit hard to read and understand what is going on here and why. E.g. why can't it default to loss when save_strategy == best? What is the major difference with the load_best_model_at_end (and save_strategy!="best")?

Again, apologies if I'm missing some obvious context here. Please feel free to disregard my comment / question then.

I didn't find the place where we set metric_for_best_model = "loss" when save_strategy!=best. Can you explain a bit why you changed the description ?

@shcheklein That was a design decision made here (#31817 (comment)). It was deemed easier to debug if we don't add a hard-coded value and rather raise an error.

@SunMarc Hmm I'm starting to think that maybe the problem is that we're not able to set load_best_model_at_end = True when save_strategy = "best" since load_best_model_at_end requires eval_strategy == save_strategy but eval_strategy doesn't have a "best" option.

This means that if we want to use save_strategy = "best" then we have to have load_best_model_at_end = False, which in turn means that when save_strategy != "best" and load_best_model_at_end = True then the __post_init__ method of TrainingArguments is setting metric_for_best_model to "loss". https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L1676:L1679

The modified docstring aims to add a bit more detail as to when the metric_for_best_model is set to a default of "loss".

should we also add best for eval_strategy then ?

SunMarc

Thanks ! Left a few comments

SunMarc · 2024-12-23T16:58:27Z

tests/trainer/test_trainer.py

            self.assertIn("`args.metric_for_best_model` must be provided", str(context.exception))

+        # Case 4: Metric name not provided and save_best_strategy is "steps" (i.e., not "best").
+        with tempfile.TemporaryDirectory() as tmpdir:


no strong opinion. We can split it into a separate test for case 3 and 4.

SunMarc · 2024-12-23T17:06:26Z

src/transformers/training_args.py

@@ -477,7 +477,7 @@ class TrainingArguments:
        metric_for_best_model (`str`, *optional*):
            Use in conjunction with `load_best_model_at_end` to specify the metric to use to compare two different
            models. Must be the name of a metric returned by the evaluation with or without the prefix `"eval_"`. Will
-            default to `"loss"` if unspecified and `load_best_model_at_end=True` (to use the evaluation loss).
+            default to `"loss"` if unspecified, `load_best_model_at_end=True`, and `save_strategy!="best"`.


I didn't find the place where we set metric_for_best_model = "loss" when save_strategy!=best. Can you explain a bit why you changed the description ?

seanswyi added 3 commits December 22, 2024 13:11

Updated docstring for _determine_best_metric.

048250e

Updated docstring for metric_for_best_model.

a903136

Added test case for save strategy.

84e91d1

seanswyi changed the title ~~Fix/update metric for best model default~~ Update doc for metric_for_best_model when save_strategy="best". Dec 22, 2024

seanswyi closed this Dec 22, 2024

seanswyi reopened this Dec 22, 2024

seanswyi added 2 commits December 22, 2024 14:18

Updated incorrect test case.

8accf30

Changed eval_strategy to match save_strategy.

20b276a

shcheklein reviewed Dec 22, 2024

View reviewed changes

SunMarc reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update doc for `metric_for_best_model` when `save_strategy="best"`. #35389

Update doc for `metric_for_best_model` when `save_strategy="best"`. #35389

seanswyi commented Dec 22, 2024

shcheklein Dec 22, 2024

seanswyi Dec 22, 2024

shcheklein Dec 22, 2024

SunMarc Dec 23, 2024

shcheklein Dec 22, 2024

SunMarc Dec 23, 2024

seanswyi Dec 24, 2024

SunMarc Dec 24, 2024

SunMarc left a comment

SunMarc Dec 23, 2024

SunMarc Dec 23, 2024

Update doc for metric_for_best_model when save_strategy="best". #35389

Are you sure you want to change the base?

Update doc for metric_for_best_model when save_strategy="best". #35389

Conversation

seanswyi commented Dec 22, 2024

What does this PR do?

Before submitting

Who can review?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SunMarc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Update doc for `metric_for_best_model` when `save_strategy="best"`. #35389

Update doc for `metric_for_best_model` when `save_strategy="best"`. #35389