Refactor Callbacks #60

HCookie · 2024-09-24T09:47:49Z

Split into seperate files
Use list in config to add callbacks
Provide legacy config enabled approach
Fix ruff issues

New Usage

Set config.diagnostics.callbacks to a list of callback names to include

Closes #59, #45

📚 Documentation preview 📚: https://anemoi-training--60.org.readthedocs.build/en/60/

- Split into seperate files - Use list in config to add callbacks - Provide legacy config enabled approach - Fix ruff issues

FussyDuck · 2024-09-24T09:47:54Z

All committers have signed the CLA.

HCookie · 2024-09-24T09:57:36Z

At the moment, this is the proposed refactor, I am yet to complete an exhaustive test of the changes

JesperDramsch · 2024-09-24T10:33:13Z

Great work, thank you for taking this on.

I was thinking that it might be nice to make this fully configurable through instantiate.

For example, no one is really using the stochastic weight averaging as far as I know, so having specific config entries for this is a bit of feature bloat.

Then the list of callbacks would just look like this:

callbacks:
  swa: _target_: pytorch_lightning.callbacks.stochastic_weight_avg.StochasticWeightAveraging
          swa_lr: 1e-4
          swa_epoch_start: 123
          annealing_epochs: 5
          annealing_strategy: cos
          device: null
  blabla: _target_: blabla_callback
             blabla: bla

This makes it more extensible and actually reduces some of or less used config entries.

Additionally, we can keep the standard callbacks, like model checkpoints as "permanent callback" (I don't think we have to make everything optional).

One idea I also had is that we could make a special list for "plot_callbacks" in the same style. Then we can easily keep the super convenient "plots.enabled = False" as a shortcut to disable them?

…acks

JesperDramsch

Hi @HCookie, thanks for taking on the callbacks!

It's already much better, great work on that. I think we can take the refactor even further and make the callbacks (almost?) fully modular, which would be incredible for future extensibility.

One comment regarding the file names. So far we haven't been using <xyz>-ing.py as language. Especially "checkpointing" would be confusing with activation checkpointing (although that is and will stay confusing honestly). Can we rename these please?

src/anemoi/training/diagnostics/callbacks/plotting.py

src/anemoi/training/diagnostics/callbacks/learning_rate.py

src/anemoi/training/diagnostics/callbacks/__init__.py

src/anemoi/training/config/diagnostics/eval_rollout.yaml

src/anemoi/training/diagnostics/callbacks/__init__.py

for more information, see https://pre-commit.ci

- Prefill config with callbacks - Warn on deprecations for old config - Expand config enabled - Add back SWA - Fix logging callback - Add flag to disable checkpointing - Add testing

…lback

[feature] Fix trainable attribute callbacks

Co-authored-by: Sara Hahner <[email protected]>

src/anemoi/training/diagnostics/callbacks/plot.py

src/anemoi/training/config/diagnostics/plot/detailed.yaml

sahahner · 2024-10-28T10:33:04Z

In general, this looks good to me. The new layout of the config files is intuitive. Thank you for the work that you have put into this.
There is one regard I have: The configuration of the callbacks is not traceable via MLFlow, as the list of targets is cut after a certain number of characters in the mlflow parameters.
Is there a way to work around this?

HCookie · 2024-10-28T10:41:50Z

The configuration of the callbacks is not traceable via MLFlow, as the list of targets is cut after a certain number of characters in the mlflow parameters.

That issue with mlflow is addressed in #91. So once that is merged, the config will be accessible in a dump or fully expanded

sahahner

Thank you for incorporating the requested changes. This looks good to me now.

…acks

sahahner

Looks good to me.

* Refactor Callbacks - Split into seperate files - Use list in config to add callbacks - Split out plotting callbacks config * Refactor rollout (#87) - New rollout central function --------- Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Sara Hahner <[email protected]>

Refactor Callbacks

b12fac8

- Split into seperate files - Use list in config to add callbacks - Provide legacy config enabled approach - Fix ruff issues

HCookie requested a review from JesperDramsch September 24, 2024 09:47

HCookie self-assigned this Sep 24, 2024

HCookie added 2 commits September 24, 2024 09:49

Update changelog

29a8477

Fix TypeError

15824be

HCookie removed the request for review from JesperDramsch September 24, 2024 09:57

HCookie added 5 commits September 25, 2024 08:13

Move to hydra.instantiate

4077bf4

Merge remote-tracking branch 'origin/develop' into fix/refactor_callb…

494d39d

…acks

Add __all__

fe37c02

Add to base config

2d8275c

Fix nested list

230eb0e

HCookie marked this pull request as ready for review September 25, 2024 09:41

HCookie requested a review from JesperDramsch September 25, 2024 10:42

HCookie added 2 commits September 26, 2024 14:31

Fix nested get issue

5547b20

Fix type checking

1d80cfb

HCookie mentioned this pull request Oct 1, 2024

Rollout video of variable dynamics #65

Draft

HCookie and others added 3 commits October 1, 2024 15:29

Merge branch 'develop' into fxi/refactor_callbacks

e79dfc7

feat: edge plot in callbacks

96ab74c

feat: set default extra callbacks

4aeb1a5

JesperDramsch previously requested changes Oct 1, 2024

View reviewed changes

pre-commit-ci bot and others added 7 commits October 2, 2024 10:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

816b3af

for more information, see https://pre-commit.ci

fix: typing & refactoring

644038f

fix: remove list comprehension

8356cd4

Refactor according to PR

930e4d2

- Prefill config with callbacks - Warn on deprecations for old config - Expand config enabled - Add back SWA - Fix logging callback - Add flag to disable checkpointing - Add testing

Update deprecation warning

52ea91f

Merge branch 'fxi/refactor_callbacks' into feature/graph-features-cal…

0dd81b7

…lback

Merge pull request #71 from ecmwf/feature/graph-features-callback

332f746

[feature] Fix trainable attribute callbacks

Apply suggestions from code review

d6e1d9c

Co-authored-by: Sara Hahner <[email protected]>

sahahner reviewed Oct 25, 2024

View reviewed changes

src/anemoi/training/diagnostics/callbacks/plot.py Outdated Show resolved Hide resolved

Fix init args issue in RolloutPlots

6073d84

sahahner reviewed Oct 25, 2024

View reviewed changes

src/anemoi/training/config/diagnostics/plot/detailed.yaml Outdated Show resolved Hide resolved

HCookie and others added 5 commits October 25, 2024 14:27

Add rollout_eval config

f1d883f

Add training mode to rollout step

66bd306

Force LongRolloutPlots to plot in serial

8dfe25d

Add warning to LongRolloutPlots when async

942e06f

Merge branch 'develop' into fxi/refactor_callbacks

8e6ab30

Fix asserrt calculation

84072a6

JPXKQX previously approved these changes Oct 28, 2024

View reviewed changes

Merge branch 'develop' into fxi/refactor_callbacks

42b59e5

HCookie dismissed JPXKQX’s stale review via 42b59e5 October 28, 2024 13:18

JPXKQX previously approved these changes Oct 28, 2024

View reviewed changes

HCookie dismissed JPXKQX’s stale review via a3f7e00 October 28, 2024 13:39

Apply post_processors before plotting in LongRolloutPlots

30dfd45

HCookie force-pushed the fxi/refactor_callbacks branch from a3f7e00 to 30dfd45 Compare October 28, 2024 13:51

HCookie and others added 3 commits October 28, 2024 14:01

Fix reference to batch

eebaf16

Merge branch 'develop' into fxi/refactor_callbacks

b31da0e

Fix debug config

8b2a30e

sahahner previously approved these changes Oct 29, 2024

View reviewed changes

mchantry added the ATS approved Approved by ATS label Oct 29, 2024

Merge remote-tracking branch 'origin/develop' into fix/refactor_callb…

7fe2c05

…acks

HCookie dismissed sahahner’s stale review via 7fe2c05 October 29, 2024 12:14

sahahner approved these changes Oct 29, 2024

View reviewed changes

HCookie merged commit 6433fa3 into develop Oct 29, 2024
116 checks passed

HCookie deleted the fxi/refactor_callbacks branch October 29, 2024 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Callbacks #60

Refactor Callbacks #60

HCookie commented Sep 24, 2024 •

edited by github-actions bot

Loading

FussyDuck commented Sep 24, 2024 •

edited

Loading

HCookie commented Sep 24, 2024

JesperDramsch commented Sep 24, 2024

JesperDramsch left a comment

sahahner commented Oct 28, 2024 •

edited

Loading

HCookie commented Oct 28, 2024

sahahner left a comment

sahahner left a comment

Refactor Callbacks #60

Refactor Callbacks #60

Conversation

HCookie commented Sep 24, 2024 • edited by github-actions bot Loading

New Usage

FussyDuck commented Sep 24, 2024 • edited Loading

HCookie commented Sep 24, 2024

JesperDramsch commented Sep 24, 2024

JesperDramsch left a comment

Choose a reason for hiding this comment

sahahner commented Oct 28, 2024 • edited Loading

HCookie commented Oct 28, 2024

sahahner left a comment

Choose a reason for hiding this comment

sahahner left a comment

Choose a reason for hiding this comment

HCookie commented Sep 24, 2024 •

edited by github-actions bot

Loading

FussyDuck commented Sep 24, 2024 •

edited

Loading

sahahner commented Oct 28, 2024 •

edited

Loading