Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

15 remapping of one input variable to multiple new ones #21

Merged

Conversation

sahahner
Copy link
Member

@sahahner sahahner commented Aug 9, 2024

Remapping of one prognostic input variable to multiple new ones using a preprocessor.
A postprocessor maps the multiple variables back to the original one.

Solves issue #15

Implemented options: map a variable x in degrees ( in the range [0, 360] ) to cos(x) and sin(x).

Preprocessor:
The model should handle the remapped variables, and the training loss must be calculated on the remapped set of variables. The validation loss should be calculated on the original variable, which will also be output in inference. Therefore, this remapping should be implemented as a preprocessor that allows the remapping of one variable to several new ones.

Data Indices:
The preprocessors are the first layers of the model, so the input variables are the same.
After the preprocessors, the set of variables has changed. Therefore, this information needs to be included in the data indices.

The changes to the data indices need to be incorporated into anemoi-training, which is why the pytests in anemoi-training/develop fail. The changes to anemoi-training are incorporated by the following PR: ecmwf/anemoi-training#17. When this branch is checked out, all tests pass.


📚 Documentation preview 📚: https://anemoi-models--21.org.readthedocs.build/en/21/

@sahahner sahahner linked an issue Aug 9, 2024 that may be closed by this pull request
@FussyDuck
Copy link

FussyDuck commented Aug 9, 2024

CLA assistant check
All committers have signed the CLA.

@codecov-commenter
Copy link

codecov-commenter commented Aug 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.83%. Comparing base (af4d9dc) to head (fba0546).
Report is 4 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop      #21   +/-   ##
========================================
  Coverage    99.82%   99.83%           
========================================
  Files           21       22    +1     
  Lines         1158     1213   +55     
========================================
+ Hits          1156     1211   +55     
  Misses           2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mishooax
Copy link
Member

mishooax commented Aug 9, 2024

Hi Sara! @sahahner
Nice work - can this also be used to do a many-to-many map, e.g., (u,v) winds to (wind speed, wind direction)?

@sahahner
Copy link
Member Author

sahahner commented Aug 9, 2024

Hi Sara! @sahahner Nice work - can this also be used to do a many-to-many map, e.g., (u,v) winds to (wind speed, wind direction)?

In my opinion, mapping n variables to n variables should be done by another preprocessor, as this conversion can be done in-place to save memory.

The new preprocessor considers starting from only one variable. Together with @da-ewanp we are talking about extending it to n-to-m mappings in a follow-up PR.

@MeraX
Copy link

MeraX commented Aug 13, 2024

Hi Sara,

thank you for this PR. I wonder if it also works for forcings. We would highly appreciate that. Our case is the directional forcing variable SSO_THETA (Angle of sub-gridscale orography). We would like the model to use sin(SSO_THETA) and sin(SSO_THETA) as forcing data instead of SSO_THETA. The means we would only need the forward transformation in the training and inference but not the inverse transformation. Would your implementation support the transformation of forcings?

Could your implementation also be used derive cos_latitude, sin_latitude, cos_longitude, and sin_longitude on the fly?
If yes, could a remapping be used to derive cos_julian_day, sin_julian_day, cos_local_time, sin_local_time, and insolation as well, i.e. would it be possible to make the datetime available to the remapper?
Such kind of preprocessors could be an elegant solution for anemoi-inference to support unstructured grids.

As a final remark, I had a brief look at your code and noted that you have an in_place argument but you don't use it. Perhaps you could add assert not in_place and note in the doc string that in-place is not possible.

@sahahner
Copy link
Member Author

sahahner commented Aug 13, 2024

Thank you Marek for the feedback.
Good to hear that this functionality is useful for others as well.

I wonder if it also works for forcings. We would highly appreciate that. Our case is the directional forcing variable SSO_THETA (Angle of sub-gridscale orography). We would like the model to use sin(SSO_THETA) and sin(SSO_THETA) as forcing data instead of SSO_THETA. The means we would only need the forward transformation in the training and inference but not the inverse transformation. Would your implementation support the transformation of forcings?

The forcing variables are actually included in the training to enable the rollout.
Right now the functionality to apply remappers to forcing variables is not implemented, but I think it fits this PR.
I am working on including that functionality in this PR.

Could your implementation also be used derive cos_latitude, sin_latitude, cos_longitude, and sin_longitude on the fly? If yes, could a remapping be used to derive cos_julian_day, sin_julian_day, cos_local_time, sin_local_time, and insolation as well, i.e. would it be possible to make the datetime available to the remapper? Such kind of preprocessors could be an elegant solution for anemoi-inference to support unstructured grids.

If I can include the feature in this PR, this should also work for datetime variables.

@sahahner
Copy link
Member Author

As a final remark, I had a brief look at your code and noted that you have an in_place argument but you don't use it. Perhaps you could add assert not in_place and note in the doc string that in-place is not possible.

I left this argument, as we call all preprocessors in a loop and the in_place is set to the same value for all of them.
I do not want to enforce this variable to False (which would be the consequence of an assert), as the other preprocessors can be applied in-place to save memory. Also, I do not expect this remapper to be included by all users.

Do you @MeraX have a recommendation on how to solve this? I would be happy to incorporate some solution for that.

@MeraX
Copy link

MeraX commented Aug 13, 2024

Hi Sara, thank you for your quick reply. I think that a non-functional argument confuses the users or at least me. If you want to execute processors that support in_place to operate in-place you could do something like the following.

import inspect
for preprocessor in preprocessors:
    if "in_place" in inspect.getargspec(func):
        preprocessor(x, in_place=True)
    else:
        x = preprocessor(x)

Of course, one could find a better way to identify capabilities of preprocessors other than inspecting their signature.

Or perhaps, it could help to simply rename that argument. If I understand the code correctly, in_place is an optimisation that allows the preprocessor to reuse the memory of the input array. So the argument could perhaps be named in_place_if_possible or allow_in_place or allow_memory_reuse or something alike. That would make it clear that one should not expect remapper(x, allow_in_place=True) to modify x. One should rather always do x = remapper(x, allow_in_place=True).

@sahahner
Copy link
Member Author

Hi @MeraX,

I implemented the remapping of forcing variables. This should work on the datetime variables as well since at the place, where the preprocessors are applied, they are treated as any other variable. Please give it a try!

Thank you for the proposals for handling the in_place input. We are still deciding on how to handle these cases.
For now, I included a warning about using in_place=True in the remapper and added a comment to the docstrings as you proposed as well.

Copy link
Member

@JesperDramsch JesperDramsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes. LGTM

@JesperDramsch JesperDramsch merged commit 80787ce into develop Sep 9, 2024
49 of 102 checks passed
@JesperDramsch JesperDramsch deleted the 15-remapping-of-one-input-variable-to-multiple-new-ones branch September 9, 2024 15:05
japols pushed a commit that referenced this pull request Sep 26, 2024
* feat: remapper and change to data indices when mapping one variable to several

* tests: update imputer and normalizer tests

* feat: include remapper as a preprocessor. update init for all preprocessors. add tests for the remapper.

* tests: update tests for index collection

* documentation and changelog

* feat: enable remapper for forcing variables

* tests: include remapping forcing variable and do not test with remapping variables at the end

* comment and warning about using in_place=True in remapper as this is not possible

* comments: incorporate changes/documentation requested by jesper

* change order of function inputs preprocessors, documentation for data indices and remapper

* style: dict in config files for defining the variables to be remapped. structure and additional assert in index collection.

* args in preprocessors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remapping of one input variable to multiple new ones
6 participants