Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge 25 - Regional Reanalysis for Europe with Machine Learning #10

Open
EsperanzaCuartero opened this issue Feb 24, 2023 · 20 comments
Labels
Stream 2 Machine Learning for Earth Sciences

Comments

@EsperanzaCuartero
Copy link
Contributor

EsperanzaCuartero commented Feb 24, 2023

Challenge 25 - Regional Reanalysis for Europe with Machine Learning

Stream 2 - Machine Learning for Earth Science

Goal

The main objective of this challenge is to develop a downscaling technique using Machine Learning (ML) tools such as to be able to generate finer spatial reanalysis information from a coarser grid-mesh reanalysis data.

Mentors and skills

  • Mentors: Mohanad Albughdadi, Matthew Chantry, Andras Horanyi, Cornel Soci
  • Skills required:
    • Theoretical and practical knowledge of Machine Learning tools
    • Experience handling large datasets
    • Writing of programmes for Machine Learning
    • Experience on plotting the results

Note: Only nationals from European Union (EU) Member States and countries associated with EU’s Digital Europe Programme (currently Iceland, Norway, and Lichtenstein) are eligible to participate (see Terms and Conditions).


Challenge description

ECMWF/C3S’s flagship global reanalysis is ERA5, which is covering the period 1940 to the present and has 31km of resolution on a global coverage (also includes a lower resolution uncertainty information). While ERA5 is very much appreciated by the users (more than 100 000 registered users in the CDS), they are also very much interested in accessing higher resolution and enhanced details in various parts of the Globe. For this, CERRA (Copernicus Regional Reanalysis for Europe) provides detailed information at 5.5km spatial horizontal resolution (CERRA also includes ensemble uncertainty information on 11km horizontal resolution). CERRA covers the period from September 1984 to June 2021. We will provide assistance to pre-process the ERA5 and CERRA datasets from the CDS.

The CERRA reanalysis includes a data assimilation system and a limited-area numerical weather prediction model. They have been used to produce high-resolution data using lateral boundary conditions from ERA5. The value of the regional reanalyses with respect to global reanalysis comes from the additional surface observations assimilated, the improved (i.e. more local details) description of the surface characteristics and the use of higher resolution tailor-made regional numerical weather prediction models.

The ultimate goal of this challenge in Code for Earth is to produce a model capable of downscaling ERA5 using regional forcings (orography or land-sea mask) towards the CERRA high-resolution analysis. A successful model would be capable of producing accurate CERRA estimates much faster than running the regional reanalysis system. As a proof of concept, we will target a limited number of parameters, starting with 2m temperature. The results would be compared to the original CERRA dataset and several baseline models. Possibly methodologies for downscaling could be conditional Generative Adversarial Models or Diffusion models.

A stretch goal of the challenge would be to provide ensemble-based uncertainty estimation to the downscaling fields. That might be achieved using the ensemble uncertainty information available from ERA5 and/or CERRA. A further stretch goal of the project could be to use sparse, noisy & synthetic observations from CERRA as an additional predictor, thus mimicking the use of observations in producing CERRA from ERA5.

From the practical point of view the challenge might consist of the following steps (take these points as guidelines):

  1. Training and validation of the Machine Learning model to map from ERA5 fields to CERRA fields on the above grid. Thereby creating the fine-scale structure of CERRA.
  2. Evaluate the produced dataset for the period 2019 July – 2021 June and provide data possibly up to present.
  3. Compare the produced dataset to the CERRA data (objective and subjective verification) and discuss the strengths and weaknesses of the proposed method and the new data.

Links and references

@EsperanzaCuartero EsperanzaCuartero added the Stream 2 Machine Learning for Earth Sciences label Feb 24, 2023
@EsperanzaCuartero EsperanzaCuartero changed the title Challenge 10 - Regional Reanalysis for Europe with Machine Learning Challenge 25 - Regional Reanalysis for Europe with Machine Learning Feb 27, 2023
@mishooax
Copy link

hi @EsperanzaCuartero - I'd like to follow this project as well, on behalf of for my section (CAMS). Could you add my Github ID to the participants list? Many thanks.

@HoranyiAndras
Copy link

Thanks Esperanza, would you add @cornelsoci to this task too (he is also a mentor)? Thank you.

@EsperanzaCuartero
Copy link
Contributor Author

EsperanzaCuartero commented Feb 27, 2023 via email

@HoranyiAndras
Copy link

Great, many thanks!

@EsperanzaCuartero
Copy link
Contributor Author

EsperanzaCuartero commented Feb 27, 2023 via email

@cornelsoci
Copy link

cornelsoci commented Feb 27, 2023 via email

@EsperanzaCuartero EsperanzaCuartero added Stream 2 Machine Learning for Earth Sciences and removed Stream 2 Machine Learning for Earth Sciences labels Mar 3, 2023
@bzah
Copy link

bzah commented Mar 16, 2023

Hi, I'm interested in this challenge but I currently have no professional experience in ML techniques.
However, I'm training myself with the ECMWF's machine learning MOOC.
Do you think completing the mooc would be a sufficient baseline to effectively address this challenge (or other ML related challenges) ?

I could cross post on the mooc forum, I think many people there would be interested in applying for theses challenges.

@trakasa
Copy link
Contributor

trakasa commented Mar 16, 2023

Hi, I'm interested in this challenge but I currently have no professional experience in ML techniques. However, I'm training myself with the ECMWF's machine learning MOOC. Do you think completing the mooc would be a sufficient baseline to effectively address this challenge (or other ML related challenges) ?

I could cross post on the mooc forum, I think many people there would be interested in applying for theses challenges.

Hi @bzah
joining the challenge on Regional reanalysis is a great opportunity to learn and enhance your skills in both ML and reanalysis. However, you should have already knowledge in these areas and it is understandable if you feel that your current level of expertise may not be sufficient for you to fully participate in the challenge. 
You might want to team up with other friends or colleagues who have complementary skills - then you can still be a valuable member of the team, even if you don't have all the necessary skills.
I hope this helps you with any next steps posting something on the MOOC forum sounds like a great idea!

@TheWeatherMan93
Copy link

Hi , I am interested in this chellenge because of many reasons, but I dont have any professional experience in Machine Learning Techinques. I have a wish to learn more about this, to try to make an other opportunities,
even when the mentioned meetings / trainings are over, if I pass, because I want to apply in my profession, anti-hail prevention.

Best regards,

Duško Mrkonjić

@HakamShams
Copy link

Hi,
I have a question regarding the challenge description:
Should the downscaling be deterministic or stochastic? In other words, should the downscaled ERA5 match the CERRA exactly or should it just be realistic and similar to the output of CERRA?

Thanks,
Hakam

@ischicker
Copy link

Hi all,
one of our team members is a bachelor student who would need to finish this June. She would implement one of our ideas (and will be also an official team member). Is it planned to use the CDS data directly or will they be provided in a more storage friendly way (and when)?
Otherwise, we would help her implementing a first shot model with only a subset of the data so that she is able to finish the BSc thesis in time.

Thanks,
Irene

@mchantry
Copy link

Hi , I am interested in this chellenge because of many reasons, but I dont have any professional experience in Machine Learning Techinques. I have a wish to learn more about this, to try to make an other opportunities, even when the mentioned meetings / trainings are over, if I pass, because I want to apply in my profession, anti-hail prevention.

Best regards,

Duško Mrkonjić

Hi Duško,

Great to hear that you are interested. We think it might be best for you to team up with some people with machine learning experience, to complement your own skills.

Thanks,

Mat

@mchantry
Copy link

Hi, I have a question regarding the challenge description: Should the downscaling be deterministic or stochastic? In other words, should the downscaled ERA5 match the CERRA exactly or should it just be realistic and similar to the output of CERRA?

Thanks, Hakam

Hi Hakam,

Ideally we would target a methodology capable of providing the inherent uncertainty in the downscaling mapping. However, given the limited time of the project it may be best to first target a deterministic solution before extending to an approach capable of providing calibrated uncertainty estimates.

Best,

Mat

@mchantry
Copy link

Hi all, one of our team members is a bachelor student who would need to finish this June. She would implement one of our ideas (and will be also an official team member). Is it planned to use the CDS data directly or will they be provided in a more storage friendly way (and when)? Otherwise, we would help her implementing a first shot model with only a subset of the data so that she is able to finish the BSc thesis in time.

Thanks, Irene

Hi Irene,

Thanks for your interest. We plan to assist with data retrievals and regridding to minimise the time spent on this aspect of the task.

Best,

Mat

@tfohrmann
Copy link

Hi everyone,
a question concerning this:
"A further stretch goal of the project could be to use sparse, noisy & synthetic observations from CERRA as an additional predictor, thus mimicking the use of observations in producing CERRA from ERA5."

If I am reading this right, you propose to use parts of the CERRA output as observation substitutes? What would be the problem with using actual observations? I was under the impression that, e.g., synoptic station data is stored in a database hosted by ECMWF?

Cheers,
Till

@HoranyiAndras
Copy link

Hi everyone, a question concerning this: "A further stretch goal of the project could be to use sparse, noisy & synthetic observations from CERRA as an additional predictor, thus mimicking the use of observations in producing CERRA from ERA5."

If I am reading this right, you propose to use parts of the CERRA output as observation substitutes? What would be the problem with using actual observations? I was under the impression that, e.g., synoptic station data is stored in a database hosted by ECMWF?

Cheers, Till

Hi Till,
Thanks for your interest. You perfectly understood what we meant. Ideally observations would be the best way, but (i) they have only limited access and (ii) that would make the problem more complicated. At this stage we would prefer to keep everything on a simpler level and see what ML can deliver for regional reanalysis.
I hope it helps, thanks, best regards
Andras

@cornelsoci
Copy link

cornelsoci commented Apr 2, 2023 via email

@i-rok94
Copy link

i-rok94 commented Apr 11, 2023

Hi, I am interested in participating in this challenge, But I am an Indian national. So am I eligible to participate in this challenge?

@trakasa
Copy link
Contributor

trakasa commented Apr 13, 2023

Hi, I am interested in participating in this challenge, But I am an Indian national. So am I eligible to participate in this challenge?

@i-rok94
for this challenge, only nationals from European Union (EU) Member States and countries associated with EU’s Digital Europe Programme (currently Iceland, Norway, and Lichtenstein) are eligible to participate.
For details please see also the Terms and Conditions - https://codeforearth.ecmwf.int/terms-and-conditions

@konradmayer
Copy link

Project repo initialized at https://github.com/ECMWFCode4Earth/tesserugged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stream 2 Machine Learning for Earth Sciences
Projects
None yet
Development

No branches or pull requests