-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Challenge #24 - CliMetLab - Machine Learning on weather and climate data #13
Comments
Hey! I am Vedant, a pre-final year undergrad student. I have been mainly working in the fields of Machine Learning and AI in general and have some experience in developing python libraries related to the same. I am interested in working on this challenge. Will be great if you could provide some more details on the work and how to get started, etc. |
Hi Vedant, thanks for your interest. The mentors will provide more details about the challenge as soon as possible. Best, Esperanza |
Hello Vedant, depending on your background/interest/time, you may want to focus more on one of the three tasks offered here or address all of them. |
Hi, When: Wednesday, 24 March 2021 at 4 pm GMT What: learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants How: register here. |
Hi! I'm interested in this challenge, and have prior experience of working with meteorological datasets using the Pangeo stack (Zarr, Xarray, etc.) If I understand correctly, plotting in the CliMetLab library is currently done using Magics, and you want that to be extended to allow for the creation of other kinds of plots using Matplotlib? Would this require the creation of an additional Matplotlib driver within plotting? It would be great if you could provide some information on what you foresee in terms of plotting functionality. |
@vidurmithal this is a good idea. The most important is that CliMetLab is seen as a framework with a plug-in architecture. So yes, support for different plotting software is a good idea as long you can ensure that specifics of that software are somewhat hidden from the end user. The aim of CliMetLab is also to provide high-level functions so that user can focus on science. Of course, users could also be given access to lower levels functionalities, as long as they are optional. |
Thank you for your response @b8raoult. So, if I understand correctly, for Task 1, you are looking at something like the plotting functionality that is built in to libraries like pandas, geopandas and even xarray, where calling |
Yes, that is correct. One of the challenge is to route a call to |
I got more questions from email:
Magics has so much features that a replacement is way out-of-scope. This task is to explore this path though. For climetlab users, it would offer a way to plot nicely the data (as nice as with Magics) with the tools they are used to (i.e. matplotlib). For plugin developers, providing visualization code (along in the plugin code to access the data) may be easier with a matplotlib driver.
Yes, the plugin we expect for intake would be a 'source' plugin (not a dataset plugin) : see the doc for an example https://climetlab.readthedocs.io/en/latest/contributing/sources.html
To elaborate the short sentence "define appropriate configuration (chunking/compression/other) according to domain use cases, develop tools to benchmark when used on a cloud-platform, compare to other formats (N5, GRIB, netCDF, geoTIFF, etc.)", here are a few questions I have in mind. I believe that answering some of them would fulfill task 3. |
Regarding task 2. https://xskillscore.readthedocs.io/en/stable/api/xskillscore.roc.html#xskillscore.roc provides a calculation for ROC |
Challenge 24- CliMetLab - Machine Learning on weather and climate data
Goal
Extend new Python ML package and help to mature package
Mentors and skills
Challenge description
CliMetLab is a Python package aiming at simplifying access to climate and meteorological datasets, allowing users to focus on science instead of technical issues such as data access and data formats. It is mostly intended to be used in Jupyter notebooks, and be interoperable with all popular data analytic packages, such as NumPy, Pandas, Xarray, SciPy, Matplotlib, etc. and well as Machine Learning frameworks, such as TensorFlow, Keras or PyTorch. Several tasks are proposed:
Task 1: extend CliMetLab with so that offers user with high-level Matplotlib-based plotting functions to produce graphs and plot which are relevant to weather and climate applications (e.g. plumes plots, ROC curves, …).
Task 2: the Python package Intake is a lightweight set of tools for loading and sharing data in data science projects. Extend CliMetLab so that it seamlessly interfaces with Intake and allow users to access all intake-compatible datasets.
Task 3: Xarray uses the data format Zarr to allow parallel read and parallel write. Convert large already available datasets to xarray-readable zarr format, define appropriate configuration (chunking/compression/other) according to domain use cases, develop tools to benchmark when used on a cloud-platform, compare to other formats (N5, GRIB, netCDF, geoTIFF, etc.).
The text was updated successfully, but these errors were encountered: