Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToDo list #1

Open
6 of 14 tasks
IamShubhamGupto opened this issue Nov 6, 2023 · 13 comments
Open
6 of 14 tasks

ToDo list #1

IamShubhamGupto opened this issue Nov 6, 2023 · 13 comments
Assignees

Comments

@IamShubhamGupto
Copy link
Member

IamShubhamGupto commented Nov 6, 2023

@IamShubhamGupto
Copy link
Member Author

@LaureZanna @jbusecke We can discuss more about the repository, notebooks, tools, plots here

@IamShubhamGupto
Copy link
Member Author

@suryadheeshjith

@LaureZanna
Copy link
Contributor

also tagging @NoraLoose who has been thinking about the code/data for the website and @Pperezhogin who is now training ML models from CM2.6 data on the LEAP Hub so we would want to use that as one of our example advanced test cases.

@jbusecke
Copy link
Collaborator

Hey everyone. Thanks for getting this started. Could we link single issues to the todos above, so we can disuss things in a focused way for each item? Many thanks.

@jbusecke jbusecke pinned this issue Dec 15, 2023
@jbusecke
Copy link
Collaborator

jbusecke commented Dec 15, 2023

Just finished a bit more thorough review. Great job @IamShubhamGupto @suryadheeshjith! Please make sure to update the todo in the original post with new items as you see fit (I suggest including merged prs/ closed issues) so we have a nice way to see progress here too!

I think we should chart a bit further into the future what we want to achieve here and how that will influence the structure of the book.

  • Can we make an issue where we list (and check-off + reference relevant issues/prs) notebooks that we want to
    • add from other repos
    • modify/combine from elsewhere
    • write from scratch

The main next step IMO should be to figure out a high level structure. Currently the notebooks are a root level list based on tools. Is that the organization we aim for? Or do we want to have different chapters:

  • Tools
  • Datasets
  • Methods
  • etc.

cc @LaureZanna

@LaureZanna
Copy link
Contributor

thanks @jbusecke , I agree - we are still missing a high-level structure. Here is a possible suggestion:

  • tools (xarray, xgcm, etc )
  • datasets (mom6, cam ....., some obs used for bias calculation)
  • methods (different ml models)
  • analysis (simple viz like movies etc of the main variables and biases ; advanced calculations (like ocean circulation) ; advanced ++ (pyconocline depth for heat content, etc)

Happy with something else!!
cc @NoraLoose @adcroft

@NoraLoose
Copy link
Member

NoraLoose commented Dec 19, 2023

I like the high-level structure that you are proposing @jbusecke @LaureZanna!

If one of the advanced use cases is to train an ML model from CM2.6 data, we may also want to add pytorch to the list of tools.

@NoraLoose
Copy link
Member

I will ask an even more general question: What is the goal of the data-gallery?

Showcasing M2LInES work? Tutorials on how to do ML for climate science? A book to be submitted to JOSE?

Sorry, if I missed earlier discussions on the end goal.

@LaureZanna
Copy link
Contributor

@NoraLoose : thanks for the feedback.
The primary goals are

  • providing tools for our community (starting within M2LInES) to visualize and analyze data, models, etc. (including avoid redundancies while leveraging existing expertise)
  • showcasing M2LInES work outside M2LInES

No plans for educational tools as with L96 yet, but this might change.

@jbusecke
Copy link
Collaborator

jbusecke commented Dec 20, 2023

Sounds like we have some convergence here. I am suggesting to focus on dataset specific notebooks for now and link the other sections to those notebooks. The reason I am saying this is that I suspect that there is a strong correlation between the individual datasets and the methods we can/want to apply.

So we could start with something like:

  • Dataset section
    • OM4 data
      • 1st cell: How to load, preprocess the data [-> relevant cell linked into tools/xarray notebook]
      • 2nd cell: Basic visualization [-> relevant cell linked into methods/visualization notebook]
      • 3rd cell: Transform to density coordinates [-> relevant cell linked to methods/density_coordinates notebook]
    • Another Dataset
      • ...

I think this will most naturally enable us to ingest (maybe existing notebooks) from peoples research. Few researchers write a notebook that shows how to load all the different datasets into xarray, but everyone writes a notebook for their dataset which loads, visualizes, and processes the data. To parse/organize these different parts out is the mission for this project.

A resulting methods notebook could then look like this:

Xarray

Loading data

Basic description links and small example
Here are some examples how to use this on specific datasets:
Xarray Loading with OM4
Xarray Loading with Another Dataset()

Basic Visualization with xarray

Timeseries

Basic description, links and small example
Here are some examples that use xarray timeseries plotting on specific datasets:
Xarray Timeseries plot with OM4
Xarray Timeseries plot with Another Dataset()

This enables the reader to not be overwhelmed by scrolling through 4000 lines of examples, but if they are interested in how to specifically apply a certain step to some dataset they can easily do that.

Happy to chat about this today on slack if needed. Starting tomorrow I will be on winter break.

@jbusecke
Copy link
Collaborator

jbusecke commented Feb 6, 2024

Just taking notes from our current conversation:

We decided to have two headers on the website based on this guide to write technical docs

  • Tutorials (tools focused), replaces the "Basic Concepts" section
  • How to Guides (datasets, and specific methods applied to datasets)

Trying to define a roadmap:

  1. Collecting source notebooks
    • Complete the list of 'source' notebooks might take longer, but we will template the steps for the existing ones.
    • We decided to actually copy and not link the notebooks (but link to the original in the header).
  2. Parsing each notebook into the tools used (so they can be linked in the Tutorials).
    • Proposed deadline: Fri 16th (meeting to check in on Tue 13).

@adcroft
Copy link

adcroft commented Feb 9, 2024

If you want some ideas from other notebooks looking at MOM6 output (not OM4), https://mom6-analysiscookbook.readthedocs.io/en/latest/ might be useful

@LaureZanna
Copy link
Contributor

Thanks @adcroft . @suryadheeshjith @IamShubhamGupto @jbusecke : this is great, we can get some inspiration from it and adapt some of them for OM4 + CM2.6 for our datasets, and create a few more diagnostics that are relevant for M2LInES

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants