Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intake conversion NearestNeighbourDistance #363

Merged
merged 4 commits into from
Aug 31, 2024

Conversation

rbeucher
Copy link
Contributor

@rbeucher rbeucher commented Jun 6, 2024

Following the discussion in issue #313, we propose converting the recipes to use Intake, given that the Cookbook is no longer supported and the ACCESS-NRI Intake catalog is now available.

A few months ago, @max-anu began working on this transition. This pull request contains the changes @max-anu made to the notebook specified in the title.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@navidcy
Copy link
Collaborator

navidcy commented Jul 19, 2024

This is ready to be reviewed. Any volunteers @COSIMA/cosima-contrib?

@navidcy
Copy link
Collaborator

navidcy commented Jul 19, 2024

Ready to review!

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    import pandas as pd

Remove as not used


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    client = Client("tcp://10.6.48.33:8786")

Curious as to what this means/why we need it? Is it better than dong Client()? I get this error when running it:

OSError: Timed out trying to connect to tcp://10.6.48.33:8786 after 30 s

Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this with a "small" ARE session crushes the kernel. I think this is because it is not efficient to try to open and then subset the times...


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    cat_subset = catalog['01deg_jra55v140_iaf_cycle4']

Already run in the cell above.


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #4.    var_search = var_search.search(path=var_search.df['path'][0])

What does this do/why we need it?


Reply via ReviewNB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hack to only load this data from one file (it doesn't change in time, but is saved multiple time in the model output)

Its equivalent to the n=1 argument supplied to getvar in the cookbook version

@julia-neme
Copy link
Collaborator

I have only commented on the intake part since this is what this Issue is about, but I'm wondering whether we want to convert this to model agnostic/cfxarray sort of example.

But my biggest concern is that I tried to run it using a small ARE session, and my kernel crashed trying to open the data (3rd cell). It is perfectly fine to open it with the cosima_cookbook. Related to my test here.

@navidcy
Copy link
Collaborator

navidcy commented Jul 22, 2024

But my biggest concern is that I tried to run it using a small ARE session, and my kernel crashed trying to open the data (3rd cell). It is perfectly fine to open it with the cosima_cookbook. Related to my test here.

I'm cc'ing here @rbeucher, @anton-seaice, @dougiesquire, @AndyHoggANU in case they want to comment on this?

Copy link
Collaborator

@julia-neme julia-neme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in ReviewNB

@dougiesquire
Copy link
Collaborator

dougiesquire commented Jul 22, 2024

I'm cc'ing here @rbeucher, @anton-seaice, @dougiesquire, @AndyHoggANU in case they want to comment on this?

I think this is due to the way the CICE data are structured (with the 2D coordinates saved in every file). The issue has been brought up in the past on the forum in the context of the cookbook - see here. There are two solutions described in that topic:

  • open with decode_coords = False which will drop the 2D coords - this is what was being done with the cookbook before this PR (hence why it worked @julia-neme), but the kwarg was dropped in this PR (it's still actually mentioned in a comment)
  • open with compat="override", data_vars="minimal", coords="minimal" to skip coordinate verification - this is the better approach as you'll keep the 2D coords in your returned Dataset.

So, pass:

xarray_combine_by_coords_kwargs=dict(
    compat="override",
    data_vars="minimal",
    coords="minimal"
)

to your to_dask() calls. I've tried this and the nodebook runs on a small ARE instance.

Examples/Nearest_Neighbour_Distance.ipynb Outdated Show resolved Hide resolved
@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@anton-seaice anton-seaice Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest getting rid of the cat subset line and merging into the var_search line.

and changing darray to ds because its a dataset not a data array

This would be similar to cell 4 in https://github.com/COSIMA/cosima-recipes/blob/4fd51150331ebb1669fdb8bc58ff4ac16b313b0e/Examples/Bathymetry.ipynb


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@anton-seaice anton-seaice Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make the var_ice chunks larger to get rid of these warnings I think?


Reply via ReviewNB

@navidcy
Copy link
Collaborator

navidcy commented Aug 31, 2024

@anton-seaice feel free to push the changes you suggest?

Copy link
Collaborator

@navidcy navidcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done the suggestions by @anton-seaice and @dougiesquire
thanks!

@navidcy
Copy link
Collaborator

navidcy commented Aug 31, 2024

oh... @julia-neme requested changed so she needs to approve

@julia-neme I ran the notebook and seems good. Recipe takes few minutes!

feel free to re-review and approve/merge at your convenience!
if you find something minor that needs fixing commit -- I'm happy with whatever you do!

@navidcy
Copy link
Collaborator

navidcy commented Aug 31, 2024

I'm actually taking the liberty to merge and spare @julia-neme from looking at another intake PR

@navidcy navidcy merged commit 602a6ea into main Aug 31, 2024
3 checks passed
@navidcy navidcy deleted the INTAKE_NearestNeighbourDistance branch August 31, 2024 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants