-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore concat_dim if only one file is passed #275
Comments
I think the problem may have been with the fact that I explicitly specified https://github.com/pangeo-data/pangeo-cmip6-cloud/blob/master/zarr_from_esgf.py#L92 |
You did, and this is actually a parameter that needs to be changed depending on the dataset (depending on the dimensionality and lateral dimensions we need different time chunks to keep the chunksize in the optimal range) What do you think about some additional logic that loads the first url and does some simple logic to:
I think that is not a bad way, but in that case we probably want to move this issue back to https://github.com/pangeo-data/pangeo-cmip6-cloud? |
Pinging this again here. I just ran into this again here. I realize that my use case (many 100k datasets in CMIP) is fairly on the edge of the usecase here, so I am trying to work with minimal changes to pgf-recipes, but I wonder if enabling 'concat_dim=None' in the pattern (and downstream) would be a viable option? It seems like some of the internals would already accept None as input. I might try to submit a WIP PR, but wanted to ask folks here about it first. |
Trying to test/implement this in #783 |
We ran into an issue with out prototype script to create zarr stores from CMIP6 netcdfs today. I believe the issue there is that we want to generally concatenate files in the 'time' dimension, except for when the file does not have a time dimension.
I am wondering if this could be implemented as a check and ignore logic in the recipe itself. Within https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/pangeo_forge_recipes/recipes/xarray_zarr.py could we implement a check for the number of files passed, and if only one is passed, check if the
concat_dim
is present. If not it should just default to a single chunk.This would enable a smooth processing of many datasets without the need to introspect the datasets beforehand.
The text was updated successfully, but these errors were encountered: