Skip to content

Commit

Permalink
Merge pull request #15 from jbusecke/beam-refactor
Browse files Browse the repository at this point in the history
Start refactor to beam-refactor branch of pgf-recipes
  • Loading branch information
jbusecke authored Nov 16, 2023
2 parents 143a8e2 + ae4598d commit c2d8f99
Show file tree
Hide file tree
Showing 8 changed files with 530 additions and 416 deletions.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,10 @@ dmypy.json
# Pyre type checker
.pyre/
pangeo_forge_esgf/_version.py
test_beam.ipynb
test_DAMIP_feedstock_with_beam_refactor.ipynb
test_esgf-pyclient.ipynb
test_refactor.ipynb
test_refactor.py
test_script copy.py
pangeo_forge_esgf/recipe_inputs_old.py
37 changes: 36 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# pangeo-forge-esgf
Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge

Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge

## Parsing a list of instance ids using wildcards

Pangeo forge recipes require the user to provide exact instance_id's for the datasets they want to be processed. Discovering these with the [web search](https://esgf-node.llnl.gov/search/cmip6/) can become cumbersome, especially when dealing with a large number of members/models etc.

`pangeo-forge-esgf` provides some functions to query the ESGF API based on instance_id values with wildcards.
Expand All @@ -22,6 +23,7 @@ iids
```

and you will get:

```
['CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gn.v20191002',
'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.uo.gn.v20200212',
Expand All @@ -36,3 +38,36 @@ and you will get:
```

Eventually I hope I can leverage this functionality to handle user requests in PRs that add wildcard instance_ids, but for now this might be helpful to manually construct lists of instance_ids to submit to a pangeo-forge feedstock.

## Generating PGF recipe input (urls) from instance_ids

```python
from pangeo_forge_esgf import get_urls_from_esgf
iids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
url_dict = await get_urls_from_esgf(iids)
url_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
```

gives

```
100%|██████████| 5/5 [00:01<00:00, 4.98it/s]
Processing responses
Processing responses: Expected files per iid
Processing responses: Check for missing iids
Processing responses: Flatten results
Processing responses: Group results
Find responsive urls
100%|██████████| 1/1 [00:00<00:00, 3.25it/s]
['https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/SImon/sifb/gn/v20200817/sifb_SImon_ACCESS-CM2_historical_r1i1p1f1_gn_185001-201412.nc']
```

or if you want to see detaile debugging statements

```python
from pangeo_forge_esgf import get_urls_from_esgf, setup_logging
setup_logging('DEBUG')
iids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
url_dict = await get_urls_from_esgf(iids)
url_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
```
33 changes: 32 additions & 1 deletion pangeo_forge_esgf/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,32 @@
from .recipe_inputs import generate_recipe_inputs_from_iids
from .recipe_inputs import get_urls_from_esgf
import logging
import backoff #noqa #https://github.com/litl/backoff/issues/71

logging.getLogger('backoff').setLevel(logging.FATAL)
# not sure if this is needed, but I want to avoid the many backoff messages

def setup_logging(level: str = "INFO"):
"""A convenience function that sets up logging for developing and debugging recipes in Jupyter,
iPython, or another interactive context.
:param level: One of (in decreasing level of detail) ``"DEBUG"``, ``"INFO"``, or ``"WARNING"``.
Defaults to ``"INFO"``.
"""
import logging

try:
from rich.logging import RichHandler

handler = RichHandler()
handler.setFormatter(logging.Formatter("%(message)s"))
except ImportError:
import sys

handler = logging.StreamHandler(stream=sys.stdout)
handler.setFormatter(logging.Formatter("%(name)s - %(levelname)s - %(message)s"))

logger = logging.getLogger("pangeo_forge_esgf")
if logger.hasHandlers():
logger.handlers.clear()
logger.setLevel(getattr(logging, level))
logger.addHandler(handler)
184 changes: 0 additions & 184 deletions pangeo_forge_esgf/dynamic_kwargs.py

This file was deleted.

Loading

0 comments on commit c2d8f99

Please sign in to comment.