You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#145 seems to have unblocked a lot of the iids that previously were not available! Big Win in general, but we need to work on some of the parts of our infrastructure.
The new API client hit a wall with too many dataset_ids passed. I have implemented a batching logic in the PR branch New Async API client jbusecke/pangeo-forge-esgf#45. This is already suuuper slow, and it might be worth investing a day to refactor this logic to async.
The speed considerations will become more pertinent as we add more iids with time. In particular the 'parsing' step where we go from the input list (with wildcards, brackets) to a list of single iids will produce more and more requests on each run of the deployment action.
The following steps will presumably get more manageable over time since we are pruning off the iids that are already ingested.
We are currently also handling this fairly inefficiently and are basically querying for the dataset info twice (once in expand_instance_id_list and then in get_recipe_inputs_from_iid_list(which currently takes a list of instance ids).
Going forward we should probably extract something like
{
'instance_id': {'id':..., 'field_a':..., },
'other_instance_id':{'id':..., 'field_a':..., },
...
}
This would make it trivial to prune off existing iids and then passing only the 'id' fields to get_recipe_inputs_from_iid_list
The text was updated successfully, but these errors were encountered:
#145 seems to have unblocked a lot of the iids that previously were not available! Big Win in general, but we need to work on some of the parts of our infrastructure.
This has led to two issues:
CMIPBQInterface.iid_list_exists
leap-data-management-utils#33 should resolve thisThe speed considerations will become more pertinent as we add more iids with time. In particular the 'parsing' step where we go from the input list (with wildcards, brackets) to a list of single iids will produce more and more requests on each run of the deployment action.
The following steps will presumably get more manageable over time since we are pruning off the iids that are already ingested.
We are currently also handling this fairly inefficiently and are basically querying for the dataset info twice (once in
expand_instance_id_list
and then inget_recipe_inputs_from_iid_list
(which currently takes a list of instance ids).Going forward we should probably extract something like
{
'instance_id': {'id':..., 'field_a':..., },
'other_instance_id':{'id':..., 'field_a':..., },
...
}
This would make it trivial to prune off existing iids and then passing only the 'id' fields to
get_recipe_inputs_from_iid_list
The text was updated successfully, but these errors were encountered: