-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parallelization of IO for readers #99
Comments
@mgrover1 I've tried to track that down now using some GAMIC source data from our BoXPol radar. In the normal case I get the above shown white spaces in the task graph. If I remove the additional lines from the gamic xradar/xradar/io/backends/gamic.py Line 468 in 02b2d92
the call to Only if I That leads to the task graph's as shown below: One Timestep Single Moment of 15 (time: 12, azimuth: 360, range: 750): All Timesteps Single Moment of 15 (time: 12, azimuth: 360, range: 750): So as a consequence we might need to make sure no immediate dask-computations are triggered before actually doing something with the data. Would it make sense to create a test repo for that? |
Yeah, let's create a test repo to try this out - this is promising! We can take a look at more testing/establishing some benchmarks to dig in here. |
Maybe xradar-benchmark? |
Description
We should take a look at how we can speed up the xarray backends, and if there are more levels of parallelization possible.
I wonder if upstream enhancements of xarray
pydata/xarray#7437
Might help with this, enabling us to plug in the io directly/benefit from more parallelization here.
What I Did
I read the data the following code:
Which resulted in this task graph, where the green is the
open_dataset
function.Which has quite a bit of whitespace/could use some optimization.
The text was updated successfully, but these errors were encountered: