Utility to extract, reshape, and store data for a subset of the data. e.g. for extracting timeseries for single PV sites from gridded NWPs #141
Labels
enhancement
New feature or request
performance
Improvements to runtime performance
usability
Make things more user-friendly
If I put on my hat of being an energy forecasting ML researchers, then one of the "dreams" would be to be able to use a single on-disk dataset (e.g. 500 TBytes of NWPs) for multiple ML experiments:
If the data is chunked on disk to support use-case 1 (the neural net) then we might use chunks something like
y=128, x=128, t=1, c=10
. But that sucks for use-case 2 (which only wants a single pixel).So it'd be nice to have a tool to:
y=1, x=1, t=4096, c=10
Maybe the ideal would be for the user to be able to express these conversions in a few lines of python, perhaps using xarray, whilst still saturating the IO (e.g. a cloud instance with a 200 Gbps NIC, reading and writing from object storage). The user shouldn't have to worry about parallelising stuff.
Perhaps you'd have multiple on-disk datasets (each optimised for a different read pattern). But the user wouldn't have to manually manage these multiple datasets. Instead the user would interact with a "multi-dataset" layer would would manage the underlying datasets (see #142).
The text was updated successfully, but these errors were encountered: