-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add a reader for nexrad level2 files #147
Conversation
@kmuehlbauer - I am still struggling to get this up and running... do you think there would be some utility to have a generic For example, taking the following as input: RadarDataStore(
time,
_range,
fields,
metadata,
scan_type,
latitude,
longitude,
altitude,
sweep_number,
sweep_mode,
fixed_angle,
sweep_start_ray_index,
sweep_end_ray_index,
azimuth,
elevation,
instrument_parameters=instrument_parameters,
) It would return an xarray data structure? I am having trouble decoupling the entry point commonalities from the individual file parsing structures in the current backends... Some benefits of moving toward this approach would be:
|
@mgrover1 I've had not yet time to check this PR out. Hopefully I can free up some time next week. If such object makes code readability and maintenance easier, why not. How would that be integrated into xarray-backend machinery? |
My thought right now is that it would be structured as: NexradLevel2File --> RadarDataStore --> NexradBackendEntrypoint The benefit here would be that the RadarDataStore would be the primary object that we would fit the coordinates + fields into... then we can pass that into the backend entrypoint. I can prototype this in this PR, and ping you when it is ready for feedback? |
@kmuehlbauer - I ended up going with a function instead of a class... it takes in the same things the radar object did in Py-ART, and returns an xarray.Dataset, with a group argument that can be used to specify which sweep to use... this way, it can be used directly with the backend entrypoint API... the user can then add additional bits to their dataset before returning that to the user. Open to thoughts here! |
@kmuehlbauer - I am stumped on what I am doing wrong here for it not to recognize the cython submodule I added. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #147 +/- ##
==========================================
- Coverage 90.79% 86.28% -4.51%
==========================================
Files 20 22 +2
Lines 3421 3995 +574
==========================================
+ Hits 3106 3447 +341
- Misses 315 548 +233
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgrover1 Thanks for moving this forward. I've added a couple of comments.
I'm still not convinced we need the cythonized interpolation at all. It looks like it is only needed to conform the single sweeps onto a common range resolution. This won't be needed for our sweep based data model. Or am I missing something?
So in case my assumption is correct, I'd suggest to shape the code to keep the original sweep resolution and remove all Cython related things from this PR.
@@ -24,5 +25,7 @@ | |||
from .iris import * # noqa | |||
from .odim import * # noqa | |||
from .rainbow import * # noqa | |||
|
|||
__all__ = [s for s in dir() if not s.startswith("_")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the __all__
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure... it is back in. thanks!!
|
||
# range | ||
_range = get_range_attrs() | ||
first_gate, gate_spacing, last_gate = _find_range_params(scan_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate of line 1072?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - should be fixed
xradar/io/backends/nexrad_level2.py
Outdated
# fields | ||
max_ngates = len(_range["data"]) | ||
available_moments = {m for scan in scan_info for m in scan["moments"]} | ||
interpolate = _find_scans_to_interp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is already done above at line 1075?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the latest commit - that should fix the duplication
xradar/io/backends/nexrad_level2.py
Outdated
warnings.warn( | ||
"Gate spacing is not constant, interpolating data in " | ||
+ f"scans {interp_scans} for moment {moment}.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems, that we do not have to do interpolation here. AFAICT this is only needed for CfRadial1 data (like good old Py-ART data model). We should be safe to keep the sweep resolution as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the interpolation in the latest commit @kmuehlbauer :) thanks for the suggestion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgrover1 We knew that it would not be an easy task. I've added another couple of suggestions and ideas.
It looks like we need to tackle the different gate spacing stuff at a lower level.
@@ -22,9 +22,6 @@ jobs: | |||
run: | | |||
python -m pip install --upgrade pip | |||
pip install black black[jupyter] ruff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK we would need to enable jupyter notebook linting/formatting for ruff in pyproject.toml
@@ -30,4 +30,5 @@ jobs: | |||
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }} | |||
run: | | |||
python -m build | |||
python setup.py build_ext --inplace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wont be needed anymore.
@@ -8,4 +8,5 @@ recursive-include tests * | |||
recursive-exclude * __pycache__ | |||
recursive-exclude * *.py[co] | |||
|
|||
global-include *.pyx *pxd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed too.
|
||
[build-system] | ||
requires = [ | ||
"setuptools>=45", | ||
"wheel", | ||
"setuptools_scm[toml]>=7.0", | ||
"numpy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy
can be removed?
return _unpack_structure(buf[pos : pos + size], structure) | ||
|
||
|
||
def _unpack_structure(string, structure): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, @mgrover, there is already similar decoding implemented over in the iris/sigmet backend. I'd suggest to align this after this PR is merged. I'd volunteer to take this on.
scale = np.float32(msg[moment]["scale"]) | ||
mask = data <= 1 | ||
scaled_data = (data - offset) / scale | ||
return np.ma.array(scaled_data, mask=mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also get rid of the mask
and masked array here, if we correctly specify missing values and/or _FillValues as attributes. Can be done as follow-up PR.
storage_options={"anon": True}, | ||
first_dimension=None, | ||
group=None, | ||
**kwargs, | ||
): | ||
# Load the data file in using NEXRADLevel2File Class | ||
nfile = NEXRADLevel2File( | ||
prepare_for_read(filename, storage_options=storage_options) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please make this depending on storage_options
.
storage_options={"anon": True}, | |
first_dimension=None, | |
group=None, | |
**kwargs, | |
): | |
# Load the data file in using NEXRADLevel2File Class | |
nfile = NEXRADLevel2File( | |
prepare_for_read(filename, storage_options=storage_options) | |
) | |
storage_options=None, | |
first_dimension=None, | |
group=None, | |
**kwargs, | |
): | |
# Load the data file in using NEXRADLevel2File Class | |
if storage_options is not None: | |
filename = prepare_for_read(filename, storage_options=storage_options) | |
nfile = NEXRADLevel2File( | |
filename | |
) |
Maybe this is a bit more involved, as storage_options
would have to be traversed to the backend reader.
# range | ||
_range = get_range_attrs() | ||
first_gate, gate_spacing, last_gate = _find_range_params(scan_info) | ||
_range["data"] = np.arange(first_gate, last_gate, gate_spacing, "float32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my, this is really hard to move from the CfRadial1 to CfRadial2 data model. AFAICS _range
-dict is used for all sweeps (assuming range interpolated to a common grid). This would need to be done on a per sweep basis.
This would need another round of refactoring here.
dic["_FillValue"] = -9999 | ||
if delay_field_loading: | ||
dic = LazyLoadDict(dic) | ||
data_call = _NEXRADLevel2StagedField(nfile, moment, max_ngates, scans) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the pain point, this will extract the moments of different scans (which might be on different range resolutions.
) | ||
|
||
|
||
def create_dataset_from_fields( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole function assumes that all data of all sweeps is in a common range grid. This would need refactor to work on a per sweep basis.
Co-authored-by: Kai Mühlbauer <[email protected]>
Thanks for the suggestions - I am busy with the AMS conference today, but will follow up later this week. I appreciate the feedback! Agreed - it is not easy, but will be worth it :) |
No worries, Max. I'm still getting accustomed to the code and we'll surely have further iteration cycles here. And you are absolutely right, it will be worth it. |
…into add-nexrad-reader
Closing this since @kmuehlbauer refactored + submitted with #158 |
This is a first cut at the nexrad file reader... starting with level2 data. Still a work in progress for now, but I figured I would share what I have so far. I may have more questions about backends @kmuehlbauer and how to deal with loading in the py-art like dictionaries.
history.md