Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add a reader for nexrad level2 files #147

Closed
wants to merge 50 commits into from

Conversation

mgrover1
Copy link
Collaborator

@mgrover1 mgrover1 commented Jan 5, 2024

This is a first cut at the nexrad file reader... starting with level2 data. Still a work in progress for now, but I figured I would share what I have so far. I may have more questions about backends @kmuehlbauer and how to deal with loading in the py-art like dictionaries.

@mgrover1 mgrover1 marked this pull request as draft January 8, 2024 14:46
@mgrover1
Copy link
Collaborator Author

mgrover1 commented Jan 17, 2024

@kmuehlbauer - I am still struggling to get this up and running... do you think there would be some utility to have a generic RadarDataStore object, similar to the pyart.base.Radar object that could port dictionaries --> xarray backends?

For example, taking the following as input:

RadarDataStore(
        time,
        _range,
        fields,
        metadata,
        scan_type,
        latitude,
        longitude,
        altitude,
        sweep_number,
        sweep_mode,
        fixed_angle,
        sweep_start_ray_index,
        sweep_end_ray_index,
        azimuth,
        elevation,
        instrument_parameters=instrument_parameters,
    )

It would return an xarray data structure? I am having trouble decoupling the entry point commonalities from the individual file parsing structures in the current backends...

Some benefits of moving toward this approach would be:

  • The user just needs to extract dictionaries with those variables (we already have many of these in the base data model)
  • We can refactor the existing readers to use this approach, cutting down on the duplicated Variable()... code, as well as handling the different dimensions (azimuth vs. time)
  • We can add a well-defined "how to add an xradar xarray backend" section to the docs that walks through how to fit this into the data model
  • Easy to port over more readers from Py-ART

@kmuehlbauer
Copy link
Collaborator

@mgrover1 I've had not yet time to check this PR out. Hopefully I can free up some time next week.

If such object makes code readability and maintenance easier, why not. How would that be integrated into xarray-backend machinery?

@mgrover1
Copy link
Collaborator Author

My thought right now is that it would be structured as:

NexradLevel2File --> RadarDataStore --> NexradBackendEntrypoint

The benefit here would be that the RadarDataStore would be the primary object that we would fit the coordinates + fields into... then we can pass that into the backend entrypoint. I can prototype this in this PR, and ping you when it is ready for feedback?

@mgrover1
Copy link
Collaborator Author

@kmuehlbauer - I ended up going with a function instead of a class... it takes in the same things the radar object did in Py-ART, and returns an xarray.Dataset, with a group argument that can be used to specify which sweep to use... this way, it can be used directly with the backend entrypoint API... the user can then add additional bits to their dataset before returning that to the user. Open to thoughts here!

@mgrover1
Copy link
Collaborator Author

@kmuehlbauer - I am stumped on what I am doing wrong here for it not to recognize the cython submodule I added.

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: Patch coverage is 73.60862% with 147 lines in your changes are missing coverage. Please review.

Project coverage is 86.28%. Comparing base (a854715) to head (5c7bd7e).

❗ Current head 5c7bd7e differs from pull request most recent head 2a1c46e. Consider uploading reports for the commit 2a1c46e to get more accurate results

Files Patch % Lines
xradar/io/backends/nexrad_level2.py 75.76% 119 Missing ⚠️
xradar/io/backends/common.py 57.62% 25 Missing ⚠️
xradar/io/backends/nexrad_common.py 40.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #147      +/-   ##
==========================================
- Coverage   90.79%   86.28%   -4.51%     
==========================================
  Files          20       22       +2     
  Lines        3421     3995     +574     
==========================================
+ Hits         3106     3447     +341     
- Misses        315      548     +233     
Flag Coverage Δ
notebooktests ?
unittests 86.28% <73.60%> (-3.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mgrover1 mgrover1 marked this pull request as ready for review January 26, 2024 02:15
Copy link
Collaborator

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgrover1 Thanks for moving this forward. I've added a couple of comments.

I'm still not convinced we need the cythonized interpolation at all. It looks like it is only needed to conform the single sweeps onto a common range resolution. This won't be needed for our sweep based data model. Or am I missing something?

So in case my assumption is correct, I'd suggest to shape the code to keep the original sweep resolution and remove all Cython related things from this PR.

@@ -24,5 +25,7 @@
from .iris import * # noqa
from .odim import * # noqa
from .rainbow import * # noqa

__all__ = [s for s in dir() if not s.startswith("_")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the __all__ here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure... it is back in. thanks!!


# range
_range = get_range_attrs()
first_gate, gate_spacing, last_gate = _find_range_params(scan_info)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate of line 1072?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - should be fixed

# fields
max_ngates = len(_range["data"])
available_moments = {m for scan in scan_info for m in scan["moments"]}
interpolate = _find_scans_to_interp(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is already done above at line 1075?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the latest commit - that should fix the duplication

Comment on lines 1211 to 1213
warnings.warn(
"Gate spacing is not constant, interpolating data in "
+ f"scans {interp_scans} for moment {moment}.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, that we do not have to do interpolation here. AFAICT this is only needed for CfRadial1 data (like good old Py-ART data model). We should be safe to keep the sweep resolution as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the interpolation in the latest commit @kmuehlbauer :) thanks for the suggestion here.

Copy link
Collaborator

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgrover1 We knew that it would not be an easy task. I've added another couple of suggestions and ideas.

It looks like we need to tackle the different gate spacing stuff at a lower level.

@@ -22,9 +22,6 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install black black[jupyter] ruff
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we would need to enable jupyter notebook linting/formatting for ruff in pyproject.toml

@@ -30,4 +30,5 @@ jobs:
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: |
python -m build
python setup.py build_ext --inplace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wont be needed anymore.

@@ -8,4 +8,5 @@ recursive-include tests *
recursive-exclude * __pycache__
recursive-exclude * *.py[co]

global-include *.pyx *pxd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed too.


[build-system]
requires = [
"setuptools>=45",
"wheel",
"setuptools_scm[toml]>=7.0",
"numpy"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy can be removed?

return _unpack_structure(buf[pos : pos + size], structure)


def _unpack_structure(string, structure):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, @mgrover, there is already similar decoding implemented over in the iris/sigmet backend. I'd suggest to align this after this PR is merged. I'd volunteer to take this on.

scale = np.float32(msg[moment]["scale"])
mask = data <= 1
scaled_data = (data - offset) / scale
return np.ma.array(scaled_data, mask=mask)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also get rid of the mask and masked array here, if we correctly specify missing values and/or _FillValues as attributes. Can be done as follow-up PR.

Comment on lines +1011 to +1019
storage_options={"anon": True},
first_dimension=None,
group=None,
**kwargs,
):
# Load the data file in using NEXRADLevel2File Class
nfile = NEXRADLevel2File(
prepare_for_read(filename, storage_options=storage_options)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please make this depending on storage_options.

Suggested change
storage_options={"anon": True},
first_dimension=None,
group=None,
**kwargs,
):
# Load the data file in using NEXRADLevel2File Class
nfile = NEXRADLevel2File(
prepare_for_read(filename, storage_options=storage_options)
)
storage_options=None,
first_dimension=None,
group=None,
**kwargs,
):
# Load the data file in using NEXRADLevel2File Class
if storage_options is not None:
filename = prepare_for_read(filename, storage_options=storage_options)
nfile = NEXRADLevel2File(
filename
)

Maybe this is a bit more involved, as storage_options would have to be traversed to the backend reader.

# range
_range = get_range_attrs()
first_gate, gate_spacing, last_gate = _find_range_params(scan_info)
_range["data"] = np.arange(first_gate, last_gate, gate_spacing, "float32")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my, this is really hard to move from the CfRadial1 to CfRadial2 data model. AFAICS _range-dict is used for all sweeps (assuming range interpolated to a common grid). This would need to be done on a per sweep basis.

This would need another round of refactoring here.

dic["_FillValue"] = -9999
if delay_field_loading:
dic = LazyLoadDict(dic)
data_call = _NEXRADLevel2StagedField(nfile, moment, max_ngates, scans)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the pain point, this will extract the moments of different scans (which might be on different range resolutions.

)


def create_dataset_from_fields(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function assumes that all data of all sweeps is in a common range grid. This would need refactor to work on a per sweep basis.

@mgrover1
Copy link
Collaborator Author

@mgrover1 We knew that it would not be an easy task. I've added another couple of suggestions and ideas.

It looks like we need to tackle the different gate spacing stuff at a lower level.

Thanks for the suggestions - I am busy with the AMS conference today, but will follow up later this week. I appreciate the feedback! Agreed - it is not easy, but will be worth it :)

@kmuehlbauer
Copy link
Collaborator

Thanks for the suggestions - I am busy with the AMS conference today, but will follow up later this week. I appreciate the feedback! Agreed - it is not easy, but will be worth it :)

No worries, Max. I'm still getting accustomed to the code and we'll surely have further iteration cycles here. And you are absolutely right, it will be worth it.

@kmuehlbauer kmuehlbauer mentioned this pull request Mar 6, 2024
3 tasks
@mgrover1
Copy link
Collaborator Author

Closing this since @kmuehlbauer refactored + submitted with #158

@mgrover1 mgrover1 closed this Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Add NEXRAD Support
2 participants