Suggested Improvements to Python Embedding across MET #2414
Labels
alert: NEED ACCOUNT KEY
Need to assign an account key to this issue
alert: NEED CYCLE ASSIGNMENT
Need to assign to a release development cycle
alert: NEED MORE DEFINITION
Not yet actionable, additional definition required
MET: Python Embedding
priority: low
Low Priority
requestor: NCAR/RAL
NCAR Research Applications Laboratory
type: enhancement
Improve something that it is currently doing
Milestone
Describe the Enhancement
This is a general issue with no due date to document suggested changes and/or improvements to how Python Embedding works within MET, with the primary goal of improving the user experience and ability to communicate and teach Python Embedding to users.
Time Estimate
TBD
Sub-Issues
Consider breaking the enhancement down into sub-issues.
PYTHON_NUMPY
andPYTHON_XARRAY
. I think we can generalize these more to be something likePYTHON_DATAPLANE
orPYTHON_GRID
andPYTHON_POINT
. Then, forPYTHON_DATAPLANE
(or_GRID
), have MET deal with deciphering whether it is an Xarray object or a NumPy N-D array. The point data is easier, and really only comes in one way. The largest confusion in my opinion is thatPYTHON_NUMPY
is used for point data currently, which has nothing to do with NUMPY whatsoever. Alternatively, we could addPYTHON_POINT
and maintain the two dataplane instances the way they are.PYTHON_NUMPY
orPYTHON_XARRAY
, and then the Python script and any arguments is included in the *_VAR<n>_NAME conf item. However, for point data, the *_INPUT_TEMPLATE conf item contains both thePYTHON_NUMPY
string, and another "=" followed by the Python script and any script arguments. It would be nice if these worked similarly, where for point data we could set the *_INPUT_TEMPLATE to justPYTHON_NUMPY
, and then use the *_VAR<n>_NAME conf items for the Python script and any arguments to it, which would make configuring METplus wrappers for point and gridded data similar with respect to where elements of Python Embedding should be put in the configuration file. After talking to John HG, it sounds like it might not be feasible to change the way MET works -or- to change the wrappers to include both items in a single conf for gridded data like obs data due to how the MET tools can be called for gridded data. Some of our discussion:MET_PYTHON_BIN_EXE
to the user? I was able to run this command:seneca:~$ /d1/projects/MET/MET_regression/develop/NB20230123/MET-develop/bin/plot_data_plane PYTHON_NUMPY fcst.ps 'name="test.py";' -v 12
which showed the value. However, it might be nice to make a symbolic link in /usr/local/met-11.0.0/bin to the python3.8 exe that was used, OR have a utility script in /usr/local/met-11.0.0/bin the user can run to echo the value of the Python install for them. This way the user can directly instantiate the same Python MET will use, and also verify whether the Python packages they need are available there or whether they will need to handle installing their own
MET_PYTHON_EXE
to use.point_stat
with Python Embedding. David Fillmore was using Python Embedding for both fcst and obs data, with the fcst being gridded data and the obs being point data. This led to constructing a command for MET and configuring METplus for both "point" data and "gridded" data, which work differently. Maybe more documentation, but maybe this is another data point for changing how it works for gridded/point to be more similar?plot_data_plane
field_string argument? name = "TMP"; level = "P500"; convert(x) = K_to_C(x); censor_thresh = lt0; censor_val = -9999; set_attr_name = "Temperature"; set_attr_level = "500mb"; set_attr_valid = 20210708_12;. See here for the best resource: https://met.readthedocs.io/en/latest/Users_Guide/config_options.html#fcst.MET/src/libcode/vx_data2d_python/python_dataplane.cc
Line 323 in 0123464
sys, os, argparse, importlib, numpy, and netCDF4
are required. Some of those are stdlib, but numpy (and maybe netCDF4) are not. (see Clarify MET Compile Time Python requirements #2490)MET_PYTHON_EXE
is set but does not exist, and warn that it is using the compile time instance.read_csv()
to read it. It specifies the type of data asdtype='str'
, thus all columns of the MPR file are cast as strings prior to sending to MET. If a user curates their own MPR data in Python (for example, massaging JSON data into the MPR line type format), they may have columns that are not string but rightfully numeric in their type (e.g. QC, FCST, OBS). Should we allow mixed data types in this case? At a minimum, perhaps we return a more informative error message thanbad object type
.type
attribute appears to use CamelCase (e.g.LatLon
) as opposed to all lower case (e.g.latlon
). We use all lower case for grid specification strings, so why is it different for Python embedding? We should make it the same for both, or allowing case-insensitive options for Python embedding to preserve backwards compatibility. See Python embedding to read geotiff data (sentinel 3 products) METplus#2702.Completed Items
MET_PYTHON_BIN_EXE
work. @georgemccabe is there an issue for that somewhere? This can be found in the list of environment variables set at MET compile time, and is also printed to the user as a debug statement when using Python embedding. The issue where this was added was Refine Python runtime environment #2388, and the debug statements were added later.met_point_obs.py
at runtime when needed rather than expecting the user to figure out if/when it actually needs to be called. Changed in Feature #2285 read_point_data #2509convert_point_obs
insidemet_point_obs
. It seems that if a user is NOT settingMET_PYTHON_EXE
, then MET will only find this after 5ad920b, but if before that commit then they need to setPYTHONPATH
. Similarly, if they are using Python embedding, they will have to setPYTHONPATH
prior to Bugfix: Fix the MET vx_pointdata_python library to handle MET_PYTHON_EXE for python embedding of point observations #2428 being merged. Is this right? Changed in Feature #2285 read_point_data #2509. The solution here is just to encourage users to update to a version after this PR, which removes the need forconvert_point_obs
.met_point_obs.py
into some other directory to automatically be included in the user's python path at runtime. But we don't necessarily want to include EVERYTHING that lives in the scripts/python directory because that MIGHT cause unexpected behavior depending on the names of scripts the user chooses. Inventory the existing location of python scripts and data files in the MET repo and revise, as needed. Changed in Feature #2285 read_point_data #2509. Since a user typically won't need this anymore (i.e. to callconvert_point_data
), then this probably won't be an issue but I also think the Python reorg in Feature #2285 read_point_data #2509 also makes it such that all the Python code a user would need/want to use in their Python embedding script will be found by MET at runtime. If running standalone the user may need to add some of these scripts/python/met directories to their PYTHONPATH.Relevant Deadlines
NONE
Funding Source
NONE
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
Enhancement Checklist
See the METplus Workflow for details.
Branch name:
feature_<Issue Number>_<Description>
Pull request:
feature <Issue Number> <Description>
Select: Reviewer(s) and Development issues
Select: Repository level development cycle Project for the next official release
Select: Milestone as the next official version
The text was updated successfully, but these errors were encountered: