Suggested Improvements to Python Embedding across MET #2414

DanielAdriaansen · 2023-01-20T17:55:04Z

Describe the Enhancement

This is a general issue with no due date to document suggested changes and/or improvements to how Python Embedding works within MET, with the primary goal of improving the user experience and ability to communicate and teach Python Embedding to users.

Time Estimate

TBD

Sub-Issues

Consider breaking the enhancement down into sub-issues.

Revisit the Python embedding strings PYTHON_NUMPY and PYTHON_XARRAY. I think we can generalize these more to be something like PYTHON_DATAPLANE or PYTHON_GRID and PYTHON_POINT. Then, for PYTHON_DATAPLANE (or _GRID), have MET deal with deciphering whether it is an Xarray object or a NumPy N-D array. The point data is easier, and really only comes in one way. The largest confusion in my opinion is that PYTHON_NUMPY is used for point data currently, which has nothing to do with NUMPY whatsoever. Alternatively, we could add PYTHON_POINT and maintain the two dataplane instances the way they are.
If --enable_python is used when building, have make_test check if the version of python specified can successfully load all the required packages from a list of required packages that is maintained somewhere. (See Enhance make_test to test Python embedding functionality Enhance make_test to test Python embedding functionality #2788)
Add to Appendix F or the contributors guide much more detailed information about running with and without MET_PYTHON_EXE set. Include wire diagrams to illustrate. Also document the data structures that are populated in the 3 different types of python embedding (grid, point, pairs).
This section in the METplus Wrappers docs: https://metplus.readthedocs.io/en/latest/Users_Guide/overview.html#metplus-components-python-requirements, is confusing because it seems that "cfgrib" is a required Python package for METplus Wrappers, but then later in chapter 4 it does not list "cfgrib", and says "Minimum Requirements To run most of the METplus wrappers, the following packages are required: dateutil (2.8)". I think there could be some documentation work in the METplus Wrappers also.
When attempting to generalize where Python Embedding should be put for METplus Wrappers configuration, I was able to determine the following. For gridded data, the *_INPUT_TEMPLATE conf item is set to just PYTHON_NUMPY or PYTHON_XARRAY, and then the Python script and any arguments is included in the *_VAR<n>_NAME conf item. However, for point data, the *_INPUT_TEMPLATE conf item contains both the PYTHON_NUMPY string, and another "=" followed by the Python script and any script arguments. It would be nice if these worked similarly, where for point data we could set the *_INPUT_TEMPLATE to just PYTHON_NUMPY, and then use the *_VAR<n>_NAME conf items for the Python script and any arguments to it, which would make configuring METplus wrappers for point and gridded data similar with respect to where elements of Python Embedding should be put in the configuration file. After talking to John HG, it sounds like it might not be feasible to change the way MET works -or- to change the wrappers to include both items in a single conf for gridded data like obs data due to how the MET tools can be called for gridded data. Some of our discussion:

Q: Hi John. Is there a good reason why point data works a little differently than gridded data for Python Embedding for METplus Wrappers? It looks like for point data, the *_INPUT_TEMPLATE conf item is used to specify the entire string "PYTHON_NUMPY=/my/script.py arg1 arg2", whereas for gridded data, the *_INPUT_TEMPLATE conf item simply only requires "PYTHON_NUMPY" (or _XARRAY), and then the _VAR_NAME conf entry requires the "/my/script.py arg1 arg2" part of the Python Embedding elements.
A: For gridded data, you typically define the script to be run along with arguments in the "field.name" config setting. There is no "field.name" setting for obs data really... since we supply all the point obs (consisting of all the types) in a single spot. So we replace the obs file name with the full python command to retrieve those obs from python.

How can we expose the value of MET_PYTHON_BIN_EXE to the user? I was able to run this command:
seneca:~$ /d1/projects/MET/MET_regression/develop/NB20230123/MET-develop/bin/plot_data_plane PYTHON_NUMPY fcst.ps 'name="test.py";' -v 12
which showed the value. However, it might be nice to make a symbolic link in /usr/local/met-11.0.0/bin to the python3.8 exe that was used, OR have a utility script in /usr/local/met-11.0.0/bin the user can run to echo the value of the Python install for them. This way the user can directly instantiate the same Python MET will use, and also verify whether the Python packages they need are available there or whether they will need to handle installing their own MET_PYTHON_EXE to use.
Add more documentation or examples of how to use point_stat with Python Embedding. David Fillmore was using Python Embedding for both fcst and obs data, with the fcst being gridded data and the obs being point data. This led to constructing a command for MET and configuring METplus for both "point" data and "gridded" data, which work differently. Maybe more documentation, but maybe this is another data point for changing how it works for gridded/point to be more similar?
There are some Python scripts located on this DTC webpage: https://dtcenter.org/community-code/model-evaluation-tools-met/sample-analysis-scripts. Is this the best place for them? Or should this be included in the Appendix F, and linked to from there? From John HG:

MET repo/RTD (https://met.readthedocs.io/en/latest/)

METplus repo/RTD (https://metplus.readthedocs.io/en/latest/)

METplus-Training repo/RTD (https://metplus-training.readthedocs.io/en/latest/)
I've been looking for more opportunities to expand METplus-Training... so perhaps that's a good landing spot.

Completed Items

Is there a way to know where the Python installation that MET was compiled with? This would help in situations where a user wanted to know if the Python installation had any particular packages installed in it to know what might be available (or not) for Python Embedding. I think this will be handled via the MET_PYTHON_BIN_EXE work. @georgemccabe is there an issue for that somewhere? This can be found in the list of environment variables set at MET compile time, and is also printed to the user as a debug statement when using Python embedding. The issue where this was added was Refine Python runtime environment #2388, and the debug statements were added later.
Fix "dash-dash" formatting in appendix F (RST problem) Changed in Feature #2285 read_point_data #2509
Differentiate Python stdlib packages from optional packages and remove stdlib packages from requirements in Appendix F (see Clarify MET Compile Time Python requirements #2490). Changed in Feature #2285 read_point_data #2509
From @DanielAdriaansen on 2/3/23, enhance MET to automatically call met_point_obs.py at runtime when needed rather than expecting the user to figure out if/when it actually needs to be called. Changed in Feature #2285 read_point_data #2509
Added 2/10/23: From Discussion python embedding; conda env METplus#2031, revise documentation to strongly suggest that users use the absolute path to their Python embedding scripts. Using relative paths or just simply the script name will produce unexpected results. Changed in Feature #2285 read_point_data #2509. Appendix F now has bolded "full path" references where appropriate.
2/17/2023: We need to clarify how to use convert_point_obs inside met_point_obs. It seems that if a user is NOT setting MET_PYTHON_EXE, then MET will only find this after 5ad920b, but if before that commit then they need to set PYTHONPATH. Similarly, if they are using Python embedding, they will have to set PYTHONPATH prior to Bugfix: Fix the MET vx_pointdata_python library to handle MET_PYTHON_EXE for python embedding of point observations #2428 being merged. Is this right? Changed in Feature #2285 read_point_data #2509. The solution here is just to encourage users to update to a version after this PR, which removes the need for convert_point_obs.
From @georgemccabe on 2/3/23, in https://github.com/dtcenter/MET/tree/main_v11.0/scripts/python consider moving met_point_obs.py into some other directory to automatically be included in the user's python path at runtime. But we don't necessarily want to include EVERYTHING that lives in the scripts/python directory because that MIGHT cause unexpected behavior depending on the names of scripts the user chooses. Inventory the existing location of python scripts and data files in the MET repo and revise, as needed. Changed in Feature #2285 read_point_data #2509. Since a user typically won't need this anymore (i.e. to call convert_point_data), then this probably won't be an issue but I also think the Python reorg in Feature #2285 read_point_data #2509 also makes it such that all the Python code a user would need/want to use in their Python embedding script will be found by MET at runtime. If running standalone the user may need to add some of these scripts/python/met directories to their PYTHONPATH.
Appendix F does not have a list of required PyEmbed packages but the METplus docs does (see here: https://metplus.readthedocs.io/en/latest/Users_Guide/installation.html#python-package-requirements) (see Clarify MET Compile Time Python requirements #2490). Changed in Feature #2285 read_point_data #2509. There is still more work to do on the METplus side, see Documentation: Review and update Requirements METplus#1978, and also the PR for Enhancement: Update use cases to use new Python directory structure in MET METplus#2115.
Update Appendix F to include what is reported if MET was compiled without Python? Changed in Feature #2285 read_point_data #2509.

Relevant Deadlines

NONE

Funding Source

NONE

Assignee

Select engineer(s) or no engineer required
Select scientist(s) or no scientist required

Labels

Select component(s)
Select priority
Select requestor(s)

Projects and Milestone

Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

METplus, MET, METdataio, METviewer, METexpress, METcalcpy, METplotpy

Enhancement Checklist

See the METplus Workflow for details.

The text was updated successfully, but these errors were encountered:

georgemccabe · 2023-02-10T17:41:37Z

os.path.abspath can be used to get the full path from a relative path. This could be used to add the directory containing the python embedding script if the user does not specify the full path. See: https://github.com/dtcenter/METplus/blob/aa9d7b3451bad720dd8a9928a735b3bf12318585/ush/run_metplus.py#L24

A potential fix (untested) would be to change this line from:
pyembed_module_name = sys.argv[2]
to:
pyembed_module_name = os.path.abspath(sys.argv[2])

DanielAdriaansen added this to the Backlog of Development Ideas milestone Jan 20, 2023

DanielAdriaansen self-assigned this Jan 20, 2023

JohnHalleyGotway modified the milestones: Backlog of Development Ideas, MET 11.1.0 Feb 3, 2023

DanielAdriaansen mentioned this issue Feb 17, 2023

Bugfix 2428 python from env main v11.0 #2443

Merged

15 tasks

DanielAdriaansen mentioned this issue Mar 16, 2023

Clarify MET Compile Time Python requirements #2490

Closed

1 task

JohnHalleyGotway modified the milestones: MET 11.1.0, MET 12.0.0 Jun 15, 2023

jprestop mentioned this issue Jan 11, 2024

Enhance make_test to test Python embedding functionality #2788

Open

22 tasks

DanielAdriaansen mentioned this issue May 6, 2024

Feature #2781 Convert MET NetCDF point obs to Pandas DataFrame #2877

Merged

17 tasks

JohnHalleyGotway modified the milestones: MET-12.0.0, MET-12.1.0 Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested Improvements to Python Embedding across MET #2414

Suggested Improvements to Python Embedding across MET #2414

DanielAdriaansen commented Jan 20, 2023 •

edited

Loading

georgemccabe commented Feb 10, 2023 •

edited

Loading

Suggested Improvements to Python Embedding across MET #2414

Suggested Improvements to Python Embedding across MET #2414

Comments

DanielAdriaansen commented Jan 20, 2023 • edited Loading

Describe the Enhancement

Time Estimate

Sub-Issues

Completed Items

Relevant Deadlines

Funding Source

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Enhancement Checklist

georgemccabe commented Feb 10, 2023 • edited Loading

DanielAdriaansen commented Jan 20, 2023 •

edited

Loading

georgemccabe commented Feb 10, 2023 •

edited

Loading