Skip to content

Commit

Permalink
Feature 325 aggregation support (#346)
Browse files Browse the repository at this point in the history
* Update README.md (#321)

update required Python version to 3.10+

* Added aggregation features

* Test

* removed folders

* Added aggregation features

* Updates settings and improved folder search algorithm; Added README

* Corrected FBIAS stat fields

* Create Aggregation.rst

creating file from:https://github.com/dtcenter/METcalcpy/blob/feature_325_aggregation_support/metcalcpy/pre_processing/aggregation/README.md
Copied Vertical Interpolation as a template

* adding aggregation

* Rename Aggregation.rst to aggregation.rst

* first pass at cleaning up warnings

* changing to 3rd person

* issue #325 CTS data from RRFS to test aggregation

* Issue #325 added background on agg_stat.py

* issue #325 added instructions for bash and csh, added links to external references

* issue #325 fix syntax for subsection

* Issue #325 fix grammar, add instructions for importing and invoked by another script

* issue #325 more fixes to grammar for import instructions

* issue #325 added corrected instructions for running via command-line (included the path to the agg_stat.py module)

* Issue #325 modify config file to specify valid paths for input and output files.

* Issue #325 modified for User's Guide instructions

* issue #325 added reformatted data for ECNT and compatible for METcalcpy agg_stat input

* Delete test/data/rrfs_cts_reformatted.data

not used for testing.  Using the ECNT data instead.

* issue #325 pytest on ECNT data reformatted with METdataio METreformat and aggregation statistics calculated

* issue #325 added latest test for ECNT aggregation

* Issue #325 address pandas future warning that causes current pytests to fail.
Remove pandas chaining such as:
  df['column_name'][index] = var_name

with:
 df.loc[index, 'column_name'] = var_name

* Issue #325 address pandas future warning that causes current pytests to fail.
Remove pandas chaining such as:
  df['column_name'][index] = var_name

with:
 df.loc[index, 'column_name'] = var_name

* Issue #325 updated input data to ECNT data, corrected the explanation of expected input format for agg_stat.

* Issue #325 modify config file to use RRFS ECNT .stat data reformatted by METdataio

* issue #325 point to actual config file via literalinclude

* issue #325 replace reference to the CTS output file with ECNT

* replace pandas append with concat

* Update unit_tests.yml

added test_reformatted_for_agg.py

* fixed syntax error with list

* issue #325 update test data with correctly reformatted ECNT line data

* issue #325 removed some unneccessary text

---------

Co-authored-by: VanderleiVargas-NOAA <[email protected]>
Co-authored-by: lisagoodrich <[email protected]>
  • Loading branch information
3 people authored Feb 2, 2024
1 parent 34dcfd8 commit e76a606
Show file tree
Hide file tree
Showing 24 changed files with 9,692 additions and 13 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ jobs:
pytest test_validate_mv_python.py
pytest test_future_warnings.py
pytest test_sl1l2.py
coverage run -m pytest test_agg_eclv.py test_agg_stats_and_boot.py test_agg_stats_with_groups.py test_calc_difficulty_index.py test_convert_lon_indices.py test_event_equalize.py test_event_equalize_against_values.py test_lon_360_to_180.py test_statistics.py test_tost_paired.py test_utils.py test_future_warnings.py
pytest test_reformatted_for_agg.py
coverage run -m pytest test_agg_eclv.py test_agg_stats_and_boot.py test_agg_stats_with_groups.py test_calc_difficulty_index.py test_convert_lon_indices.py test_event_equalize.py test_event_equalize_against_values.py test_lon_360_to_180.py test_statistics.py test_tost_paired.py test_utils.py test_future_warnings.py test_reformatted_for_agg.py
coverage html
- name: Archive code coverage results
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,6 @@ Instructions for installing the metcalcpy package locally
Instructions for installing the metcalcpy package from PyPI
-----------------------------------------------------------

- activate your Python 3.8.6+ conda environment
- activate your Python 3.10+ conda environment
- run the following from the command line:
- pip install metcalcpy==x.y.z where x.y.z is the version number of interest
198 changes: 198 additions & 0 deletions docs/Users_Guide/aggregation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
***********
Aggregation
***********

Aggregation is an option that can be applied to MET stat output (in
the appropriate format) to calculate aggregation statistics and confidence intervals.
Input data must first be reformatted using the METdataio METreformat module to
label all the columns with the corresponding statistic name specified in the
`MET User's Guide <https://met.readthedocs.io/en/develop/Users_Guide/index.html>`_
for `point-stat <https://met.readthedocs.io/en/develop/Users_Guide/point-stat.html>`_,
`grid-stat <https://met.readthedocs.io/en/develop/Users_Guide/grid-stat.html>`_, or
`ensemble-stat <https://met.readthedocs.io/en/develop/Users_Guide/ensemble-stat.html>`_ .stat output data.

Python Requirements
===================

The third-party Python packages and the corresponding version numbers are found
in the requirements.txt and nco_requirements.txt files:

**For Non-NCO systems**:

* `requirements.txt <https://github.com/dtcenter/METcalcpy/blob/develop/requirements.txt>`_

**For NCO systems**:

* `nco_requirements.txt <https://github.com/dtcenter/METcalcpy/blob/develop/nco_requirements.txt>`_


Retrieve Code
=============

Refer to the `Installation Guide <https://metcalcpy.readthedocs.io/en/develop/Users_Guide/installation.html>`_
for instructions.


Retrieve Sample Data
====================

The sample data used for this example is located in the $METCALCPY_BASE/test directory,
where **$METCALCPY_BASE** is the full path to the location of the METcalcpy source code
(e.g. /User/my_dir/METcalcpy).
The example data file used for this example is **rrfs_ecnt_for_agg.data**.
This data was reformatted from the MET .stat output using the METdataio METreformat module.
The reformatting step labels the columns with the corresponding statistics, based on the MET tool (point-stat,
grid-stat, or ensemble-stat). The ECNT linetype of
the MET grid-stat output has been reformatted to include the statistics names for all
`ECNT <https://met.readthedocs.io/en/develop/Users_Guide/ensemble-stat.html#id2>`_ specific columns.


Input data **must** be in this format prior to using the aggregation
module, agg_stat.py.

The example data can be copied to a working directory, or left in this directory. The location
of the data will be specified in the YAML configuration file.

Please refer to the METdataio User's Guide for instructions for reformatting MET .stat files :
https://metdataio.readthedocs.io/en/develop/Users_Guide/reformat_stat_data.html


Aggregation
===========

The agg_stat module, **agg_stat.py** to is used to calculate aggregated statistics and confidence intervals.
This module can be run as a script at the command-line, or imported in another Python script.

A required YAML configuration file, **config_agg_stat.yaml** file is used to define the location of
input data and the name and location of the output file.

The agg_stat module support the ECNT linetype that are output from the MET
**ensemble-stat** tool

The input to the agg_stat module must have the appropriate format. The ECNT linetype must first be
`reformatted via the METdataio METreformat module <https://metdataio.readthedocs.io/en/develop/Users_Guide/reformat_stat_data.html>`_
by following the instructions under the **Reformatting for computing aggregation statistics with METcalcpy agg_stat**
header.

Modify the YAML configuration file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The config_agg_stat.yaml is required to perform aggregation statistics calculations. This
configuration file is located in the $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config
directory. The $METCALCPY_BASE is the directory where the METcalcpy source code is
saved (e.g. /Users/my_acct/METcalcpy). Change directory to $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config
and modify the config_agg_stat.yaml file.

1. Specify the input and output files

.. code-block:: yaml
agg_stat_input: /path-to/test/data/rrfs_ecnt_for_agg.data
agg_stat_output: /path-to/ecnt_aggregated.data
Replace the *path-to* in the above two settings to the location where the input data
was stored (either in a working directory or the $METCALCPY_BASE/test directory). **NOTE**:
Use the **full path** to the input and output directories (no environment variables).

2. Specify the meteorological and the stat variables:

.. code-block:: yaml
fcst_var_val_1:
TMP:
- ECNT_RMSE
- ECNT_SPREAD_PLUS_OERR
3. Specify the selected models/members:

.. code-block:: yaml
series_val_1:
model:
- RRFS_GEFS_GF.SPP.SPPT
4. Specify the selected statistics to be aggregated, in this case, the RMSE and SPREAD_PLUS_OERR
statistics from the ECNT ensemble-stat tool output are to be calculated. The aggregated statistics
are named ECNT_RMSE and ECNT_SPREAD_PLUS_OERR (append original statistic name with the linetype):

list_stat_1:
- ECNT_RMSE
- ECNT_SPREAD_PLUS_OERR

The full **config_agg_stat.yaml** file is shown below:


.. literalinclude:: ../../metcalcpy/pre_processing/aggregation/config/config_agg_stat.yaml



**NOTE**: Use full directory paths when specifying the location of the input file and output
file.


Set the Environment and PYTHONPATH
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

bash shell:

.. code-block:: ini
export METCALCPY_BASE=/path-to-METcalcpy
csh shell:

.. code-block:: ini
setenv METCALCPY_BASE /path-to-METcalcpy
where *path-to-METcalcpy* is the full path to where the METcalcpy source code is located
(e.g. /User/my_dir/METcalcpy)

bash shell:

.. code-block:: ini
export PYTHONPATH=$METCALCPY_BASE/:$METCALCPY_BASE/metcalcpy
csh shell

.. code-block:: ini
setenv PYTHONPATH $METCALCPY_BASE/:$METCALCPY_BASE/metcalcpy
Where $METCALCPY_BASE is the full path to where the METcalcpy code resides (e.g. /User/
my_dir/METcalcpy).

Run the python script:
^^^^^^^^^^^^^^^^^^^^^^

The following are instructions for performing aggregation from the command-line:

.. code-block:: yaml
python $METCALCPY_BASE/metcalcpy/agg_stat.py $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config/config_stat_agg.yaml
This will generate the file **ecnt_aggregated.data** (from the agg_stat_output setting) which now contains the
aggregated statistics data.


Additionally, the agg_stat.py module can be invoked by another script or module
by importing the package:

.. code-block:: ini
from metcalcpy.agg_stat import AggStat
AGG_STAT = AggStat(PARAMS)
AGG_STAT.calculate_stats_and_ci()
where PARAMS is a dictionary containing the parameters indicating the
location of input and output data. The structure is similar to the
original Rscript template from which this Python implementation was derived.

**NOTE**: Remember to use the same PYTHONPATH defined above to ensure that the agg_stat module is found by
the Python import process.
1 change: 1 addition & 0 deletions docs/Users_Guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ National Center for Atmospheric Research (NCAR) is sponsored by NSF.
installation
vertical_interpolation
difficulty_index
aggregation
release-notes

**Indices and tables**
Expand Down
11 changes: 6 additions & 5 deletions metcalcpy/agg_stat.py
Original file line number Diff line number Diff line change
Expand Up @@ -1101,11 +1101,12 @@ def _proceed_with_axis(self, axis="1"):
n_stats = 0

# save results to the output data frame
out_frame['fcst_var'][point_ind] = fcst_var
out_frame['stat_value'][point_ind] = bootstrap_results.value
out_frame['stat_btcl'][point_ind] = bootstrap_results.lower_bound
out_frame['stat_btcu'][point_ind] = bootstrap_results.upper_bound
out_frame['nstats'][point_ind] = n_stats
out_frame.loc[point_ind, 'fcst_var'] = fcst_var
out_frame.loc[point_ind, 'stat_value'] = bootstrap_results.value
out_frame.loc[point_ind, 'stat_btcl'] = bootstrap_results.lower_bound
out_frame.loc[point_ind, 'stat_btcu'] = bootstrap_results.upper_bound
out_frame.loc[point_ind, 'nstats'] = n_stats


else:
out_frame = pd.DataFrame()
Expand Down
10 changes: 5 additions & 5 deletions metcalcpy/agg_stat_bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,11 +209,11 @@ def _proceed_with_axis(self, axis="1"):
index = rows_with_mask_indy_var.index[0]

# save results to the output data frame
out_frame['fcst_var'][index] = fcst_var
out_frame['stat_value'][index] = bootstrap_results.value
out_frame['stat_btcl'][index] = bootstrap_results.lower_bound
out_frame['stat_btcu'][index] = bootstrap_results.upper_bound
out_frame['nstats'][index] = n_stats
out_frame.loc[index, 'fcst_var'] = fcst_var
out_frame.loc[index, 'stat_value'] = bootstrap_results.value
out_frame.loc[index, 'stat_btcl'] = bootstrap_results.lower_bound
out_frame.loc[index, 'stat_btcu'] = bootstrap_results.upper_bound
out_frame.loc[index, 'nstats'] = n_stats
else:
out_frame = pd.DataFrame()
return out_frame
Expand Down
3 changes: 3 additions & 0 deletions metcalcpy/pre_processing/aggregation/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
workdir/
temp/
plots/
Loading

0 comments on commit e76a606

Please sign in to comment.