diff --git a/docs/Users_Guide/config_options.rst b/docs/Users_Guide/config_options.rst index 7a11bcfe3a..60f81e5ef1 100644 --- a/docs/Users_Guide/config_options.rst +++ b/docs/Users_Guide/config_options.rst @@ -494,6 +494,49 @@ Where code is running in a production context, it is worth being familiar with the binding / affinitization method on the particular system and building it into any relevant scripting. +.. _met_keep_temp_file: + +MET_KEEP_TEMP_FILE +------------------ + +The MET_KEEP_TEMP_FILE environment variable can be set to control the runtime +behavior of the MET tools. The MET tools write temporary files in several places +in the application and library code. By default, those temporary files are deleted +when they are no longer needed. However it can be useful for development, testing, +and debugging to keep them for further inspection. Setting this environment variable +to a value of :code:`yes` or :code:`true` instructs the MET tools to retain temporary +files instead of deleting them. + +Note that doing so may fill up the temporary directory. It is the responsiblity of +the user to monitor the temporary directory usage and remove temporary files that +are no longer needed. + +When running with this option, users are advised to refer to section +:numref:`config_tmp_dir` and write temporary files to a personal location rather than +the default shared :code:`/tmp` directory. + +.. _met_python_debug: + +MET_PYTHON_DEBUG +---------------- + +The MET_PYTHON_DEBUG environment variable can be set to enable debugging log messages +related to Python embedding. These log messages are disabled by default. The environment +variable can be set to a value of :code:`all` for all log messages, :code:`dataplane` +for log messages when reading gridded data, or :code:`point` for log messages when +reading point data. + +.. _met_python_tmp_format: + +MET_PYTHON_TMP_FORMAT +--------------------- + +The MET_PYTHON_TMP_FORMAT environment variable defines whether temporary files for +Python embedding should be written as NetCDF files or using JSON/NumPy serialization. +By default, they are written using JSON for attributes and NumPy serialization for data +to avoid NetCDF library conflicts between MET and Python. Setting this environment +variable to :code:`netcdf` enables the use of temporary NetCDF files instead. + Settings Common to Multiple Tools ================================= @@ -3770,13 +3813,22 @@ Where "job_name" is set to one of the following: * "filter" - To filter out the STAT or TCMPR lines matching the job filtering - criteria specified below and using the optional arguments below. + To filter out the STAT lines matching the job filtering criteria + specified below and using the optional arguments below. The output STAT lines are written to the file specified using the "-dump_row" argument. + Required Args: -dump_row -| + Optional Args: + + .. code-block:: none + + -set_hdr column_name value + May be used multiple times to override data written to the + output dump_row file. + +| * "summary" @@ -3805,8 +3857,8 @@ Where "job_name" is set to one of the following: * Format the -column option as LINE_TYPE:COLUMN. -| - +| + Use the -derive job command option to automatically derive statistics on the fly from input contingency tables and partial sums. @@ -3832,10 +3884,14 @@ Where "job_name" is set to one of the following: .. code-block:: none - -by column_name to specify case information - -out_alpha to override default alpha value of 0.05 - -derive to derive statistics on the fly - -column_union to summarize multiple columns + -by column_name + To specify case information. + -out_alpha + To override the default alpha value. + -derive + To derive statistics on the fly. + -column_union + To summarize multiple columns. * "aggregate" @@ -3852,8 +3908,8 @@ Where "job_name" is set to one of the following: ISC, ECNT, RPS, RHIST, PHIST, RELP, SSVAR Required Args: -line_type - -| + +| * "aggregate_stat" @@ -3887,8 +3943,8 @@ Where "job_name" is set to one of the following: .. code-block:: none -out_thresh or -out_fcst_thresh and -out_obs_thresh - When -out_line_type FHO, CTC, CTS, MCTC, MCTS, - PCT, PSTD, PJC, PRC + When -out_line_type FHO, CTC, CTS, MCTC, MCTS, + PCT, PSTD, PJC, PRC Additional Optional Args for -line_type MPR: @@ -3901,14 +3957,14 @@ Where "job_name" is set to one of the following: -out_obs_wind_thresh -out_wind_logic When -out_line_type WDIR - + Additional Optional Arg for: .. code-block:: none -line_type ORANK -out_line_type PHIST, SSVAR ... -out_bin_size - + Additional Optional Args for: .. code-block:: none @@ -3917,14 +3973,14 @@ Where "job_name" is set to one of the following: -out_eclv_points * "ss_index" - + The skill score index job can be configured to compute a weighted average of skill scores derived from a configurable set of variables, levels, lead times, and statistics. The skill score index is computed using two models, a forecast model and a reference model. For each statistic in the index, a skill score is computed as: - + SS = 1 - (S[model]*S[model])/(S[reference]*S[reference]) Where S is the statistic. @@ -4135,17 +4191,19 @@ Where "job_name" is set to one of the following: "-rank_corr_flag value" "-vif_flag value" - For aggregate and aggregate_stat job types: - .. code-block:: none - "-out_stat path" to write a .stat output file for the job - including the .stat header columns. Multiple - values for each header column are written as - a comma-separated list. - "-set_hdr col_name value" may be used multiple times to explicity - specify what should be written to the header - columns of the output .stat file. + -out_stat path + To write a .stat output file for aggregate and aggregate_stat jobs + including the .stat header columns. Multiple input values for each + header column are written to the output as a comma-separated list + of unique values. + + -set_hdr col_name value + May be used multiple times to explicity specify what should be + written to the header columns of the output .stat file for + aggregate and aggregate_stat jobs or output dump_row file + for filter jobs. When using the "-by" job command option, you may reference those columns in the "-set_hdr" job command options. For example, when computing statistics diff --git a/docs/Users_Guide/stat-analysis.rst b/docs/Users_Guide/stat-analysis.rst index 0a5c7ec842..92672edc26 100644 --- a/docs/Users_Guide/stat-analysis.rst +++ b/docs/Users_Guide/stat-analysis.rst @@ -604,7 +604,7 @@ The Stat-Analysis tool supports several additional job command options which may This job command option is extremely useful. It can be used multiple times to specify a list of STAT header column names. When reading each input line, the Stat-Analysis tool concatenates together the entries in the specified columns and keeps track of the unique cases. It applies the logic defined for that job to each unique subset of data. For example, if your output was run over many different model names and masking regions, specify **-by MODEL,VX_MASK** to get output for each unique combination rather than having to run many very similar jobs. .. code-block:: none - + -column_min col_name value -column_max col_name value -column_eq col_name value @@ -615,30 +615,30 @@ This job command option is extremely useful. It can be used multiple times to sp The column filtering options may be used when the **-line_type** has been set to a single value. These options take two arguments, the name of the data column to be used followed by a value, string, or threshold to be applied. If multiple column_min/max/eq/thresh/str options are listed, the job will be performed on their intersection. Each input line is only retained if its value meets the numeric filtering criteria defined, matches one of the strings defined by the **-column_str** option, or does not match any of the string defined by the **-column_str_exc** option. Multiple filtering strings may be listed using commas. Defining thresholds in MET is described in :numref:`config_options`. .. code-block:: none - + -dump_row file Each analysis job is performed over a subset of the input data. Filtering the input data down to a desired subset is often an iterative process. The **-dump_row** option may be used for each job to specify the name of an output file to which the exact subset of data used for that job will be written. When initially constructing Stat-Analysis jobs, users are strongly encouraged to use the option and check its contents to ensure that the analysis was actually done over the intended subset. .. code-block:: none - + -out_line_type name This option specifies the desired output line type(s) for the **aggregate_stat** job type. .. code-block:: none - + -out_stat file -set_hdr col_name string The Stat-Analysis tool writes its output to either the log file or the file specified using the **-out** command line option. However the **aggregate** and **aggregate_stat** jobs create STAT output lines and the standard output written lacks the full set of STAT header columns. The **-out_stat** job command option may be used for these jobs to specify the name of an output file to which full STAT output lines should be written. When the **-out_stat** job command option is used for **aggregate** and **aggregate_stat** jobs the output is sent to the **-out_stat** file instead of the log or **-out** file. -Jobs will often combine output with multiple entries in the header columns. For example, a job may aggregate output with three different values in the **VX_MASK** column, such as "mask1", "mask2", and "mask3". The output **VX_MASK** column will contain the unique values encountered concatenated together with commas: "mask1,mask2,mask3". Alternatively, the **-set_hdr** option may be used to specify what should be written to the output header columns, such as "-set_hdr VX_MASK all_three_masks". +Jobs will often combine output with multiple entries in the header columns. For example, a job may aggregate output with three different values in the **VX_MASK** column, such as "mask1", "mask2", and "mask3". The output **VX_MASK** column will contain the unique values encountered concatenated together with commas: "mask1,mask2,mask3". Alternatively, the **-set_hdr** option may be used to specify what should be written to the output header columns, such as "-set_hdr VX_MASK all_three_masks". When **-set_hdr** is specified for **filter** jobs, it controls what is written to the **-dump_row** output file. When using the "-out_stat" option to create a .stat output file and stratifying results using one or more "-by" job command options, those columns may be referenced in the "-set_hdr" option. When using multiple "-by" options, use "CASE" to reference the full case information string: .. code-block:: none - + -job aggregate_stat -line_type MPR -out_line_type CNT -by FCST_VAR,OBS_SID \ -set_hdr VX_MASK OBS_SID -set_hdr DESC CASE @@ -662,7 +662,7 @@ When processing input MPR lines, these options may be used to define a masking g When processing input MPR lines, these options are used to define the forecast, observation, or both thresholds to be applied when computing statistics. For categorical output line types (FHO, CTC, CTS, MCTC, MCTS) these define the categorical thresholds. For continuous output line types (SL1L2, SAL1L2, CNT), these define the continuous filtering thresholds and **-out_cnt_logic** defines how the forecast and observed logic should be combined. .. code-block:: none - + -out_fcst_wind_thresh thresh -out_obs_wind_thresh thresh -out_wind_thresh thresh diff --git a/internal/test_unit/R_test/test_util.R b/internal/test_unit/R_test/test_util.R index 3d4a2d3f9b..dfc869cb60 100644 --- a/internal/test_unit/R_test/test_util.R +++ b/internal/test_unit/R_test/test_util.R @@ -383,7 +383,10 @@ compareStatLty = function(stat1, stat2, lty, verb=0, strict=0){ # compare the information in the header columns for(intCol in 2:21){ listMatch = apply(data.frame(dfV1[,intCol], dfV2[,intCol]), 1, - function(a){ a[1] == a[2] }); + function(a){ + same = (a[1] == a[2]) | (is.na(a[1]) & is.na(a[2])); + same[is.na(same)] = FALSE; + return(same); }); intNumDiff = sum( !listMatch[ !is.na(listMatch) ] ); if( 0 < intNumDiff ){ if( 1 <= verb ){ diff --git a/internal/test_unit/xml/unit_python.xml b/internal/test_unit/xml/unit_python.xml index 051f709a62..f6e011dc0d 100644 --- a/internal/test_unit/xml/unit_python.xml +++ b/internal/test_unit/xml/unit_python.xml @@ -206,6 +206,7 @@ &MET_BIN;/point_stat + MET_PYTHON_EXE &MET_PYTHON_EXE; FCST_COMMAND &MET_BASE;/python/examples/read_ascii_numpy.py &DATA_DIR_PYTHON;/fcst.txt FCST OBS_COMMAND &MET_BASE;/python/examples/read_ascii_numpy.py &DATA_DIR_PYTHON;/obs.txt OBS @@ -480,21 +481,6 @@ - - &MET_BIN;/point2grid - \ - 'PYTHON_NUMPY=&MET_BASE;/python/examples/read_met_point_obs.py &OUTPUT_DIR;/pb2nc/ndas.20120409.t12z.prepbufr.tm00.nc' \ - G212 \ - &OUTPUT_DIR;/python/pb2nc_TMP.nc \ - -field 'name="TMP"; level="*"; valid_time="20120409_120000"; censor_thresh=[ <0 ]; censor_val=[0];' \ - -name TEMP \ - -v 1 - - - &OUTPUT_DIR;/python/pb2nc_TMP.nc - - - &MET_BIN;/point2grid @@ -535,6 +521,7 @@ &MET_BIN;/plot_point_obs + MET_PYTHON_EXE &MET_PYTHON_EXE; TO_GRID NONE \ @@ -561,6 +548,7 @@ > &OUTPUT_DIR;/python/ensemble_stat/input_file_list; \ &MET_BIN;/ensemble_stat + MET_PYTHON_EXE &MET_PYTHON_EXE; DESC NA OBS_ERROR_FLAG FALSE SKIP_CONST FALSE @@ -587,6 +575,7 @@ &MET_BIN;/point_stat + MET_PYTHON_EXE &MET_PYTHON_EXE; BEG_DS -1800 END_DS 1800 OUTPUT_PREFIX GRIB1_NAM_GDAS_WINDS @@ -605,6 +594,9 @@ + + MET_PYTHON_EXE &MET_PYTHON_EXE; + &MET_BIN;/plot_data_plane \ PYTHON_NUMPY \ @@ -619,6 +611,9 @@ + + MET_PYTHON_EXE &MET_PYTHON_EXE; + &MET_BIN;/pcp_combine \ -add PYTHON_NUMPY \ diff --git a/internal/test_unit/xml/unit_stat_analysis_ps.xml b/internal/test_unit/xml/unit_stat_analysis_ps.xml index 87884061ea..9fd50dcb2a 100644 --- a/internal/test_unit/xml/unit_stat_analysis_ps.xml +++ b/internal/test_unit/xml/unit_stat_analysis_ps.xml @@ -76,6 +76,7 @@ -job filter -line_type MPR -fcst_var TMP -fcst_lev Z2 -vx_mask DTC165 \ -column_str OBS_SID KDLN,KDHT,KDEN,KDLS,KDMA,KDMN,KDVT,KDEW \ -column_str_exc OBS_SID KDLN,KDHT \ + -set_hdr DESC FILTER_OBS_SID \ -dump_row &OUTPUT_DIR;/stat_analysis_ps/POINT_STAT_FILTER_OBS_SID.stat \ -v 1 diff --git a/scripts/python/examples/read_ascii_numpy.py b/scripts/python/examples/read_ascii_numpy.py index 3c6310cec2..5692472b10 100644 --- a/scripts/python/examples/read_ascii_numpy.py +++ b/scripts/python/examples/read_ascii_numpy.py @@ -5,7 +5,7 @@ ########################################### def log(msg): - dataplane.log_msg(msg) + dataplane.log_message(msg) def set_dataplane_attrs(): # attrs is a dictionary which contains attributes describing the dataplane. @@ -95,5 +95,5 @@ def set_dataplane_attrs(): attrs = set_dataplane_attrs() log("Attributes:\t" + repr(attrs)) -# Sets fill_value if it exists +# Sets fill_value if it exists at the dataplane #attrs['fill_value'] = 255 # for letter.txt diff --git a/scripts/python/examples/read_ascii_numpy_grid.py b/scripts/python/examples/read_ascii_numpy_grid.py index 79e6829052..7ca2b3b6b6 100644 --- a/scripts/python/examples/read_ascii_numpy_grid.py +++ b/scripts/python/examples/read_ascii_numpy_grid.py @@ -27,8 +27,11 @@ met_data = dataplane.read_2d_text_input(input_file) print("Data Shape:\t" + repr(met_data.shape)) print("Data Type:\t" + repr(met_data.dtype)) -except NameError: - print("Can't find the input file") +except NameError as ex: + print(" === ERROR from read_ascii_numpy_grid.py") + print(f" Exception: {type(ex)} {ex}") + print(f" sys.argv: {sys.argv}") + print(" Can't find the input file") # attrs is a dictionary which contains attributes describing the dataplane. # attrs should have 9 items, each of data type string: diff --git a/scripts/python/examples/read_ascii_xarray.py b/scripts/python/examples/read_ascii_xarray.py index e4ba1f9a28..f897982f07 100644 --- a/scripts/python/examples/read_ascii_xarray.py +++ b/scripts/python/examples/read_ascii_xarray.py @@ -6,7 +6,7 @@ ########################################### def log(msg): - dataplane.log_msg(msg) + dataplane.log_message(msg) log("Python Script:\t" + repr(sys.argv[0])) diff --git a/scripts/python/examples/read_met_point_obs.py b/scripts/python/examples/read_met_point_obs.py index e16ccf2d86..b8c8cb3db3 100644 --- a/scripts/python/examples/read_met_point_obs.py +++ b/scripts/python/examples/read_met_point_obs.py @@ -21,6 +21,7 @@ from datetime import datetime from met.point import met_point_tools +from met.point_nc import met_point_nc_tools from pyembed.python_embedding import pyembed_tools ARG_PRINT_DATA = 'show_data' @@ -44,14 +45,15 @@ netcdf_filename = os.path.expandvars(input_name) args = [ netcdf_filename ] #args = { 'nc_name': netcdf_filename } - point_obs_data = met_point_tools.get_nc_point_obs() + point_obs_data = met_point_nc_tools.get_nc_point_obs() point_obs_data.read_data(point_obs_data.get_nc_filename(args)) if point_obs_data is not None: met_point_data = point_obs_data.get_point_data() met_point_data['met_point_data'] = point_obs_data - print("met_point_data: ", met_point_data) - print(met_point_data) + if os.getenv("MET_PYTHON_DEBUG", "") != "": + print("met_point_data: ", met_point_data) + print(met_point_data) if DO_PRINT_DATA: point_obs_data.dump() diff --git a/scripts/python/met/Makefile.am b/scripts/python/met/Makefile.am index fd802449dd..c24b47247b 100644 --- a/scripts/python/met/Makefile.am +++ b/scripts/python/met/Makefile.am @@ -28,6 +28,7 @@ pythonmetscripts_DATA = \ logger.py \ dataplane.py \ mprbase.py \ + point_nc.py \ point.py EXTRA_DIST = ${pythonmetscripts_DATA} diff --git a/scripts/python/met/Makefile.in b/scripts/python/met/Makefile.in index fea84eace6..6a7570efa2 100644 --- a/scripts/python/met/Makefile.in +++ b/scripts/python/met/Makefile.in @@ -314,6 +314,7 @@ pythonmetscripts_DATA = \ logger.py \ dataplane.py \ mprbase.py \ + point_nc.py \ point.py EXTRA_DIST = ${pythonmetscripts_DATA} diff --git a/scripts/python/met/dataplane.py b/scripts/python/met/dataplane.py index 57c9ac367b..a28661b365 100644 --- a/scripts/python/met/dataplane.py +++ b/scripts/python/met/dataplane.py @@ -1,35 +1,33 @@ import os import sys +import json import numpy as np -import netCDF4 as nc import xarray as xr from importlib import util as import_util -from met.logger import logger +from met.logger import met_base, met_base_tools ########################################### -class dataplane(logger): +class dataplane(met_base): KEEP_XARRAY = True class_name = "dataplane" - MET_FILL_VALUE = -9999. ATTR_USER_FILL_VALUE = 'user_fill_value' @staticmethod def call_python(argv): - logger.log_msg(f"Module:\t{repr(argv[0])}") + # argv[0] is the python wrapper script (caller) + met_base.log_message(f"Module:\t{repr(argv[0])}") if 1 == len(argv): - logger.quit(f"User command is missing") + met_base.quit_msg(f"User python command is missing") + sys.exit(1) + + met_base.log_message(f"User python command:\t{repr(' '.join(argv[1:]))}") - logger.log_msg("User Command:\t" + repr(' '.join(argv[1:]))) - # argv[0] is the python wrapper script (caller) # argv[1] contains the user defined python script pyembed_module_name = argv[1] - sys.argv = argv[1:] - logger.log_msg(f" sys.argv:\t{sys.argv}") - # append user script dir to system path pyembed_dir, pyembed_name = os.path.split(pyembed_module_name) if pyembed_dir: @@ -40,11 +38,19 @@ def call_python(argv): user_base = pyembed_name.replace('.py','') + argv_org = sys.argv # save sys.argv + sys.argv = argv[1:] spec = import_util.spec_from_file_location(user_base, pyembed_module_name) met_in = import_util.module_from_spec(spec) spec.loader.exec_module(met_in) + sys.argv = argv_org # restore sys.argv return met_in + #@staticmethod + #def get_numpy_filename(tmp_filename): + # return met_base_tools.replace_extension(tmp_filename, "json", "npy") if tmp_filename.endswith(".json") else \ + # met_base_tools.replace_extension(tmp_filename, "nc", "npy") if tmp_filename.endswith(".nc") else f'{tmp_filename}.npy' + @staticmethod def is_integer(a_data): return isinstance(a_data, int) @@ -100,7 +106,34 @@ def read_2d_text_input(input_file): return met_data @staticmethod - def read_dataplane(netcdf_filename): + def read_dataplane(tmp_filename): + # Default is JSON for attributes and NUMPY serialization for 2D array + return dataplane.read_dataplane_nc(tmp_filename) if met_base_tools.use_netcdf_format() \ + else dataplane.read_dataplane_json_numpy(tmp_filename) + + @staticmethod + def read_dataplane_json_numpy(tmp_filename): + if met_base_tools.is_debug_enabled("dataplane"): + met_base.log_message(f"Read from a temporary JSON file and a temporary numpy output (dataplane)") + + met_info = {} + with open(tmp_filename) as json_fh: + met_info['attrs'] = json.load(json_fh) + # read 2D numeric data + numpy_dump_name = met_base_tools.get_numpy_filename(tmp_filename) + met_dp_data = np.load(numpy_dump_name) + met_info['met_data'] = met_dp_data + if numpy_dump_name != tmp_filename: + met_base_tools.remove_temp_file(numpy_dump_name) + return met_info + + @staticmethod + def read_dataplane_nc(netcdf_filename): + import netCDF4 as nc + + if met_base_tools.is_debug_enabled("dataplane"): + met_base.log_message(f"Read from a temporary NetCDF file (dataplane)") + # read NetCDF file ds = nc.Dataset(netcdf_filename, 'r') @@ -135,7 +168,73 @@ def read_dataplane(netcdf_filename): return met_info @staticmethod - def write_dataplane(met_in, netcdf_filename): + def validate_met_data(met_data, fill_value=None): + method_name = f"{dataplane.class_name}.validate()" + #met_base.log_msg(f"{method_name} type(met_data)= {type(met_data)}") + attrs = None + from_xarray = False + from_ndarray = False + if met_data is None: + met_base.quit(f"{method_name} The met_data is None") + sys.exit(1) + + nx, ny = met_data.shape + + met_fill_value = met_base.MET_FILL_VALUE + if dataplane.is_xarray_dataarray(met_data): + from_xarray = True + attrs = met_data.attrs + met_data = met_data.data + modified_met_data = True + if isinstance(met_data, np.ndarray): + from_ndarray = True + met_data = np.ma.array(met_data) + + if isinstance(met_data, np.ma.MaskedArray): + is_int_data = dataplane.is_integer(met_data[0,0]) or dataplane.is_integer(met_data[int(nx/2),int(ny/2)]) + met_data = np.ma.masked_equal(met_data, float('nan')) + met_data = np.ma.masked_equal(met_data, float('inf')) + if fill_value is not None: + met_data = np.ma.masked_equal(met_data, fill_value) + met_data = met_data.filled(int(met_fill_value) if is_int_data else met_fill_value) + else: + met_base.log_message(f"{method_name} unknown datatype {type(met_data)}") + + if dataplane.KEEP_XARRAY: + return xr.DataArray(met_data,attrs=attrs) if from_xarray else met_data + else: + return met_data + + @staticmethod + def write_dataplane(met_in, tmp_filename): + # Default is JSON for attributes and NUMPY serialization for 2D array + if met_base_tools.use_netcdf_format(): + dataplane.write_dataplane_nc(met_in, tmp_filename) + else: + dataplane.write_dataplane_json_numpy(met_in, tmp_filename) + + @staticmethod + def write_dataplane_json_numpy(met_in, tmp_filename): + if met_base_tools.is_debug_enabled("dataplane"): + met_base.log_message(f"Save to a temporary JSON file and a temporary numpy output (dataplane)") + if hasattr(met_in.met_data, 'attrs') and met_in.met_data.attrs: + attrs = met_in.met_data.attrs + else: + attrs = met_in.attrs + with open(tmp_filename,'w') as json_fh: + json.dump(attrs, json_fh) + + met_dp_data = met_base_tools.convert_to_ndarray(met_in.met_data) + numpy_dump_name = met_base_tools.get_numpy_filename(tmp_filename) + np.save(numpy_dump_name, met_dp_data) + + @staticmethod + def write_dataplane_nc(met_in, netcdf_filename): + import netCDF4 as nc + + if met_base_tools.is_debug_enabled("dataplane"): + met_base.log_message(f"Save to a temporary NetCDF file (dataplane)") + met_info = {'met_data': met_in.met_data} if hasattr(met_in.met_data, 'attrs') and met_in.met_data.attrs: attrs = met_in.met_data.attrs @@ -171,42 +270,6 @@ def write_dataplane(met_in, netcdf_filename): ds.close() - @staticmethod - def validate_met_data(met_data, fill_value=None): - method_name = f"{dataplane.class_name}.validate()" - #logger.log_msg(f"{method_name} type(met_data)= {type(met_data)}") - attrs = None - from_xarray = False - from_ndarray = False - if met_data is None: - logger.quit(f"{method_name} The met_data is None") - else: - nx, ny = met_data.shape - - met_fill_value = dataplane.MET_FILL_VALUE - if dataplane.is_xarray_dataarray(met_data): - from_xarray = True - attrs = met_data.attrs - met_data = met_data.data - modified_met_data = True - if isinstance(met_data, np.ndarray): - from_ndarray = True - met_data = np.ma.array(met_data) - - if isinstance(met_data, np.ma.MaskedArray): - is_int_data = dataplane.is_integer(met_data[0,0]) or dataplane.is_integer(met_data[int(nx/2),int(ny/2)]) - met_data = np.ma.masked_equal(met_data, float('nan')) - met_data = np.ma.masked_equal(met_data, float('inf')) - if fill_value is not None: - met_data = np.ma.masked_equal(met_data, fill_value) - met_data = met_data.filled(int(met_fill_value) if is_int_data else met_fill_value) - else: - logger.log_msg(f"{method_name} unknown datatype {type(met_data)}") - - if dataplane.KEEP_XARRAY: - return xr.DataArray(met_data,attrs=attrs) if from_xarray else met_data - else: - return met_data def main(argv): @@ -231,14 +294,14 @@ def main(argv): fill_value = met_in.user_fill_value fill_value = attrs.get('fill_value', None) - dataplane.log_msg('validating the dataplane array...') + met_base.log_message('validating the dataplane array...') met_data = dataplane.validate_met_data(init_met_data, fill_value) met_info['met_data'] = met_data if os.environ.get('MET_PYTHON_DEBUG', None) is not None: - dataplane.log_msg('--- met_data after validating ---') - dataplane.log_msg(met_data) + met_base.log_message('--- met_data after validating ---') + met_base.log_message(met_data) if __name__ == '__main__' or __name__ == sys.argv[0]: main(sys.argv) - dataplane.log_msg(f'{__name__} complete') + met_base.log_message(f'{__name__} complete') diff --git a/scripts/python/met/logger.py b/scripts/python/met/logger.py index a7296124a6..db58286e07 100644 --- a/scripts/python/met/logger.py +++ b/scripts/python/met/logger.py @@ -1,39 +1,173 @@ ########################################### +import os import sys +import re -class logger(): +import numpy as np - PROMPT= " PYTHON:" - ERROR_PROMPT= "ERROR" +class logger(): - ## - ## create the metadata dictionary - ## + PROMPT = " PYTHON:" + ERROR_P = " ==PYTHON_ERROR==" + INFO_P = " ==PYTHON_INFO==" @staticmethod def append_error_prompt(msg): - return f'{logger.ERROR_PROMPT}: {msg}' + return f'{logger.ERROR_P}: {msg}' @staticmethod - def error_msg(msg): + def error_messageg(msg): msgs = msg if isinstance(msg, list) else [msg] msgs.insert(0, '') msgs.append('') for a_msg in msgs: - logger.log_msg(logger.append_error_prompt(a_msg)) + logger.log_message(logger.append_error_prompt(a_msg)) #@staticmethod #def get_met_fill_value(): # return logger.MET_FILL_VALUE @staticmethod - def log_msg(msg): + def info_message(msg): + print(f'{logger.PROMPT} {logger.INFO_P} {msg}') + + @staticmethod + def log_message(msg): print(f'{logger.PROMPT} {msg}') @staticmethod - def quit(msg): - logger.error_msg([msg, "Quit..."]) - sys.exit(1) + def quit(msg, do_quit=True): + logger.quit_msg(msg) + if do_quit: + sys.exit(1) + + @staticmethod + def quit_msg(msg): + logger.error_message([msg, "Quit..."]) + + +class met_base(logger): + + MET_FILL_VALUE = -9999. + + def convert_to_array(self, ndarray_data): + return met_base_tools.convert_to_array(ndarray_data) + + def convert_to_ndarray(self, array_data): + return met_base_tools.convert_to_ndarray(array_data) + + def get_met_fill_value(self): + return met_base.MET_FILL_VALUE + + def error_msg(self, msg): + logger.error_messageg(msg) + + def get_prompt(self): + return met_base_tools.get_prompt() + + def info_msg(self, msg): + logger.info_message(msg) + + def is_numpy_array(self, var_data): + return isinstance(var_data, np.ndarray) + + def log_msg(self, msg): + logger.log_message(msg) + + @staticmethod + def get_numpy_filename(tmp_filename): + return logger.replace_extension(tmp_filename, "json", "npy") if tmp_filename.endswith(".json") else \ + logger.replace_extension(tmp_filename, "nc", "npy") if tmp_filename.endswith(".nc") else f'{tmp_filename}.npy' + + def is_debug_enabled(self, component_name=""): + return met_base_tools.is_debug_enabled(component_name) + + def replace_extension(self, file_name, from_ext, to_ext): + return met_base_tools.replace_extension(file_name, from_ext, to_ext) + + def remove_file(self, file_name): + os.remove(file_name) + + def use_netcdf_format(self): + return met_base_tools.use_netcdf_format() + +class met_base_tools(object): + + ENV_MET_KEEP_TEMP_FILE = "MET_KEEP_TEMP_FILE" + ENV_MET_PYTHON_DEBUG = "MET_PYTHON_DEBUG" + ENV_MET_PYTHON_TMP_FORMAT = "MET_PYTHON_TMP_FORMAT" + + @staticmethod + def convert_to_array(ndarray_data): + is_byte_type = False + if 0 < len(ndarray_data): + is_byte_type = isinstance(ndarray_data[0], (bytes, np.bytes_)) + if isinstance(ndarray_data[0], np.ndarray): + if 0 < len(ndarray_data[0]): + is_byte_type = isinstance(ndarray_data[0][0], (bytes, np.bytes_)) + if is_byte_type: + array_data = [] + if isinstance(ndarray_data[0], (np.ma.MaskedArray, np.ma.core.MaskedArray)): + for byte_data in ndarray_data: + array_data.append(byte_data.tobytes(fill_value=' ').decode('utf-8').rstrip()) + else: + for byte_data in ndarray_data: + array_data.append(byte_data.decode("utf-8").rstrip()) + elif isinstance(ndarray_data, (np.ma.MaskedArray, np.ma.core.MaskedArray)): + array_data = np.ma.getdata(ndarray_data, subok=False).tolist() + elif isinstance(ndarray_data, np.ndarray): + array_data = ndarray_data.tolist() + else: + array_data = ndarray_data + return array_data + + @staticmethod + def convert_to_ndarray(array_data): + if isinstance(array_data, (np.ma.MaskedArray, np.ma.core.MaskedArray)): + ndarray_data = np.ma.getdata(array_data, subok=False) + elif isinstance(array_data, np.ndarray): + ndarray_data = array_data + else: + ndarray_data = np.array(array_data) + return ndarray_data + + @staticmethod + def get_numpy_filename(tmp_filename): + return logger.replace_extension(tmp_filename, "json", "npy") if tmp_filename.endswith(".json") else \ + logger.replace_extension(tmp_filename, "txt", "npy") if tmp_filename.endswith(".txt") else \ + logger.replace_extension(tmp_filename, "nc", "npy") if tmp_filename.endswith(".nc") else f'{tmp_filename}.npy' + + @staticmethod + def get_prompt(): + return logger.PROMPT + @staticmethod + def is_debug_enabled(component_name=""): + env_value = os.getenv(met_base_tools.ENV_MET_PYTHON_DEBUG, "").lower() + return env_value == "all" or env_value == component_name.lower() + + @staticmethod + def keep_temp_file(): + env_value = os.getenv(met_base_tools.ENV_MET_KEEP_TEMP_FILE, "") + return env_value.lower() == "true" or env_value.lower() == "yes" + + @staticmethod + def replace_extension(file_name, from_ext, to_ext): + return re.sub(f".{from_ext}$", f".{to_ext}", file_name) + + @staticmethod + def remove_file(file_name): + if os.path.exists(file_name): + os.remove(file_name) + + @staticmethod + def remove_temp_file(file_name): + if not met_base_tools.keep_temp_file(): + met_base_tools.remove_file(file_name) + + @staticmethod + def use_netcdf_format(): + env_value = os.getenv(met_base_tools.ENV_MET_PYTHON_TMP_FORMAT, "") + return env_value.lower() == "netcdf" diff --git a/scripts/python/met/point.py b/scripts/python/met/point.py index fbfb112f51..624173c2a4 100644 --- a/scripts/python/met/point.py +++ b/scripts/python/met/point.py @@ -46,24 +46,24 @@ def read_data(data_filename): ''' import os +import json from abc import ABC, abstractmethod import numpy as np -import netCDF4 as nc import pandas as pd +from met.logger import met_base, met_base_tools + + COUNT_SHOW = 30 -class base_met_point_obs(object): + +class met_base_point(met_base): ''' classdocs ''' - ERROR_P = " ==PYTHON_ERROR==" - INFO_P = " ==PYTHON_INFO==" - python_prefix = 'PYTHON_POINT_USER' - - FILL_VALUE = -9999. + COMPONENT_NAME = "met_point" def __init__(self, use_var_id=True): ''' @@ -73,7 +73,7 @@ def __init__(self, use_var_id=True): self.input_name = None self.ignore_input_file = False self.use_var_id = use_var_id # True if variable index, False if GRIB code - self.error_msg = "" + self.error_msgs = "" self.has_error = False # Header @@ -114,11 +114,11 @@ def __init__(self, use_var_id=True): def add_error_msg(self, error_msg): self.has_error = True - self.log_error_msg(error_msg) - if 0 == len(self.error_msg): - self.error_msg = error_msg + self.error_msg(error_msg) + if 0 == len(self.error_msgs): + self.error_msgs = error_msg else: - self.error_msg = "{m1}\n{m2}".format(m1=self.error_msg, m2=error_msg) + self.error_msgs = "{m1}\n{m2}".format(m1=self.error_msgs, m2=error_msg) def add_error_msgs(self, error_msgs): self.has_error = True @@ -126,43 +126,40 @@ def add_error_msgs(self, error_msgs): self.add_error_msg(error_msg) def check_data_member_float(self, local_var, var_name): + method_name = f"{self.__class__.__name__}.check_data_member_float()" if 0 == len(local_var): - self.add_error_msg("{v} is empty (float)".format(v=var_name)) + self.add_error_msg(f"{method_name} {var_name} is empty") elif isinstance(local_var, list): if isinstance(local_var[0], str) and not self.is_number(local_var[0]): - self.add_error_msg("Not supported data type: {n}[0]={v}, string type, not a number (int or float only)".format( - n=var_name, v=local_var[0])) + self.add_error_msg(f"{method_name} Not supported data type: {type(local_var[0])} for {var_name}[0], string type, not a number (int or float only)") elif 0 > str(type(local_var[0])).find('numpy') and not isinstance(local_var[0], (int, float)): - self.add_error_msg("Not supported data type ({t}) for {v}[0] (int or float only)".format( - v=var_name, t=type(local_var[0]))) + self.add_error_msg(f"{method_name} Not supported data type: {type(local_var[0])} for {var_name}[0] (int or float only)") elif not self.is_numpy_array(local_var): - self.add_error_msg("Not supported data type ({t}) for {v} (list and numpy.ndarray)".format( - v=var_name, t=type(local_var))) + self.add_error_msg(f"{method_name} Not supported data type ({type(local_var)}) for {var_name} (list and numpy.ndarray)") def check_data_member_int(self, local_var, var_name): + method_name = f"{self.__class__.__name__}.check_data_member_int()" if 0 == len(local_var): - self.add_error_msg("{v} is empty (int)".format(v=var_name)) + self.add_error_msg(f"{method_name} {var_name} is empty") elif isinstance(local_var, list): if isinstance(local_var[0], str) and not self.is_number(local_var[0]): - self.add_error_msg("Not supported data type: {n}[0]={v}, string type, not a number (int only)".format( - n=var_name, v=local_var[0])) + self.add_error_msg(f"{method_name} Not supported data type: {type(local_var[0])} for {var_name}[0], string type, not a number (int only)") elif 0 > str(type(local_var[0])).find('numpy') and not isinstance(local_var[0], int): - self.add_error_msg("Not supported data type ({t}) for {v}[0] (int only)".format( - v=var_name, t=type(local_var[0]))) + self.add_error_msg(f"{method_name} Not supported data type: {type(local_var[0])} for {var_name}[0] (int only)") elif not self.is_numpy_array(local_var): - self.add_error_msg("Not supported data type ({t}) for {v} (list and numpy.ndarray)".format( - v=var_name, t=type(local_var))) + self.add_error_msg(f"{method_name} Not supported data type ({type(local_var)}) for {var_name} (list and numpy.ndarray)") def check_data_member_string(self, local_var, var_name): + method_name = f"{self.__class__.__name__}.check_data_member_string()" if 0 == len(local_var): - self.add_error_msg("{v} is empty (string)".format(v=var_name)) + self.add_error_msg(f"{method_name} {var_name} is empty") elif not isinstance(local_var, (list)): - self.add_error_msg("Not supported data type ({t}) for {v} (list)".format( - v=var_name, t=type(local_var))) + self.add_error_msg(f"{method_name} Not supported data type ({type(local_var)}) for {var_name} (list)") def check_point_data(self): + method_name = f"{self.__class__.__name__}.check_point_data()" if not self.ignore_input_file and self.input_name is not None and not os.path.exists(self.input_name): - self.add_error_msg('The netcdf input {f} does not exist'.format(f=self.input_name)) + self.add_error_msg(f'{method_name} The input {self.input_name} does not exist') else: self.check_data_member_int(self.hdr_typ,'hdr_typ') self.check_data_member_int(self.hdr_sid,'hdr_sid') @@ -184,11 +181,11 @@ def check_point_data(self): if self.use_var_id: self.check_data_member_string(self.obs_var_table,'obs_var_table') - def convert_to_numpy(self, value_list): - return np.array(value_list) + #def convert_to_numpy(self, value_list): + # return met_point_tools.convert_to_ndarray(value_list) def dump(self): - base_met_point_obs.print_point_data(self.get_point_data()) + met_base_point.print_point_data(self.get_point_data()) def get_count_string(self): return f' nobs={self.nobs} nhdr={self.nhdr} ntyp={self.nhdr_typ} nsid={self.nhdr_sid} nvld={self.nhdr_vld} nqty={self.nobs_qty} nvar={self.nobs_var}' @@ -213,53 +210,41 @@ def get_point_data(self): self.check_point_data() if not self.is_numpy_array(self.hdr_typ): - self.hdr_typ = self.convert_to_numpy(self.hdr_typ) + self.hdr_typ = self.convert_to_ndarray(self.hdr_typ) if not self.is_numpy_array(self.hdr_sid): - self.hdr_sid = self.convert_to_numpy(self.hdr_sid) + self.hdr_sid = self.convert_to_ndarray(self.hdr_sid) if not self.is_numpy_array(self.hdr_vld): - self.hdr_vld = self.convert_to_numpy(self.hdr_vld) + self.hdr_vld = self.convert_to_ndarray(self.hdr_vld) if not self.is_numpy_array(self.hdr_lat): - self.hdr_lat = self.convert_to_numpy(self.hdr_lat) + self.hdr_lat = self.convert_to_ndarray(self.hdr_lat) if not self.is_numpy_array(self.hdr_lon): - self.hdr_lon = self.convert_to_numpy(self.hdr_lon) + self.hdr_lon = self.convert_to_ndarray(self.hdr_lon) if not self.is_numpy_array(self.hdr_elv): - self.hdr_elv = self.convert_to_numpy(self.hdr_elv) + self.hdr_elv = self.convert_to_ndarray(self.hdr_elv) if not self.is_numpy_array(self.obs_qty): - self.obs_qty = self.convert_to_numpy(self.obs_qty) + self.obs_qty = self.convert_to_ndarray(self.obs_qty) if not self.is_numpy_array(self.obs_hid): - self.obs_hid = self.convert_to_numpy(self.obs_hid) + self.obs_hid = self.convert_to_ndarray(self.obs_hid) if not self.is_numpy_array(self.obs_vid): - self.obs_vid = self.convert_to_numpy(self.obs_vid) + self.obs_vid = self.convert_to_ndarray(self.obs_vid) if not self.is_numpy_array(self.obs_lvl): - self.obs_lvl = self.convert_to_numpy(self.obs_lvl) + self.obs_lvl = self.convert_to_ndarray(self.obs_lvl) if not self.is_numpy_array(self.obs_hgt): - self.obs_hgt = self.convert_to_numpy(self.obs_hgt) + self.obs_hgt = self.convert_to_ndarray(self.obs_hgt) if not self.is_numpy_array(self.obs_val): - self.obs_val = self.convert_to_numpy(self.obs_val) + self.obs_val = self.convert_to_ndarray(self.obs_val) self.count_info = self.get_count_string() self.met_point_data = self return self.__dict__ +# def get_prompt(self): +# return met_point_tools.get_prompt() + def is_number(self, num_str): return num_str.replace('-','1').replace('+','2').replace('.','3').isdigit() - def is_numpy_array(self, var): - return isinstance(var, np.ndarray) - - def log_error_msg(self, err_msg): - base_met_point_obs.error_msg(err_msg) - - def log_error(self, err_msgs): - print(self.ERROR_P) - for err_line in err_msgs.split('\n'): - self.log_error_msg(err_line) - print(self.ERROR_P) - - def log_info(self, info_msg): - base_met_point_obs.info_msg(info_msg) - def put_data(self, point_obs_dict): self.use_var_id = point_obs_dict['use_var_id'] self.hdr_typ = point_obs_dict['hdr_typ'] @@ -298,92 +283,122 @@ def put_data(self, point_obs_dict): if po_array is not None: self.hdr_inst_typ = po_array - @staticmethod - def get_prompt(): - return " python:" - - @staticmethod - def error_msg(msg): - print(f'{base_met_point_obs.get_prompt()} {base_met_point_obs.ERROR_P} {msg}') - - @staticmethod - def info_msg(msg): - print(f'{base_met_point_obs.get_prompt()} {base_met_point_obs.INFO_P} {msg}') - - @staticmethod - def get_python_script(arg_value): - return arg_value[len(met_point_obs.python_prefix)+1:] - - @staticmethod - def is_python_script(arg_value): - return arg_value.startswith(met_point_obs.python_prefix) - - @staticmethod - def print_data(key, data_array, show_count=COUNT_SHOW): - if isinstance(data_array, list): - data_len = len(data_array) - if show_count >= data_len: - print(" {k:10s}: {v}".format(k=key, v= data_array)) - else: - end_offset = int(show_count/2) - print(" {k:10s}: count={v}".format(k=key, v=data_len)) - print(" {k:10s}[0:{o}] {v}".format(k=key, v=data_array[:end_offset], o=end_offset)) - print(" {k:10s}[{s}:{e}]: {v}".format(k=key, v='...', s=end_offset+1, e=data_len-end_offset-1)) - print(" {k:10s}[{s}:{e}]: {v}".format(k=key, v= data_array[-end_offset:], s=(data_len-end_offset), e=(data_len-1))) - else: - print(" {k:10s}: {v}".format(k=key, v= data_array)) + def read_point_data(self, tmp_filename): + method_name = f"{self.__class__.__name__}.read_point_data()" + if met_base_tools.use_netcdf_format(): + from met.point_nc import nc_point_obs - @staticmethod - def print_point_data(met_point_data, print_subset=True): - print(' === MET point data by python embedding ===') - if print_subset: - met_point_obs.print_data('nhdr',met_point_data['nhdr']) - met_point_obs.print_data('nobs',met_point_data['nobs']) - met_point_obs.print_data('use_var_id',met_point_data['use_var_id']) - met_point_obs.print_data('hdr_typ',met_point_data['hdr_typ']) - met_point_obs.print_data('hdr_typ_table',met_point_data['hdr_typ_table']) - met_point_obs.print_data('hdr_sid',met_point_data['hdr_sid']) - met_point_obs.print_data('hdr_sid_table',met_point_data['hdr_sid_table']) - met_point_obs.print_data('hdr_vld',met_point_data['hdr_vld']) - met_point_obs.print_data('hdr_vld_table',met_point_data['hdr_vld_table']) - met_point_obs.print_data('hdr_lat',met_point_data['hdr_lat']) - met_point_obs.print_data('hdr_lon',met_point_data['hdr_lon']) - met_point_obs.print_data('hdr_elv',met_point_data['hdr_elv']) - met_point_obs.print_data('obs_hid',met_point_data['obs_hid']) - met_point_obs.print_data('obs_vid',met_point_data['obs_vid']) - met_point_obs.print_data('obs_var_table',met_point_data['obs_var_table']) - met_point_obs.print_data('obs_qty',met_point_data['obs_qty']) - met_point_obs.print_data('obs_qty_table',met_point_data['obs_qty_table']) - met_point_obs.print_data('obs_lvl',met_point_data['obs_lvl']) - met_point_obs.print_data('obs_hgt',met_point_data['obs_hgt']) - met_point_obs.print_data('obs_val',met_point_data['obs_val']) + met_point_data = nc_point_obs() + met_point_data.read_data(tmp_filename) + self.put_data(met_point_data.get_point_data()) + if met_base_tools.is_debug_enabled("point"): + met_base.log_message(f"{method_name} Read from a temporary NetCDF file (point)") else: - print('All',met_point_data) - print(" nhdr: ",met_point_data['nhdr']) - print(" nobs: ",met_point_data['nobs']) - print(' use_var_id: ',met_point_data['use_var_id']) - print(' hdr_typ: ',met_point_data['hdr_typ']) - print('hdr_typ_table: ',met_point_data['hdr_typ_table']) - print(' hdr_sid: ',met_point_data['hdr_sid']) - print('hdr_sid_table: ',met_point_data['hdr_sid_table']) - print(' hdr_vld: ',met_point_data['hdr_vld']) - print('hdr_vld_table: ',met_point_data['hdr_vld_table']) - print(' hdr_lat: ',met_point_data['hdr_lat']) - print(' hdr_lon: ',met_point_data['hdr_lon']) - print(' hdr_elv: ',met_point_data['hdr_elv']) - print(' obs_hid: ',met_point_data['obs_hid']) - print(' obs_vid: ',met_point_data['obs_vid']) - print('obs_var_table: ',met_point_data['obs_var_table']) - print(' obs_qty: ',met_point_data['obs_qty']) - print('obs_qty_table: ',met_point_data['obs_qty_table']) - print(' obs_lvl: ',met_point_data['obs_lvl']) - print(' obs_hgt: ',met_point_data['obs_hgt']) - print(' obs_val: ',met_point_data['obs_val']) - - print(' === MET point data by python embedding ===') + self.read_point_data_json_numpy(tmp_filename) + + def read_point_data_json_numpy(self, tmp_filename): + method_name = f"{self.__class__.__name__}.read_point_data_json_numpy()" + if met_base_tools.is_debug_enabled("point"): + met_base.log_message(f"{method_name} Read from a temporary JSON file and a temporary numpy output (point)") + + with open(tmp_filename) as json_fh: + json_dict = json.load(json_fh) + + self.use_var_id = json_dict['use_var_id'] + self.nhdr = json_dict['nhdr'] + self.nobs = json_dict['nobs'] + self.hdr_typ_table = json_dict['hdr_typ_table'] + self.hdr_sid_table = json_dict['hdr_sid_table'] + self.hdr_vld_table = json_dict['hdr_vld_table'] + self.obs_var_table = json_dict['obs_var_table'] + self.obs_qty_table = json_dict['obs_qty_table'] + self.obs_var_unit = json_dict['obs_var_unit'] + self.obs_var_desc = json_dict['obs_var_desc'] + self.hdr_prpt_typ = json_dict['hdr_prpt_typ'] + self.hdr_irpt_typ = json_dict['hdr_irpt_typ'] + self.hdr_inst_typ = json_dict['hdr_inst_typ'] + + # read 2D numeric data + numpy_dump_name = met_base_tools.get_numpy_filename(tmp_filename) + point_array_list = np.load(numpy_dump_name) + + # Header data + self.hdr_typ = point_array_list[0,:self.nhdr] + self.hdr_sid = point_array_list[1,:self.nhdr] + self.hdr_vld = point_array_list[2,:self.nhdr] + self.hdr_lat = point_array_list[3,:self.nhdr] + self.hdr_lon = point_array_list[4,:self.nhdr] + self.hdr_elv = point_array_list[5,:self.nhdr] + # Observation data + self.obs_hid = point_array_list[6] + self.obs_lvl = point_array_list[7] + self.obs_hgt = point_array_list[8] + self.obs_val = point_array_list[9] + self.obs_vid = point_array_list[10] + self.obs_qty = point_array_list[11] + + if numpy_dump_name != tmp_filename: + met_base_tools.remove_temp_file(numpy_dump_name) + def write_point_data(self, tmp_filename): + if met_base_tools.use_netcdf_format(): + from met.point_nc import nc_point_obs -class csv_point_obs(base_met_point_obs): + nc_point_obs.write_nc_file(tmp_filename, self) + if met_base_tools.is_debug_enabled("point"): + met_base.log_message(f"Save to a temporary NetCDF file (point)") + else: + self.write_point_data_json_numpy(tmp_filename) + + def write_point_data_json_numpy(self, tmp_filename): + method_name = f"{self.__class__.__name__}.write_point_data_json_numpy()" + if met_base_tools.is_debug_enabled("dataplane"): + met_base.log_message(f"{method_name} Save to a temporary JSON file and a temporary numpy output (point)") + + self.nhdr = len(self.hdr_sid) + self.nobs = len(self.obs_hid) + + json_dict = {} + json_dict['use_var_id'] = self.use_var_id + json_dict['nhdr'] = self.nhdr + json_dict['nobs'] = self.nobs + json_dict['hdr_typ_table'] = self.convert_to_array(self.hdr_typ_table) + json_dict['hdr_sid_table'] = self.convert_to_array(self.hdr_sid_table) + json_dict['hdr_vld_table'] = self.convert_to_array(self.hdr_vld_table) + json_dict['obs_var_table'] = self.convert_to_array(self.obs_var_table) + json_dict['obs_qty_table'] = self.convert_to_array(self.obs_qty_table) + json_dict['obs_var_unit'] = self.convert_to_array(self.obs_var_unit) + json_dict['obs_var_desc'] = self.convert_to_array(self.obs_var_desc) + json_dict['hdr_prpt_typ'] = self.convert_to_array(self.hdr_prpt_typ) + json_dict['hdr_irpt_typ'] = self.convert_to_array(self.hdr_irpt_typ) + json_dict['hdr_inst_typ'] = self.convert_to_array(self.hdr_inst_typ) + + point_array_list = np.empty([12, self.nobs]) + + # Header data + point_array_list[0,:self.nhdr] = self.convert_to_ndarray(self.hdr_typ) + point_array_list[1,:self.nhdr] = self.convert_to_ndarray(self.hdr_sid) + point_array_list[2,:self.nhdr] = self.convert_to_ndarray(self.hdr_vld) + point_array_list[3,:self.nhdr] = self.convert_to_ndarray(self.hdr_lat) + point_array_list[4,:self.nhdr] = self.convert_to_ndarray(self.hdr_lon) + point_array_list[5,:self.nhdr] = self.convert_to_ndarray(self.hdr_elv) + # Observation data + point_array_list[6] = self.convert_to_ndarray(self.obs_hid) + point_array_list[7] = self.convert_to_ndarray(self.obs_lvl) + point_array_list[8] = self.convert_to_ndarray(self.obs_hgt) + point_array_list[9] = self.convert_to_ndarray(self.obs_val) + point_array_list[10] = self.convert_to_ndarray(self.obs_vid) + point_array_list[11] = self.convert_to_ndarray(self.obs_qty) + + with open(tmp_filename,'w') as json_fh: + json.dump(json_dict, json_fh) + + numpy_dump_name = met_base_tools.get_numpy_filename(tmp_filename) + #np.save(numpy_dump_name, self.convert_to_ndarray(point_array_list)) + np.save(numpy_dump_name, point_array_list) + + +class csv_point_obs(met_base_point): def __init__(self, point_data): self.point_data = point_data @@ -400,50 +415,50 @@ def __init__(self, point_data): self.convert_point_data() def check_csv_record(self, csv_point_data, index): + method_name = f"{self.__class__.__name__}.check_csv_record()" error_msgs = [] # names=['typ', 'sid', 'vld', 'lat', 'lon', 'elv', 'var', 'lvl', 'hgt', 'qc', 'obs'] # dtype={'typ':'str', 'sid':'str', 'vld':'str', 'var':'str', 'qc':'str'} if 11 > len(csv_point_data): - error_msgs.append("{i}-th data: missing columns. should be 11 columns, not {c} columns".format( - i=index, c=len(csv_point_data))) + error_msgs.append(f"{method_name} {index}-th data: missing columns. should be 11 columns, not {len(csv_point_data)} columns") elif 11 < len(csv_point_data): - print("{i}-th data: ignore after 11-th columns out of {c} columns".format( - i=index, c=len(csv_point_data))) + print("{i}-th data: ignore after 11-th columns out of {len(csv_point_data)} columns") if not isinstance(csv_point_data[0], str): - error_msgs.append("{i}-th data: message_type is not string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: message_type is not string") if not isinstance(csv_point_data[1], str): - error_msgs.append("{i}-th data: station_id is not string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: station_id is not string") if not isinstance(csv_point_data[2], str): - error_msgs.append("{i}-th data: valid_time is not string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: valid_time is not string") if isinstance(csv_point_data[3], str): - error_msgs.append("{i}-th data: latitude can not be a string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: latitude can not be a string") elif csv_point_data[3] < -90.0 or csv_point_data[3] > 90.0: - error_msgs.append("{i}-th data: latitude ({l}) is out of range".format(i=index, l=csv_point_data[3])) + error_msgs.append(f"{method_name} {index}-th data: latitude ({csv_point_data[3]}) is out of range") if isinstance(csv_point_data[4], str): - error_msgs.append("{i}-th data: longitude can not be a string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: longitude can not be a string") elif csv_point_data[4] < -180.0 or csv_point_data[4] > 360.0: - error_msgs.append("{i}-th data: longitude ({l}) is out of range".format(i=index, l=csv_point_data[4])) + error_msgs.append(f"{method_name} {index}-th data: longitude ({csv_point_data[4]}) is out of range") if not isinstance(csv_point_data[6], str): - error_msgs.append("{i}-th data: grib_code/var_name is not string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: grib_code/var_name is not string") if not isinstance(csv_point_data[9], str): - error_msgs.append("{i}-th data: quality_mark is not string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: quality_mark is not string") is_string, is_num = self.is_num_string(csv_point_data[5]) if is_string and not is_num: - error_msgs.append("{i}-th data: elevation: only NA is accepted as string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: elevation: only NA is accepted as string") is_string, is_num = self.is_num_string(csv_point_data[7]) if is_string and not is_num: - error_msgs.append("{i}-th data: obs_level: only NA is accepted as string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: obs_level: only NA is accepted as string") is_string, is_num = self.is_num_string(csv_point_data[8]) if is_string and not is_num: - error_msgs.append("{i}-th data: obs_height: only NA is accepted as string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: obs_height: only NA is accepted as string") is_string, is_num = self.is_num_string(csv_point_data[10]) if is_string and not is_num: - error_msgs.append("{i}-th data: obs_value: only NA is accepted as string".format(i=index)) + error_msgs.append(f"{method_name} {index}-th data: obs_value: only NA is accepted as string") return error_msgs def check_csv_point_data(self, all_records=False): + method_name = f"{self.__class__.__name__}.check_csv_point_data()" if 0 == len(self.point_data): - self.add_error_msg("No data!") + self.add_error_msg(f"{method_name} No data!") elif all_records: data_idx = 0 for csv_point_data in self.point_data: @@ -573,6 +588,7 @@ def convert_point_data(self): self.obs_var_table[idx] = key def get_num_value(self, column_value): + method_name = f"{self.__class__.__name__}.get_num_value()" num_value = column_value if isinstance(column_value, str): if self.is_number(column_value): @@ -580,7 +596,7 @@ def get_num_value(self, column_value): else: num_value = self.FILL_VALUE if column_value.lower() != 'na' and column_value.lower() != 'n/a': - self.log_info(f'{column_value} is not a number, converted to the missing value') + self.info_msg(f'{method_name} {column_value} is not a number, converted to the missing value') return num_value def is_grib_code(self): @@ -602,9 +618,7 @@ def is_num_string(self, column_value): return is_string, is_num -class met_point_obs(ABC, base_met_point_obs): - - MET_ENV_RUN = 'MET_FORCE_TO_RUN' +class met_point_obs(ABC, met_base_point): @abstractmethod def read_data(self, args): @@ -625,7 +639,15 @@ def read_data(self, args): pass -class met_point_tools(): +class dummy_point_obs(met_point_obs): + + def read_data(self, args): + pass + + +class met_point_tools(met_base_tools): + + python_prefix = 'PYTHON_POINT_USER' @staticmethod def convert_point_data(point_data, check_all_records=False, input_type='csv'): @@ -635,24 +657,85 @@ def convert_point_data(point_data, check_all_records=False, input_type='csv'): csv_point_data.check_csv_point_data(check_all_records) tmp_point_data = csv_point_data.get_point_data() else: - base_met_point_obs.error_msg('Not supported input type: {input_type}') + met_base_point.error_message(f'met_point_tools.convert_point_data() Not supported input type: {input_type}') return tmp_point_data @staticmethod - def get_prompt(): - return " python:" + def get_sample_point_obs(): + return sample_met_point_obs() @staticmethod - def get_nc_point_obs(): - return nc_point_obs() + def get_python_script(arg_value): + return arg_value[len(met_point_tools.python_prefix)+1:] @staticmethod - def get_sample_point_obs(): - return sample_met_point_obs() + def is_python_prefix(user_cmd): + return user_cmd.startswith(met_point_tools.python_prefix) @staticmethod - def is_python_prefix(user_cmd): - return user_cmd.startswith(base_met_point_obs.python_prefix) + def print_data(key, data_array, show_count=COUNT_SHOW): + if isinstance(data_array, list): + data_len = len(data_array) + if show_count >= data_len: + print(" {k:10s}: {v}".format(k=key, v= data_array)) + else: + end_offset = int(show_count/2) + print(" {k:10s}: count={v}".format(k=key, v=data_len)) + print(" {k:10s}[0:{o}] {v}".format(k=key, v=data_array[:end_offset], o=end_offset)) + print(" {k:10s}[{s}:{e}]: {v}".format(k=key, v='...', s=end_offset+1, e=data_len-end_offset-1)) + print(" {k:10s}[{s}:{e}]: {v}".format(k=key, v= data_array[-end_offset:], s=(data_len-end_offset), e=(data_len-1))) + else: + print(" {k:10s}: {v}".format(k=key, v= data_array)) + + @staticmethod + def print_point_data(met_point_data, print_subset=True): + method_name = f"met_point_tools.print_point_data()" + print(' === MET point data by python embedding ===') + if print_subset: + met_point_tools.print_data('nhdr',met_point_data['nhdr']) + met_point_tools.print_data('nobs',met_point_data['nobs']) + met_point_tools.print_data('use_var_id',met_point_data['use_var_id']) + met_point_tools.print_data('hdr_typ',met_point_data['hdr_typ']) + met_point_tools.print_data('hdr_typ_table',met_point_data['hdr_typ_table']) + met_point_tools.print_data('hdr_sid',met_point_data['hdr_sid']) + met_point_tools.print_data('hdr_sid_table',met_point_data['hdr_sid_table']) + met_point_tools.print_data('hdr_vld',met_point_data['hdr_vld']) + met_point_tools.print_data('hdr_vld_table',met_point_data['hdr_vld_table']) + met_point_tools.print_data('hdr_lat',met_point_data['hdr_lat']) + met_point_tools.print_data('hdr_lon',met_point_data['hdr_lon']) + met_point_tools.print_data('hdr_elv',met_point_data['hdr_elv']) + met_point_tools.print_data('obs_hid',met_point_data['obs_hid']) + met_point_tools.print_data('obs_vid',met_point_data['obs_vid']) + met_point_tools.print_data('obs_var_table',met_point_data['obs_var_table']) + met_point_tools.print_data('obs_qty',met_point_data['obs_qty']) + met_point_tools.print_data('obs_qty_table',met_point_data['obs_qty_table']) + met_point_tools.print_data('obs_lvl',met_point_data['obs_lvl']) + met_point_tools.print_data('obs_hgt',met_point_data['obs_hgt']) + met_point_tools.print_data('obs_val',met_point_data['obs_val']) + else: + print(f'{method_name} All',met_point_data) + print(f" nhdr: met_point_data['nhdr']") + print(f" nobs: met_point_data['nobs']") + print(f" use_var_id: met_point_data['use_var_id']") + print(f" hdr_typ: met_point_data['hdr_typ']") + print(f"hdr_typ_table: met_point_data['hdr_typ_table']") + print(f" hdr_sid: met_point_data['hdr_sid']") + print(f"hdr_sid_table: met_point_data['hdr_sid_table']") + print(f" hdr_vld: met_point_data['hdr_vld']") + print(f"hdr_vld_table: met_point_data['hdr_vld_table']") + print(f" hdr_lat: met_point_data['hdr_lat']") + print(f" hdr_lon: met_point_data['hdr_lon']") + print(f" hdr_elv: met_point_data['hdr_elv']") + print(f" obs_hid: met_point_data['obs_hid']") + print(f" obs_vid: met_point_data['obs_vid']") + print(f"obs_var_table: met_point_data['obs_var_table']") + print(f" obs_qty: met_point_data['obs_qty']") + print(f"obs_qty_table: met_point_data['obs_qty_table']") + print(f" obs_lvl: met_point_data['obs_lvl']") + print(f" obs_hgt: met_point_data['obs_hgt']") + print(f" obs_val: met_point_data['obs_val']") + + print(' === MET point data by python embedding ===') @staticmethod # Read the input file which is 11 column text file as the first argument @@ -670,254 +753,14 @@ def read_text_point_obs(input_file, header=None, # (9) numeric: Height(msl or agl) # (10) string: QC_String # (11) numeric: Observation_Value - ascii_point_data = pd.read_csv(input_file, header=header, - delim_whitespace=delim_whitespace, - keep_default_na=keep_default_na, - names=['typ', 'sid', 'vld', 'lat', 'lon', 'elv', 'var', 'lvl', 'hgt', 'qc', 'obs'], - dtype={'typ':'string', 'sid':'string', 'vld':'string', 'var':'string', 'qc':'string'}).values.tolist() + ascii_point_data = pd.read_csv( + input_file, header=header, + delim_whitespace=delim_whitespace, + keep_default_na=keep_default_na, + names=['typ', 'sid', 'vld', 'lat', 'lon', 'elv', 'var', 'lvl', 'hgt', 'qc', 'obs'], + dtype={'typ':'string', 'sid':'string', 'vld':'string', 'var':'string', 'qc':'string'}).values.tolist() return ascii_point_data -# Note: caller should import netCDF4 -# The argements nc_group(dataset) and nc_var should not be None -class nc_tools(): - - met_missing = -99999999. - - @staticmethod - def get_num_array(nc_group, var_name): - nc_var = nc_group.variables.get(var_name, None) - return [] if nc_var is None else nc_var[:] - - @staticmethod - def get_ncbyte_array_to_str(nc_var): - nc_str_data = nc_var[:] - if nc_var.datatype.name == 'bytes8': - nc_str_data = [ str(s.compressed(),"utf-8") for s in nc_var[:] ] - return nc_str_data - - @staticmethod - def get_string_array(nc_group, var_name): - nc_var = nc_group.variables.get(var_name, None) - return [] if nc_var is None else nc_tools.get_ncbyte_array_to_str(nc_var) - - -class nc_point_obs(met_point_obs): - - # args should be string, list, or dictionary - def get_nc_filename(self, args): - nc_filename = None - if isinstance(args, dict): - nc_filename = args.get('nc_name',None) - elif isinstance(args, list): - nc_filename = args[0] - elif args != ARG_PRINT_DATA: - nc_filename = args - - return nc_filename - - def read_data(self, nc_filename): - if nc_filename is None: - self.log_error_msg("The input NetCDF filename is missing") - elif not os.path.exists(nc_filename): - self.log_error_msg(f"input NetCDF file ({nc_filename}) does not exist") - else: - dataset = nc.Dataset(nc_filename, 'r') - - attr_name = 'use_var_id' - use_var_id_str = dataset.getncattr(attr_name) if attr_name in dataset.ncattrs() else "false" - self.use_var_id = use_var_id_str.lower() == 'true' - - # Header - self.hdr_typ = dataset['hdr_typ'][:] - self.hdr_sid = dataset['hdr_sid'][:] - self.hdr_vld = dataset['hdr_vld'][:] - self.hdr_lat = dataset['hdr_lat'][:] - self.hdr_lon = dataset['hdr_lon'][:] - self.hdr_elv = dataset['hdr_elv'][:] - self.hdr_typ_table = nc_tools.get_string_array(dataset, 'hdr_typ_table') - self.hdr_sid_table = nc_tools.get_string_array(dataset, 'hdr_sid_table') - self.hdr_vld_table = nc_tools.get_string_array(dataset, 'hdr_vld_table') - - nc_var = dataset.variables.get('obs_unit', None) - if nc_var: - self.obs_var_unit = nc_var[:] - nc_var = dataset.variables.get('obs_desc', None) - if nc_var: - self.obs_var_desc = nc_var[:] - - nc_var = dataset.variables.get('hdr_prpt_typ', None) - if nc_var: - self.hdr_prpt_typ = nc_var[:] - nc_var = dataset.variables.get('hdr_irpt_typ', None) - if nc_var: - self.hdr_irpt_typ = nc_var[:] - nc_var = dataset.variables.get('hdr_inst_typ', None) - if nc_var: - self.hdr_inst_typ =nc_var[:] - - #Observation data - self.hdr_sid = dataset['hdr_sid'][:] - self.obs_qty = np.array(dataset['obs_qty'][:]) - self.obs_hid = np.array(dataset['obs_hid'][:]) - self.obs_lvl = np.array(dataset['obs_lvl'][:]) - self.obs_hgt = np.array(dataset['obs_hgt'][:]) - self.obs_val = np.array(dataset['obs_val'][:]) - nc_var = dataset.variables.get('obs_vid', None) - if nc_var is None: - self.use_var_id = False - nc_var = dataset.variables.get('obs_gc', None) - else: - self.obs_var_table = nc_tools.get_string_array(dataset, 'obs_var') - if nc_var: - self.obs_vid = np.array(nc_var[:]) - - self.obs_qty_table = nc_tools.get_string_array(dataset, 'obs_qty_table') - - def save_ncfile(self, nc_filename): - met_data = self.get_point_data() - with nc.Dataset(nc_filename, 'w') as nc_dataset: - self.set_nc_data(nc_dataset) - return met_data - - def set_nc_data(self, nc_dataset): - return nc_point_obs.write_nc_data(nc_dataset, self) - - @staticmethod - def write_nc_file(nc_filename, point_obs): - with nc.Dataset(nc_filename, 'w') as nc_dataset: - nc_point_obs.set_nc_data(nc_dataset, point_obs) - - @staticmethod - def write_nc_data(nc_dataset, point_obs): - do_nothing = False - if 0 == point_obs.nhdr: - do_nothing = True - base_met_point_obs.info_msg("the header is empty") - if 0 == point_obs.nobs: - do_nothing = True - base_met_point_obs.info_msg("the observation data is empty") - if do_nothing: - print() - return - - # Set global attributes - nc_dataset.MET_Obs_version = "1.02" ; - nc_dataset.use_var_id = "true" if point_obs.use_var_id else "false" - - # Create dimensions - nc_dataset.createDimension('mxstr', 16) - nc_dataset.createDimension('mxstr2', 40) - nc_dataset.createDimension('mxstr3', 80) - nc_dataset.createDimension('nhdr', point_obs.nhdr) - nc_dataset.createDimension('nobs', point_obs.nobs) - #npbhdr = len(point_obs.hdr_prpt_typ) - if 0 < point_obs.npbhdr: - nc_dataset.createDimension('npbhdr', point_obs.npbhdr) - nc_dataset.createDimension('nhdr_typ', point_obs.nhdr_typ) - nc_dataset.createDimension('nhdr_sid', point_obs.nhdr_sid) - nc_dataset.createDimension('nhdr_vld', point_obs.nhdr_vld) - nc_dataset.createDimension('nobs_qty', point_obs.nobs_qty) - nc_dataset.createDimension('obs_var_num', point_obs.nobs_var) - - type_for_string = 'S1' # np.byte - dims_hdr = ('nhdr',) - dims_obs = ('nobs',) - - # Create header and observation variables - var_hdr_typ = nc_dataset.createVariable('hdr_typ', np.int32, dims_hdr, fill_value=-9999) - var_hdr_sid = nc_dataset.createVariable('hdr_sid', np.int32, dims_hdr, fill_value=-9999) - var_hdr_vld = nc_dataset.createVariable('hdr_vld', np.int32, dims_hdr, fill_value=-9999) - var_hdr_lat = nc_dataset.createVariable('hdr_lat', np.float32, dims_hdr, fill_value=-9999.) - var_hdr_lon = nc_dataset.createVariable('hdr_lon', np.float32, dims_hdr, fill_value=-9999.) - var_hdr_elv = nc_dataset.createVariable('hdr_elv', np.float32, dims_hdr, fill_value=-9999.) - - var_obs_qty = nc_dataset.createVariable('obs_qty', np.int32, dims_obs, fill_value=-9999) - var_obs_hid = nc_dataset.createVariable('obs_hid', np.int32, dims_obs, fill_value=-9999) - var_obs_vid = nc_dataset.createVariable('obs_vid', np.int32, dims_obs, fill_value=-9999) - var_obs_lvl = nc_dataset.createVariable('obs_lvl', np.float32, dims_obs, fill_value=-9999.) - var_obs_hgt = nc_dataset.createVariable('obs_hgt', np.float32, dims_obs, fill_value=-9999.) - var_obs_val = nc_dataset.createVariable('obs_val', np.float32, dims_obs, fill_value=-9999.) - - if 0 == point_obs.npbhdr: - var_hdr_prpt_typ = None - var_hdr_irpt_typ = None - var_hdr_inst_typ = None - else: - dims_npbhdr = ('npbhdr',) - var_hdr_prpt_typ = nc_dataset.createVariable('hdr_prpt_typ', np.int32, dims_npbhdr, fill_value=-9999.) - var_hdr_irpt_typ = nc_dataset.createVariable('hdr_irpt_typ', np.int32, dims_npbhdr, fill_value=-9999.) - var_hdr_inst_typ = nc_dataset.createVariable('hdr_inst_typ', np.int32, dims_npbhdr, fill_value=-9999.) - - var_hdr_typ_table = nc_dataset.createVariable('hdr_typ_table', type_for_string, ('nhdr_typ','mxstr2')) - var_hdr_sid_table = nc_dataset.createVariable('hdr_sid_table', type_for_string, ('nhdr_sid','mxstr2')) - var_hdr_vld_table = nc_dataset.createVariable('hdr_vld_table', type_for_string, ('nhdr_vld','mxstr')) - var_obs_qty_table = nc_dataset.createVariable('obs_qty_table', type_for_string, ('nobs_qty','mxstr')) - var_obs_var_table = nc_dataset.createVariable('obs_var', type_for_string, ('obs_var_num','mxstr2')) - var_obs_var_unit = nc_dataset.createVariable('obs_unit', type_for_string, ('obs_var_num','mxstr2')) - var_obs_var_desc = nc_dataset.createVariable('obs_desc', type_for_string, ('obs_var_num','mxstr3')) - - # Set variables - var_hdr_typ[:] = point_obs.hdr_typ[:] - var_hdr_sid[:] = point_obs.hdr_sid[:] - var_hdr_vld[:] = point_obs.hdr_vld[:] - var_hdr_lat[:] = point_obs.hdr_lat[:] - var_hdr_lon[:] = point_obs.hdr_lon[:] - var_hdr_elv[:] = point_obs.hdr_elv[:] - for i in range(0, point_obs.nhdr_typ): - for j in range(0, len(point_obs.hdr_typ_table[i])): - var_hdr_typ_table[i,j] = point_obs.hdr_typ_table[i][j] - for i in range(0, point_obs.nhdr_sid): - for j in range(0, len(point_obs.hdr_sid_table[i])): - var_hdr_sid_table[i,j] = point_obs.hdr_sid_table[i][j] - for i in range(0, point_obs.nhdr_vld): - for j in range(0, len(point_obs.hdr_vld_table[i])): - var_hdr_vld_table[i,j] = point_obs.hdr_vld_table[i][j] - if 0 < point_obs.npbhdr: - var_hdr_prpt_typ[:] = point_obs.hdr_prpt_typ[:] - var_hdr_irpt_typ[:] = point_obs.hdr_irpt_typ[:] - var_hdr_inst_typ[:] = point_obs.hdr_inst_typ[:] - - var_obs_qty[:] = point_obs.obs_qty[:] - var_obs_hid[:] = point_obs.obs_hid[:] - var_obs_vid[:] = point_obs.obs_vid[:] - var_obs_lvl[:] = point_obs.obs_lvl[:] - var_obs_hgt[:] = point_obs.obs_hgt[:] - var_obs_val[:] = point_obs.obs_val[:] - for i in range(0, point_obs.nobs_var): - for j in range(0, len(point_obs.obs_var_table[i])): - var_obs_var_table[i,j] = point_obs.obs_var_table[i][j] - var_obs_var_unit[i] = "" if i >= len(point_obs.obs_var_unit) else point_obs.obs_var_unit[i] - var_obs_var_desc[i] = "" if i >= len(point_obs.obs_var_desc) else point_obs.obs_var_desc[i] - for i in range(0, point_obs.nobs_qty): - for j in range(0, len(point_obs.obs_qty_table[i])): - var_obs_qty_table[i,j] = point_obs.obs_qty_table[i][j] - - # Set variable attributes - var_hdr_typ.long_name = "index of message type" - var_hdr_sid.long_name = "index of station identification" - var_hdr_vld.long_name = "index of valid time" - var_hdr_lat.long_name = "latitude" - var_hdr_lat.units = "degrees_north" - var_hdr_lon.long_name = "longitude" - var_hdr_lon.units = "degrees_east" - var_hdr_elv.long_name = "elevation" - var_hdr_elv.units = "meters above sea level (msl)" - - var_obs_qty.long_name = "index of quality flag" - var_obs_hid.long_name = "index of matching header data" - var_obs_vid.long_name = "index of BUFR variable corresponding to the observation type" - var_obs_lvl.long_name = "pressure level (hPa) or accumulation interval (sec)" - var_obs_hgt.long_name = "height in meters above sea level (msl)" - var_obs_val.long_name = "observation value" - var_hdr_typ_table.long_name = "message type" - var_hdr_sid_table.long_name = "station identification" - var_hdr_vld_table.long_name = "valid time" - var_hdr_vld_table.units = "YYYYMMDD_HHMMSS UTC" - var_obs_qty_table.long_name = "quality flag" - var_obs_var_table.long_name = "variable names" - var_obs_var_unit.long_name = "variable units" - var_obs_var_desc.long_name = "variable descriptions" - # This is a sample drived class class sample_met_point_obs(met_point_obs): @@ -946,6 +789,7 @@ def read_data(self, arg_map={}): self.obs_var_table = [ "TMP", "RH" ] self.obs_qty_table = [ "NA" ] + def convert_point_data(point_data, check_all_records=False, input_type='csv'): tmp_point_data = {} if 'csv' == input_type: @@ -953,9 +797,12 @@ def convert_point_data(point_data, check_all_records=False, input_type='csv'): csv_point_data.check_csv_point_data(check_all_records) tmp_point_data = csv_point_data.get_point_data() else: - base_met_point_obs.error_msg('Not supported input type: {input_type}') + met_base.error_messageg(f'convert_point_data(() Not supported input type: {input_type}') return tmp_point_data +def get_empty_point_obs(): + return dummy_point_obs() + def main(): args = {} # or args = [] point_obs_data = sample_met_point_obs() @@ -964,20 +811,6 @@ def main(): point_obs_data.print_point_data(met_point_data, print_subset=False) -def main_nc(argv): - if len(argv) != 1 and argv[1] != ARG_PRINT_DATA: - netcdf_filename = argv[1] - tmp_nc_name = 'tmp_met_point.nc' - point_obs_data = nc_point_obs() - point_obs_data.read_data(point_obs_data.get_nc_filename(netcdf_filename)) - met_point_data = point_obs_data.save_ncfile(tmp_nc_name) - print(f'{base_met_point_obs.get_prompt()} saved met_point_data to {tmp_nc_name}') - met_point_data['met_point_data'] = point_obs_data - - if DO_PRINT_DATA or ARG_PRINT_DATA == argv[-1]: - met_point_obs.print_point_data(met_point_data) - - if __name__ == '__main__': main() print('Done python script') diff --git a/scripts/python/met/point_nc.py b/scripts/python/met/point_nc.py new file mode 100644 index 0000000000..37063bdb0d --- /dev/null +++ b/scripts/python/met/point_nc.py @@ -0,0 +1,293 @@ +''' +Created on Jan 10, 2024 + +@author: hsoh + +- This is a derived class to support a NetCDF format data. Separated from point.py +- The potential risk with the netCDF python package is the NetCDF library conflicts beteen MET and python3. + +''' + +import os + +import numpy as np +import netCDF4 as nc + +from met.point import met_point_obs, met_point_tools + +DO_PRINT_DATA = False +ARG_PRINT_DATA = "print_data" + + +def get_nc_point_obs(): + return nc_point_obs() + + +# Note: caller should import netCDF4 +# The argements nc_group(dataset) and nc_var should not be None +class met_point_nc_tools(met_point_tools): + + #met_missing = -99999999. + + @staticmethod + def get_num_array(nc_group, var_name): + nc_var = nc_group.variables.get(var_name, None) + return [] if nc_var is None else nc_var[:] + + @staticmethod + def get_nc_point_obs(): + return nc_point_obs() + + @staticmethod + def get_ncbyte_array_to_str(nc_var): + nc_str_data = nc_var[:] + if nc_var.datatype.name == 'bytes8': + nc_str_data = [ str(s.compressed(),"utf-8") for s in nc_var[:] ] + return nc_str_data + + @staticmethod + def get_string_array(nc_group, var_name): + nc_var = nc_group.variables.get(var_name, None) + return [] if nc_var is None else met_point_nc_tools.get_ncbyte_array_to_str(nc_var) + + +class nc_point_obs(met_point_obs): + + # args should be string, list, or dictionary + def get_nc_filename(self, args): + nc_filename = None + if isinstance(args, dict): + nc_filename = args.get('nc_name',None) + elif isinstance(args, list): + nc_filename = args[0] + elif args != ARG_PRINT_DATA: + nc_filename = args + + return nc_filename + + def read_data(self, nc_filename): + method_name = f"{self.__class__.__name__}.read_data()" + if nc_filename is None: + self.log_error_msg(f"{method_name} The input NetCDF filename is missing") + elif not os.path.exists(nc_filename): + self.log_error_msg(f"{method_name} input NetCDF file ({nc_filename}) does not exist") + else: + dataset = nc.Dataset(nc_filename, 'r') + + attr_name = 'use_var_id' + use_var_id_str = dataset.getncattr(attr_name) if attr_name in dataset.ncattrs() else "false" + self.use_var_id = use_var_id_str.lower() == 'true' + + # Header + self.hdr_typ = dataset['hdr_typ'][:] + self.hdr_sid = dataset['hdr_sid'][:] + self.hdr_vld = dataset['hdr_vld'][:] + self.hdr_lat = dataset['hdr_lat'][:] + self.hdr_lon = dataset['hdr_lon'][:] + self.hdr_elv = dataset['hdr_elv'][:] + self.hdr_typ_table = met_point_nc_tools.get_string_array(dataset, 'hdr_typ_table') + self.hdr_sid_table = met_point_nc_tools.get_string_array(dataset, 'hdr_sid_table') + self.hdr_vld_table = met_point_nc_tools.get_string_array(dataset, 'hdr_vld_table') + + nc_var = dataset.variables.get('obs_unit', None) + if nc_var: + self.obs_var_unit = nc_var[:] + nc_var = dataset.variables.get('obs_desc', None) + if nc_var: + self.obs_var_desc = nc_var[:] + + nc_var = dataset.variables.get('hdr_prpt_typ', None) + if nc_var: + self.hdr_prpt_typ = nc_var[:] + nc_var = dataset.variables.get('hdr_irpt_typ', None) + if nc_var: + self.hdr_irpt_typ = nc_var[:] + nc_var = dataset.variables.get('hdr_inst_typ', None) + if nc_var: + self.hdr_inst_typ =nc_var[:] + + #Observation data + self.hdr_sid = dataset['hdr_sid'][:] + self.obs_qty = np.array(dataset['obs_qty'][:]) + self.obs_hid = np.array(dataset['obs_hid'][:]) + self.obs_lvl = np.array(dataset['obs_lvl'][:]) + self.obs_hgt = np.array(dataset['obs_hgt'][:]) + self.obs_val = np.array(dataset['obs_val'][:]) + nc_var = dataset.variables.get('obs_vid', None) + if nc_var is None: + self.use_var_id = False + nc_var = dataset.variables.get('obs_gc', None) + else: + self.obs_var_table = met_point_nc_tools.get_string_array(dataset, 'obs_var') + if nc_var: + self.obs_vid = np.array(nc_var[:]) + + self.obs_qty_table = met_point_nc_tools.get_string_array(dataset, 'obs_qty_table') + + def save_ncfile(self, nc_filename): + met_data = self.get_point_data() + with nc.Dataset(nc_filename, 'w') as nc_dataset: + self.set_nc_data(nc_dataset) + return met_data + + def set_nc_data(self, nc_dataset): + return nc_point_obs.write_nc_data(nc_dataset, self) + + @staticmethod + def write_nc_file(nc_filename, point_obs): + with nc.Dataset(nc_filename, 'w') as nc_dataset: + nc_point_obs.write_nc_data(nc_dataset, point_obs) + + @staticmethod + def write_nc_data(nc_dataset, point_obs): + method_name = "point_nc.write_nc_data()" + try: + do_nothing = False + if 0 == point_obs.nhdr: + do_nothing = True + met_point_obs.info_message(f"{method_name} the header is empty") + if 0 == point_obs.nobs: + do_nothing = True + met_point_obs.info_message(f"{method_name} the observation data is empty") + if do_nothing: + print() + return + + # Set global attributes + nc_dataset.MET_Obs_version = "1.02" ; + nc_dataset.use_var_id = "true" if point_obs.use_var_id else "false" + + # Create dimensions + nc_dataset.createDimension('mxstr', 16) + nc_dataset.createDimension('mxstr2', 40) + nc_dataset.createDimension('mxstr3', 80) + nc_dataset.createDimension('nhdr', point_obs.nhdr) + nc_dataset.createDimension('nobs', point_obs.nobs) + #npbhdr = len(point_obs.hdr_prpt_typ) + if 0 < point_obs.npbhdr: + nc_dataset.createDimension('npbhdr', point_obs.npbhdr) + nc_dataset.createDimension('nhdr_typ', point_obs.nhdr_typ) + nc_dataset.createDimension('nhdr_sid', point_obs.nhdr_sid) + nc_dataset.createDimension('nhdr_vld', point_obs.nhdr_vld) + nc_dataset.createDimension('nobs_qty', point_obs.nobs_qty) + nc_dataset.createDimension('obs_var_num', point_obs.nobs_var) + + type_for_string = 'S1' # np.byte + dims_hdr = ('nhdr',) + dims_obs = ('nobs',) + + # Create header and observation variables + var_hdr_typ = nc_dataset.createVariable('hdr_typ', np.int32, dims_hdr, fill_value=-9999) + var_hdr_sid = nc_dataset.createVariable('hdr_sid', np.int32, dims_hdr, fill_value=-9999) + var_hdr_vld = nc_dataset.createVariable('hdr_vld', np.int32, dims_hdr, fill_value=-9999) + var_hdr_lat = nc_dataset.createVariable('hdr_lat', np.float32, dims_hdr, fill_value=-9999.) + var_hdr_lon = nc_dataset.createVariable('hdr_lon', np.float32, dims_hdr, fill_value=-9999.) + var_hdr_elv = nc_dataset.createVariable('hdr_elv', np.float32, dims_hdr, fill_value=-9999.) + + var_obs_qty = nc_dataset.createVariable('obs_qty', np.int32, dims_obs, fill_value=-9999) + var_obs_hid = nc_dataset.createVariable('obs_hid', np.int32, dims_obs, fill_value=-9999) + var_obs_vid = nc_dataset.createVariable('obs_vid', np.int32, dims_obs, fill_value=-9999) + var_obs_lvl = nc_dataset.createVariable('obs_lvl', np.float32, dims_obs, fill_value=-9999.) + var_obs_hgt = nc_dataset.createVariable('obs_hgt', np.float32, dims_obs, fill_value=-9999.) + var_obs_val = nc_dataset.createVariable('obs_val', np.float32, dims_obs, fill_value=-9999.) + + if 0 == point_obs.npbhdr: + var_hdr_prpt_typ = None + var_hdr_irpt_typ = None + var_hdr_inst_typ = None + else: + dims_npbhdr = ('npbhdr',) + var_hdr_prpt_typ = nc_dataset.createVariable('hdr_prpt_typ', np.int32, dims_npbhdr, fill_value=-9999.) + var_hdr_irpt_typ = nc_dataset.createVariable('hdr_irpt_typ', np.int32, dims_npbhdr, fill_value=-9999.) + var_hdr_inst_typ = nc_dataset.createVariable('hdr_inst_typ', np.int32, dims_npbhdr, fill_value=-9999.) + + var_hdr_typ_table = nc_dataset.createVariable('hdr_typ_table', type_for_string, ('nhdr_typ','mxstr2')) + var_hdr_sid_table = nc_dataset.createVariable('hdr_sid_table', type_for_string, ('nhdr_sid','mxstr2')) + var_hdr_vld_table = nc_dataset.createVariable('hdr_vld_table', type_for_string, ('nhdr_vld','mxstr')) + var_obs_qty_table = nc_dataset.createVariable('obs_qty_table', type_for_string, ('nobs_qty','mxstr')) + var_obs_var_table = nc_dataset.createVariable('obs_var', type_for_string, ('obs_var_num','mxstr2')) + var_obs_var_unit = nc_dataset.createVariable('obs_unit', type_for_string, ('obs_var_num','mxstr2')) + var_obs_var_desc = nc_dataset.createVariable('obs_desc', type_for_string, ('obs_var_num','mxstr3')) + + # Set variables + var_hdr_typ[:] = point_obs.hdr_typ[:] + var_hdr_sid[:] = point_obs.hdr_sid[:] + var_hdr_vld[:] = point_obs.hdr_vld[:] + var_hdr_lat[:] = point_obs.hdr_lat[:] + var_hdr_lon[:] = point_obs.hdr_lon[:] + var_hdr_elv[:] = point_obs.hdr_elv[:] + for i in range(0, point_obs.nhdr_typ): + for j in range(0, len(point_obs.hdr_typ_table[i])): + var_hdr_typ_table[i,j] = point_obs.hdr_typ_table[i][j] + for i in range(0, point_obs.nhdr_sid): + for j in range(0, len(point_obs.hdr_sid_table[i])): + var_hdr_sid_table[i,j] = point_obs.hdr_sid_table[i][j] + for i in range(0, point_obs.nhdr_vld): + for j in range(0, len(point_obs.hdr_vld_table[i])): + var_hdr_vld_table[i,j] = point_obs.hdr_vld_table[i][j] + if 0 < point_obs.npbhdr: + var_hdr_prpt_typ[:] = point_obs.hdr_prpt_typ[:] + var_hdr_irpt_typ[:] = point_obs.hdr_irpt_typ[:] + var_hdr_inst_typ[:] = point_obs.hdr_inst_typ[:] + + var_obs_qty[:] = point_obs.obs_qty[:] + var_obs_hid[:] = point_obs.obs_hid[:] + var_obs_vid[:] = point_obs.obs_vid[:] + var_obs_lvl[:] = point_obs.obs_lvl[:] + var_obs_hgt[:] = point_obs.obs_hgt[:] + var_obs_val[:] = point_obs.obs_val[:] + for i in range(0, point_obs.nobs_var): + for j in range(0, len(point_obs.obs_var_table[i])): + var_obs_var_table[i,j] = point_obs.obs_var_table[i][j] + var_obs_var_unit[i] = "" if i >= len(point_obs.obs_var_unit) else point_obs.obs_var_unit[i] + var_obs_var_desc[i] = "" if i >= len(point_obs.obs_var_desc) else point_obs.obs_var_desc[i] + for i in range(0, point_obs.nobs_qty): + for j in range(0, len(point_obs.obs_qty_table[i])): + var_obs_qty_table[i,j] = point_obs.obs_qty_table[i][j] + + # Set variable attributes + var_hdr_typ.long_name = "index of message type" + var_hdr_sid.long_name = "index of station identification" + var_hdr_vld.long_name = "index of valid time" + var_hdr_lat.long_name = "latitude" + var_hdr_lat.units = "degrees_north" + var_hdr_lon.long_name = "longitude" + var_hdr_lon.units = "degrees_east" + var_hdr_elv.long_name = "elevation" + var_hdr_elv.units = "meters above sea level (msl)" + + var_obs_qty.long_name = "index of quality flag" + var_obs_hid.long_name = "index of matching header data" + var_obs_vid.long_name = "index of BUFR variable corresponding to the observation type" + var_obs_lvl.long_name = "pressure level (hPa) or accumulation interval (sec)" + var_obs_hgt.long_name = "height in meters above sea level (msl)" + var_obs_val.long_name = "observation value" + var_hdr_typ_table.long_name = "message type" + var_hdr_sid_table.long_name = "station identification" + var_hdr_vld_table.long_name = "valid time" + var_hdr_vld_table.units = "YYYYMMDD_HHMMSS UTC" + var_obs_qty_table.long_name = "quality flag" + var_obs_var_table.long_name = "variable names" + var_obs_var_unit.long_name = "variable units" + var_obs_var_desc.long_name = "variable descriptions" + except: + print(f' === ERROR at {method_name} type(nc_dataset)={type(nc_dataset)} type(point_obs)={type(point_obs)}') + raise + + +def main(argv): + if len(argv) != 1 and argv[1] != ARG_PRINT_DATA: + netcdf_filename = argv[1] + tmp_nc_name = 'tmp_met_point.nc' + point_obs_data = nc_point_obs() + point_obs_data.read_data(point_obs_data.get_nc_filename(netcdf_filename)) + met_point_data = point_obs_data.save_ncfile(tmp_nc_name) + print(f'{met_point_tools.get_prompt()} saved met_point_data to {tmp_nc_name}') + met_point_data['met_point_data'] = point_obs_data + + if DO_PRINT_DATA or ARG_PRINT_DATA == argv[-1]: + point_obs_data.print_point_data(met_point_data) + +if __name__ == '__main__': + main() + print('Done python script') diff --git a/scripts/python/pyembed/read_tmp_point_nc.py b/scripts/python/pyembed/read_tmp_point_nc.py index 622405c520..d3e0f13cb7 100644 --- a/scripts/python/pyembed/read_tmp_point_nc.py +++ b/scripts/python/pyembed/read_tmp_point_nc.py @@ -8,7 +8,7 @@ import sys -from met.point import met_point_tools +from met.point import get_empty_point_obs, met_point_tools try: from python_embedding import pyembed_tools except: @@ -19,8 +19,8 @@ # read NetCDF file print('{p} reading {f}'.format(p=met_point_tools.get_prompt(), f=input_filename)) try: - point_obs_data = met_point_tools.get_nc_point_obs() - point_obs_data.read_data(input_filename) + point_obs_data = get_empty_point_obs() + point_obs_data.read_point_data(input_filename) met_point_data = point_obs_data.get_point_data() met_point_data['met_point_data'] = point_obs_data diff --git a/scripts/python/pyembed/write_tmp_point_nc.py b/scripts/python/pyembed/write_tmp_point_nc.py index 41f380de77..fc828e3237 100644 --- a/scripts/python/pyembed/write_tmp_point_nc.py +++ b/scripts/python/pyembed/write_tmp_point_nc.py @@ -17,20 +17,24 @@ except: from pyembed.python_embedding import pyembed_tools -from met.point import met_point_tools +try: + from point import get_empty_point_obs +except: + from met.point import get_empty_point_obs + if __name__ == '__main__': tmp_filename = sys.argv[1] met_in = pyembed_tools.call_python(sys.argv) - if hasattr(met_in, 'point_data'): + if hasattr(met_in, 'point_obs_data'): + met_in.point_obs_data.write_point_data(tmp_filename) + elif hasattr(met_in, 'point_data'): pyembed_tools.write_tmp_ascii(tmp_filename, met_in.point_data) - elif hasattr(met_in, 'point_obs_data'): - met_in.point_obs_data.save_ncfile(tmp_filename) else: if hasattr(met_in.met_point_data, 'point_obs_data'): - met_in.met_point_data['point_obs_data'].save_ncfile(tmp_filename) + met_in.met_point_data['point_obs_data'].write_point_data(tmp_filename) else: - tmp_point_obs = met_point_tools.get_nc_point_obs() + tmp_point_obs = get_empty_point_obs() tmp_point_obs.put_data(met_in.met_point_data) tmp_point_obs.save_ncfile(tmp_filename) diff --git a/src/basic/vx_config/temp_file.cc b/src/basic/vx_config/temp_file.cc index 98683a1c5b..85d5528a49 100644 --- a/src/basic/vx_config/temp_file.cc +++ b/src/basic/vx_config/temp_file.cc @@ -70,6 +70,14 @@ void remove_temp_file(const ConcatString file_name) { // // Attempt to remove the file and print out any error message // + const char *keep_temp = getenv("MET_KEEP_TEMP_FILE"); + if (nullptr != keep_temp + && (0 == strcmp(keep_temp, "true") || 0 == strcmp(keep_temp, "yes"))) { + mlog << Debug(2) << "The temporary file (" + << file_name << ") was not deleted. Please remove it manually\n\n"; + return; + } + if((errno = remove(file_name.c_str())) != 0) { mlog << Error << "\nremove_temp_file() -> " << "can't delete temporary file: \"" diff --git a/src/libcode/vx_analysis_util/stat_job.cc b/src/libcode/vx_analysis_util/stat_job.cc index 515200d984..ea781a5fc0 100644 --- a/src/libcode/vx_analysis_util/stat_job.cc +++ b/src/libcode/vx_analysis_util/stat_job.cc @@ -1450,14 +1450,6 @@ void STATAnalysisJob::parse_job_command(const char *jobstring) { i+=2; } else if(jc_array[i] == "-set_hdr") { - n = METHdrTable.header(met_version, "STAT", na_str)->col_offset(to_upper(jc_array[i+1]).c_str()); - if(is_bad_data(n)) { - mlog << Error << "\nSTATAnalysisJob::parse_job_command() -> " - << "no match found for header column named: \"" - << to_upper((string)jc_array[i+1]) << "\"\n\n"; - if(line) { delete [] line; line = (char *) 0; } - throw(1); - } hdr_name.add_css(to_upper(jc_array[i+1])); hdr_value.add_css(jc_array[i+2]); i+=2; @@ -1650,6 +1642,28 @@ void STATAnalysisJob::parse_job_command(const char *jobstring) { } // end for + // Validate set_hdr column names + if(hdr_name.n() > 0) { + + string lt_str = (line_type.n() == 1 ? + line_type[0] : na_str); + + for(i=0; i + col_offset(hdr_name[i].c_str()); + + if(is_bad_data(n)) { + mlog << Error << "\nSTATAnalysisJob::parse_job_command() -> " + << "no match found for " + << (line_type.n() == 1 ? line_type[0] : "header") + << " column named \"" << hdr_name[i] << "\"\n\n"; + if(line) { delete [] line; line = (char *) 0; } + throw(1); + } + } // end for + } + // Expand out_eclv_points if(out_eclv_points.n() == 1) { for(i=2; i*out_eclv_points[0] < 1.0; i++) out_eclv_points.add(i*out_eclv_points[0]); @@ -2130,7 +2144,8 @@ void STATAnalysisJob::close_stat_file() { //////////////////////////////////////////////////////////////////////// -void STATAnalysisJob::dump_stat_line(const STATLine &line) { +void STATAnalysisJob::dump_stat_line(const STATLine &line, + bool do_set_hdr) { int i; // @@ -2268,8 +2283,24 @@ void STATAnalysisJob::dump_stat_line(const STATLine &line) { // Store the data line // for(i=0; i