Rz logging dedup #450

mgovorcin · 2024-10-21T20:40:35Z

No description provided.

pep8speaks · 2024-10-21T20:40:43Z

Hello @mgovorcin! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file tools/ARIAtools/extractProduct.py:

Line 347:33: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
Line 374:80: E501 line too long (82 > 79 characters)
Line 375:45: E226 missing whitespace around arithmetic operator
Line 379:80: E501 line too long (89 > 79 characters)
Line 388:80: E501 line too long (80 > 79 characters)
Line 1062:56: E226 missing whitespace around arithmetic operator
Line 1066:63: E226 missing whitespace around arithmetic operator
Line 1071:32: E226 missing whitespace around arithmetic operator
Line 1071:49: E226 missing whitespace around arithmetic operator
Line 1072:43: E226 missing whitespace around arithmetic operator
Line 1078:37: E226 missing whitespace around arithmetic operator
Line 1084:80: E501 line too long (82 > 79 characters)
Line 1092:80: E501 line too long (81 > 79 characters)
Line 1103:80: E501 line too long (81 > 79 characters)
Line 1451:5: E303 too many blank lines (2)
Line 1465:80: E501 line too long (80 > 79 characters)
Line 1472:80: E501 line too long (80 > 79 characters)
Line 1476:80: E501 line too long (80 > 79 characters)
Line 1490:13: E127 continuation line over-indented for visual indent

In the file tools/ARIAtools/product.py:

Line 255:17: E127 continuation line over-indented for visual indent
Line 1281:72: E502 the backslash is redundant between brackets

In the file tools/ARIAtools/util/run_logging.py:

Line 11:1: E302 expected 2 blank lines, found 1
Line 48:5: E303 too many blank lines (2)

sssangha · 2024-10-22T20:48:22Z

I've found a couple of issues @rzinke

In the first run with ariaExtract.py I encounter this error, it looks like a log is expected to exist with all runs for ariaExtract.py, but the logic is worked out with ariaTSsetup.py. It seems that the logic you used for ariaTSsetup.py just needs to be extended to ariaExtract.py:

/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/ARIAtools/__init__.py:9: RuntimeWarning: ARIA-tools is not installed! Install in editable/develop mode via (from the top of this repo): python -m pip install -e .
  warnings.warn(
Traceback (most recent call last):
  File "/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/bin/ariaExtract.py", line 316, in <module>
	main()
  File "/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/bin/ariaExtract.py", line 210, in main
	proj, is_nisar_file) = ARIAtools.extractProduct.merged_productbbox(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/ARIAtools/extractProduct.py", line 350, in merged_productbbox
	if run_log.get_current()['rerun'] == True:
   	^^^^^^^^^^^^^^^^^^^^^
  File "/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/ARIAtools/util/run_logging.py", line 109, in get_current
	current_run = log_names[0]
              	~~~~~~~~~^^^

Finally for this section, I don’t understand the different between product and files. How about we just simplify and consolidate these variables if possible, unless I'm missing some nuance?

When re-running with the same inputs, I see the ** Cropping only operation duplicated for multiple layers. This is happening because the current_bbox doesn’t seem to be the same size as the prev_bbox, as it’s not getting caught in this conditional. But upon closer inspection, it looks like this is happening because a ‘bbox’ isn’t being passed above. Why aren’t we tracking this with the run_log? E.g. run_log.update('bbox_file', bbox_file)

To replicate this, you can run:

ariaDownload.py -t 71 -b '37.5 39.5 -118.5 -114.5' -s 20200515 -e 20200601
ariaExtract.py -f 'products/*.nc' -l "unwrappedPhase,coherence" -w test_extract
ariaTSsetup.py -f 'products/*.nc' -w test_tssetup

mgovorcin · 2024-10-22T20:50:08Z

@rzinke added line by line suggestions

mgovorcin · 2024-10-22T20:16:07Z

tools/ARIAtools/product.py

@@ -1260,9 +1268,24 @@ def __continuous_time__(self):
        return sorted_products

    def __run__(self):
+        # Grab list of already read GUNWs
+        log_data = self.run_log.load()
+        past_files = log_data['files'] if 'files' in log_data.keys() else []


past_files contains list of full_path to *.nc files stored in config/PICKLE. If args.imgs gets changed from producs-early/nc to products/.nc, it does not get recognized as already been done

Suggestions:
Extract basename for dedup check:
pst_files = [os.path.basename(f) for f in past_files]

existing_files_flag =[os.path.basename(f) in pst_files for f in self.files]
and add logging printout
LOGGER.info(f'ARIA-product: Skip reading {np.sum(existing_files_flag)} / {len(self.files)} products')

mgovorcin · 2024-10-22T20:17:28Z

tools/ARIAtools/product.py

-        for f in self.files:
-            self.products += self.__readproduct__(f)
+        for file in self.files:
+            if file not in past_files:


use basename to compare with to avoid mixed-up in filepaths
if os.path.basename(file) not in pst_files:

Would add also:

if os.path.basename(file) not in pst_files: LOGGER.debug(f'ARIA-product: f'reading {file}') self.products += self.__readproduct__(file) else: LOGGER.debug(f'ARIA-product: f' skipping {file}')```

maybe include into you run_logging as check
when it loads PICKLE to run through files and check if they exist: os.path.exists if the path changed report warning

mgovorcin · 2024-10-22T20:21:03Z

tools/ARIAtools/util/dem.py

-        bounds=extent, dem_name=dem_name,
-        localize_tiles_to_gtiff=localize_tiles_to_gtiff,
-        tile_dir=f'{dem_name}_tiles')
+    # Check if DEM has already been downloaded


@rzinke are these changes already implemented in PR: #436
if so, complete that PR first

as discussed add check of that prods_TOTbbox_metadatalyr have not changed between re-runs

mgovorcin · 2024-10-22T20:24:39Z

tools/bin/ariaTSsetup.py

@@ -427,6 +428,25 @@ def main():
        LOGGER.debug("Using standard layers: %s" % ARIA_STANDARD_LAYERS)
        args.layers = ','.join(ARIA_STANDARD_LAYERS)

+    # Establish log file


I noticed here, that with each run run_log content and related config jsons get overwritten, we should discuss how to make the booking what has changed between different runs, e.g. boundging box, skippped extraction products as exist or update with which update-mode etc..

mgovorcin · 2024-10-22T20:27:48Z

tools/ARIAtools/extractProduct.py

@@ -332,8 +339,62 @@ def merged_productbbox(
    if track_fileext.endswith('.h5'):
        is_nisar_file = True

+    # Establish log file if it does not exist and load any data


Lines 342 to 396 seems not to do the intended work as they compare productBoundingBox.json with specified bbox, whereas final productBoundingBox.json gets stored later after checking common intersection with scene boundingboxs, the process that can alter the output

noticed this when did multiple runs with the same args, where code keep asking me to update bbox even though I expected to stay the same

mgovorcin · 2024-10-22T20:33:14Z

tools/ARIAtools/extractProduct.py

@@ -323,6 +324,12 @@ def merged_productbbox(
    report common track union to accurately interpolate metadata fields,
    and expected shape for DEM.
    """
+    # Define total bounding box
+    prods_TOTbbox = os.path.join(workdir, 'productBoundingBox.json')


@rzinke
Suggestions:
Check if 'productBoundingBox.json' exists already (meaning there has been previous runs), copy it to the same dir with different name and compare at the end with re-calculated bounding box,

if os.path.exists(prods_TOTbbox): previous_rundate = logs_data['all_runtimes'][-1] # this needs to be added to run_log prods_TOTbbox_old = os.path.join(os.path.dirname(prods_TOTbbox), f'productBoundingBox_{previous_rundate}.json') LOGGER.debug(f' Previous run detected on {previous_rundate}, check if {prods_TOTbbox} changed') shutil.copyfile(prods_TOTbbox, prods_TOTbbox_old)

repeat the same for 'productBoundingBox_croptounion_formetadatalyr.json' that is used to clip DEM and water mask

mgovorcin · 2024-10-22T20:40:49Z

tools/ARIAtools/extractProduct.py

@@ -526,6 +585,25 @@ def merged_productbbox(
        proj = ds.GetProjection()
        ds = None



Add here check if the bounding box changed;

if os.path.exists(prods_TOTbbox_old): prev_bbox = ARIAtools.util.shp.open_shp(prods_TOTbbox_old) prev_area = ARIAtools.util.shp.shp_area(prev_bbox, lyr_proj) new_bbox = ARIAtools.util.shp.open_shp(prods_TOTbbox) new_area = ARIAtools.util.shp.shp_area(new_bbox, lyr_proj) overlap_ratio = new_area / prev_area if overlap_ratio != 1: LOGGER.debug(f'Product bbox changed in size from previous run {new_area} vs {prev_area}') if shapely.equals(new_poly, existing_poly): # check if overlap_ratio == 1 and shapely.equals give same thing, is so, just merge it update_mode = 'skip' elif overlap_ratio < 1.0: # For smaller bbox # MG: prompt and use small bbox: eg.if difference is area bigger than 60% exit and report warning with new bbox, suggest deleting LOGGER.debug(f'Warning: Current bbox is {overlap_ratio:.2f} previous bbox.') use_larger = input('Use previous (larger) bbox? [y/n] ') # we need to discuss this on the next meeting update_mode = 'crop_only' else: update_mode = 'full_extract' run_log.update('update_mode', update_mode)```

we should discuss how to handle prompting

If the user bbox is larger than the initial bbox, that means the user knows what (s)he is doing, and extract from scratch.

We should warn the user if the new bbox is much smaller than the old one.

Don't like prompting because it reduces automation.

One option: If the new bbox is way smaller than the previous one, kick out the products smaller than the previous bbox

If the user passes an explicit AOI, trust the user and use the new (smaller) bbox

mgovorcin · 2024-10-22T20:43:21Z

tools/ARIAtools/extractProduct.py

+    run_log.update('arrres', arrres)
+    run_log.update('lyr_proj', lyr_proj)
+
+    # run_log.update('metadata_dict', metadata_dict)


remove obsolete comments

mgovorcin · 2024-10-22T20:44:15Z

tools/ARIAtools/extractProduct.py

-                osgeo.gdal.Warp(
-                    outname, outname + '.vrt', options=warp_options)
+    if update_mode == 'skip' and os.path.exists(outname+'.vrt'):
+        print('** SKIP')


keep it as LOGGER.debug ? also add filename next to SKIP: {outname}

Wanna include somewhere logger printout:
How many existing product code detected, does it need to do update mode on them and how many products need to be extracted:

Something like this... please modify it accordingly :

Update mode for existing products: skip ################################################## Number of existing unwrappedPhase products 9/9 Number of existing coherence products 9/9 Number of existing incidenceAngle products 0/9 Number of existing azimuthAngle products 0/9

I got the above like this:

run_log = RunLog(workdir=args.workdir, verbose=True) print(f'Update mode for existing products: \033[1m{run_log.load()['update_mode']}\033[0m') print(50*'#') for lyr in args.layers.split(','): existing_prod = np.count_nonzero([os.path.exists(os.path.join(f'/u/trappist-r0/govorcin/test/ARIA_tests/tests/test-same_aoi/{lyr}/', f['pair_name'][0] + '.vrt')) for f in standardproduct_info.products[1]]) print(f'Number of existing {lyr} products {existing_prod}/{len(standardproduct_info.products[1])}')

mgovorcin · 2024-10-28T17:03:21Z

tools/ARIAtools/util/run_logging.py

+        """
+        """
+        with open(self.log_name, 'rb') as log_file:
+            log_data = pickle.load(log_file)


for f in log_data['files']:
if ~os.path.exists(f):
print('WARNING, {f} path changed')

mgovorcin · 2024-11-11T19:16:48Z

Closing this PR, as it moved to #454

rzinke added 5 commits October 3, 2024 09:25

Try out logging routines.

f050575

prelim dedup measures implemented

3f10688

Simpler deduplication check

219314f

More compact dedup

91929b0

Moved bbox check to merged_productBBox function

619e180

mgovorcin requested a review from rzinke October 28, 2024 16:30

mgovorcin marked this pull request as ready for review October 28, 2024 16:33

mgovorcin commented Oct 28, 2024

View reviewed changes

mgovorcin closed this Nov 11, 2024

rzinke deleted the rz_logging_dedup branch November 14, 2024 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rz logging dedup #450

Rz logging dedup #450

mgovorcin commented Oct 21, 2024

pep8speaks commented Oct 21, 2024

sssangha commented Oct 22, 2024 •

edited

Loading

mgovorcin commented Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

rzinke Oct 29, 2024

rzinke Oct 29, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 22, 2024

mgovorcin Oct 28, 2024

mgovorcin commented Nov 11, 2024

		@@ -526,6 +585,25 @@ def merged_productbbox(
		proj = ds.GetProjection()
		ds = None

Rz logging dedup #450

Rz logging dedup #450

Conversation

mgovorcin commented Oct 21, 2024

pep8speaks commented Oct 21, 2024

sssangha commented Oct 22, 2024 • edited Loading

mgovorcin commented Oct 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgovorcin commented Nov 11, 2024

sssangha commented Oct 22, 2024 •

edited

Loading