Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rz logging dedup #450

Closed
wants to merge 5 commits into from
Closed

Rz logging dedup #450

wants to merge 5 commits into from

Conversation

mgovorcin
Copy link
Collaborator

No description provided.

@pep8speaks
Copy link

Hello @mgovorcin! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 347:33: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
Line 374:80: E501 line too long (82 > 79 characters)
Line 375:45: E226 missing whitespace around arithmetic operator
Line 379:80: E501 line too long (89 > 79 characters)
Line 388:80: E501 line too long (80 > 79 characters)
Line 1062:56: E226 missing whitespace around arithmetic operator
Line 1066:63: E226 missing whitespace around arithmetic operator
Line 1071:32: E226 missing whitespace around arithmetic operator
Line 1071:49: E226 missing whitespace around arithmetic operator
Line 1072:43: E226 missing whitespace around arithmetic operator
Line 1078:37: E226 missing whitespace around arithmetic operator
Line 1084:80: E501 line too long (82 > 79 characters)
Line 1092:80: E501 line too long (81 > 79 characters)
Line 1103:80: E501 line too long (81 > 79 characters)
Line 1451:5: E303 too many blank lines (2)
Line 1465:80: E501 line too long (80 > 79 characters)
Line 1472:80: E501 line too long (80 > 79 characters)
Line 1476:80: E501 line too long (80 > 79 characters)
Line 1490:13: E127 continuation line over-indented for visual indent

Line 255:17: E127 continuation line over-indented for visual indent
Line 1281:72: E502 the backslash is redundant between brackets

Line 11:1: E302 expected 2 blank lines, found 1
Line 48:5: E303 too many blank lines (2)

@sssangha
Copy link
Collaborator

sssangha commented Oct 22, 2024

I've found a couple of issues @rzinke

In the first run with ariaExtract.py I encounter this error, it looks like a log is expected to exist with all runs for ariaExtract.py, but the logic is worked out with ariaTSsetup.py. It seems that the logic you used for ariaTSsetup.py just needs to be extended to ariaExtract.py:

/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/ARIAtools/__init__.py:9: RuntimeWarning: ARIA-tools is not installed! Install in editable/develop mode via (from the top of this repo): python -m pip install -e .
  warnings.warn(
Traceback (most recent call last):
  File "/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/bin/ariaExtract.py", line 316, in <module>
	main()
  File "/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/bin/ariaExtract.py", line 210, in main
	proj, is_nisar_file) = ARIAtools.extractProduct.merged_productbbox(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/ARIAtools/extractProduct.py", line 350, in merged_productbbox
	if run_log.get_current()['rerun'] == True:
   	^^^^^^^^^^^^^^^^^^^^^
  File "/u/trappist-r0/ssangha/conda_installation/miniforge/miniforge/ARIA-tools_dev/tools/ARIAtools/util/run_logging.py", line 109, in get_current
	current_run = log_names[0]
              	~~~~~~~~~^^^

Finally for this section, I don’t understand the different between product and files. How about we just simplify and consolidate these variables if possible, unless I'm missing some nuance?

When re-running with the same inputs, I see the ** Cropping only operation duplicated for multiple layers. This is happening because the current_bbox doesn’t seem to be the same size as the prev_bbox, as it’s not getting caught in this conditional. But upon closer inspection, it looks like this is happening because a ‘bbox’ isn’t being passed above. Why aren’t we tracking this with the run_log? E.g. run_log.update('bbox_file', bbox_file)

To replicate this, you can run:

ariaDownload.py -t 71 -b '37.5 39.5 -118.5 -114.5' -s 20200515 -e 20200601
ariaExtract.py -f 'products/*.nc' -l "unwrappedPhase,coherence" -w test_extract
ariaTSsetup.py -f 'products/*.nc' -w test_tssetup

@mgovorcin
Copy link
Collaborator Author

@rzinke added line by line suggestions

@mgovorcin mgovorcin requested a review from rzinke October 28, 2024 16:30
@mgovorcin mgovorcin marked this pull request as ready for review October 28, 2024 16:33
@@ -1260,9 +1268,24 @@ def __continuous_time__(self):
return sorted_products

def __run__(self):
# Grab list of already read GUNWs
log_data = self.run_log.load()
past_files = log_data['files'] if 'files' in log_data.keys() else []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

past_files contains list of full_path to *.nc files stored in config/PICKLE. If args.imgs gets changed from producs-early/nc to products/.nc, it does not get recognized as already been done

Suggestions:
Extract basename for dedup check:
pst_files = [os.path.basename(f) for f in past_files]

existing_files_flag =[os.path.basename(f) in pst_files for f in self.files]
and add logging printout
LOGGER.info(f'ARIA-product: Skip reading {np.sum(existing_files_flag)} / {len(self.files)} products')

for f in self.files:
self.products += self.__readproduct__(f)
for file in self.files:
if file not in past_files:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use basename to compare with to avoid mixed-up in filepaths
if os.path.basename(file) not in pst_files:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would add also:

if os.path.basename(file) not in pst_files:
    LOGGER.debug(f'ARIA-product: f'reading {file}')
    self.products += self.__readproduct__(file)
else:
    LOGGER.debug(f'ARIA-product: f' skipping {file}')```

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe include into you run_logging as check
when it loads PICKLE to run through files and check if they exist: os.path.exists if the path changed report warning

bounds=extent, dem_name=dem_name,
localize_tiles_to_gtiff=localize_tiles_to_gtiff,
tile_dir=f'{dem_name}_tiles')
# Check if DEM has already been downloaded
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzinke are these changes already implemented in PR: #436
if so, complete that PR first

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed add check of that prods_TOTbbox_metadatalyr have not changed between re-runs

@@ -427,6 +428,25 @@ def main():
LOGGER.debug("Using standard layers: %s" % ARIA_STANDARD_LAYERS)
args.layers = ','.join(ARIA_STANDARD_LAYERS)

# Establish log file
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed here, that with each run run_log content and related config jsons get overwritten, we should discuss how to make the booking what has changed between different runs, e.g. boundging box, skippped extraction products as exist or update with which update-mode etc..

@@ -332,8 +339,62 @@ def merged_productbbox(
if track_fileext.endswith('.h5'):
is_nisar_file = True

# Establish log file if it does not exist and load any data
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 342 to 396 seems not to do the intended work as they compare productBoundingBox.json with specified bbox, whereas final productBoundingBox.json gets stored later after checking common intersection with scene boundingboxs, the process that can alter the output

noticed this when did multiple runs with the same args, where code keep asking me to update bbox even though I expected to stay the same

@@ -323,6 +324,12 @@ def merged_productbbox(
report common track union to accurately interpolate metadata fields,
and expected shape for DEM.
"""
# Define total bounding box
prods_TOTbbox = os.path.join(workdir, 'productBoundingBox.json')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzinke
Suggestions:
Check if 'productBoundingBox.json' exists already (meaning there has been previous runs), copy it to the same dir with different name and compare at the end with re-calculated bounding box,

if os.path.exists(prods_TOTbbox):
    previous_rundate = logs_data['all_runtimes'][-1] # this needs to be added to run_log
    prods_TOTbbox_old = os.path.join(os.path.dirname(prods_TOTbbox), f'productBoundingBox_{previous_rundate}.json')
   LOGGER.debug(f' Previous run detected on {previous_rundate}, check if {prods_TOTbbox} changed')
   shutil.copyfile(prods_TOTbbox, prods_TOTbbox_old)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeat the same for 'productBoundingBox_croptounion_formetadatalyr.json' that is used to clip DEM and water mask

@@ -526,6 +585,25 @@ def merged_productbbox(
proj = ds.GetProjection()
ds = None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add here check if the bounding box changed;

if os.path.exists(prods_TOTbbox_old):
    prev_bbox = ARIAtools.util.shp.open_shp(prods_TOTbbox_old)
    prev_area = ARIAtools.util.shp.shp_area(prev_bbox, lyr_proj)
    new_bbox = ARIAtools.util.shp.open_shp(prods_TOTbbox)
    new_area = ARIAtools.util.shp.shp_area(new_bbox, lyr_proj)
    overlap_ratio = new_area / prev_area
    if overlap_ratio != 1:
        LOGGER.debug(f'Product bbox changed in size from previous run {new_area} vs {prev_area}')
   if shapely.equals(new_poly, existing_poly):  # check if overlap_ratio == 1 and shapely.equals give same thing, is so, just merge it
            update_mode = 'skip'
            
   elif overlap_ratio < 1.0:
         # For smaller bbox
         # MG: prompt and use small bbox: eg.if difference is area bigger than 60% exit and report warning with new bbox, suggest deleting
          LOGGER.debug(f'Warning: Current bbox is {overlap_ratio:.2f} previous bbox.')
          use_larger = input('Use previous (larger) bbox? [y/n] ') # we need to discuss this on the next meeting
          update_mode = 'crop_only'
   else:
        update_mode = 'full_extract'
    run_log.update('update_mode', update_mode)```

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should discuss how to handle prompting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user bbox is larger than the initial bbox, that means the user knows what (s)he is doing, and extract from scratch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should warn the user if the new bbox is much smaller than the old one.

  • Don't like prompting because it reduces automation.
  • One option: If the new bbox is way smaller than the previous one, kick out the products smaller than the previous bbox
  • If the user passes an explicit AOI, trust the user and use the new (smaller) bbox

run_log.update('arrres', arrres)
run_log.update('lyr_proj', lyr_proj)

# run_log.update('metadata_dict', metadata_dict)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove obsolete comments

osgeo.gdal.Warp(
outname, outname + '.vrt', options=warp_options)
if update_mode == 'skip' and os.path.exists(outname+'.vrt'):
print('** SKIP')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it as LOGGER.debug ? also add filename next to SKIP: {outname}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanna include somewhere logger printout:
How many existing product code detected, does it need to do update mode on them and how many products need to be extracted:

Something like this... please modify it accordingly :

Update mode for existing products: skip
##################################################
Number of existing unwrappedPhase products 9/9
Number of existing coherence products 9/9
Number of existing incidenceAngle products 0/9
Number of existing azimuthAngle products 0/9

I got the above like this:

run_log = RunLog(workdir=args.workdir, verbose=True)
print(f'Update mode for existing products: \033[1m{run_log.load()['update_mode']}\033[0m')
print(50*'#')
for lyr in args.layers.split(','):
    existing_prod = np.count_nonzero([os.path.exists(os.path.join(f'/u/trappist-r0/govorcin/test/ARIA_tests/tests/test-same_aoi/{lyr}/', f['pair_name'][0] + '.vrt'))
                    for f in standardproduct_info.products[1]])

    print(f'Number of existing {lyr} products {existing_prod}/{len(standardproduct_info.products[1])}')

"""
"""
with open(self.log_name, 'rb') as log_file:
log_data = pickle.load(log_file)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for f in log_data['files']:
if ~os.path.exists(f):
print('WARNING, {f} path changed')

@mgovorcin mgovorcin closed this Nov 11, 2024
@mgovorcin
Copy link
Collaborator Author

Closing this PR, as it moved to #454

@rzinke rzinke deleted the rz_logging_dedup branch November 14, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants