-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rz logging dedup #450
Rz logging dedup #450
Conversation
Hello @mgovorcin! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
I've found a couple of issues @rzinke In the first run with
Finally for this section, I don’t understand the different between When re-running with the same inputs, I see the To replicate this, you can run:
|
@rzinke added line by line suggestions |
@@ -1260,9 +1268,24 @@ def __continuous_time__(self): | |||
return sorted_products | |||
|
|||
def __run__(self): | |||
# Grab list of already read GUNWs | |||
log_data = self.run_log.load() | |||
past_files = log_data['files'] if 'files' in log_data.keys() else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
past_files contains list of full_path to *.nc files stored in config/PICKLE. If args.imgs gets changed from producs-early/nc to products/.nc, it does not get recognized as already been done
Suggestions:
Extract basename for dedup check:
pst_files = [os.path.basename(f) for f in past_files]
existing_files_flag =[os.path.basename(f) in pst_files for f in self.files]
and add logging printout
LOGGER.info(f'ARIA-product: Skip reading {np.sum(existing_files_flag)} / {len(self.files)} products')
for f in self.files: | ||
self.products += self.__readproduct__(f) | ||
for file in self.files: | ||
if file not in past_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use basename to compare with to avoid mixed-up in filepaths
if os.path.basename(file) not in pst_files:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would add also:
if os.path.basename(file) not in pst_files:
LOGGER.debug(f'ARIA-product: f'reading {file}')
self.products += self.__readproduct__(file)
else:
LOGGER.debug(f'ARIA-product: f' skipping {file}')```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe include into you run_logging as check
when it loads PICKLE to run through files and check if they exist: os.path.exists
if the path changed report warning
bounds=extent, dem_name=dem_name, | ||
localize_tiles_to_gtiff=localize_tiles_to_gtiff, | ||
tile_dir=f'{dem_name}_tiles') | ||
# Check if DEM has already been downloaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed add check of that prods_TOTbbox_metadatalyr have not changed between re-runs
@@ -427,6 +428,25 @@ def main(): | |||
LOGGER.debug("Using standard layers: %s" % ARIA_STANDARD_LAYERS) | |||
args.layers = ','.join(ARIA_STANDARD_LAYERS) | |||
|
|||
# Establish log file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed here, that with each run run_log content and related config jsons get overwritten, we should discuss how to make the booking what has changed between different runs, e.g. boundging box, skippped extraction products as exist or update with which update-mode etc..
@@ -332,8 +339,62 @@ def merged_productbbox( | |||
if track_fileext.endswith('.h5'): | |||
is_nisar_file = True | |||
|
|||
# Establish log file if it does not exist and load any data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 342 to 396 seems not to do the intended work as they compare productBoundingBox.json
with specified bbox
, whereas final productBoundingBox.json
gets stored later after checking common intersection with scene boundingboxs, the process that can alter the output
noticed this when did multiple runs with the same args, where code keep asking me to update bbox even though I expected to stay the same
@@ -323,6 +324,12 @@ def merged_productbbox( | |||
report common track union to accurately interpolate metadata fields, | |||
and expected shape for DEM. | |||
""" | |||
# Define total bounding box | |||
prods_TOTbbox = os.path.join(workdir, 'productBoundingBox.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rzinke
Suggestions:
Check if 'productBoundingBox.json'
exists already (meaning there has been previous runs), copy it to the same dir with different name and compare at the end with re-calculated bounding box,
if os.path.exists(prods_TOTbbox):
previous_rundate = logs_data['all_runtimes'][-1] # this needs to be added to run_log
prods_TOTbbox_old = os.path.join(os.path.dirname(prods_TOTbbox), f'productBoundingBox_{previous_rundate}.json')
LOGGER.debug(f' Previous run detected on {previous_rundate}, check if {prods_TOTbbox} changed')
shutil.copyfile(prods_TOTbbox, prods_TOTbbox_old)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeat the same for 'productBoundingBox_croptounion_formetadatalyr.json'
that is used to clip DEM and water mask
@@ -526,6 +585,25 @@ def merged_productbbox( | |||
proj = ds.GetProjection() | |||
ds = None | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add here check if the bounding box changed;
if os.path.exists(prods_TOTbbox_old):
prev_bbox = ARIAtools.util.shp.open_shp(prods_TOTbbox_old)
prev_area = ARIAtools.util.shp.shp_area(prev_bbox, lyr_proj)
new_bbox = ARIAtools.util.shp.open_shp(prods_TOTbbox)
new_area = ARIAtools.util.shp.shp_area(new_bbox, lyr_proj)
overlap_ratio = new_area / prev_area
if overlap_ratio != 1:
LOGGER.debug(f'Product bbox changed in size from previous run {new_area} vs {prev_area}')
if shapely.equals(new_poly, existing_poly): # check if overlap_ratio == 1 and shapely.equals give same thing, is so, just merge it
update_mode = 'skip'
elif overlap_ratio < 1.0:
# For smaller bbox
# MG: prompt and use small bbox: eg.if difference is area bigger than 60% exit and report warning with new bbox, suggest deleting
LOGGER.debug(f'Warning: Current bbox is {overlap_ratio:.2f} previous bbox.')
use_larger = input('Use previous (larger) bbox? [y/n] ') # we need to discuss this on the next meeting
update_mode = 'crop_only'
else:
update_mode = 'full_extract'
run_log.update('update_mode', update_mode)```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should discuss how to handle prompting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user bbox is larger than the initial bbox, that means the user knows what (s)he is doing, and extract from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should warn the user if the new bbox is much smaller than the old one.
- Don't like prompting because it reduces automation.
- One option: If the new bbox is way smaller than the previous one, kick out the products smaller than the previous bbox
- If the user passes an explicit AOI, trust the user and use the new (smaller) bbox
run_log.update('arrres', arrres) | ||
run_log.update('lyr_proj', lyr_proj) | ||
|
||
# run_log.update('metadata_dict', metadata_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove obsolete comments
osgeo.gdal.Warp( | ||
outname, outname + '.vrt', options=warp_options) | ||
if update_mode == 'skip' and os.path.exists(outname+'.vrt'): | ||
print('** SKIP') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it as LOGGER.debug ? also add filename next to SKIP: {outname}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanna include somewhere logger printout:
How many existing product code detected, does it need to do update mode on them and how many products need to be extracted:
Something like this... please modify it accordingly :
Update mode for existing products: skip
##################################################
Number of existing unwrappedPhase products 9/9
Number of existing coherence products 9/9
Number of existing incidenceAngle products 0/9
Number of existing azimuthAngle products 0/9
I got the above like this:
run_log = RunLog(workdir=args.workdir, verbose=True)
print(f'Update mode for existing products: \033[1m{run_log.load()['update_mode']}\033[0m')
print(50*'#')
for lyr in args.layers.split(','):
existing_prod = np.count_nonzero([os.path.exists(os.path.join(f'/u/trappist-r0/govorcin/test/ARIA_tests/tests/test-same_aoi/{lyr}/', f['pair_name'][0] + '.vrt'))
for f in standardproduct_info.products[1]])
print(f'Number of existing {lyr} products {existing_prod}/{len(standardproduct_info.products[1])}')
""" | ||
""" | ||
with open(self.log_name, 'rb') as log_file: | ||
log_data = pickle.load(log_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for f in log_data['files']:
if ~os.path.exists(f):
print('WARNING, {f} path changed')
Closing this PR, as it moved to #454 |
No description provided.