Stability report #188

benjimin · 2024-07-05T00:14:32Z

Want to add automated testing/reporting that specifically assesses how consistent the output products are with past releases.

Use gdalcompare to compare pixels of sample WO (and other) datasets in the existing DEA collection, against regenerated versions (freshly derived from the ARD in the same collection).
Have CI that summarises the metrics (count of non-identical pixels, and max value change) in a comment for each open PR.
Block merges pending completion of said CI

The motivation is that for the efficient maintenance of a plurality of operationalised alchemist pipelines, all pipelines should be kept using the most current stable release of alchemist. (This facilitates refactoring deployments to be modular, sharing rather than duplicating infracode so as to prevent unintended divergence that leads to redundant debugging labours. And it facilitates upstream patches and API compatibility changes, particularly ODC schema changes, to be deployed in a timely manner. It's impracticable to backport hotfixes and provide long term support of multiple release branches.) This requires a high level of assurance that continuously-integrated changes will not unintentionally affect scientific quality.

A pixel wise comparison gives a high level of confidence that change to orchestration has not altered scientific qualities. It's more useful as a report rather than as a test. It only assesses stability of the output, and can't say which version is better. Sometimes output changes may be expected (due to ARD reprocessing, or performance optimisations such as order-of-operations changes that produce floating-point discrepancies which are scientifically insignificant), and so an arbitrary hard threshold (a blocking test) might promote habitual circumvention, whereas a score attached to each PR seems more likely to encourage consideration of its value before merging. A pixel wise comparison deliberately excludes noise associated with changes in file header, compression, etc (e.g., as the format driver version is incremented).

(Note since WO is a multidimensional categorical bitfield, the max value change is a poor metric, although it may be better suited for FC. In future, could consider a more aware treatment of nodata, if a product such as WO has multiple values that represent nodata.)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stability report #188

Stability report #188

benjimin commented Jul 5, 2024

Stability report #188

Stability report #188

Comments

benjimin commented Jul 5, 2024