Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stability report #188

Open
benjimin opened this issue Jul 5, 2024 · 0 comments
Open

Stability report #188

benjimin opened this issue Jul 5, 2024 · 0 comments

Comments

@benjimin
Copy link

benjimin commented Jul 5, 2024

Want to add automated testing/reporting that specifically assesses how consistent the output products are with past releases.

  • Use gdalcompare to compare pixels of sample WO (and other) datasets in the existing DEA collection, against regenerated versions (freshly derived from the ARD in the same collection).
  • Have CI that summarises the metrics (count of non-identical pixels, and max value change) in a comment for each open PR.
  • Block merges pending completion of said CI

The motivation is that for the efficient maintenance of a plurality of operationalised alchemist pipelines, all pipelines should be kept using the most current stable release of alchemist. (This facilitates refactoring deployments to be modular, sharing rather than duplicating infracode so as to prevent unintended divergence that leads to redundant debugging labours. And it facilitates upstream patches and API compatibility changes, particularly ODC schema changes, to be deployed in a timely manner. It's impracticable to backport hotfixes and provide long term support of multiple release branches.) This requires a high level of assurance that continuously-integrated changes will not unintentionally affect scientific quality.

A pixel wise comparison gives a high level of confidence that change to orchestration has not altered scientific qualities. It's more useful as a report rather than as a test. It only assesses stability of the output, and can't say which version is better. Sometimes output changes may be expected (due to ARD reprocessing, or performance optimisations such as order-of-operations changes that produce floating-point discrepancies which are scientifically insignificant), and so an arbitrary hard threshold (a blocking test) might promote habitual circumvention, whereas a score attached to each PR seems more likely to encourage consideration of its value before merging. A pixel wise comparison deliberately excludes noise associated with changes in file header, compression, etc (e.g., as the format driver version is incremented).

(Note since WO is a multidimensional categorical bitfield, the max value change is a poor metric, although it may be better suited for FC. In future, could consider a more aware treatment of nodata, if a product such as WO has multiple values that represent nodata.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant