Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI workflow #1

Closed
aidanheerdegen opened this issue Nov 21, 2023 · 11 comments · Fixed by #2
Closed

Add CI workflow #1

aidanheerdegen opened this issue Nov 21, 2023 · 11 comments · Fixed by #2

Comments

@aidanheerdegen
Copy link
Member

aidanheerdegen commented Nov 21, 2023

We need a continuous integration workflow to run model tests on HPC infrastructure. This should be a reusable workflow/action as we need to add the same CI tests to other repositories.

This workflow would be triggered from pull requests that update configurations.

There would be levels of testing possible: the fastest and easiest is bit wise reproducibility testing where the model is run for a very short time and checksums in the outputs compared to checksums from a "ground truth" run.

The more time intensive testing would be running the model for longer and testing for performance degradation. Again this would involve extracting timing information from the logs and comparing against a "ground truth" run.

The mechanism for doing this was what this repo and issue is about: ACCESS-NRI/model-config-tests#83.

We also need a mechanism to update the "ground truth" when there is a known breaking change to the model.

In some cases those pull requests would be generated from new pre-release deployments.

These CI checks can't be run automatically from PRs due to possible security issues. So we either make the CI fire only on PRs to protected branches that limit who can make PRs, or we don't let the fire without an approval process.

@aidanheerdegen
Copy link
Member Author

aidanheerdegen commented Nov 23, 2023

This is my go at encapsulating this as a flow diagram

flowchart TD
subgraph "Repo: ACCESS-OM2"
pre-release-om2[Pre-Release ACCESS-OM2]
end
subgraph "Repo: ACCESS-OM2-configs"
PR["PR: Update model exe paths"]
CI["Run config on gadi"]
QReproduce{"Run reproduces?"}
QBumpMajorVersion{"Bump major version?"}
BumpMajorVersion["Increment major version
               and save new truth"]
QBumpMinorVersion{"Bump minor version?"}
BumpMinorVersion["Increment minor version"]
MergeConfig["Merge PR to main"]
QNewVersion{"New version?"}
ReleaseAndDeploy["Release and deploy to gadi"]
end
subgraph "Repo: ACCESS-OM2"
release-om2[Tag and Release ACCESS-OM2]
end
pre-release-om2 --> PR
PR --> CI --> QReproduce
QReproduce -- no --> QBumpMajorVersion
QBumpMajorVersion -- yes --> BumpMajorVersion
QBumpMajorVersion -- no --> PR
QReproduce -- yes --> QBumpMinorVersion
QBumpMinorVersion -- yes --> BumpMinorVersion
QBumpMinorVersion -- no --> MergeConfig
BumpMinorVersion --> MergeConfig
BumpMajorVersion --> MergeConfig --> QNewVersion
QNewVersion -- yes --> ReleaseAndDeploy
ReleaseAndDeploy --> release-om2
Loading

@aidanheerdegen
Copy link
Member Author

aidanheerdegen commented Nov 23, 2023

Code for above diagram is available if hit the "copy" button if you want to hack about with it (it is mermaid.js so just put mermaid after the 3 ticks to get it to render)

@CodeGat
Copy link
Contributor

CodeGat commented Nov 23, 2023

Under what circumstance would the run fail to reproduce, but we also don't want to bump to a new major version? What does the flow diagram mean when we are going back to update model exe paths?

@aidanheerdegen
Copy link
Member Author

Under what circumstance would the run fail to reproduce, but we also don't want to bump to a new major version?

If we wanted to change the model code so that it did reproduce. So the non-reproduction was accidental.

What does the flow diagram mean when we are going back to update model exe paths?

This shows the limitation of what I've done, because the PR to access-om2-configs should be a CI step, that we can make pass with updates to the code.

I'm not even sure how likely that is to be honest. Part of the reason for making this issue/diagram.

@aidanheerdegen
Copy link
Member Author

The decision was made to prepend release- to any released configuration to make it easier to protect those branches, and also to clearly signal that these are released versions of configurations.

There was some discussion if development versions of the same configuration should have dev- prepended to make branch protection rules cleaner, and CI logic easier to write. I don't know if this is a good idea or not. We're making branches with very long names to make CI easier, does it make life more difficult for users?

The design choice is to have the version of a configuration stored in a file (VERSION) in the repo itself. This would be a normal semantic version, with MAJOR version changes when configurations are no longer bitwise reproducible. GitHub tags would then be created which maps to this VERSION. Because tags are global across a repository, these tags would have to have the configuration name and the VERSION, e.g. release-1deg-jra55-iaf_1.1. CI would be responsible for generating these tags when PRs are merged.

It might be desirable to use / separators to create tag hierarchies, to make it more easily understandable, e.g.

release/1deg-jra55-iaf/1.1
release/1deg-jra55-iaf/2.0
release/1deg-jra55-ryf/1.0
release/1deg-jra55-ryf/1.1

The current approach uses branch names with an underscore (_) separator, because / might not play nicely with payu clone which uses branch-name when creating laboratory directories. It is possible to substitute _ for /, but sounds a bit hokey, and it is potentially confusing to use a different convention for branch names and tags.

There was the possibility of using / separators for branch names too, and alter payu so that it ignores anything "pathlike" in the branch name. Hrm.

@CodeGat
Copy link
Contributor

CodeGat commented Jan 17, 2024

I agree with the / separator between aspects of a tag. Although just to correct, we are currently using - as a separator, not _. For example, we are currently doing release-1deg_jra55_iaf-1.0.0.

@CodeGat
Copy link
Contributor

CodeGat commented Jan 25, 2024

So for now, will we continue with - for separators in tags? Or move to / separators? @jo-basevi will this affect anything in payu?

@jo-basevi
Copy link
Collaborator

So for now, will we continue with - for separators in tags? Or move to / separators? @jo-basevi will this affect anything in payu?

Currently with the latest version of payu, the model output directories use $control-$branchname-$uuid for naming sub-directories (where control is the name of the directory from which payu run was run in). I've checked and adding / will create directory trees.. Could add in option to payu to substitute / for -or _ in the branch names.

@aidanheerdegen
Copy link
Member Author

Important to make sure we're not confusing branch names with tag names. I know I was getting myself confused. / should be fine in tags I think, not great in branch names because of the aforementioned directory tree creation when payu uses a branch name in an experiment name.

@CodeGat
Copy link
Contributor

CodeGat commented Jan 28, 2024

Adding a comment regarding the latest update regarding checksums.

  • We won't be storing just the result of the CHECKSUM, but potentially the result of other tests, as well. So we will update the repo structure (for release-* and dev-* branches) to:
./testing
  |- checksum
  |  |- <potentially multiple checksums>
  |  `- test_report.xml
  |- performance
  |  |- <performance testing results>
  |  `- <maybe other stuff>
  |- <other tests>
  `- <even more tests>

Instead of just ./CHECKSUM.

Tagging @jo-basevi and @aidanheerdegen

@aidanheerdegen
Copy link
Member Author

So this doesn't get lost in resolved conversations

#2 (comment)

I did wonder if an alternate approach could be to simply tag the releases we want to do scheduled checks on? I have decided this is a bad idea: there is no audit trail for adding or deleting tags, or a way of checking and approving them. Also it would mean deleting tags. Also tags have to be unique, so we would end up with stuff like check-month-3 which is nasty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants