Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATES restart fix for long runs #2199

Merged
merged 11 commits into from
Nov 12, 2023
Merged

Conversation

rgknox
Copy link
Collaborator

@rgknox rgknox commented Oct 16, 2023

Description of changes

FATES had been calling a routine (canopy_structure()) during the restart read process. The purpose was to facilitate rebuilding the canopy. However, this had an inadvertent effect of changing results from base, because this procedure would not just rebuild, but modify the cohort compositions due to termination and promotion/demotion in the canopy. We realized that we could simply bypass this call, and add two variables to the restart file to achieve b4b restarts in tests.

This needs to be synchronized with: NGEET/fates#1098

Specific notes

Contributors other than yourself, if any: @mvdebolskiy identified the locus of the problem which enabled its fix, with team contributions and troubleshooting from @mvertens , @glemieux , @rosiealice and @ckoven

CTSM Issues Fixed (include github issue #):

NGEET/fates#1051

Are answers expected to change (and if so in what way)?

No, this is just a bug fix.

Any User Interface Changes (namelist or namelist defaults changes)? No

Testing performed, if any:

We constructed a new test that was simply long enough to enable more canopy development (at least that is the hypothesis). The key is to have a FATES ERS test that is an LM5 (5 month). However, the need for a long test would likely be circumvented if we had an initial condition file that had been generated after long simulation (ie at least a few months of cold start spinup, but longer would probably be better (ie years)).

@rgknox rgknox mentioned this pull request Oct 16, 2023
3 tasks
@ekluzek
Copy link
Collaborator

ekluzek commented Oct 16, 2023

@rgknox @glemieux @adrifoster @ckoven do we want to expedite bringing this in as it's a critical fix?

Externals_CLM.cfg Outdated Show resolved Hide resolved
Copy link
Collaborator

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I request a small change for readability that won't be hard to bring in.

I'm really glad you were able to figure this out! Thanks to everyone who worked on this! @mvdebolskiy @glemieux @mvertens I know for sure...

With testing there are tests that will start passing that can be removed from the expected fails. It also sounds like you setup a special test to find this -- should that come in as well?

So the things I see:

  • Update FATES tag
  • Add keyword syntax to the .false./.true. calls.
  • Add a comment to the one that's different the .true. one.
  • Remove expected fails that now work (requires FATES tests to be run and this to be checked)
  • Add a new test for this

src/utils/clmfates_interfaceMod.F90 Outdated Show resolved Hide resolved
src/utils/clmfates_interfaceMod.F90 Outdated Show resolved Hide resolved
@rgknox
Copy link
Collaborator Author

rgknox commented Oct 17, 2023

Those all sound like great suggestions @ekluzek . I'm working through designing a new test right now. I had originally wanted to go all-out and create a finidat file to be used in the new test's initialization (which would allow us to encounter the salient model mechanics in a shorter run), but I'm going to circumvent that for now because that is a larger topic/issue. I'll check back in when I have a new test created. In short I'm hoping I can trigger the original error with an f10 grid and a few months. Ideally, I'd like to see a test similar to an ERS f10 Lm13 in there, if it does not take much longer than the other tests.

src/utils/clmfates_interfaceMod.F90 Outdated Show resolved Hide resolved
Copy link
Contributor

@mvdebolskiy mvdebolskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder to update externals, and I think it is good to go. Will just need to make an issue for adding a test that starts with really spun-up restart file.

@rgknox
Copy link
Collaborator Author

rgknox commented Nov 9, 2023

I tested a bunch of configurations to get a 25 month test that completes in a reasonable amount of time, I found that using the FatesColdNoComp testdef is fairly effective at doing this. I tested different pe layouts and found that with:
72 cores = 38 minutes
144 cores = 29 minutes
252 cores = 27 minutes
My take is that 144 core setup is the sweet spot and will move forward with this

@rgknox
Copy link
Collaborator Author

rgknox commented Nov 9, 2023

all aux_clm pass, izumi and cheyenne, preparing changelogs

@ekluzek ekluzek self-assigned this Nov 10, 2023
@ekluzek ekluzek added the FATES API update Changes to the FATES version that also REQUIRE an API change in CTSM label Nov 10, 2023
@ekluzek
Copy link
Collaborator

ekluzek commented Nov 10, 2023

@rgknox I realized this isn't pointing to a FATES tag. And it looks like the FATES tag hasn't been made yet. Could you do that and update the branch? Thanks...

@ekluzek
Copy link
Collaborator

ekluzek commented Nov 12, 2023

@rgknox I'm making the tag now. But, we will need you to rename the test directory to under the baselines for both Cheyenne and izumi to ctsm5.1.dev151. There were several directories and I wasn't sure which was the right one, and there seemed to be permission issues when I tried to copy them over. So if you could do that Monday that would be great.

@ekluzek
Copy link
Collaborator

ekluzek commented Nov 12, 2023

@rgknox also was the FATES test list run for this? I expect that some of the FATES expected fails, might now work? Or is that not the case? Monday would be good to look into this and it probably would be good to run the FATES test suite then as well. If more of the FATES test pass, we can mark them as working in the next tag. For now I'll check that thing off the list with a note about it.

@ekluzek ekluzek merged commit 2ff9c12 into ESCOMP:master Nov 12, 2023
2 checks passed
@glemieux glemieux linked an issue Nov 16, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FATES API update Changes to the FATES version that also REQUIRE an API change in CTSM
Projects
Status: Done (non release/external)
Development

Successfully merging this pull request may close these issues.

Exact restart problem with Fates
4 participants