FATES restart fix for long runs #2199

rgknox · 2023-10-16T14:43:41Z

Description of changes

FATES had been calling a routine (canopy_structure()) during the restart read process. The purpose was to facilitate rebuilding the canopy. However, this had an inadvertent effect of changing results from base, because this procedure would not just rebuild, but modify the cohort compositions due to termination and promotion/demotion in the canopy. We realized that we could simply bypass this call, and add two variables to the restart file to achieve b4b restarts in tests.

This needs to be synchronized with: NGEET/fates#1098

Specific notes

Contributors other than yourself, if any: @mvdebolskiy identified the locus of the problem which enabled its fix, with team contributions and troubleshooting from @mvertens , @glemieux , @rosiealice and @ckoven

CTSM Issues Fixed (include github issue #):

NGEET/fates#1051

Are answers expected to change (and if so in what way)?

No, this is just a bug fix.

Any User Interface Changes (namelist or namelist defaults changes)? No

Testing performed, if any:

We constructed a new test that was simply long enough to enable more canopy development (at least that is the hypothesis). The key is to have a FATES ERS test that is an LM5 (5 month). However, the need for a long test would likely be circumvented if we had an initial condition file that had been generated after long simulation (ie at least a few months of cold start spinup, but longer would probably be better (ie years)).

ekluzek · 2023-10-16T20:14:12Z

@rgknox @glemieux @adrifoster @ckoven do we want to expedite bringing this in as it's a critical fix?

Externals_CLM.cfg

src/utils/clmfates_interfaceMod.F90

ekluzek

I request a small change for readability that won't be hard to bring in.

I'm really glad you were able to figure this out! Thanks to everyone who worked on this! @mvdebolskiy @glemieux @mvertens I know for sure...

With testing there are tests that will start passing that can be removed from the expected fails. It also sounds like you setup a special test to find this -- should that come in as well?

So the things I see:

Update FATES tag
Add keyword syntax to the .false./.true. calls.
Add a comment to the one that's different the .true. one.
Remove expected fails that now work (requires FATES tests to be run and this to be checked)
Add a new test for this

src/utils/clmfates_interfaceMod.F90

rgknox · 2023-10-17T15:58:39Z

Those all sound like great suggestions @ekluzek . I'm working through designing a new test right now. I had originally wanted to go all-out and create a finidat file to be used in the new test's initialization (which would allow us to encounter the salient model mechanics in a shorter run), but I'm going to circumvent that for now because that is a larger topic/issue. I'll check back in when I have a new test created. In short I'm hoping I can trigger the original error with an f10 grid and a few months. Ideally, I'd like to see a test similar to an ERS f10 Lm13 in there, if it does not take much longer than the other tests.

…e_type call

…limit

src/utils/clmfates_interfaceMod.F90

mvdebolskiy

Just a reminder to update externals, and I think it is good to go. Will just need to make an issue for adding a test that starts with really spun-up restart file.

rgknox · 2023-11-09T02:24:22Z

I tested a bunch of configurations to get a 25 month test that completes in a reasonable amount of time, I found that using the FatesColdNoComp testdef is fairly effective at doing this. I tested different pe layouts and found that with:
72 cores = 38 minutes
144 cores = 29 minutes
252 cores = 27 minutes
My take is that 144 core setup is the sweet spot and will move forward with this

rgknox · 2023-11-09T14:29:46Z

all aux_clm pass, izumi and cheyenne, preparing changelogs

ekluzek · 2023-11-10T19:56:39Z

@rgknox I realized this isn't pointing to a FATES tag. And it looks like the FATES tag hasn't been made yet. Could you do that and update the branch? Thanks...

ekluzek · 2023-11-12T00:04:23Z

@rgknox I'm making the tag now. But, we will need you to rename the test directory to under the baselines for both Cheyenne and izumi to ctsm5.1.dev151. There were several directories and I wasn't sure which was the right one, and there seemed to be permission issues when I tried to copy them over. So if you could do that Monday that would be great.

ekluzek · 2023-11-12T00:15:43Z

@rgknox also was the FATES test list run for this? I expect that some of the FATES expected fails, might now work? Or is that not the case? Monday would be good to look into this and it probably would be good to run the FATES test suite then as well. If more of the FATES test pass, we can mark them as working in the next tag. For now I'll check that thing off the list with a note about it.

rgknox added 3 commits October 13, 2023 13:07

update to fates api to pass restart flag to ed_site_update()

8dd877c

Merge branch 'master' into fates-nocomp-fix

f6a9af4

Setting fates external to ryans nocomp fix branch

eee3bf5

rgknox mentioned this pull request Oct 16, 2023

Long run restart fix NGEET/fates#1098

Merged

3 tasks

ekluzek reviewed Oct 16, 2023

View reviewed changes

Externals_CLM.cfg Outdated Show resolved Hide resolved

ekluzek reviewed Oct 16, 2023

View reviewed changes

src/utils/clmfates_interfaceMod.F90 Outdated Show resolved Hide resolved

ekluzek requested changes Oct 16, 2023

View reviewed changes

src/utils/clmfates_interfaceMod.F90 Outdated Show resolved Hide resolved

src/utils/clmfates_interfaceMod.F90 Outdated Show resolved Hide resolved

rgknox and others added 2 commits October 30, 2023 12:05

added longish fates f10 test, added is_restarting arguments to ed_sit…

f802509

…e_type call

update duration to 25 months and increase wall time to avoid hitting …

1d5297e

…limit

mvdebolskiy approved these changes Nov 3, 2023

View reviewed changes

src/utils/clmfates_interfaceMod.F90 Outdated Show resolved Hide resolved

mvdebolskiy reviewed Nov 3, 2023

View reviewed changes

merged up to dev150, resolve fates tag

2ddb0a7

Changed fates long test to be FatesColdNoComp, increased wall time too

e421959

ekluzek self-assigned this Nov 10, 2023

ekluzek added the FATES API update Changes to the FATES version that also REQUIRE an API change in CTSM label Nov 10, 2023

ekluzek approved these changes Nov 10, 2023

View reviewed changes

rgknox and others added 4 commits November 10, 2023 15:31

Updating fates external to api30

901656a

Added changelogs

1e190bb

Add ESCOMP#2236 to expected fails

808c947

Update changelog

2b186d2

ekluzek merged commit 2ff9c12 into ESCOMP:master Nov 12, 2023
2 checks passed

glemieux linked an issue Nov 16, 2023 that may be closed by this pull request

Exact restart problem with Fates #667

Closed

glemieux mentioned this pull request Nov 16, 2023

Exact restart problem with Fates #667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FATES restart fix for long runs #2199

FATES restart fix for long runs #2199

rgknox commented Oct 16, 2023 •

edited by ekluzek

Loading

ekluzek commented Oct 16, 2023

ekluzek left a comment •

edited

Loading

rgknox commented Oct 17, 2023

mvdebolskiy left a comment

rgknox commented Nov 9, 2023

rgknox commented Nov 9, 2023

ekluzek commented Nov 10, 2023

ekluzek commented Nov 12, 2023

ekluzek commented Nov 12, 2023

FATES restart fix for long runs #2199

FATES restart fix for long runs #2199

Conversation

rgknox commented Oct 16, 2023 • edited by ekluzek Loading

Description of changes

Specific notes

ekluzek commented Oct 16, 2023

ekluzek left a comment • edited Loading

Choose a reason for hiding this comment

rgknox commented Oct 17, 2023

mvdebolskiy left a comment

Choose a reason for hiding this comment

rgknox commented Nov 9, 2023

rgknox commented Nov 9, 2023

ekluzek commented Nov 10, 2023

ekluzek commented Nov 12, 2023

ekluzek commented Nov 12, 2023

rgknox commented Oct 16, 2023 •

edited by ekluzek

Loading

ekluzek left a comment •

edited

Loading