Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Going to a CISM active test for a processor change test (PEM), causes answers to change... #2542

Open
ekluzek opened this issue May 13, 2024 · 6 comments
Assignees
Labels
bug something is working incorrectly testing additions or changes to tests

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented May 13, 2024

Brief summary of bug

With ctsm5.2.0 we discovered we didn't have enough testing that corresponded to CESM or CAM testing. CESM testing is always done with CISM active, so I changed some tests in #2501 from I1850Clm60BgcCrop to I1850Clm60BgcCropG. However,

General bug information

CTSM version you are using: ctsm5.2.004-31-ga09d22376

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: With CISM active

Details of bug

PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCrop.derecho_intel.clm-clm60cam6LndTuningMode passes, however
PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode fails in the comparison of different processors...

FAIL PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode COMPARE_base_modpes

Important details of your setup / configuration so we can reproduce the bug

In the test list there are PEM and ERP tests for glc* testmods that have a comment that says this

cism is not answer preserving across processor changes, but short test length should be ok

Those tests range from 5 days to 10 days. But, many are f10, and the highest resolution is f19 which runs 5 days.

@ekluzek ekluzek added bug something is working incorrectly testing additions or changes to tests labels May 13, 2024
@ekluzek ekluzek changed the title Going to a CISM active test for a processor change PEM, causes answers to change... Going to a CISM active test for a processor change test (PEM), causes answers to change... May 13, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented May 13, 2024

Still fails for 3 days, which is about the shortest I think we should try...

@ekluzek ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label May 13, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented May 15, 2024

I talked to @Katetc about this after the CSEG meeting. She also said that the issue is a traditional global-sum issue in MPI which is solved in other places and as such should be relatively easy to fix.

In confirming the timeline on this she sent me an email, which says that they will work on this relatively soon.

@ekluzek ekluzek removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label May 15, 2024
@samsrabin
Copy link
Collaborator

samsrabin commented May 28, 2024

On ctsm5.2.005, I'm getting a failure in the same step for

PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode

Should this be marked as an expected fail? I see that a slightly different test (3 days instead of 9) named

PEM_D_Ld3.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode

is present in the expected fail list (and points to this issue), but that's not actually in the test list.

@ekluzek
Copy link
Collaborator Author

ekluzek commented May 30, 2024

On ctsm5.2.005, I'm getting a failure in the same step for

PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode

Should this be marked as an expected fail? I see that a slightly different test (3 days instead of 9) named

PEM_D_Ld3.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode

is present in the expected fail list (and points to this issue), but that's not actually in the test list.

Yes we should correct the expected fail to the test list. I think @slevis-lmwg did this in 006 though.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jul 9, 2024

I ran into this again in working on ctsm5.2.009 because of a change in the test mod used.

But, I verified that in ctsm5.2.008 the following test fails:

PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel

@ekluzek ekluzek mentioned this issue Jul 9, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Jul 9, 2024

See this comment: #2632 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly testing additions or changes to tests
Projects
None yet
Development

No branches or pull requests

3 participants