Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update RDASApp submodules to match mpasjedi-v3.0.1 #194

Merged
merged 35 commits into from
Oct 22, 2024

Conversation

Junjun-NOAA
Copy link
Collaborator

@Junjun-NOAA Junjun-NOAA commented Oct 11, 2024

List of submodule changes. issue #193

ioda             c7b8760f -> d49ed17e
ufo              92ccfb2a -> 94d50d64
oops             35820130 -> d77217323
vader            e3457cba -> 6d56a1eb5
mpas             3ecd59e2 -> 41e9a3fb8   #  repo URL also changed
mpas-jedi        a1c60997 -> b9d596d7c
#fv3-jedi         d3c800b8 -> c99519638
#fv3-jedi-lm      a6e97d76 -> 30ef7a390

@Junjun-NOAA Junjun-NOAA marked this pull request as draft October 11, 2024 23:39
guoqing-noaa and others added 4 commits October 11, 2024 17:44
restart -> mpasout
static.nc -> invariant.nc
zero padding: 2 -> 3
update ioda-data, merge the latest develop and then match them to mpasjedi.v3.0.1
@Junjun-NOAA
Copy link
Collaborator Author

mpas-jedi test:

90% tests passed, 6 tests failed out of 59

Label Time Summary:
executable = 92.81 secproc (13 tests)
mpasjedi = 712.40 sec
proc (59 tests)
mpi = 705.89 secproc (58 tests)
script = 619.59 sec
proc (46 tests)

Total Test time (real) = 112.89 sec

The following tests FAILED:
37 - test_mpasjedi_3denvar_amsua_allsky (Failed)
38 - test_mpasjedi_3denvar_amsua_bc (Failed)
43 - test_mpasjedi_4denvar_VarBC (Failed)
44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed)
48 - test_mpasjedi_4dfgat_append_obs (Failed)
54 - test_mpasjedi_lgetkf_height_vloc (Failed)

@Junjun-NOAA
Copy link
Collaborator Author

fv3-jedi test:

92% tests passed, 10 tests failed out of 127

Label Time Summary:
fv3-jedi = 1025.78 secproc (126 tests)
fv3jedi = 1032.39 sec
proc (127 tests)
mpi = 1020.63 secproc (115 tests)
script = 1032.39 sec
proc (127 tests)

Total Test time (real) = 155.42 sec

The following tests FAILED:
70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed)
88 - fv3jedi_test_tier1_hyb-3dvar (Failed)
91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed)
96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed)
98 - fv3jedi_test_tier1_4denvar (Failed)
99 - fv3jedi_test_tier1_4denvar_seq (Failed)
109 - fv3jedi_test_tier1_diffstates_gfs (Failed)
111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed)
112 - fv3jedi_test_tier1_addincrement_gfs (Failed)
125 - fv3jedi_test_tier1_eda_3dvar_control_pert (Failed)

@guoqing-noaa
Copy link
Collaborator

guoqing-noaa commented Oct 12, 2024

NOTE:
The following fix files were added per the need of this PR and corresponding links under fix/ were updated.

fv3-jedi-data_2085be5_20241008
ioda-data_20241011
mpas-jedi-data/testinput_tier_1/obs
VEGPARM.TBL.20241011
NoahmpTable.TBL

Fix file changes were sync'ed on Jet/Hera/Orion/Hercules and archived to HPSS

@guoqing-noaa
Copy link
Collaborator

mpas-jedi test:

90% tests passed, 6 tests failed out of 59

Label Time Summary: executable = 92.81 sec_proc (13 tests) mpasjedi = 712.40 sec_proc (59 tests) mpi = 705.89 sec_proc (58 tests) script = 619.59 sec_proc (46 tests)

Total Test time (real) = 112.89 sec

The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

@Junjun-NOAA For you failed mpas-jedi tests, I guess it may be related to the CRTM source code version. Could you try a test by update sorc/crtm to mpasjedi.v3.0.1 in a different local copy and rerun mpas-jedi tests? If it passed, post your results and run directory here. Thanks!

@guoqing-noaa
Copy link
Collaborator

guoqing-noaa commented Oct 12, 2024

fv3-jedi test:

92% tests passed, 10 tests failed out of 127

Label Time Summary: fv3-jedi = 1025.78 sec_proc (126 tests) fv3jedi = 1032.39 sec_proc (127 tests) mpi = 1020.63 sec_proc (115 tests) script = 1032.39 sec_proc (127 tests)

Total Test time (real) = 155.42 sec

The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed) 125 - fv3jedi_test_tier1_eda_3dvar_control_pert (Failed)

@Junjun-NOAA For the failed fv3-jedi cases, you can try the following steps:

  1. update all the remaining submodules to match jedi-bundle
  2. rerun the ctests using my latest commit which updated fv3-jedi-data
    If 2) does not solve all fails, then
  3. update the sorc/crtm module and see whether it helps.

Thanks!

@guoqing-noaa
Copy link
Collaborator

rrfs-test all passed on Hera

Test project /scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/rrfs-test
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............   Passed   43.19 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............   Passed  103.69 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  122.99 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  344.80 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  416.29 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  1289.32 sec

100% tests passed, 0 tests failed out of 6

Label Time Summary:
mpi            = 2320.29 sec*proc (6 tests)
rdas-bundle    = 2320.29 sec*proc (6 tests)
script         = 2320.29 sec*proc (6 tests)

@Junjun-NOAA
Copy link
Collaborator Author

@Junjun-NOAA Do you have the ctest results from the mpas-bundle itself (NOT RDASApp/mpasjedi-test) on Hera?

Not yet. Hera is very slow today.

@guoqing-noaa
Copy link
Collaborator

guoqing-noaa commented Oct 16, 2024

Thanks @Junjun-NOAA for running both RDASAPP mpasjedi tests and the mpas-bundle ctests.
3dfgat passed in mpas-bundle while failed in RDASApp.

The test results are at the following two locations respectively:
/scratch1/BMC/wrfruc/jjhu/rrfsv2/RDASApp_PRs/PR194/RDASApp/build/mpas-jedi/Testing/Temporary/3dfgat.log
and
/scratch1/BMC/wrfruc/jjhu/rrfsv2/mpas-bundle-v3.0.1/build/mpas-jedi/Testing/Temporary/3dfgat.log

We compared all submodules and data directories under RDASApp and mpas-bundle-v3.0.1:
ioda/ ioda-data/ MPAS/ mpas-jedi/ mpas-jedi-data/ oops/ saber/ ufo/ ufo-data/ vader/
All are exactly the same.

@Junjun-NOAA
Copy link
Collaborator Author

Junjun-NOAA commented Oct 16, 2024

By comparing the log files, we found RDASApp rejects one more Radiosonde wind obs than mpas-bundle, please see the log below:
Screenshot 2024-10-15 at 10 49 20 PM

the white color is RDASApp, the cyan color is mpas-bundle

@SamuelDegelia-NOAA
Copy link
Contributor

SamuelDegelia-NOAA commented Oct 16, 2024

Thanks @Junjun-NOAA and @guoqing-noaa for the additional testing. I think your analysis shows that this one small difference is pretty minor and not worth worrying about at the moment. But at least we have a record if we ever want to go back and figure out what is going on.

@rrfsbot
Copy link
Collaborator

rrfsbot commented Oct 17, 2024

PASSED on hera

started build_and_test on hera at UTC time: Wed Oct 16 17:54:54 UTC 2024
finished at UTC time: Thu Oct 17 04:30:41 UTC 2024

Test project /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194/build/rrfs-test
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............   Passed   34.65 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............   Passed  216.56 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  19790.00 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  20032.23 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  35029.60 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  15886.36 sec

100% tests passed, 0 tests failed out of 6

Label Time Summary:
mpi            = 90989.39 sec*proc (6 tests)
rdas-bundle    = 90989.39 sec*proc (6 tests)
script         = 90989.39 sec*proc (6 tests)

Total Test time (real) = 35919.88 sec

workdir: /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194

@ShunLiu-NOAA
Copy link

ctest takes 35919.88 sec. Is it an issue related to HPC?

@SamuelDegelia-NOAA
Copy link
Contributor

Those ctest times include waiting time in the queue. Hera has been quite slow for me recently so I am guessing that is the problem. Typically the longest ctest is the GETKF solver which takes about ~20 minutes of actual runtime (it will be reduced to ~10 min after we can use reduce obs space).

@SamuelDegelia-NOAA
Copy link
Contributor

Part of the increased wait time too is probably due to requesting 30 min for the mpas-jedi ctests. The increased wall time is really only needed for the GETKF solver so I plan to tune those a bit better whenever I update the OOPS hash (after this PR is merged).

@guoqing-noaa
Copy link
Collaborator

ctest takes 35919.88 sec. Is it an issue related to HPC?

@ShunLiu-NOAA Yes, as @SamuelDegelia-NOAA mentioned, this was because Hera was known to have a very long job waiting time recently. The ctests themselves don't have any issues.

@guoqing-noaa
Copy link
Collaborator

Part of the increased wait time too is probably due to requesting 30 min for the mpas-jedi ctests. The increased wall time is really only needed for the GETKF solver so I plan to tune those a bit better whenever I update the OOPS hash (after this PR is merged).

Thanks for the thought. I don't think 10 minutes vs 30 minutes will make any evident differences in RDHPCS, :)

@SamuelDegelia-NOAA
Copy link
Contributor

SamuelDegelia-NOAA commented Oct 17, 2024

Part of the increased wait time too is probably due to requesting 30 min for the mpas-jedi ctests. The increased wall time is really only needed for the GETKF solver so I plan to tune those a bit better whenever I update the OOPS hash (after this PR is merged).

Thanks for the thought. I don't think 10 minutes vs 30 minutes will make any evident differences in RDHPCS, :)

I wouldn't think so, but 5 min vs 30 min is currently making a big difference. The fv3-jedi tests that request 5 min here are not waiting at all and run within a few minutes. But the mpas-jedi tests that request 30 min are waiting much much longer.

@guoqing-noaa
Copy link
Collaborator

Part of the increased wait time too is probably due to requesting 30 min for the mpas-jedi ctests. The increased wall time is really only needed for the GETKF solver so I plan to tune those a bit better whenever I update the OOPS hash (after this PR is merged).

Thanks for the thought. I don't think 10 minutes vs 30 minutes will make any evident differences in RDHPCS, :)

I wouldn't think so, but 5 min vs 30 min is currently making a big difference. The fv3-jedi tests that request 5 min here are not waiting at all and run within a few minutes. But the mpas-jedi tests that request 30 min are waiting much much longer.

Good to know. If it helps, you may go ahead to make the changes. Thanks.

@SamuelDegelia-NOAA
Copy link
Contributor

Good to know. If it helps, you may go ahead to make the changes. Thanks.

I plan to wait until we update the OOPS hash further. We'll need to do that first to get the lower run times for GETKF (via reduce obs space). Hopefully I can start on that as soon as this PR gets merged.

@ShunLiu-NOAA
Copy link

@hu5970 Do you get a chance to review this PR?

@guoqing-noaa
Copy link
Collaborator

@ShunLiu-NOAA @hu5970 Do we still lingering issues on this PR? Otherwise, we should consider merging this PR. Thanks!

@hu5970
Copy link
Contributor

hu5970 commented Oct 21, 2024

@guoqing-noaa I will ask the team on this PR tomorrow and try to arrange an order to merge the PRs.

@hu5970 hu5970 dismissed ShunLiu-NOAA’s stale review October 22, 2024 21:35

Shun and I talked about this review and the problem solved.

@hu5970 hu5970 merged commit 6967dd5 into NOAA-EMC:develop Oct 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants