Update WRF_Hydro MPI calls. #552

donaldwj · 2021-04-29T16:36:00Z

TYPE: enhancement

KEYWORDS: MPI, Communication, Enhancement

SOURCE: Donald W Johnson, NWC (National Water Center)

DESCRIPTION OF CHANGES: This pull request is intended to replace mpi code in mpp_land.F and other mpp files where mpi_send and mpi_recv are being used for tasks better suited to higher level mpi constructs (mpi_gather[v], mpi_scatter[v], mpi_reduce, mpi_allreduce). The intent is to not change the current behavior of any functions but to improve the efficiency of the underling communication. In some cases this means that internal unit conversions (for example unneeded changes form real*8 to real) will be eliminated and this may slightly change final answers.

ISSUE:

Fixes #551

TESTS CONDUCTED:

Initial Testing was conducted with the croton domain on Cheyenne. Although all test passed on this domain I suspect regression may fail on conus because the updated functions had unnecessary data type changes removed.

####################################

TESTING: --- nwm_ana ---

####################################

echo; echo "Using the following modules for testing:" ; module list; echo;pytest -vv --tb=no --ignore=local -p no:cacheprovider --html=/glade/scratch/donaldj/take_test_croton_NY_mpi_updates/wrfhydro_testing-ifort-nwm_ana.html --self-contained-html --config=nwm_ana --compiler=ifort --domain_dir=/glade/work/jamesmcc/domains/public/croton_NY --candidate_dir=/glade/scratch/donaldj/take_test_croton_NY_mpi_updates/dwj_wrf_hydro_nwm_can_pytest/trunk/NDHMS --reference_dir=/glade/scratch/donaldj/take_test_croton_NY_mpi_updates/reference_ref_pytest/trunk/NDHMS --output_dir=/glade/scratch/donaldj/take_test_croton_NY_mpi_updates --exe_cmd='mpirun -np 6 ./wrf_hydro.exe' --ncores=6 --xrcmp_n_cores=0 --scheduler --nnodes=1 --account=NRAL0017 --walltime=01:00:00 --queue=share

Using the following modules for testing:

Currently Loaded Modules:

intel/18.0.5 2) impi/2018.4.274 3) ncarcompilers/0.5.0 4) netcdf/4.7.4 5) ncarenv/1.3 6) nccmp/1.8.2.1

=========================================================== test session starts ===========================================================
platform linux -- Python 3.6.8, pytest-5.1.2, py-1.8.0, pluggy-0.12.0 -- /glade/p/cisl/nwc/model_testing_env/wrf_hydro_nwm_test/bin/python
metadata: {'Python': '3.6.8', 'Platform': 'Linux-4.12.14-95.51-default-x86_64-with-SuSE-12-x86_64', 'Packages': {'pytest': '5.1.2', 'py': '1.8.0', 'pluggy': '0.12.0'}, 'Plugins': {'html': '1.19.0', 'datadir-ng': '1.1.0', 'metadata': '1.8.0'}, 'JAVA_HOME': '/usr/lib64/jvm/java'}
rootdir: /glade/scratch/donaldj/take_test_croton_NY_mpi_updates/dwj_wrf_hydro_nwm_can_pytest
plugins: html-1.19.0, datadir-ng-1.1.0, metadata-1.8.0
collected 16 items

tests/test_1_fundamental.py::test_compile_candidate PASSED [ 6%]
tests/test_1_fundamental.py::test_compile_reference PASSED [ 12%]
tests/test_1_fundamental.py::test_run_candidate PASSED [ 18%]
tests/test_1_fundamental.py::test_run_reference PASSED [ 25%]
tests/test_1_fundamental.py::test_ncores_candidate PASSED [ 31%]
tests/test_1_fundamental.py::test_perfrestart_candidate PASSED [ 37%]
tests/test_2_regression.py::test_regression_data PASSED [ 43%]
tests/test_2_regression.py::test_regression_metadata PASSED [ 50%]
tests/test_3_outputs.py::test_output_has_nans PASSED [ 56%]
tests/test_supp_1_channel_only.py::test_run_candidate_channel_only PASSED [ 62%]
tests/test_supp_1_channel_only.py::test_channel_only_matches_full PASSED [ 68%]
tests/test_supp_1_channel_only.py::test_ncores_candidate_channel_only PASSED [ 75%]
tests/test_supp_1_channel_only.py::test_perfrestart_candidate_channel_only PASSED [ 81%]
tests/test_supp_2_nwm_output.py::test_run_reference_nwm_output_sim PASSED [ 87%]
tests/test_supp_2_nwm_output.py::test_run_candidate_nwm_output_sim PASSED [ 93%]
tests/test_supp_2_nwm_output.py::test_regression_metadata_nwm_output PASSED [100%]

NOTES:

This pull request will be updated periodically. I intend to only change one or two functions per commit. I am currently keeping the original function definitions in comments below changes, should this be maintained?

Checklist

Merging the PR depends on following checklist being completed. Add X between each of the square
brackets if they are completed in the PR itself. If a bullet is not relevant to you, please comment
on why below the bullet.

Closes issue MPI code using sub-optimal communication patterns #551
Tests added (unit tests and/or regression/integration tests)
Backwards compatible
Requires new files? If so, how to generate them.
Fully documented
Short description in the Development section of NEWS.md

donaldwj · 2021-04-29T16:41:44Z

Note: more commits with other function updates will follow.

…wm_public into mpi_updates

…the size and displacement vectors needed by scatterv and gatherv.

scrasmussen · 2021-08-14T00:05:29Z

This looks good and works well, thanks @donaldwj!!
I compared the results of this PR with main using Croton and got identical results.

@rcabell, I believe you had mentioned that it is a preference to simply delete the old code instead of commenting out sections, do we want that done with lines like 408-454, 647-663 of this PR?

donaldwj · 2021-08-16T16:57:59Z

I will delete the old code in a separate commit

…

On Fri, Aug 13, 2021 at 7:05 PM Soren Rasmussen ***@***.***> wrote: This looks good and works well, thanks @donaldwj <https://github.com/donaldwj>!! I compared the results of this PR with main using Croton and got identical results. @rcabell <https://github.com/rcabell>, I believe you had mentioned that it is a preference to simply delete the old code instead of commenting out sections, do we want that done with lines like 408-454, 647-663 of this PR? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#552 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF6KABDDVI4JJBSYKSZ3JQTT4WXNHANCNFSM432IMGWQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

-- Donald W Johnson 205-347-1467 National Water Center Tuscaloosa AL

…always use MPI_INTGER.

scrasmussen · 2021-08-20T18:18:42Z

I've looked over, built, and ran on the Croton test domain. Everything built and ran successfully so I'm postulating that everything is working correctly. I've added to my todo list to make some nice unit tests for these changes but that also involves creating the structure for unit tests so I recommend accepting this PR in the meantime. All looking good : )

scrasmussen

Everything looks good and builds correctly. I tested on the Croton domain and am getting the same results pre and post PR. All recommendations have been followed and everything seems good! These changes will help performance 👍

donaldwj · 2021-08-23T15:40:07Z

Is there a simple way to get the testing framework to do CONUS runs. In particular I want to see if there is noticeable speed change.

rcabell · 2021-08-23T15:42:00Z

Is there a simple way to get the testing framework to do CONUS runs. In particular I want to see if there is noticeable speed change.

We have a mirror system of the GitHub CI tests on Cheyenne that does 48 hour Full and Long-Range Physics tests. I just started it and can have timing data for you soon.

rcabell · 2021-09-14T21:51:59Z

Something strange is going on with this update when run over CONUS, probably a race condition or something. If I re-run the tests several times, usually they hang on the first run_candidate stage. But other times (1 in 3 or 4) the testing will continue to another subsequent run of the model, where it will hang. I managed to get Long-Range to go to completion, but not full-physics. Croton always appears to succeed, regardless of compiler or MPI library. We may need to find an intermediate-complexity domain to figure out what's going on.

donaldwj · 2021-09-14T22:46:12Z

If I remember the original code MPI_Barrier() calls at the end of the replaced functions. It may be worth re-adding those calls to the new functions. They should not be needed by it would be a simple test. Donald Johnson

…

On Tue, Sep 14, 2021 at 4:52 PM Ryan Cabell ***@***.***> wrote: Something strange is going on with this update when run over CONUS, probably a race condition or something. If I re-run the tests several times, *usually* they hang on the first run_candidate stage. But other times (1 in 3 or 4) the testing will continue to another subsequent run of the model, where it will hang. I managed to get Long-Range to go to completion, but not full-physics. Croton always appears to succeed, regardless of compiler or MPI library. We may need to find an intermediate-complexity domain to figure out what's going on. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#552 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF6KABAHAOSMFMULMBIKJI3UB67YTANCNFSM432IMGWQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Donald W Johnson 205-347-1467 National Water Center Tuscaloosa AL

Update definitions of sum_real8 and sumreal1d.

ce62b08

rcabell requested review from rcabell and scrasmussen June 30, 2021 19:20

rcabell force-pushed the master branch 3 times, most recently from 93ccb4d to 385b797 Compare July 8, 2021 01:10

rcabell and others added 5 commits July 7, 2021 19:48

Merge branch 'master' into mpi_updates

f87485c

update code in calculate_start_p()

c319f7b

Merge branch 'mpi_updates' of https://github.com/donaldwj/wrf_hydro_n…

c1a9a26

…wm_public into mpi_updates

bugfix for new start_p() code.

8bcced0

Update get_local_size() to use MPI_AllGather().

54582c0

donaldwj changed the title ~~Update definitions of sum_real8 and sumreal1d.~~ Update WRF_Hydro MPI calls. Aug 9, 2021

Added subroutine calculate_offset_vectors(). This function calculate …

be02e6c

…the size and displacement vectors needed by scatterv and gatherv.

donaldwj added 6 commits August 17, 2021 11:01

Inital implementation of decompose_data_real with scatterv.

2992959

Remove old code that had been commented out.

34110f9

change decompose_data_int to use scatterv.

3e14cf7

Change write_IO_int to use mpi_gatherv. Update decompose_data_int to …

4c3323c

…always use MPI_INTGER.

Update write_IO_real to use scatterv.

505eb3d

Remove commented out code.

e02ac23

scrasmussen previously approved these changes Aug 20, 2021

View reviewed changes

Merge branch 'master' into mpi_updates

f692cff

rcabell marked this pull request as draft April 13, 2022 20:21

rcabell dismissed scrasmussen’s stale review via f692cff June 7, 2023 19:42

Merge remote-tracking branch 'upstream/main' into mpi_updates

d7b5c11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update WRF_Hydro MPI calls. #552

Update WRF_Hydro MPI calls. #552

donaldwj commented Apr 29, 2021 •

edited

Loading

donaldwj commented Apr 29, 2021

scrasmussen commented Aug 14, 2021

donaldwj commented Aug 16, 2021 via email

scrasmussen commented Aug 20, 2021

scrasmussen left a comment •

edited

Loading

donaldwj commented Aug 23, 2021

rcabell commented Aug 23, 2021

rcabell commented Sep 14, 2021

donaldwj commented Sep 14, 2021 via email

Update WRF_Hydro MPI calls. #552

Are you sure you want to change the base?

Update WRF_Hydro MPI calls. #552

Conversation

donaldwj commented Apr 29, 2021 • edited Loading

TESTING: --- nwm_ana ---

Checklist

donaldwj commented Apr 29, 2021

scrasmussen commented Aug 14, 2021

donaldwj commented Aug 16, 2021 via email

scrasmussen commented Aug 20, 2021

scrasmussen left a comment • edited Loading

Choose a reason for hiding this comment

donaldwj commented Aug 23, 2021

rcabell commented Aug 23, 2021

rcabell commented Sep 14, 2021

donaldwj commented Sep 14, 2021 via email

donaldwj commented Apr 29, 2021 •

edited

Loading

scrasmussen left a comment •

edited

Loading