-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update WRF_Hydro MPI calls. #552
base: main
Are you sure you want to change the base?
Conversation
Note: more commits with other function updates will follow. |
93ccb4d
to
385b797
Compare
…the size and displacement vectors needed by scatterv and gatherv.
This looks good and works well, thanks @donaldwj!! @rcabell, I believe you had mentioned that it is a preference to simply delete the old code instead of commenting out sections, do we want that done with lines like 408-454, 647-663 of this PR? |
I will delete the old code in a separate commit
…On Fri, Aug 13, 2021 at 7:05 PM Soren Rasmussen ***@***.***> wrote:
This looks good and works well, thanks @donaldwj
<https://github.com/donaldwj>!!
I compared the results of this PR with main using Croton and got identical
results.
@rcabell <https://github.com/rcabell>, I believe you had mentioned that
it is a preference to simply delete the old code instead of commenting out
sections, do we want that done with lines like 408-454, 647-663 of this PR?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#552 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF6KABDDVI4JJBSYKSZ3JQTT4WXNHANCNFSM432IMGWQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
--
Donald W Johnson
205-347-1467
National Water Center
Tuscaloosa AL
|
…always use MPI_INTGER.
I've looked over, built, and ran on the Croton test domain. Everything built and ran successfully so I'm postulating that everything is working correctly. I've added to my todo list to make some nice unit tests for these changes but that also involves creating the structure for unit tests so I recommend accepting this PR in the meantime. All looking good : ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good and builds correctly. I tested on the Croton domain and am getting the same results pre and post PR. All recommendations have been followed and everything seems good! These changes will help performance 👍
Is there a simple way to get the testing framework to do CONUS runs. In particular I want to see if there is noticeable speed change. |
We have a mirror system of the GitHub CI tests on Cheyenne that does 48 hour Full and Long-Range Physics tests. I just started it and can have timing data for you soon. |
Something strange is going on with this update when run over CONUS, probably a race condition or something. If I re-run the tests several times, usually they hang on the first run_candidate stage. But other times (1 in 3 or 4) the testing will continue to another subsequent run of the model, where it will hang. I managed to get Long-Range to go to completion, but not full-physics. Croton always appears to succeed, regardless of compiler or MPI library. We may need to find an intermediate-complexity domain to figure out what's going on. |
If I remember the original code MPI_Barrier() calls at the end of the
replaced functions. It may be worth re-adding those calls to the new
functions. They should not be needed by it would be a simple test.
Donald Johnson
…On Tue, Sep 14, 2021 at 4:52 PM Ryan Cabell ***@***.***> wrote:
Something strange is going on with this update when run over CONUS,
probably a race condition or something. If I re-run the tests several
times, *usually* they hang on the first run_candidate stage. But other
times (1 in 3 or 4) the testing will continue to another subsequent run of
the model, where it will hang. I managed to get Long-Range to go to
completion, but not full-physics. Croton always appears to succeed,
regardless of compiler or MPI library. We may need to find an
intermediate-complexity domain to figure out what's going on.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#552 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF6KABAHAOSMFMULMBIKJI3UB67YTANCNFSM432IMGWQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Donald W Johnson
205-347-1467
National Water Center
Tuscaloosa AL
|
TYPE: enhancement
KEYWORDS: MPI, Communication, Enhancement
SOURCE: Donald W Johnson, NWC (National Water Center)
DESCRIPTION OF CHANGES: This pull request is intended to replace mpi code in mpp_land.F and other mpp files where mpi_send and mpi_recv are being used for tasks better suited to higher level mpi constructs (mpi_gather[v], mpi_scatter[v], mpi_reduce, mpi_allreduce). The intent is to not change the current behavior of any functions but to improve the efficiency of the underling communication. In some cases this means that internal unit conversions (for example unneeded changes form real*8 to real) will be eliminated and this may slightly change final answers.
ISSUE:
TESTS CONDUCTED:
Initial Testing was conducted with the croton domain on Cheyenne. Although all test passed on this domain I suspect regression may fail on conus because the updated functions had unnecessary data type changes removed.
####################################
TESTING: --- nwm_ana ---
####################################
echo; echo "Using the following modules for testing:" ; module list; echo;pytest -vv --tb=no --ignore=local -p no:cacheprovider --html=/glade/scratch/donaldj/take_test_croton_NY_mpi_updates/wrfhydro_testing-ifort-nwm_ana.html --self-contained-html --config=nwm_ana --compiler=ifort --domain_dir=/glade/work/jamesmcc/domains/public/croton_NY --candidate_dir=/glade/scratch/donaldj/take_test_croton_NY_mpi_updates/dwj_wrf_hydro_nwm_can_pytest/trunk/NDHMS --reference_dir=/glade/scratch/donaldj/take_test_croton_NY_mpi_updates/reference_ref_pytest/trunk/NDHMS --output_dir=/glade/scratch/donaldj/take_test_croton_NY_mpi_updates --exe_cmd='mpirun -np 6 ./wrf_hydro.exe' --ncores=6 --xrcmp_n_cores=0 --scheduler --nnodes=1 --account=NRAL0017 --walltime=01:00:00 --queue=share
Using the following modules for testing:
Currently Loaded Modules:
=========================================================== test session starts ===========================================================
platform linux -- Python 3.6.8, pytest-5.1.2, py-1.8.0, pluggy-0.12.0 -- /glade/p/cisl/nwc/model_testing_env/wrf_hydro_nwm_test/bin/python
metadata: {'Python': '3.6.8', 'Platform': 'Linux-4.12.14-95.51-default-x86_64-with-SuSE-12-x86_64', 'Packages': {'pytest': '5.1.2', 'py': '1.8.0', 'pluggy': '0.12.0'}, 'Plugins': {'html': '1.19.0', 'datadir-ng': '1.1.0', 'metadata': '1.8.0'}, 'JAVA_HOME': '/usr/lib64/jvm/java'}
rootdir: /glade/scratch/donaldj/take_test_croton_NY_mpi_updates/dwj_wrf_hydro_nwm_can_pytest
plugins: html-1.19.0, datadir-ng-1.1.0, metadata-1.8.0
collected 16 items
tests/test_1_fundamental.py::test_compile_candidate PASSED [ 6%]
tests/test_1_fundamental.py::test_compile_reference PASSED [ 12%]
tests/test_1_fundamental.py::test_run_candidate PASSED [ 18%]
tests/test_1_fundamental.py::test_run_reference PASSED [ 25%]
tests/test_1_fundamental.py::test_ncores_candidate PASSED [ 31%]
tests/test_1_fundamental.py::test_perfrestart_candidate PASSED [ 37%]
tests/test_2_regression.py::test_regression_data PASSED [ 43%]
tests/test_2_regression.py::test_regression_metadata PASSED [ 50%]
tests/test_3_outputs.py::test_output_has_nans PASSED [ 56%]
tests/test_supp_1_channel_only.py::test_run_candidate_channel_only PASSED [ 62%]
tests/test_supp_1_channel_only.py::test_channel_only_matches_full PASSED [ 68%]
tests/test_supp_1_channel_only.py::test_ncores_candidate_channel_only PASSED [ 75%]
tests/test_supp_1_channel_only.py::test_perfrestart_candidate_channel_only PASSED [ 81%]
tests/test_supp_2_nwm_output.py::test_run_reference_nwm_output_sim PASSED [ 87%]
tests/test_supp_2_nwm_output.py::test_run_candidate_nwm_output_sim PASSED [ 93%]
tests/test_supp_2_nwm_output.py::test_regression_metadata_nwm_output PASSED [100%]
NOTES:
This pull request will be updated periodically. I intend to only change one or two functions per commit. I am currently keeping the original function definitions in comments below changes, should this be maintained?
Checklist
Merging the PR depends on following checklist being completed. Add
X
between each of the squarebrackets if they are completed in the PR itself. If a bullet is not relevant to you, please comment
on why below the bullet.
NEWS.md