-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support MPItrampoline as MPI provider #513
Support MPItrampoline as MPI provider #513
Conversation
Closes #502 |
There are tests for various architectures (in Dockerfiles) and tests on a few concrete HPC systems (as shell scripts).
… eschnett/mpitrampoline # Conflicts: # deps/build.jl # src/MPI.jl # src/onesided.jl # test/test_sendrecv.jl # test/test_shared_win.jl # test/test_test.jl
This is awesome, let me know if there is anything I can do to help. |
@simonbyrne At the moment I'm fighting with the unit tests. Locally and on HPC systems the tests work fine. I'm looking for general feedback, in particular for the larger changes I'm making outside the code that is MPItrampoline-only. |
Would it make sense to split those changes into two? Right now it looks like the two issues are that
|
diff --git a/deps/consts_openmpi.jl b/deps/consts_openmpi.jl
index 0face3e..d84c202 100644
--- a/deps/consts_openmpi.jl
+++ b/deps/consts_openmpi.jl
@@ -132,3 +132,20 @@ const MPI_IN_PLACE = reinterpret(SentinelPtr, 1)
const MPI_STATUS_IGNORE = reinterpret(SentinelPtr, 0)
const MPI_STATUSES_IGNORE = reinterpret(SentinelPtr, 0)
+function _type_null_copy_fn(oldtype::MPI_Datatype, type_keyval::Cint,
+ extra_state::Ptr{Cvoid},
+ attribute_val_in::Ptr{Cvoid}, attribute_val_out::Ptr{Cvoid},
+ flag::Ptr{Cint})
+ unsafe_store!(flag, Cint(0))
+ return MPI_SUCCESS
+end
+function _type_null_delete_fn(oldtype::MPI_Datatype, type_keyval::Cint,
+ attribute_val_in::Ptr{Cvoid}, extra_state::Ptr{Cvoid})
+ return MPI_SUCCESS
+end
+
+const MPI_TYPE_NULL_COPY_FN = @cfunction(_type_null_copy_fn,Cint,(MPI_Datatype, Cint, Ptr{Cvoid},
+ Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cint}))
+
+const MPI_TYPE_NULL_DELETE_FN = @cfunction(_type_null_delete_fn,Cint,(MPI_Datatype, Cint, Ptr{Cvoid},
+ Ptr{Cvoid}))
\ No newline at end of file
diff --git a/src/datatypes.jl b/src/datatypes.jl
index acbba29..0ecce2b 100644
--- a/src/datatypes.jl
+++ b/src/datatypes.jl
@@ -29,17 +29,6 @@ function free(dt::Datatype)
end
# attributes
-# function _type_null_copy_fn(oldtype::MPI_Datatype, type_keyval::Cint,
-# extra_state::Ptr{Cvoid},
-# attribute_val_in::Ptr{Cvoid}, attribute_val_out::Ptr{Cvoid},
-# flag::Ptr{Cint})
-# unsafe_store!(flag, Cint(0))
-# return MPI_SUCCESS
-# end
-# function _type_null_delete_fn(oldtype::MPI_Datatype, type_keyval::Cint,
-# attribute_val_in::Ptr{Cvoid}, extra_state::Ptr{Cvoid})
-# return MPI_SUCCESS
-# end
function create_keyval(::Type{Datatype})
ref = Ref(Cint(0))
@@ -48,10 +37,6 @@ function create_keyval(::Type{Datatype})
Ptr{Cvoid},
Ptr{Cint},
Ptr{Cvoid}),
- # @cfunction(_type_null_copy_fn,Cint,(MPI_Datatype, Cint, Ptr{Cvoid},
- # Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cint})),
- # @cfunction(_type_null_delete_fn,Cint,(MPI_Datatype, Cint, Ptr{Cvoid},
- # Ptr{Cvoid})),
MPI_TYPE_NULL_COPY_FN,
MPI_TYPE_NULL_DELETE_FN,
ref, C_NULL) made me at least get through OpenMPI until I hit the new errhandler changes. |
Regarding splitting the changes into two: Yes, that would be possible. I'd like to wait at least until things work so that the changes I need to make until then don't break when MPItrampoline is used. The changes that add support for MPItrampoline are inactive by default. You need to set |
… eschnett/mpitrampoline # Conflicts: # src/implementations.jl
The PR is ready. There is 1 approval. It's a PR, though. Should I expect a second approval? And: When do we increase the version number? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eschnett, this is going to be a huge improvement. Did you want to clean up the history, or should I just squash-merge it?
Don't worry about the version for now: we need to bump it anyway, and there are a couple more changes I wanted to make before the next release.
This is awesome! Thanks for all of the hard work @eschnett. Are there any examples of building external packages for Julia via Yggdrasil that run with MPItrampoline? I would love to add a p4est binary in Yggdrasil that does this. |
Yes! I jumped the gun and built LAMMPS against MPItrampoline a while back https://github.com/JuliaPackaging/Yggdrasil/blob/16bf87457e7b22578244738e6146b55e81acd37a/L/LAMMPS/build_tarballs.jl#L64 |
A squash merge is fine. |
I can only repeat what others have already written and said: Great work, I am looking forward to having this crucial piece for a Julian approach to HPC usability in MPI.jl!
@lcw We should maybe coordinate on this. Based on the discussion with @eschnett I had last month, we (@lchristm is doing most of the work) are working on building p4est in P4est_jll against MicrosoftMPI on Win* and against MPICH elsewhere (the respective default MPI versions on each platform). I think this still makes sense as long as MPItrampoline is not the default MPI version on *nix-like systems, since otherwise users would always have to switch out the used MPI library and cannot just use P4est_jll out of the box. Or am I missing something here? (we should probably continue this discussion elsewhere, since its not pertinent to this PR - sorry for the spam!) |
I am wondering if we can use MPITrampoline already since otherwise you will get: JuliaParallel/Elemental.jl#75 (comment) and that's almost a worse experience.. Let me experiment with LAMMPS_jll |
@sloede In principle we can switch Yggdrasil builds MPItrampoline together with a wrapped MPICH for convenience, and configures MPItrampoline to use that MPICH as fallback if no other MPI implementation is selected. That is, if you use I didn't suggest doing so right away because I wanted to give people an opportunity to test the |
I agree - the best test would be to just use it. However, I assume that you have already run a lot of tests yourself, and we have CI here, so more academic tests would probably add not much to it. Thus if we switch to However, considering the likely decrease in developer availability over the upcoming holiday season (and thus longer reaction times in case something does break), I suggest to wait with merging such a MPItrampoline-is-the-new-default PR until the second week of January. |
I suggest we first make the change on the master branch to give you and a few others a quick chance to check that I didn't overlook an important mode of operation (or a lack of documentation that confuses others), and then release a new version. I'll create a PR to change the default, to be applied either now or after the winter break. |
Sounds good to me 👍 |
This changes
MPI.jl
to accept MPItrampoline as MPI provider.Since MPItrampoline is initialized later than other MPI implementations (at load time, not at build time), MPI.jl's initialization of various constants also needs to be delayed when MPItrampoline is used. I currently convert
const
toglobal
objects for this. It would be possible to useconst Ref
instead as well.This PR also corrects a few places where MPI.jl deviates from the MPI standard, notably when handling
MPI_Status
.