Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Elemental built against system MPI #37

Open
bluehope opened this issue Aug 8, 2016 · 22 comments
Open

Support for Elemental built against system MPI #37

bluehope opened this issue Aug 8, 2016 · 22 comments

Comments

@bluehope
Copy link

bluehope commented Aug 8, 2016

Is there any plan to support MVAPICH2 & intel MPI?

Intel MPI (which has binary compatibility with MVAPICH2) is also one of widely used MPI implementation.
It would be nice to Elemental to support MVAPICH2!

@poulson
Copy link

poulson commented Aug 8, 2016

Elemental absolutely supports every modern MPI implementation. I assume that you mean Elemental.jl?

@bluehope
Copy link
Author

bluehope commented Aug 9, 2016

@poulson Oh, Yes. I meant "Elemental.jl". Thank you for the correction.

@andreasnoack
Copy link
Member

andreasnoack commented Sep 15, 2016

For some reason, I'd unwatched my package here so I've only seen this issue now. Soon, we'll change MPI.jl to use the C-API and also hard code various MPI implementations. When that PR is merged, I'll probably delete the MPI functions here and add MPI.jl as a dependency. Adding support for MVAPICH would then be a matter of adding support in MPI.jl and I guess it is basically a matter of copying the definitions from MPICH. Feel free to open a PR to speed up the process.

@ViralBShah
Copy link
Member

ViralBShah commented May 28, 2020

We no longer build the sources as part of the package installation like we used to.

So the only options are - use what ships with BinaryBuilder, or provide your own custom build. Note that while MPI.jl allows system MPI to be used, Elemental.jl needs an update to allow a system build (that lets it opt out of the BB provided binaries).

@ViralBShah ViralBShah changed the title Support MVAPICH2 (& intel MPI) Support system MPI May 28, 2020
@ViralBShah ViralBShah changed the title Support system MPI Support for Elemental built against system MPI May 28, 2020
@JBlaschke
Copy link

Hi, is there a way we can speed up this update? I volunteer my time. At NERSC we need to build against the system MPI, so I am happy to help out if this means we can deploy Elemental.jl on our systems sooner.

I am still a bit new to BB, so can someone give be some guidance how I can "opt out of" BB provided binaries?

@JBlaschke
Copy link

Btw @ViralBShah in Julia 1.6.0 we get

Warning: Error requiring `MPICH_jll` from `MPI`
  exception =
   MPICH_jll cannot be loaded: MPI.jl is configured to use the system MPI library

As a workaround I tried:

] dev --local Elemental
] dev --local Elemental_jll
] dev --loca MPICH_jll

but that doesn't fix it. TBH, I can only see one place where MPICH_jll is needed -- here: https://github.com/JuliaParallel/Elemental.jl/blob/83089155659739fea1aae476c6fd492b1ee20850/test/runtests.jl

Then again, I don't know much about BinaryBuilder but it looks like Requires goes through the package specs, and throws this warning. What is not clear to me is how I can tell Requires that not using the system MPI is not an option. What do you think?

@JBlaschke
Copy link

Follow-up: is there a way to drop the MPICH_jll requirement in https://github.com/JuliaBinaryWrappers/Elemental_jll.jl ?

@andreasnoack
Copy link
Member

MPI.jl has a mechanism that allows for using a system MPI instead of the BB provided MPI. However, it's really not clear to me how that can work here unless we reintroduce the code for building Elemental as part of this package. We could try to mimic the MPI.jl code and link against a system provided libelemental but, historically, it was important to keep a tight connection between the version of the wrappers here and the version of libelemental since the API was evolving.

If you already have a build of libelemental that links against your MPI then you can try to remove Elemental_jll and just point the libEl variable to that libEl.so (or what it's called, I don't recall).

@JBlaschke
Copy link

@andreasnoack if you can share with me the BB code that was used to generate Elemental_jll (the build_tarbals.jl? As I said, I'm new to this) then I could take a stab at a locally build one. I think we can't get around an Elemental_jll because Elemental.jl references it in its source: using Elemental_jll: libEl. My strategy at NERSC would then be to provide our own Elemental_jll in our admin repo.

@andreasnoack
Copy link
Member

It's here, https://github.com/JuliaPackaging/Yggdrasil/blob/master/E/Elemental/build_tarballs.jl, but I don't see why it would be easier to modify Elemental_jll instead of just Elemental.jl.

@JBlaschke
Copy link

Thanks @andreasnoack I'll take a stab at this and let you know. This is easier because I don't need elemental. Several of our users do. So I could explain to each of our users how to modify Elemental.jl. Or I could build our own NERSC-specific Elemental_jll which would live in our global admin depot (alongside of our MPI.jl). If I do things right, this should be automatically picked up whenever a user installs Elemental.jl, without them needing to fiddle with the source code (or build their own libEl).

@JBlaschke
Copy link

Another question @andreasnoack -- why do you use the deprecated elemental repo instead of https://github.com/LLNL/Elemental ? The LLNL version comes with CUDA support.

@ViralBShah
Copy link
Member

ViralBShah commented Jun 17, 2021

I think the last time I checked, I could not get the build to work in the new repo. We are not yet ready to enable CUDA support in BinaryBuilder, because we don't have infrastructure to distribute CUDA-built binaries. @maleadt may be able to say when we can expect that.

In the meanwhile, we can certainly switch to the new upstream repo for building Elemental_jll. @would you be able to submit a PR?

@andreasnoack
Copy link
Member

Or I could build our own NERSC-specific Elemental_jll which would live in our global admin depot (alongside of our MPI.jl)

What I don't understand is why it would be easier to put Elemental_jll there instead of just putting a modified version of Elemental.jl in your global admin repo.

@JBlaschke
Copy link

JBlaschke commented Jun 17, 2021

@andreasnoack a user might want a specific version/make their own changes. This seems to be more maintainable: unless the libEl build instructions change -- if I understand this correctly -- I can just re-build libEl locally by re-running the same build script. On the other hand, if I maintain a patched Elemental.jl, then I need to re-apply the patch (and possibly re-build libEl anyway) every time I update Elemental.jl.

Also: we don't have an Elemental module, so I would have to write a build script anyway.

@ViralBShah
Copy link
Member

What we should really do is update Elemental_jll to be from the new repo. Then update Elemental.jl to use those new binaries, and whatever features are needed to use a system Elemental - put them behind an environment variable. We are happy to update the upstream package to allow whatever local configuration is necessary.

@ViralBShah
Copy link
Member

Here's some of the problems trying to build the Elemental from LLNL:

https://dev.azure.com/JuliaPackaging/Yggdrasil/_build/results?buildId=11741&view=logs&j=bdc19914-4824-529b-e606-c39779d9c0ef&t=ca989bc1-9e4f-55b3-32df-8eaed39b717f&l=2625

@Sideboard
Copy link

Are there any new developments concerning this issue?

I wanted to test leastsquares() for distributed systems but ran into MPI vs MPICH errors. At least one cluster I'm working on uses Intel MPI so it seems like a serious constraint having to use MPICH for Elemental.jl.

@bernstei
Copy link

To make the BinaryBuilder process more flexible, are there few enough ABIs (e.g. MPICH, which IntelMPI apparently also uses, and OpenMPI) that just having a couple (or a few, but still not too many) of versions of libEl, one for each ABI, is enough? The actual runtime selection of libEl can be handled by an argument or env var, and the underlying MPI library by LD_LIBRARY_PATH.

With #64 extended to a few more strings that might be sufficient for a reasonably wide range of uses.

@vchuravy
Copy link
Member

If someone want to take on JuliaPackaging/Yggdrasil#4776 that would be great. Then we can provide binaries for OpenMPI & MPICH as well as our portability layer MPItrampoline

@wcwitt
Copy link

wcwitt commented Oct 6, 2022

I see there has been some progress here (JuliaPackaging/Yggdrasil#5130), but I can't tell what, if anything, still needs to happen. Any advice?

@vchuravy
Copy link
Member

@wcwitt if you want to take a look, you could start with #80. I won't have time to finish this any time soon, but that's my believe in how we could start rewriting Elemental.jl to make use of the multi-MPI support we now have in MPI 0.20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants