Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI build appears successful but actually excludes MPI #973

Closed
rileychall opened this issue Dec 7, 2023 · 5 comments
Closed

MPI build appears successful but actually excludes MPI #973

rileychall opened this issue Dec 7, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@rileychall
Copy link

Description

I am building an MPI Fortran program using a pretty basic configuration. I have tried this build on three systems and on two of them, this problem occurs. (System info and meta.yaml below) By all indications, the build succeeds. However, when you try to run the executable with MPI, mpi_init just does nothing and the MPI environment is never initialized.

$ fpm build --flag '-ffree-line-length-none'
 + which mpiexec
/usr/bin/mpiexec
emuinit.F                              done.
[...]
core_loop.F                            done.
libemu.a                               done.
emu                                    done.
[100%] Project compiled successfully.

Looking at the library dependencies of the executable produced by fpm shows that the MPI libraries are missing:

$ ldd build/gfortran_735EAD39D1DF8EB7/app/emu
        linux-vdso.so.1 (0x00007ffcfed4e000)
        libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f7e6cd09000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7e6cc22000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7e6cc02000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7e6c9d8000)
        libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f7e6c990000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f7e6dbf9000)

Building the same code in the same environment with make produces an executable with the MPI libraries linked:

$ ldd bin/emu
        linux-vdso.so.1 (0x00007ffeed9ad000)
        libmpi_mpifh.so.40 => /lib/x86_64-linux-gnu/libmpi_mpifh.so.40 (0x00007f55001ea000)
        libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f54fff0f000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f54ffe28000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f54ffe08000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f54ffbde000)
        libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007f54ffaa7000)
        libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007f54ff9f4000)
        libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f54ff9ac000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5500628000)
        libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007f54ff8ef000)
        libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007f54ff891000)
        libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007f54ff85c000)
        libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007f54ff857000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f54ff83b000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007f54ff811000)

Expected Behaviour

The build should not appear successful in this scenario, where the MPI libraries are not linked (or whatever the true underlying problem is). Ideally, the MPI libraries should always link correctly, but if something is preventing that, it should be reported.

Version of fpm

0.9.0

Platform and Architecture

Ubuntu 22.04

Additional Information

Good system:

  • gfortran 9.3.1 and 4.8.5
  • Intel MPI 2021.2.0
  • CentOS 7.9.2009

Bad system 1:

  • gfortran 9.5.0
  • OpenMPI 4.1.2
  • Ubuntu 22.04

Bad system 2:

  • gfortran 9.4.0
  • Intel MPI 2021.2.0
  • Linux Mint 17.1

Bad system 2 is really old, so I was ready to blame that until the same thing happened on a separate, up-to-date system.

Here is the fpm.toml used in all cases:

fpm.toml
name = "emu"

[[executable]]
name = "emu"
source-dir = "src"
main = "emu.F"

[dependencies]
mpi = "*"

[fortran]
implicit-typing = true
implicit-external = true

[preprocess]
[preprocess.cpp]
suffixes = ["F"]
macros = ["EMU_MPI"]

Just in case, I did also test without the preprocessing, which had no effect.

@rileychall rileychall added the bug Something isn't working label Dec 7, 2023
@perazz
Copy link
Contributor

perazz commented Dec 8, 2023

I believe linking of the executable cannot be successful if any calls to MPI libraries are made, and the code is not linking against any libraries that contain them.

Looking at Bad system #1, you have gfortran + OpenMPI which is a supported configuration, so please help us understand if there is a bug:

  • What is the output of the build process of --verbose?
  • It seems like that version of OpenMPI 4.1.2 has issues. Please share the output of mpif90 --showme and mpif90 --showme:link, that may help us understand what's going on better.

@rileychall
Copy link
Author

Sure thing. Here's mpif90 --showme:

gfortran -I/usr/lib/x86_64-linux-gnu/openmpi/lib/../../fortran/gfortran-mod-15/openmpi -I/usr/lib/x86_64-linux-gnu/openmpi/lib -L/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

And mpif90 --showme:link:

-I/usr/lib/x86_64-linux-gnu/openmpi/lib -L/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Here's the verbose build output, in a file because of length:
fpm_build_verbose.txt

@perazz
Copy link
Contributor

perazz commented Dec 9, 2023

Thank you!
From your output, it seems like the MPI libraries are correctly linked into the executable also during the final link step, so, the only reason I believe they may not end up into the executable is because no functions from them are ever called. Could this be related to calls to MPI routines i.e. MPI_Init be due to pre-processor macros ?

@rileychall
Copy link
Author

I think I figured it out. There's a file that defines dummy versions of all the MPI subroutines that are used, I think to accommodate running without MPI after building with MPI. (I inherited this codebase, so I'm not entirely sure of the intent here.) Deleting that file causes the linking to succeed on Bad system 1, and Good system continues to succeed.
It seems like there was a name collision between the dummy and MPI subroutines, where the good case prioritized the MPI and the bad case prioritized the dummy. Does that sound plausible to you?

However, Bad system 2 now shows undefined references to all the MPI subroutines at the final linking step:

[...]
pen_surface.F                          done.
pen_setup.F                            done.
emu_main.F                             done.
libemu.a                               done.
emu                                    failed.
[100%] Compiling...
build/gfortran_D85B94C7BF1DD83C/emu/libemu.a(src_emu_main.F.o): In function `emumain_':
emu_main.F:(.text+0x737): undefined reference to `mpi_finalize_'
emu_main.F:(.text+0x2391): undefined reference to `mpi_finalize_'
build/gfortran_D85B94C7BF1DD83C/emu/libemu.a(src_modules_md_util.F.o): In function `__md_util_MOD_wrapup':
md_util.F:(.text+0xc3f): undefined reference to `mpi_finalize_'
build/gfortran_D85B94C7BF1DD83C/emu/libemu.a(src_exchange_exchange_init.F.o): In function `exchange_init_':
exchange_init.F:(.text+0x4b): undefined reference to `mpi_initialized_'
[...]
exchange_tearing.F:(.text+0xc44): undefined reference to `mpi_allreduce_'
collect2: error: ld returned 1 exit status
<ERROR> Compilation failed for object " emu "
<ERROR> stopping due to failed compilation
STOP 1

@perazz
Copy link
Contributor

perazz commented Dec 13, 2023

This falls back to #974, so I will close this issue.

@perazz perazz closed this as completed Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants