Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault filling halo regions with Partition(y=2) #3878

Open
glwagner opened this issue Oct 29, 2024 · 14 comments
Open

Segmentation fault filling halo regions with Partition(y=2) #3878

glwagner opened this issue Oct 29, 2024 · 14 comments
Labels
distributed 🕸️ Our plan for total cluster domination

Comments

@glwagner
Copy link
Member

glwagner commented Oct 29, 2024

Not sure how this is possible, but the following code throws a segfault:

using Oceananigans
using Oceananigans.BoundaryConditions: fill_halo_regions!

partition = Partition(y=2)
arch = Distributed(GPU(); partition)
x = y = z = (0, 1)
grid = RectilinearGrid(arch; size=(16, 16, 16), x, y, z, topology=(Periodic, Periodic, Bounded))
c = CenterField(grid)
fill_halo_regions!(c)

I'm running with

$ mpiexecjl -n 2 julia --project test_interpolate.jl

(I found this error originally when trying to interpolate a field, but it seems it boils down to a halo filling issue)

This is the error I get:

[ Info: Oceananigans will use 32 threads
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: Oceananigans will use 32 threads
[ Info: MPI has not been initialized, so we are calling MPI.Init().

[116989] signal (11.2): Segmentation fault
in expression starting at /orcd/data/raffaele/001/glwagner/OceananigansPaper/listings/test_interpolate.jl:10
__memcpy_ssse3 at /lib64/libc.so.6 (unknown line)
MPIDI_CH3_iSendv at /orcd/data/raffaele/001/glwagner/.julia/artifacts/e85c0a68e07fee0ee7b19c2abc210b1af2f4771a/lib/libmpi.so (unknown line)
MPIDI_CH3_EagerContigIsend at /orcd/data/raffaele/001/glwagner/.julia/artifacts/e85c0a68e07fee0ee7b19c2abc210b1af2f4771a/lib/libmpi.so (unknown line)
MPID_Isend at /orcd/data/raffaele/001/glwagner/.julia/artifacts/e85c0a68e07fee0ee7b19c2abc210b1af2f4771a/lib/libmpi.so (unknown line)
MPI_Isend at /orcd/data/raffaele/001/glwagner/.julia/artifacts/e85c0a68e07fee0ee7b19c2abc210b1af2f4771a/lib/libmpi.so (unknown line)
MPI_Isend at /orcd/data/raffaele/001/glwagner/.julia/packages/MPI/TKXAj/src/api/generated_api.jl:2151 [inlined]
Isend at /orcd/data/raffaele/001/glwagner/.julia/packages/MPI/TKXAj/src/pointtopoint.jl:66
Isend at /orcd/data/raffaele/001/glwagner/.julia/packages/MPI/TKXAj/src/pointtopoint.jl:70 [inlined]
Isend at /orcd/data/raffaele/001/glwagner/.julia/packages/MPI/TKXAj/src/pointtopoint.jl:70 [inlined]
send_south_halo at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:317
#fill_south_and_north_halo!#50 at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:263
fill_south_and_north_halo! at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:250
unknown function (ip: 0x2aaac8afa8b6)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#fill_halo_event!#40 at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:208
fill_halo_event! at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:193
unknown function (ip: 0x2aaac8aefb2e)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#fill_halo_regions!#38 at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:114
fill_halo_regions! at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:101 [inlined]
#fill_halo_regions!#37 at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:90 [inlined]
fill_halo_regions! at /orcd/data/raffaele/001/glwagner/Oceananigans.jl/src/DistributedComputations/halo_communication.jl:87
unknown function (ip: 0x2aaac8ad0ee5)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
_include at ./loading.jl:2136
include at ./Base.jl:495
jfptr_include_46447.1 at /orcd/data/raffaele/001/glwagner/Software/julia-1.10.5/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
exec_options at ./client.jl:318
_start at ./client.jl:552
jfptr__start_82798.1 at /orcd/data/raffaele/001/glwagner/Software/julia-1.10.5/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 26236174 (Pool: 26209699; Big: 26475); GC: 35

I'll test CPU then try to see if this situation is tested.

@glwagner glwagner added the distributed 🕸️ Our plan for total cluster domination label Oct 29, 2024
@glwagner
Copy link
Member Author

Why don't we test the distributed NonhydrostaticModel here?

if CPU() archs
for partition in [Partition(1, 4), Partition(2, 2), Partition(4, 1)]
@info "Time-stepping a distributed NonhydrostaticModel with partition $partition..."
arch = Distributed(; partition)
grid = RectilinearGrid(arch, topology=(Periodic, Periodic, Periodic), size=(8, 8, 8), extent=(1, 2, 3))
model = NonhydrostaticModel(; grid)

or are there tests elsewhere?

@glwagner
Copy link
Member Author

The test architectures are specified here:

test_child_arch() = CUDA.has_cuda() ? GPU() : CPU()
function test_architectures()
child_arch = test_child_arch()
# If MPI is initialized with MPI.Comm_size > 0, we are running in parallel.
# We test several different configurations: `Partition(x = 4)`, `Partition(y = 4)`,
# `Partition(x = 2, y = 2)`, and different fractional subdivisions in x, y and xy
if MPI.Initialized() && MPI.Comm_size(MPI.COMM_WORLD) == 4
return (Distributed(child_arch; partition = Partition(4)),
Distributed(child_arch; partition = Partition(1, 4)),
Distributed(child_arch; partition = Partition(2, 2)),
Distributed(child_arch; partition = Partition(x = Fractional(1, 2, 3, 4))),
Distributed(child_arch; partition = Partition(y = Fractional(1, 2, 3, 4))),
Distributed(child_arch; partition = Partition(x = Fractional(1, 2), y = Equal())))
else
return tuple(child_arch)
end
end

This was hard to find at first

@glwagner
Copy link
Member Author

Are the distributed GPU tests actually running?

I see this:

https://buildkite.com/clima/oceananigans-distributed/builds/4081#0192d4e4-191f-48e1-a943-d82377d8a125/189-1099

And then subsequently it looks like the architecture is Distributed{CPU}.

We need a better way to specify the test architectures?

@glwagner
Copy link
Member Author

@simone-silvestri

@simone-silvestri
Copy link
Collaborator

Damn, it looks like the tests on the GPU are not working because CUDA is not loaded properly.
I am trying to address this in #3880. A segmentation fault probably means the MPI is not CUDA-aware. Typically, the MPI that is shipped with MPI_jll is not cuda-aware. A good way to check is

julia> using MPI

julia> MPI.has_cuda()
true

@glwagner
Copy link
Member Author

Thank @simone-silvestri, it turns out that I wasn't using CUDA-aware MPI.

#3883 addresses this by adding an error if CUDA-aware MPI is not available, so that we are not confronted with a mysterious segmentation fault (which could be caused by any number of issues, not just CUDA-aware MPI).

Since we don't have GPU tests right now I will also check to make sure that this runs with a proper CUDA-aware MPI.

@liuchihl
Copy link
Contributor

liuchihl commented Jan 8, 2025

I have reproduced the segmentation fault using the same MWE above, but withsrun -n 2 julia --project test_multiGPU.jl.

I was actually expecting openmpi/4.1.5+cuda (a module that can be loaded) to be CUDA-aware, but it is not as shown below:

julia> using MPI

julia> MPI.has_cuda()
false

Is there something I am missing here?

@glwagner
Copy link
Member Author

glwagner commented Jan 8, 2025

Not sure, what cluster are you using?

@liuchihl
Copy link
Contributor

liuchihl commented Jan 8, 2025

I am using Delta

@liuchihl
Copy link
Contributor

liuchihl commented Jan 8, 2025

In this OpenMPI doc, the test seems to show that the MPI I am using is supported by CUDA:

$ ompi_info | grep "MPI extensions"
[dt-login02.delta.ncsa.illinois.edu:95561] mca: base: components_register: registering framework ras components
[dt-login02.delta.ncsa.illinois.edu:95561] mca: base: components_register: found loaded component simulator
[dt-login02.delta.ncsa.illinois.edu:95561] mca: base: components_register: component simulator register function successful
[dt-login02.delta.ncsa.illinois.edu:95561] mca: base: components_register: found loaded component slurm
[dt-login02.delta.ncsa.illinois.edu:95561] mca: base: components_register: component slurm register function successful
          MPI extensions: affinity, cuda, pcollreq
[dt-login02.delta.ncsa.illinois.edu:95561] mca: base: close: unloading component simulator
[dt-login02.delta.ncsa.illinois.edu:95561] mca: base: close: unloading component slurm

And

$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
[dt-login02.delta.ncsa.illinois.edu:208020] mca: base: components_register: registering framework ras components
[dt-login02.delta.ncsa.illinois.edu:208020] mca: base: components_register: found loaded component simulator
[dt-login02.delta.ncsa.illinois.edu:208020] mca: base: components_register: component simulator register function successful
[dt-login02.delta.ncsa.illinois.edu:208020] mca: base: components_register: found loaded component slurm
[dt-login02.delta.ncsa.illinois.edu:208020] mca: base: components_register: component slurm register function successful
mca:mpi:base:param:mpi_built_with_cuda_support:value:true
[dt-login02.delta.ncsa.illinois.edu:208020] mca: base: close: unloading component simulator
[dt-login02.delta.ncsa.illinois.edu:208020] mca: base: close: unloading component slurm

Aside from requiring CUDA-aware MPI, could there be other factors causing the segmentation fault error?

@glwagner Do you happen to solve the segfault error when running the MWE in your cluster? I am curious how that goes.
Thanks.

@glwagner
Copy link
Member Author

glwagner commented Jan 9, 2025

Are you sure that this command:

$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value

tests the same MPI implementation that you are using to launch julia for your test?

A related question is: what steps have you taken to ensure that the cluster openmpi (which is loaded as a module) is used to build MPI.jl? This can often be a little tricky. Here is the documentation: https://juliaparallel.org/MPI.jl/stable/configuration/

I wrote up my experience with NCAR's Derecho because I was amazed at how intricate and fragile the process of getting CUDA-aware MPI to work was: #3669

@glwagner
Copy link
Member Author

glwagner commented Jan 9, 2025

I'll test it myself, but note that this is also tested in CI, for example: https://buildkite.com/clima/oceananigans-distributed/builds/5371#01944227-aca9-4485-a7ba-cac6571bf9ff/247-1301

So probably I should close this issue...

@glwagner
Copy link
Member Author

glwagner commented Jan 9, 2025

Okay, I followed the instructions here: #3669, except applying it to the mwe.jl in the top post. The job is currently in the queue so I will report if there are errors or not.

@liuchihl
Copy link
Contributor

liuchihl commented Jan 9, 2025

Thanks for all this helpful information, I will spend some time to understand CUDA-aware MPI more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed 🕸️ Our plan for total cluster domination
Projects
None yet
Development

No branches or pull requests

3 participants