How to go from files to distributed matrix. #74

dh-ilight · 2021-10-11T15:16:12Z

I have files each holding 1 column of an array. I would like to create an Elemental.DistMatrix from these files.
I would like to load the DistMatrix in parallel.
An earlier question was answered by pointing to Elemental/test/lav.jl
I made the following program by extracting from lav.jl. It works for a single node and hangs for 2 nodes using mpiexecjl.
I am using Julia 1.5 on a 4 core machine running Centos 7.5
Please let me know what is wrong with the program and how to load my column array files in parallel.
I intend to eventually run a program using DistMatrix on a computer with hundreds of cores.

# to import MPIManager
using MPIClusterManagers, Distributed

# Manage MPIManager manually -- all MPI ranks do the same work
# Start MPIManager
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)

# Init an Elemental.DistMatrix
@everywhere function spread(n0, n1)
println("start spread")
height = n0*n1
width = n0*n1
h= El.Dist(n0)
w= El.Dist(n1)
A = El.DistMatrix(Float64)
El.gaussian!(A, n0, n1) # how to init size ?
localHeight = El.localHeight(A)
println("localHeight ", localHeight)
El.reserve(A, 6*localHeight) # number of queue entries
println("after reserve")
for sLoc in 1:localHeight
s = El.globalRow(A, sLoc)
x0 = ((s-1) % n0) + 1
x1 = div((s-1), n0) + 1
El.queueUpdate(A, s, s, 11.0)
println("sLoc $sLoc, x0 $x0")
if x0 > 1
El.queueUpdate(A, s, s - 1, -10.0)
println("after q")
end
if x0 < n0
El.queueUpdate(A, s, s + 1, 20.0)
end
if x1 > 1
El.queueUpdate(A, s, s - n0, -30.0)
end
if x1 < n1
El.queueUpdate(A, s, s + n0, 40.0)
end
# The dense last column
# El.queueUpdate(A, s, width, floor(-10/height))
end # for
println("before processQueues")
El.processQueues(A)
println("after processQueues") # with 2 nodes never gets here
return A
end

@mpi_do manager begin
using MPI, LinearAlgebra, Elemental
const El = Elemental
res = spread(4,4)
println( "res=" , res)

# Manage MPIManager manually:
# Elemental needs to be finalized before shutting down MPIManager
# println("[rank $(MPI.Comm_rank(comm))]: Finalizing Elemental")
Elemental.Finalize()
# println("[rank $(MPI.Comm_rank(comm))]: Done finalizing Elemental")
end # mpi_do

# Shut down MPIManager
MPIClusterManagers.stop_main_loop(manager)

Thank you

JBlaschke · 2021-10-11T16:16:44Z

Based on a NERSC user ticket which inspired #73 @andreasnoack

~~@dhiepler can you put the code snippet in a code block (put ```julia at the beginning and ``` at the end)~~

andreasnoack · 2021-10-12T07:00:43Z

The program looks right to me. To debug this, I'd try to remove the MPIClusterManagers, Distributed parts and then run the script with mpiexec like we do in

Elemental.jl/test/runtests.jl

Line 19 in 8308915

run(`$exec -np $nprocs $(Base.julia_cmd()) $(joinpath(@__DIR__, f))`)

JBlaschke · 2021-10-12T21:28:03Z

FTR @dhiepler on Cori that would be

srun -n $NUM_RANKS julia path/to/test.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to go from files to distributed matrix. #74

How to go from files to distributed matrix. #74

dh-ilight commented Oct 11, 2021 •

edited

Loading

JBlaschke commented Oct 11, 2021 •

edited

Loading

andreasnoack commented Oct 12, 2021

JBlaschke commented Oct 12, 2021

How to go from files to distributed matrix. #74

How to go from files to distributed matrix. #74

Comments

dh-ilight commented Oct 11, 2021 • edited Loading

JBlaschke commented Oct 11, 2021 • edited Loading

andreasnoack commented Oct 12, 2021

JBlaschke commented Oct 12, 2021

dh-ilight commented Oct 11, 2021 •

edited

Loading

JBlaschke commented Oct 11, 2021 •

edited

Loading