Running Flux on Argonne machines #3538
Replies: 4 comments 18 replies
-
@jameshcorbett: try
And see what this version reports. It could be that If this is the case, we need to turn off binding. |
Beta Was this translation helpful? Give feedback.
-
We also need to debug this. Can you set |
Beta Was this translation helpful? Give feedback.
-
After talking with @dongahn and @SteVwonder yesterday, it turns out installing Flux on Argonne's machines is really not that important---at least for me. The collaborations I was hoping to do on Argonne machines I can do on LC machines instead. So I am just going to forget about these issues for now. |
Beta Was this translation helpful? Give feedback.
-
For the record, there are two things that should be resolved properly before being able to support that platform. This way, we can focus on these issues when we circle back to this.
|
Beta Was this translation helpful? Give feedback.
-
I am trying to get a Flux instance up and running on Theta, an ALCF machine that uses Cobalt as its resource manager. After spack-installing
flux-sched@master
, I tried to start a multi-node Flux instance:(The
-N
option determines the number of tasks per node, not the number of allocated nodes. And the-d 64 -j 4 -cc depth
says to give Flux access to all of the hardware threads on the node. The KNL nodes have 256 hardware threads per node.)It seems that Flux isn't picking up the fact that it's been launched under MPI---it looks like I get two independent Flux instances. It also seems that Flux isn't registering all of the resources available to it:
Any idea what might be going on here? How could I help you find the source of the issue?
In the meanwhile I should be able to make good progress even with Flux instances that can only see part of a single node's resources.
Beta Was this translation helpful? Give feedback.
All reactions