You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've had on and off discussions about this for a while, one of the core decisions with flux early on, is that we give the user what they asked for. If they ask for 2 cores per task, they get two cores per task (slurm does not do this). The trouble comes from what we do when the arguments don't fully specify the user's request. Some examples:
-N 1: Is this one whole node, or one task with one core on one node?
Slurm: 1 node, or 1 core, or 1 toaster, unclear, on our systems consistently one node
Flux: 1 node unless a number of tasks or cores is specified, then expresses only "fit everything else into one node"
-N 1 -n 4:
Slurm: Carve one node up into four pieces (on our systems, can also mean 4 cores in some configurations)
Flux: Four tasks, each with one core, on one node
-n 8:
Slurm: carve up some number of nodes for 8 tasks, unclear even here
Flux: run 8 tasks, somewhere, one core each
The main point of confusion, specifically for LLNL users, is that since all of our systems configure slurm consistently (except the non-exclusive ones) users have been trained to expect that -N x -n y will carve up all resources in x nodes into sufficient pieces to provide to y tasks. Flux by default treats it as running y tasks on one core each.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We've had on and off discussions about this for a while, one of the core decisions with flux early on, is that we give the user what they asked for. If they ask for 2 cores per task, they get two cores per task (slurm does not do this). The trouble comes from what we do when the arguments don't fully specify the user's request. Some examples:
The main point of confusion, specifically for LLNL users, is that since all of our systems configure slurm consistently (except the non-exclusive ones) users have been trained to expect that
-N x -n y
will carve up all resources in x nodes into sufficient pieces to provide to y tasks. Flux by default treats it as running y tasks on one core each.Beta Was this translation helpful? Give feedback.
All reactions