Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for sbatch #55

Closed
dr-br opened this issue Aug 4, 2021 · 9 comments
Closed

Support for sbatch #55

dr-br opened this issue Aug 4, 2021 · 9 comments

Comments

@dr-br
Copy link

dr-br commented Aug 4, 2021

We intend to use enroot for providing containerized Jupyter environments.
batchspawner for Jupyterhub relies on the usage of sbatch.

Would it be possible to add the pyxis functionalities also to sbatch? Or is there a known workaround?

Thanks and best regards from Karlsruhe! ;)

@flx42
Copy link
Member

flx42 commented Aug 4, 2021

Hello @dr-br,

Do you intend to use srun within the containerized sbatch?
If yes, then it is a little bit tricky to enable proper Slurm support inside the pyxis container, see #31

If not, then it should be easier to enable but I suppose it would only make sense for single node jobs.

@dr-br
Copy link
Author

dr-br commented Aug 5, 2021

Dear Felix,
sbatch and srun are run natively on the host. We are currently investigating, how to use srun with batchspawner in conjunction with sbatch, up to now, without success.

Primary focus is single node at the moment.

edit: clarified usacase

@flx42
Copy link
Member

flx42 commented Aug 6, 2021

@dr-br could you try the following branch? https://github.com/NVIDIA/pyxis/tree/2021-08-06/sbatch-salloc-support
It enables all the pyxis arguments for salloc and sbatch.

sbatch

You might need to edit the pyxis plugstack config if you are not using the default value for SlurmSpoolDir in slurm.conf. The default is the following:

$ scontrol show config | grep Spool
SlurmdSpoolDir          = /var/spool/slurmd

If you use a different path, you will need to use the new plugstack option to override the path, for example:

$ cat /etc/slurm/plugstack.conf.d/pyxis.conf 
required /usr/local/lib/slurm/spank_pyxis.so slurmd_spool_dir=/var/run/slurmd

This is required because we can't query the Slurm configuration from a SPANK plugin (I will open an RFE against Slurm for this).

salloc

Support for salloc comes for "free". But the behavior depends on the slurm configuration you use, if you use LaunchParameters=use_interactive_step, then you will immediately land inside the container:

$ salloc --container-image=ubuntu:18.04 --no-container-mount-home
salloc: Granted job allocation 292
pyxis: importing docker image ...

root@node-1:/# grep PRETTY /etc/os-release 
PRETTY_NAME="Ubuntu 18.04.5 LTS"

Without use_interactive_step, the container arguments will be applied to the first srun:

$ salloc --container-image=ubuntu:18.04 --no-container-mount-home
salloc: Granted job allocation 293

$ grep PRETTY /etc/os-release 
PRETTY_NAME="Ubuntu 21.04"

$ srun --pty bash
pyxis: importing docker image ...

root@ioctl:/# grep PRETTY /etc/os-release 
PRETTY_NAME="Ubuntu 18.04.5 LTS"

multi-node

As I mentioned above, it's tricky to use a containerized multi-node sbatch (as you have to go to great lengths to enable srun inside your container), but the current patch is not trying to prevent this from happening.

@flx42
Copy link
Member

flx42 commented Aug 6, 2021

And here is a run.sub example:

#!/bin/bash -eux
#SBATCH --container-image ubuntu:18.04

grep PRETTY /etc/os-release

Usage:

$ sbatch run.sub 
Submitted batch job 294

$ cat slurm-294.out 
pyxis: importing docker image ...
+ grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu 18.04.5 LTS"

@flx42
Copy link
Member

flx42 commented Aug 7, 2021

Actually, please try v2 of the patch instead, on this branch: https://github.com/NVIDIA/pyxis/tree/2021-08-06/sbatch-salloc-support-v2

It will avoid the need to specify the spool dir, as it will only bind mount the sbatch script and not the whole dir.

@dr-br
Copy link
Author

dr-br commented Aug 9, 2021

We will try and give you feedback ASAP.
Thanks a lot!

@dr-br
Copy link
Author

dr-br commented Aug 11, 2021

Feedback:
Works as expected :)
We got JupyterHub + batchspawner to start containerized JupyterLab images.

Will this feature be mainlined?

Thanks a lot
Samuel

@flx42
Copy link
Member

flx42 commented Aug 11, 2021

Yes, it will be mainlined, I'll do a bit more testing and push to the main branch.

@flx42
Copy link
Member

flx42 commented Aug 11, 2021

Pushed in this commit: 6833333
The only difference with the 2021-08-06/sbatch-salloc-support-v2 is that admins will be able to disable support for sbatch/salloc in the plugstack configuration file, in case there are concerns about users running multi-node containerized sbatch jobs and being confused. By default it is enabled so it should work for your use case.

This is not part of a pyxis release yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants