Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request Slurm RuntimeManager #268

Open
tsjackson-noaa opened this issue Aug 19, 2021 · 0 comments
Open

Request Slurm RuntimeManager #268

tsjackson-noaa opened this issue Aug 19, 2021 · 0 comments
Assignees
Labels
feature-request New feature or request framework Issue pertains to the framework code

Comments

@tsjackson-noaa
Copy link
Contributor

tsjackson-noaa commented Aug 19, 2021

What problem will this feature solve?
The framework currently runs PODs in separate subprocesses on a single processor, relying on the OS's scheduling to share compute and memory. This is inadequate for analysis of larger volumes of data, e.g. as generated during current high-resolution runs of GFDL CM4/MOM6. Functionality is needed to scale multi-POD execution beyond the limitations of a single node.

This request is not especially urgent, but does reflect a real-world use case. The workaround currently suggested for GFDL is to submit a batch job for each POD as a separate run of the framework, and to collect and reorganize the output manually.

Describe the solution you'd like
The proposal made in this issue is to implement a RuntimeManager that submits each POD to a slurm scheduler as a separate batch job. As with the current SubprocessRuntimeManager, the framework's execution would block until all jobs complete or return errors, with the status being logged, and the output from each POD would be written to a separate directory in the overall MDTF_* output directory for the run.

The user would select between this RuntimeManager and the SubprocessRuntimeManager via the --runtime_manager setting, using the existing plug-in mechanism. CLI options specific to this RuntimeManager should allow the user to pass arbitrary directives (e.g. requested run time) through to sbatch, although some of these (working directory, path to stdout/stderr) should be set by the framework.

Aspects of the implementation would necessarily be site-specific, e.g. to allow for use of different file transfer protocols between nodes of the cluster, and to make use of shared filesystems, if any.

Another site-specific detail would be whether the pre-processed model input data could be placed on a filesystem that's mounted on all the nodes, or whether the input data for each POD would need to be transferred to the node responsible for running it. This latter scenario is more general, but more complicated, as it requires communication between the POD batch job, the scheduler, and the framework's process.

Describe alternatives you've considered
N/A

Additional context

@tsjackson-noaa tsjackson-noaa added feature-request New feature or request framework Issue pertains to the framework code labels Aug 19, 2021
@wrongkindofdoctor wrongkindofdoctor self-assigned this Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request framework Issue pertains to the framework code
Projects
None yet
Development

No branches or pull requests

2 participants