Fix slurm configuration in prod2313 #17
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
openhpc_slurmd_spool_dir
has to be specified instead ofopenhpc_config_extra.SlurmdSpoolDir
, so that the role can actually create the spool dir.openhpc_config_extra
dict is defined bothenvironments/nrel/inventory/group_vars/openhpc/overrides
and inenvironments/{prod,vtest}/inventory/group_vars/all/openhpc-generic-slurm.yml
. So in both environments the desirednrel
config is not actually getting applied.environments/nrel/inventory/group_vars/openhpc/overrides.yml
definesopenhpc_config_extra.StateSaveLocation: /var/spool/slurm/slurmctld
. Looking at the terraform forprod
andvtest
neither define volumes. And there's no state share inenvironments/nrel/inventory/group_vars/os_manila/overrides.yml
. Is state on a persistent disk at all in the cluster? If not, we should fix this.appliances_state_dir
should be set, then the defaults will do the right thing (once that override is removed). NB: should retrieve the slurmctld state from the current directory BEFORE reimaging the cluster!openhpc_packages_extra_nrel
->openhpc_packages_extra
which won't be applied when using generic slurm. Also this contains a lot of openhpc-specific packages. Also needs to be combined with the exampleopenhpc_generic_packages
provided.openhpc_*_dir
is defined differently innrel
andvtest
- is this required?