Skip to content

Releases: stackhpc/ansible-role-openhpc

v0.16.0

09 Aug 15:08
3e44e8c
Compare
Choose a tag to compare

What's Changed

  • Fix slurm.conf inventory filtering for nodenames with numeric padding by @sjpb in #143

Full Changelog: v0.15.0...v0.16.0

Fix openhpc_job_maxtime default

16 May 13:27
a06cf7f
Compare
Choose a tag to compare

What's Changed

In previous releases the default maximum job lifetime set by openhpc_job_maxtime was intended to be 24 hours and was documented as such. However due to Ansible/Jinja type conversion this became 60 days on the actual running system. This release changes openhpc_job_maxtime to explicitly be 60 days for backward-compatibility and updates the documentation.

See #136 for full details of the bug.

Full Changelog: v0.13.0...v0.15.0

v0.13.0

29 Apr 14:20
7d77a1c
Compare
Choose a tag to compare

What's Changed

  • Allow no partition definitions by @sjpb in #132
  • Improve openhpc_state_save_location templating by @sjpb in #135

Full Changelog: v0.12.0...v0.13.0

v0.12.0

24 Mar 10:18
065c6b0
Compare
Choose a tag to compare

What's Changed

  • Empty partitions now display as "n/a" (#131)

Full Changelog: v0.11.0...v0.12.0

Slurmdbd startup improvements and support for any partition configuration options

24 Mar 10:16
f51ef58
Compare
Choose a tag to compare

What's Changed

  • Slurm startup should be more reliable; the slurmdbd retarts now wait for the relevant port to become available, as the systemd unit just returns when the binary starts. (#129)
  • Any desired partition parameters can now be set, see openhpc_slurm_partitions.partition_params (#130)

New Contributors

Full Changelog: v0.10.0...v0.11.0

Support autoscale

24 Jan 15:36
a0d6f53
Compare
Choose a tag to compare

This release enables the definition of nodes not controlled by this role, using a new attribute extra_nodes on role variable openhpc_slurm_partitions. This supports e.g. defining autoscaling nodes (using additional logic outside this role).

It also:

  • Improves the templating of slurm.conf so that nodes are specified by pattern rather than individually. This will result in much shorter configuration files for large clusters and improve slurmctld startup time.
  • Clarifies role variable documentation in the main README.md.

Support Rocky Linux / slurm.conf parameters / slurm state location

19 Jan 09:23
42db2fa
Compare
Choose a tag to compare

New features:

  • Add support for Rocky Linux 8.5 (as well as CentOS 7.9). CentOS 8.x is no longer tested.
  • Supply additional parameters to slurm.conf - see role variable openhpc_config.
  • Specify directory to save Slurm state (e.g. to put this on a persistent volume) - see role variable openhpc_state_save_location.

Major fixes:

  • Fix errors in logfile due to login node config: #115

Support OpenHPC v2.1

13 Apr 10:11
54247a5
Compare
Choose a tag to compare

Support for OpenHPC v2.1 (released 6th April 2021).

As packages for OpenHPC v2.0 and v2.1 are provided by the same OpenHPC repos, this role now creates an OpenHPC v2.1 node when using CentOS 8.x (see docs for role var openhpc_release_repo). However the Slurm versions provided by OpenHPC v2.0 and v2.1 (v20.02.5 and v20.11.3) are not compatible. Therefore all new builds of nodes using CentOS 8.x should use this release of this role, and it will be necessary to upgrade entire clusters at once.

Changes:

  • Optional role variable openhpc_munge_key_path specifying a path to a munge key has been replaced by openhpc_munge_key which specifies its content instead.
  • Accounting storage now disabled by default, as Slurm 20.11 does not support the previous default storage type accounting_storage/filetxt (see docs for role var openhpc_slurm_accounting_storage_type). This means sacct returns no information. Either setup the slurm database daemon or configure job accounting. The latter is simpler to enable but only captures limited information about job completion (viewable via sacct -c).
  • Ensures default job completion logfile is writable.
  • Fixes owner/group on slurmdbd configuration file (only an issue for Slurm v20.11.3).
  • Adds molecule tests for job accounting and node deletion.

Add support for configless mode and slurmdbd

26 Feb 10:49
50961a1
Compare
Choose a tag to compare

All changes should be backwards-compatible with v0.6.0. Major enhancements are:

  • Role itself now installs appropriate OpenHPC release repo depending on OS version, rather than this being a prerequisite.
  • When using OpenHPCv2 adds support for slurm's "configless" mode where slurm.conf is present only on the control node (see parameter openhpc_slurm_configless).
  • Can optionally configure slurmdb and accounting to provide enhanced accounting/sacct functionality (see "Accounting" in README).
  • Flexibility added to support image-based approaches to deployment, e.g. can now configure only a control node or only a compute node, or configure but not start services.
  • Cluster munge key can optionally be user-supplied.
  • Node RealMemory parameter in slurm.conf now defaults to ansible-derived value instead of 1MB (see parameters ram_mb and openhpc_ram_multiplier).
  • Testing in CI massively expanded (see directory molecule/).
  • Adds slurm-libpmi-ohpc package by default to to support use of srun with Intel MPI (see Slurm docs).
  • Can skip installing the module system (see parameter openhpc_module_system_install).

OpenHPC 2 support and more flexible host naming

14 Jul 10:07
070f637
Compare
Choose a tag to compare

Now has support for:

  • CentOS 8 and OpenHPC 2
  • CentOS 7 and OpenHPC 1.x

List nodes and partitions in slurm.conf using ansible inventory group contents rather than assuming they follow a stricter naming scheme.