Skip to content
Steve Brasier edited this page Feb 17, 2023 · 29 revisions

Use a hook from a parent environment

Hooks from parent environments don't get run by default as site.yml relies on APPLIANCES_ENVIRONMENT_ROOT to find them. But you can explicitly run them using :

# environments/child/hooks/pre/yml
- name: Import parent hook
  import_playbook: "{{ lookup('env', 'APPLIANCES_ENVIRONMENT_ROOT') }}/../parent/hooks/pre.yml"

where child and parent are the environment names.

Create a cluster with a combined compute/control/login node

  • Create a group control containing just the first compute node, and add that group into hpctests, e.g. in environments/<myenv>/inventory/groups:

    [control]
    combined-compute-0
    
    [hpctests:children]
    control
  • Do NOT create a group login (that is for login-only nodes)

  • Create a post-hook like this:

    - hosts: control
      become: true
      tasks:
        - name: Prevent ansible_user's processes being killed on compute nodes at job completion
          replace:
            path: /etc/slurm/slurm.epilog.clean
            regexp: 'if \[ \$SLURM_UID -lt 100 \] ; then'
            replace: "if [[ $SLURM_UID -lt 100 || $SLURM_JOB_USER -eq {{ ansible_user }} ]] ; then"

Rerun slurm configuration fast(ish)

You can rerun the slurm setup (e.g. for partition/node changes, slurm template debugging) faster using:

$ ansible-playbook ansible/slurm.yml --tags openhpc --skip-tags install

Run CI and merge a PR from a fork

Github won't (by default) inject secrets into forked repos, so the OpenStack-based CI won't run as the runner won't have access to the credentials needed to access our OpenStack. In addition, the repo is configured to require approval on workflows from forked repos (which should therefore be denied, because they can't do anything useful).

The proposed approach is therefore as follows:

  • Review the PR for correctness.
  • Review the PR for safety, i.e. no changes which could leak the repository secrets, provide access to or leak information about our infrastructure
  • Get changes made until happy.
  • Create a new branch and change the merge target for that PR to that new branch.
  • Merge the PR into the new branch - this will run CI.
  • Make tweaks as necessary.
  • Go through normal (internal) review to merge new branch into main.
  • Merge new branch into main.

Setup an autoscaling cluster (DRAFT)

To create an autoscaling cluster requires a compute node image to be build, and defined (along with other things) in the partition definition. If it is desired to create a directly-configured cluster to check configuration before building images this potentially causes a cyclical problem. The ordering below is a suggested way to avoid this when setting up a cluster from scratch. Some notes on alternatives are given after the steps.

  1. Create an app cred (by default in ~/.config/openstack/clouds.yaml, or set both variables autoscale_clouds and openhpc_rebuild_clouds with an alternative path).
  2. Create and configure an initial cluster with e.g. 2x nodes and check desired functionality (e.g. check fileshares, IAM, run hpctests, check Open Ondemand, monitoring, etc.).
  3. Run packer build (for compute and login). Note image IDs are shown in packer/packer-manifest.json.
  4. Optionally, reimage login node.
  5. Optionally, reimage compute nodes and check cluster is still up to test image is OK.
  6. If no non-autoscaling nodes are desired, delete the compute nodes and remove them from the group definition(s) <openhpc_cluster_name>_<partition_name>.
  7. Add cloud_ definitions to partition definitions, e.g.:
    openhpc_slurm_partitions:
    - name: small
      cloud_nodes: autoscale-small-[0-1]
      cloud_instances:
        flavor: general.v1.tiny
        image: 34e88d94-9b36-4d73-abfb-df98acea5513
        keypair: slurm-app-ci
        network: stackhpc-ci-geneve
    
  8. Set any autoscaling parameters: See stackhpc.slurm_openstack_tools.autoscale/README.md#role-variables AND the subsequent stackhpc.openhpc role variables section.
  9. Rerun ansible/slurm.yml --tags openhpc --skip-tags install to push this info into the Slurm configuration (the tags/skip-tags arguments are just for speed).
  10. Optionally, login to cluster and check sinfo shows above cloud node names in powered-down (~) state.
  11. Optionally, run hpctests again to check nodes respond.

The CI implements a similar approach except that the cluster has 2x non-autoscaling and 2x autoscaling nodes. For speed, it also builds images in parallel with the initial direct configuration of the 2x non-autoscaling nodes using pre- and post- hooks.

Note: to recover failed autoscaling nodes, check if state is shown as power-ing up (#) in sinfo. If it is, wait for it to change to DOWN (i.e. node did not "resume" within ResumeTimeout) then run exceeded run scontrol update state=resume nodename=.... It should change back to idle~ state.

Configure for Open Ondemand

The CI uses basic auth with a predefined user (which is not rocky, as $HOME for rocky is not on the shared NFS):

Create a suitable image for openstack

If using volume-backed instances you almost-certainly want a raw format image, e.g.:

qemu-img convert -f qcow2 -O raw openhpc-220808-1510.qcow2 openhpc-220808-1510.raw
openstack image create --disk-format raw --min-disk 20 --file openhpc-220808-1510.raw openhpc-220808-1510.raw

Then set properties, potentially:

openstack image set \
--property hw_rng_model=virtio \
--property hw_scsi_model=virtio-scsi \
--property hw_disk_bus=scsi \
--property hw_vif_model=virtio \
<image_name>

Define nodes as being in multiple partitions

The openhpc role needs something like this:

openhpc_slurm_partitions:
    - name: small
    - name: duplicate
      partition_params: 
        Nodes: demo-compute-[0-1] 

In this case there's a group demo_small containing nodes demo-compute-{0,1} but no group named demo_duplicate. Result (didn't actually try running jobs, cluster was down for another reason):

[root@demo-control rocky]# sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
small         up 60-00:00:0      2  down* demo-compute-[0-1]
duplicate*    up 60-00:00:0      2  down* demo-compute-[0-1]

Use a DIB-built image from the rocky-container element

Problem: TASK [podman : Reset podman database] from bootstrap.yml will fail with

newuidmap: write to uid_map failed: Operation not permitted\n"
Error: cannot setup namespace using "/bin/newuidmap": should have setuid or have filecaps setuid: exit status 1

Solution: Add something similar to this to environments/$ENVIRONMENT/hooks/pre.yml:

- hosts: podman
  become: yes
  gather_facts: no
  tasks:
    - name: Let podman control namespaces
      # RL GenericCloud images already have these file capabilities set
      community.general.capabilities: "{{ item }}"
      loop:
        - path: /bin/newuidmap
          capability: cap_setuid+ep
          state: present
        - path: /bin/newgidmap
          capability: cap_setgid+ep
          state: present

Discussion: These capabilities are already set in a RockyLinux GenericCloud image during the shadow-utils package install. Presumably the Dockerfile can't replicate this.

Use a user-specific venv for a Jupyter notebook launched through Open Ondemand

e.g. on the login node:

cd
python3 -m venv venv
. venv/bin/activate
pip install -U pip
pip install ipykernel
python -m ipykernel install --user --name py376 --display-name "Python 3.7.6"

Then restart your notebook, and select Kernel > Change Kernel > Python 3.7.6.

You can now pip install things in that venv.