-
Notifications
You must be signed in to change notification settings - Fork 26
Development notes
What information is required as input to the cluster/nodes.
Groups:
login
compute
control
Group/host vars:
Key things:
- All slurmd nodes (login, compute) need to know control address
- Control node needs to know login address when using openondemand.
Full list:
-
Cluster name. Var
openhpc_cluster_name
. REQUIRED in environment inventory -
Slurmctld address. Var
openhpc_slurm_control_host
. Default in common:all:openhpc ={{ groups['control'] | first }}
. NB: maybe should use.internal_address
? -
Partition definitions. Var
openhpc_slurm_partitions
. Default in common:all:openhpc is single 'compute' partition. NB: requires group"{{ openhpc_cluster_name }}_compute"
in environment inventory. Could check groups during validation?? -
If using
nfs
: Varnfs_server
. Default in common:all:nfs isnfs_server_default
->"{{ hostvars[groups['control'] | first ].internal_address }}"
. -
All the "service endpoints" in common:all:defaults:
elasticsearch_address: "{{ hostvars[groups['opendistro'].0].api_address }}" prometheus_address: "{{ hostvars[groups['prometheus'].0].api_address }}" openondemand_address: "{{ hostvars[groups['openondemand'].0].api_address if groups['openondemand'] | count > 0 else '' }}"
-
All the secrets in envionrment:all:secrets - see secret role's defaults:
- grafana, elasticsearch, mysql (x2) passwords (all potentially depending on group placement)
- munge key (for all openhpc nodes)
Which roles can we ONLY run the install tasks from, to build a cluster-independent(*)/no-config image?
In-appliance roles:
- basic_users: n/a
- block_devices: n/a
- filebeat: n/a but downloads Docker container at service start)
- grafana-dashboards: Downloads grafana dashboards
- grafana-datasources: n/a
- hpctests: n/a but reqd. packages are installed as part of
openhpc_default_packages
. - opendistro: n/a but downloads Docker container at service start.
- openondemand:
-
main.yml
unnamed task does rpm installs using osc.ood:install-rpm.yml -
main.yml
unnamed task does rpm installs using pam_auth.yml. -
main.yml
[unnamed task] does git downloads using osc.ood:install-apps.yml -
jupyter_compute.yml
: Does package installs -
vnc_compute.yml
: Does package installs
-
- passwords: n/a
- podman:
prereqs.yml
Does package installs
Out of appliance roles:
- stackhpc.nfs: [main.yml(https://github.com/stackhpc/ansible-role-cluster-nfs/blob/master/tasks/main.yml) installs packages.
- stackhpc.openhpc: Required and
openhpc_packages
(see above) installed in install.yml but requiresopenhpc_slurm_service
fact set frommain.yml
. - cloudalchemy.node_exporter:
-
install.yml does binary download from github but also propagation. Could pre-download it and use
node_exporter_binary_local_dir
but install.yml still needs running as it does user creation too. - selinux.yml also does package installations
-
install.yml does binary download from github but also propagation. Could pre-download it and use
- cloudalchemy.blackbox-exporter: Currently unused.
- cloudalchemy.prometheus: install.yml. Same comments as for
cloudalchemy.node_exporter
above. - cloudalchemy.alertmanager: Currently unused.
- cloudalchemy.grafana: install.yml does package updates.
- geerlingguy.mysql: setup-RedHat.yml does package updates BUT needs variables.yml running to load appropriate variables.
- jriguera.configdrive: Unused, should be deleted.
- osc.ood: See
openondemand
above.
- It's not really cluster-independent as which features are turned on where may vary.