-
Notifications
You must be signed in to change notification settings - Fork 26
Decision Engine integration test
On FermiCloud, setup the VM and make sure to include in the setup the "gwms-ports" and "extra_http_portsp" Security Groups. This will allow GlideinWMS to work properly with the VM. In case an existing VM is repurposed for DE, the Security Group can still be added to the VM. For good measure make sure the VM is up-to-date
yum update -y
The general installation guide for Decision Engine is available here.
For convenience below are instructions to install and configure Decision Engine to run integration tests.
Decision Engine uses a PostgreSQL database back-end and Redis as message broker and cache.
It is need to install first PostgreSQL, Redis, and then the Decision engine framework (decisionengine) and then the standard channels (decisionengine_modules).
This document assumes a system installation performed as root
, while decisionengine service will run as decisionengine
user.
All commands are supposed to be executed as root
, unless differently stated.
The default postgresql installed on RH7 is 9.2 which is outdated. Suggest to remove it and install 12 instead :
- Remove old postgresql
yum erase -y postgresql*
- Install postgresql 12
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install -y postgresql12 postgresql12-server
# optional, also: postgresql12-devel
- Enable postgresql
systemctl enable postgresql-12
- Init the database
/usr/pgsql-12/bin/postgresql-12-setup initdb
- edit
/var/lib/pgsql/12/data/pg_hba.conf
like the following:
[root@fermicloud371 ~]# diff /var/lib/pgsql/12/data/pg_hba.conf~ /var/lib/pgsql/12/data/pg_hba.conf
80c80
< local all all peer
---
> local all all trust
82c82
< host all all 127.0.0.1/32 ident
---
> host all all 127.0.0.1/32 trust
84c84
< host all all ::1/128 ident
---
> host all all ::1/128 trust
This is setting the authentication method to trust.
- start the database
systemctl start postgresql-12
- create decisionengine
createdb -U postgres decisionengine
The schema and the connection will be created and configured during the Decision Engine framework installation.
To use the database it is needed to add it to the environment
export PG_VERSION=12
export PATH="/usr/pgsql-${PG_VERSION}/bin:~/.local/bin:$PATH"
- Prerequisites setup. Make sure that required yum repositories and some required packages (python3, gcc, ...) are installed and up to date.
yum install -y http://ftp.scientificlinux.org/linux/scientific/7x/repos/x86_64/yum-conf-softwarecollections-2.0-1.el7.noarch.rpm
# epel-release-latest-7 could already be installed
yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# gcc, swig and make are needed for dependencies (jsonnet)
yum -y install python3 python3-pip python3-setuptools python3-wheel \
gcc gcc-c++ make \
python3-devel swig openssl-devel git rpm-build
python3 -m pip install --upgrade --user pip
python3 -m pip install --upgrade --user setuptools wheel setuptools-scm[toml]
# To install the modules it also needs GlideinWMS Frontend, which is in the OSG repository.
# Assuming the use of OSG 3.6 that supports both GSI and tokens, here is a brief summary of the setup:
yum install -y yum-priorities
yum install -y https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm
# HTCondor 9.0.x, required by GlideinWMS, is in the osg repository.
# GlideinWMS 3.10.x is in osg repository.
# There is the package "rrdtool" required by GlideinWMS exists in slf-primary and osg repositories,
# the version we need is in osg repository, so we need to disable slf-primary to install GlideinWMS.
# Change the Epel repository priority to make sure that comes after the OSG repositories, which are 98. Make sure that epel has:
priority=99
vi /etc/yum.repos.d/epel.repo
- Install GlideinWMS components The complete version of the GlideinWMS installation instructions is available here
A minimal GlideinWMS installation for Decision Engine is the following:
yum --disablerepo=slf-primary -y install glideinwms-vofrontend-libs \
glideinwms-vofrontend-glidein voms-clients-cpp osg-ca-certs \
fetch-crl vo-client glideinwms-minimal-condor glideinwms-userschedd \
glideinwms-usercollector glideinwms-vofrontend-core \
glideinwms-vofrontend-httpd httpd globus-proxy-utils
- Install other utilities
yum -y install fermilab-util_kx509
In order to run Decision Engine, it is needed to start some services
systemctl start httpd
systemctl start condor
systemctl start fetch-crl-cron
systemctl enable fetch-crl-boot
Decision Engine RPMs are made available as release assets in GitHub.
- Decision Engine release assets.
- Decision Engine modules release assets
RPMs for Decision Engine 2.0.0:
Install Decision Engine RPMs
yum -y install decisionengine-* decisionengine_modules-*
To complete the setup, it is necessary to install some Python dependencies.
To avoid to pollute the system Python, install them for the decisionengine
user who is running the service.
ksu decisionengine -e /bin/bash
python3 -m pip install --upgrade pip setuptools wheel --user
python3 -m pip install --user jsonnet==0.17.0
python3 -m pip install --user tabulate toposort structlog
python3 -m pip install --user "bill-calculator-hep>=0.1.4" "boto3>=1.17.10"
python3 -m pip install --user "gcs-oauth2-boto-plugin>=2.7" "google-api-python-client>=1.12.8"
python3 -m pip install --user "google_auth<2dev,>=1.16.0" "urllib3>=1.26.2"
python3 -m pip install --user wheel DBUtils sqlalchemy
python3 -m pip install --user pandas==1.1.5 numpy==1.19.5
python3 -m pip install --user "psycopg2-binary >= 2.8.6; platform_python_implementation == 'CPython'"
python3 -m pip install --user boto3 google_auth google-api-python-client
python3 -m pip install --user gcs-oauth2-boto-plugin
python3 -m pip install --user "cherrypy>=18.6.0" "kombu[redis]>=5.2.0rc1" "prometheus-client>=0.10.0"
python3 -m pip install --user "psutil>=5.8.0" "typing_extensions==4.1.1"
Now the decisionengine
session can be closed to get back to the root
session.
- Create the frontend proxy:
mkdir /var/de/
pushd /etc/grid-security/
grid-proxy-init -cert hostcert.pem -key hostkey.pem -valid 999:0 -out /var/de/fe_proxy
popd
- Create the user proxy:
export X509_USER_PROXY=/var/de/vo_proxy
kinit <user_principal> #this is needed if the kerberos ticket is not forwarded
kx509
voms-proxy-init -rfc -dont-verify-ac -noregen -voms fermilab -valid 120:0
- Set proper permission and ownership for the proxies:
chmod 600 /var/de/vo_proxy /var/de/fe_proxy
chown decisionengine: /var/de/vo_proxy /var/de/fe_proxy
Configuration templates are available in the contrib repository.
Files from the folder decisionengine
go in the directory /etc/decisionengine
.
There are two file to edit, job_classification.libsonnet
and glideinwms.libsonnet
, there is a placeholder @FERMICLOUDNODE@
that needs to be replaced with the fermicloud host name.
The condor_mapfile
template from the folder condor
goes in /etc/condor/certs/condor_mapfile
.
This template also has placeholders that need to be replaced with the fermicloud host DN and the user DN.
To make sure to have the right DNs, they can be retrieved with:
- the host DN
openssl x509 -noout -subject -in /var/de/fe_proxy | cut -c 10- | sed -re 's#/CN=[0-9]{8,10}##g'
- the user DN
openssl x509 -noout -subject -in /var/de/vo_proxy | cut -c 10- | sed -re 's#/CN=[0-9]{8,10}##g'
Install and start the message broker (Redis) as pod container.
yum install podman
podman run --name decisionengine-redis -p 127.0.0.1:6379:6379 -d redis:6 --loglevel warning
More details about Redis are available in this redis document.
For this step it is needed to run:
chown -R decisionengine: /var/lib/gwms-frontend
systemctl start decisionengine
ksu decisionengine -e /usr/bin/python3 /usr/lib/python3.6/site-packages/decisionengine_modules/glideinwms/configure_gwms_frontend.py
This command will create the file /var/lib/gwms-frontend/vofrontend/de_frontend_config
At this point it is needed to stop decisionengine service and remove the Redis container:
systemctl stop decisionengine
podman stop decisionengine-redis
podman rm <container id>
Now all should be ready to run Decision Engine.
The procedure to run Decision Engine is as follow:
- Reset decisionengine DB:
dropdb -U postgres decisionengine
createdb -U postgres decisionengine
- Run Redis container:
podman run --name decisionengine-redis -p 127.0.0.1:6379:6379 -d redis:6 --loglevel warning
- Start decisionengine service and check its status:
systemctl start decisionengine
sleep 5
systemctl status decisionengine
If decisionengine service is running, it is possible to submit test jobs.
- Make sure channel and sources are
STEADY
ksu decisionengine -e /bin/bash
de-client --status
- prepare a Condor submission file
mytest.submit
with the following content:
# A test Condor submission file - mytest.submit
executable = /bin/hostname
universe = vanilla
+DESIRED_Sites = "ITB_FC_CE2"
log = test.log
output = test.out.$(Cluster).$(Process)
error = test.err.$(Cluster).$(Process)
queue 1
- submit the test job
condor_submit mytest.submit
- check jobs in the queue
condor_q
- check for available glideins
condor_status
after test jobs are submitted it will take few minutes (usually no more than 10 minutes) to get some glideins and then the job running.
Now the decisionengine
session can be closed to get back to the root
session.
Finally to stop Decision Engine service and remove the Redis container:
systemctl stop decisionengine.service
podman stop decisionengine-redis
podman rm <container id>