Skip to content

Decision Engine integration test

Vito Di Benedetto edited this page Feb 23, 2023 · 25 revisions

Decision Engine installation

VM setup (FermiCloud)

On FermiCloud, setup the VM and make sure to include in the setup the "gwms-ports" and "extra_http_portsp" Security Groups. This will allow GlideinWMS to work properly with the VM. In case an existing VM is repurposed for DE, the Security Group can still be added to the VM. For good measure make sure the VM is up-to-date

yum update -y

Decision Engine installation

The general installation guide for Decision Engine is available here.

For convenience below are instructions to install and configure Decision Engine to run integration tests.

Install and configure Decision Engine to run the integration test.

Decision Engine uses a PostgreSQL database back-end and Redis as message broker and cache.

It is need to install first PostgreSQL, Redis, and then the Decision engine framework (decisionengine) and then the standard channels (decisionengine_modules).

This document assumes a system installation performed as root, while decisionengine service will run as decisionengine user. All commands are supposed to be executed as root, unless differently stated.

Install PostgreSQL

The default postgresql installed on RH7 is 9.2 which is outdated. Suggest to remove it and install 12 instead :

  1. Remove old postgresql
yum erase -y postgresql*
  1. Install postgresql 12
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install -y postgresql12 postgresql12-server
# optional, also: postgresql12-devel
  1. Enable postgresql
systemctl enable postgresql-12
  1. Init the database
/usr/pgsql-12/bin/postgresql-12-setup initdb
  1. edit /var/lib/pgsql/12/data/pg_hba.conf like the following:
[root@fermicloud371 ~]# diff  /var/lib/pgsql/12/data/pg_hba.conf~ /var/lib/pgsql/12/data/pg_hba.conf
80c80
< local   all             all                                     peer
---
> local   all             all                                     trust
82c82
< host    all             all             127.0.0.1/32            ident
---
> host    all             all             127.0.0.1/32            trust
84c84
< host    all             all             ::1/128                 ident
---
> host    all             all             ::1/128                 trust

This is setting the authentication method to trust.

  1. start the database
systemctl start postgresql-12
  1. create decisionengine
createdb -U postgres decisionengine

The schema and the connection will be created and configured during the Decision Engine framework installation.

To use the database it is needed to add it to the environment

export PG_VERSION=12
export PATH="/usr/pgsql-${PG_VERSION}/bin:~/.local/bin:$PATH"

Install Decision Engine prerequisites

  1. Prerequisites setup. Make sure that required yum repositories and some required packages (python3, gcc, ...) are installed and up to date.
yum install -y http://ftp.scientificlinux.org/linux/scientific/7x/repos/x86_64/yum-conf-softwarecollections-2.0-1.el7.noarch.rpm
# epel-release-latest-7 could already be installed
yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# gcc, swig and make are needed for dependencies (jsonnet)
yum -y install python3 python3-pip python3-setuptools python3-wheel \
    gcc gcc-c++ make \
    python3-devel swig openssl-devel git rpm-build
python3 -m pip install --upgrade --user pip
python3 -m pip install --upgrade --user setuptools wheel setuptools-scm[toml]

# To install the modules it also needs GlideinWMS Frontend, which is in the OSG repository.
# Assuming the use of OSG 3.6 that supports both GSI and tokens, here is a brief summary of the setup:
yum install -y yum-priorities
yum install -y https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm
# HTCondor 9.0.x, required by GlideinWMS, is in the osg repository.
# GlideinWMS 3.10.x is in osg repository.
# There is the package "rrdtool" required by GlideinWMS exists in slf-primary and osg repositories,
# the version we need is in osg repository, so we need to disable slf-primary to install GlideinWMS.
# Change the Epel repository priority to make sure that comes after the OSG repositories, which are 98. Make sure that epel has:
priority=99
vi /etc/yum.repos.d/epel.repo
  1. Install GlideinWMS components The complete version of the GlideinWMS installation instructions is available here

A minimal GlideinWMS installation for Decision Engine is the following:

yum --disablerepo=slf-primary -y install glideinwms-vofrontend-libs \
    glideinwms-vofrontend-glidein voms-clients-cpp osg-ca-certs \
    fetch-crl vo-client glideinwms-minimal-condor glideinwms-userschedd \
    glideinwms-usercollector glideinwms-vofrontend-core \
    glideinwms-vofrontend-httpd httpd globus-proxy-utils
  1. Install other utilities
yum -y install fermilab-util_kx509

Start required services

In order to run Decision Engine, it is needed to start some services

systemctl start httpd
systemctl start condor
systemctl start fetch-crl-cron
systemctl enable fetch-crl-boot

Install Decision Engine and the standard modules

Decision Engine RPMs are made available as release assets in GitHub.

RPMs for Decision Engine 2.0.0:

  • Decision Engine RPM is available here
  • Decision Engine modules RPM is available here

Install Decision Engine RPMs

yum -y install decisionengine-* decisionengine_modules-*

decisionengine user setup

To complete the setup, it is necessary to install some Python dependencies.
To avoid to pollute the system Python, install them for the decisionengine user who is running the service.

ksu decisionengine -e /bin/bash
python3 -m pip install --upgrade pip setuptools wheel --user
python3 -m pip install --user jsonnet==0.17.0
python3 -m pip install --user tabulate toposort structlog
python3 -m pip install --user "bill-calculator-hep>=0.1.4" "boto3>=1.17.10"
python3 -m pip install --user "gcs-oauth2-boto-plugin>=2.7" "google-api-python-client>=1.12.8"
python3 -m pip install --user "google_auth<2dev,>=1.16.0" "urllib3>=1.26.2"
python3 -m pip install --user wheel DBUtils sqlalchemy
python3 -m pip install --user pandas==1.1.5 numpy==1.19.5
python3 -m pip install --user "psycopg2-binary >= 2.8.6; platform_python_implementation == 'CPython'"
python3 -m pip install --user boto3 google_auth google-api-python-client
python3 -m pip install --user gcs-oauth2-boto-plugin
python3 -m pip install --user "cherrypy>=18.6.0" "kombu[redis]>=5.2.0rc1" "prometheus-client>=0.10.0"
python3 -m pip install --user "psutil>=5.8.0" "typing_extensions==4.1.1"

Now the decisionengine session can be closed to get back to the root session.

Setup proxies

  • Create the frontend proxy:
mkdir /var/de/
pushd /etc/grid-security/
grid-proxy-init -cert hostcert.pem -key hostkey.pem -valid 999:0 -out /var/de/fe_proxy
popd
  • Create the user proxy:
export X509_USER_PROXY=/var/de/vo_proxy
kinit <user_principal> #this is needed if the kerberos ticket is not forwarded
kx509
voms-proxy-init -rfc -dont-verify-ac -noregen -voms fermilab -valid 120:0
  • Set proper permission and ownership for the proxies:
chmod 600 /var/de/vo_proxy /var/de/fe_proxy
chown decisionengine: /var/de/vo_proxy /var/de/fe_proxy

Decision Engine configuration

Configuration templates are available in the contrib repository.

Files from the folder decisionengine go in the directory /etc/decisionengine.
There are two file to edit, job_classification.libsonnet and glideinwms.libsonnet, there is a placeholder @FERMICLOUDNODE@ that needs to be replaced with the fermicloud host name.

The condor_mapfile template from the folder condor goes in /etc/condor/certs/condor_mapfile.
This template also has placeholders that need to be replaced with the fermicloud host DN and the user DN.
To make sure to have the right DNs, they can be retrieved with:

  • the host DN
openssl x509 -noout -subject -in  /var/de/fe_proxy | cut -c 10- | sed -re 's#/CN=[0-9]{8,10}##g'
  • the user DN
openssl x509 -noout -subject -in  /var/de/vo_proxy | cut -c 10- | sed -re 's#/CN=[0-9]{8,10}##g'

Setup Redis

Install and start the message broker (Redis) as pod container.

yum install podman
podman run --name decisionengine-redis -p 127.0.0.1:6379:6379 -d redis:6 --loglevel warning

More details about Redis are available in this redis document.

Create GWMS frontend configuration

For this step it is needed to run:

chown -R decisionengine: /var/lib/gwms-frontend
systemctl start decisionengine
ksu decisionengine -e /usr/bin/python3 /usr/lib/python3.6/site-packages/decisionengine_modules/glideinwms/configure_gwms_frontend.py

This command will create the file /var/lib/gwms-frontend/vofrontend/de_frontend_config

At this point it is needed to stop decisionengine service and remove the Redis container:

systemctl stop decisionengine
podman stop decisionengine-redis
podman rm <container id>

Now all should be ready to run Decision Engine.

Run Decision Engine integration test

The procedure to run Decision Engine is as follow:

  • Reset decisionengine DB:
dropdb -U postgres decisionengine
createdb -U postgres decisionengine
  • Run Redis container:
podman run --name decisionengine-redis -p 127.0.0.1:6379:6379 -d redis:6 --loglevel warning
  • Start decisionengine service and check its status:
systemctl start decisionengine
sleep 5
systemctl status decisionengine

Submit a test job

If decisionengine service is running, it is possible to submit test jobs.

  • Make sure channel and sources are STEADY
ksu decisionengine -e /bin/bash
de-client --status
  • prepare a Condor submission file mytest.submit with the following content:
#  A test Condor submission file - mytest.submit
executable = /bin/hostname
universe = vanilla
+DESIRED_Sites = "ITB_FC_CE2"
log = test.log
output = test.out.$(Cluster).$(Process)
error = test.err.$(Cluster).$(Process)
queue 1
  • submit the test job
condor_submit mytest.submit
  • check jobs in the queue
condor_q
  • check for available glideins
condor_status

after test jobs are submitted it will take few minutes (usually no more than 10 minutes) to get some glideins and then the job running.

Now the decisionengine session can be closed to get back to the root session.

Stop Decision Engine service

Finally to stop Decision Engine service and remove the Redis container:

systemctl stop decisionengine.service
podman stop decisionengine-redis
podman rm <container id>