Skip to content

Commit

Permalink
BF: CS-617 reformat sge_execd, sge_qmaster, sge_shadowd, sge_shepherd
Browse files Browse the repository at this point in the history
  • Loading branch information
ernst-bablick committed Sep 25, 2024
1 parent ea6f458 commit 9233d77
Show file tree
Hide file tree
Showing 4 changed files with 143 additions and 253 deletions.
114 changes: 39 additions & 75 deletions doc/markdown/man/man8/sge_execd.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,59 +8,49 @@ date: __DATE__

# NAME

xxqs_name_sxx_execd - xxQS_NAMExx job execution agent
`xxqs_name_sxx_execd` - xxQS_NAMExx job execution agent

# SYNOPSIS

**xxqs_name_sxx_execd** \[ **-help** \]
`xxqs_name_sxx_execd` \[ `-help*` \]

# DESCRIPTION

*xxqs_name_sxx_execd* controls the xxQS_NAMExx queues local to the
machine on which *xxqs_name_sxx_execd* is running and executes/controls
the jobs sent from *xxqs_name_sxx_qmaster*(8) to be run on these
queues.
`xxqs_name_sxx_execd` controls the xxQS_NAMExx queues local to the machine on which `xxqs_name_sxx_execd` is running
and executes/controls the jobs sent from xxqs_name_sxx_qmaster(8) to be run on these queues.

# OPTIONS

## **-help**
## -help

Prints a listing of all options.

# LOAD SENSORS

If a **load sensor** is configured for *xxqs_name_sxx_execd* via either
the global host configuration or the execution-host-specific cluster
configuration (See *xxqs_name_sxx_conf*(5).), the executable path of
the load sensor is invoked by *xxqs_name_sxx_execd* on a regular basis
and delivers one or multiple load figures for the execution host (e.g.
users currently logged in) or the complete cluster (e.g. free disk space
on a network wide scratch file system). The load sensor may be a script
or a binary executable. In either case its handling of the STDIN and
If a *load sensor* is configured for `xxqs_name_sxx_execd` via either the global host configuration or the
execution-host-specific cluster configuration (See xxqs_name_sxx_conf(5)), the executable path of the load sensor
is invoked by `xxqs_name_sxx_execd` on a regular basis and delivers one or multiple load figures for the execution
host (e.g. users currently logged in) or the complete cluster (e.g. free disk space on a network wide scratch
file system). The load sensor may be a script or a binary executable. In either case its handling of the STDIN and
STDOUT streams and its control flow must comply to the following rules:

The load sensor must be written as an infinite loop waiting at a certain
point for input from STDIN. If the string "quit" is read from STDIN, the
load sensor should exit. When an end-of-line is read from STDIN, a load
data retrieval cycle should start. The load sensor then performs
whatever operation is necessary to compute the desired load figures. At
the end of the cycle the load sensor writes the result to stdout. The
The load sensor must be written as an infinite loop waiting at a certain point for input from STDIN. If the string
"quit" is read from STDIN, the load sensor should exit. When an end-of-line is read from STDIN, a load
data retrieval cycle should start. The load sensor then performs whatever operation is necessary to compute the
desired load figures. At the end of the cycle the load sensor writes the result to stdout. The
format is as follows:

- A load value report starts with a line containing only either the
word "start" or the word "begin".
- A load value report starts with a line containing only either the word "start" or the word "begin".

- Individual load values are separated by newlines.

- Each load value report consists of three parts separated by colons
(":") and containing no blanks.
- Each load value report consists of three parts separated by colons (":") and containing no blanks.

- The first part of a load value information is either the name of the
host for which load is reported or the special name "global".
- The first part of a load value information is either the name of the host for which load is reported or the
special name "global".

- The second part is the symbolic name of the load value as defined in
the host or global complex list (see *complex*(5) for details). If
a load value is reported for which no entry in the host or global
- The second part is the symbolic name of the load value as defined in the host or global complex list
(see xxqs_name_sxx_complex(5) for details). If a load value is reported for which no entry in the host or global
complex list exists, the reported load value is not used.

- The third part is the measured load value.
Expand All @@ -69,65 +59,39 @@ format is as follows:

# ENVIRONMENTAL VARIABLES

xxQS_NAME_Sxx_ROOT
Specifies the location of the xxQS_NAMExx standard configuration files.

xxQS_NAME_Sxx_CELL
If set, specifies the default xxQS_NAMExx cell. To address a xxQS_NAMExx
cell *xxqs_name_sxx_execd* uses (in the order of precedence):

> The name of the cell specified in the environment variable
> xxQS_NAME_Sxx_CELL, if it is set.
>
> The name of the default cell, i.e. **default**.
xxQS_NAME_Sxx_DEBUG_LEVEL
If set, specifies that debug information should be written to stderr. In
addition the level of detail in which debug information is generated is
defined.

xxQS_NAME_Sxx_QMASTER_PORT
If set, specifies the tcp port on which *xxqs_name_sxx_qmaster*(8) is
expected to listen for communication requests. Most installations will
use a services map entry for the service "sge_qmaster" instead to define
that port.

xxQS_NAME_Sxx_EXECD_PORT
If set, specifies the tcp port on which *xxqs_name_sxx_execd*(8) is
expected to listen for communication requests. Most installations will
use a services map entry for the service "sge_execd" instead to define
that port.
For a complete list of common environment variables used by all xxQS_NAMExx commands, see xxqs_name_sxx_intro(1).

# RESTRICTIONS

*xxqs_name_sxx_execd* usually is started from root on each machine in
the xxQS_NAMExx pool. If started by a normal user, a spool directory
must be used to which the user has read/write access. In this case only
jobs being submitted by that same user are handled correctly by the
system.
`xxqs_name_sxx_execd` usually is started from root on each machine in the xxQS_NAMExx pool. If started by a
normal user, a spool directory must be used to which the user has read/write access. In this case only jobs being
submitted by that same user are handled correctly by the system.

# FILES

For a complete list of files used by all xxQS_NAMExx commands, see xxqs_name_sxx_intro(1).

**sgepasswd contains a list of user names and their** corresponding
encrypted passwords. If available, the password file will be used by
**sge_execd. To change the contents ** of this file please use the
**sgepasswd command. It is not advised to change ** that file manually.

<xxqs_name_sxx_root>/<cell>/common/configuration
xxQS_NAMExx global configuration
<xxqs_name_sxx_root>/<cell>/common/local_conf/<host>
xxQS_NAMExx host specific configuration
<xxqs_name_sxx_root>/<cell>/spool/<host>
Default execution host spool directory
<xxqs_name_sxx_root>/<cell>/common/act_qmaster
xxQS_NAMExx master host file
## <xxqs_name_sxx_root>/<cell>/common/configuration
xxQS_NAMExx global configuration

## <xxqs_name_sxx_root>/<cell>/common/local_conf/<host>
xxQS_NAMExx host specific configuration

## <xxqs_name_sxx_root>/<cell>/spool/<host>
Default execution host spool directory

## <xxqs_name_sxx_root>/<cell>/common/act_qmaster
xxQS_NAMExx master host file

# SEE ALSO

*xxqs_name_sxx_intro*(1), *xxqs_name_sxx_conf*(5), *complex*(5),
*xxqs_name_sxx_qmaster*(8).
xxqs_name_sxx_intro(1), xxqs_name_sxx_conf(5), xxqs_name_sxx_complex(5), xxqs_name_sxx_qmaster(8).

# COPYRIGHT

See *xxqs_name_sxx_intro*(1) for a full statement of rights and
permissions.
See xxqs_name_sxx_intro(1) for a full statement of rights and permissions.
72 changes: 21 additions & 51 deletions doc/markdown/man/man8/sge_qmaster.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,81 +8,51 @@ date: __DATE__

# NAME

xxqs_name_sxx_qmaster - xxQS_NAMExx master control daemon
`xxqs_name_sxx_qmaster` - xxQS_NAMExx master control daemon

# SYNOPSIS

**xxqs_name_sxx_qmaster** \[ **-help** \]
`xxqs_name_sxx_qmaster` \[ `-help` \]

# DESCRIPTION

*xxqs_name_sxx_qmaster* controls the overall xxQS_NAMExx behavior in a
cluster.
`xxqs_name_sxx_qmaster` controls the overall xxQS_NAMExx behavior in a cluster.

# OPTIONS

-help
## -help
Prints a listing of all options.

# ENVIRONMENTAL VARIABLES

xxQS_NAME_Sxx_ROOT
Specifies the location of the xxQS_NAMExx standard configuration files.
For a complete list of common environment variables used by all xxQS_NAMExx commands, see xxqs_name_sxx_intro(1).

xxQS_NAME_Sxx_CELL
If set, specifies the default xxQS_NAMExx cell. To address a xxQS_NAMExx
cell *xxqs_name_sxx_qmaster* uses (in the order of precedence):
# RESTRICTIONS

> The name of the cell specified in the environment variable
> xxQS_NAME_Sxx_CELL, if it is set.
>
> The name of the default cell, i.e. **default**.
`xxqs_name_sxx_qmaster` is usually started from root on the master or shadow master machines of the cluster
If started by a normal user, a master spool directory must be used to which the user has read/write
access. In this case only jobs being submitted by that same user are handled correctly by the system.

xxQS_NAME_Sxx_DEBUG_LEVEL
If set, specifies that debug information should be written to stderr. In
addition the level of detail in which debug information is generated is
defined.
# FILES

xxQS_NAME_Sxx_QMASTER_PORT
If set, specifies the tcp port on which *xxqs_name_sxx_qmaster*(8) is
expected to listen for communication requests. Most installations will
use a services map entry for the service "sge_qmaster" instead to define
that port.
For a complete list of files used by all xxQS_NAMExx commands, see xxqs_name_sxx_intro(1).

xxQS_NAME_Sxx_EXECD_PORT
If set, specifies the tcp port on which *xxqs_name_sxx_execd*(8) is
expected to listen for communication requests. Most installations will
use a services map entry for the service "sge_execd" instead to define
that port.
## <xxqs_name_sxx_root>/<cell>/common/configuration
xxQS_NAMExx global configuration

# RESTRICTIONS
## <xxqs_name_sxx_root>/<cell>/common/local_conf/<host>
xxQS_NAMExx host specific configuration

*xxqs_name_sxx_qmaster* is usually started from root on the master or
shadow master machines of the cluster (refer to the *xxQS_NAMExx
Installation and Administration Guide* for more information about the
configuration of shadow master hosts). If started by a normal user, a
master spool directory must be used to which the user has read/write
access. In this case only jobs being submitted by that same user are
handled correctly by the system.

# FILES
## <xxqs_name_sxx_root>/<cell>/common/qmaster_args
xxqs_name_sxx_qmaster argument file

<xxqs_name_sxx_root>/<cell>/common/configuration
xxQS_NAMExx global configuration
<xxqs_name_sxx_root>/<cell>/common/local_conf/<host>
xxQS_NAMExx host specific configuration
<xxqs_name_sxx_root>/<cell>/common/qmaster_args
xxqs_name_sxx_qmaster argument file
<xxqs_name_sxx_root>/<cell>/spool
Default master spool directory
## <xxqs_name_sxx_root>/<cell>/spool
Default master spool directory

# SEE ALSO

*xxqs_name_sxx_intro*(1), *xxqs_name_sxx_conf*(5),
*xxqs_name_sxx_execd*(8), *xxqs_name_sxx_shadowd*(8), *xxQS_NAMExx
Installation and Administration Guide*
xxqs_name_sxx_intro(1), xxqs_name_sxx_conf(5), xxqs_name_sxx_execd(8), xxqs_name_sxx_shadowd(8)

# COPYRIGHT

See *xxqs_name_sxx_intro*(1) for a full statement of rights and
permissions.
See xxqs_name_sxx_intro(1) for a full statement of rights and permissions.
94 changes: 34 additions & 60 deletions doc/markdown/man/man8/sge_shadowd.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,92 +8,66 @@ date: __DATE__

# NAME

xxqs_name_sxx_shadowd - xxQS_NAMExx shadow master daemon
`xxqs_name_sxx_shadowd` - xxQS_NAMExx shadow master daemon

# SYNOPSIS

**xxqs_name_sxx_shadowd**
`xxqs_name_sxx_shadowd`

# DESCRIPTION

*xxqs_name_sxx_shadowd* is a "light weight" process which can be run on
so-called shadow master hosts in a xxQS_NAMExx cluster to detect failure
of the current xxQS_NAMExx master daemon, *xxqs_name_sxx_qmaster*(8),
and to start-up a new *xxqs_name_sxx_qmaster*(8) on the host on which
the *xxqs_name_sxx_shadowd* runs. If multiple shadow daemons are active
in a cluster, they run a protocol which ensures that only one of them
`xxqs_name_sxx_shadowd` is a "light weight" process which can be run on so-called shadow master hosts in a
xxQS_NAMExx cluster to detect failure of the current xxQS_NAMExx master daemon, xxqs_name_sxx_qmaster(8),
and to start-up a new xxqs_name_sxx_qmaster(8) on the host on which the `xxqs_name_sxx_shadowd` runs.
If multiple shadow daemons are active in a cluster, they run a protocol which ensures that only one of them
will start-up a new master daemon.

The hosts suitable for being used as shadow master hosts must have
shared root read/write access to the directory
$xxQS_NAME_Sxx_ROOT/$xxQS_NAME_Sxx_CELL/common as well as to the master
daemon spool directory (by default
$xxQS_NAME_Sxx_ROOT/$xxQS_NAME_Sxx_CELL/spool/qmaster). The names of the
shadow master hosts need to be contained in the file
$xxQS_NAME_Sxx_ROOT/$xQS_NAME_Sxx_CELL/common/shadow_masters.
The hosts suitable for being used as shadow master hosts must have shared root read/write access to the directory
\$xxQS_NAME_Sxx_ROOT/\$xxQS_NAME_Sxx_CELL/common as well as to the master daemon spool directory (by default
\$xxQS_NAME_Sxx_ROOT/\$xxQS_NAME_Sxx_CELL/spool/qmaster). The names of the shadow master hosts need to be contained
in the file \$xxQS_NAME_Sxx_ROOT/\$xQS_NAME_Sxx_CELL/common/shadow_masters.

# RESTRICTIONS

*xxqs_name_sxx_shadowd* may only be started by root.
`xxqs_name_sxx_shadowd` may only be started by root.

# ENVIRONMENT VARIABLES

xxQS_NAME_Sxx_ROOT
Specifies the location of the xxQS_NAMExx standard configuration files.

xxQS_NAME_Sxx_CELL
If set, specifies the default xxQS_NAMExx cell. To address a xxQS_NAMExx
cell *xxqs_name_sxx_shadowd* uses (in the order of precedence):

> The name of the cell specified in the environment variable
> xxQS_NAME_Sxx_CELL, if it is set.
>
> The name of the default cell, i.e. **default**.
xxQS_NAME_Sxx_DEBUG_LEVEL
If set, specifies that debug information should be written to stderr. In
addition the level of detail in which debug information is generated is
defined.

xxQS_NAME_Sxx_QMASTER_PORT
If set, specifies the tcp port on which *xxqs_name_sxx_qmaster*(8) is
expected to listen for communication requests. Most installations will
use a services map entry for the service "sge_qmaster" instead to define
that port.
For a complete list of common environment variables used by all xxQS_NAMExx commands, see xxqs_name_sxx_intro(1).

xxQS_NAME_Sxx_DELAY_TIME
This variable controls the interval in which *xxqs_name_sxx_shadowd*
pauses if a takeover bid fails. This value is used only when there are
multiple *xxqs_name_sxx_shadowd* instances and they are contending to be
the master. The default is 600 seconds.
This variable controls the interval in which `xxqs_name_sxx_shadowd` pauses if a takeover bid fails. This value is
used only when there are multiple `xxqs_name_sxx_shadowd` instances, and they are contending to be the master.
The default is 600 seconds.

xxQS_NAME_Sxx_CHECK_INTERVAL
This variable controls the interval in which the *xxqs_name_sxx_shadowd*
checks the heartbeat file (60 seconds by default).
This variable controls the interval in which the `xxqs_name_sxx_shadowd` checks the heartbeat file (60 seconds
by default).

xxQS_NAME_Sxx_GET_ACTIVE_INTERVAL
This variable controls the interval when a *xxqs_name_sxx_shadowd*
instance tries to take over when the heartbeat file has not changed. The
default is 240 seconds.
This variable controls the interval when a `xxqs_name_sxx_shadowd` instance tries to take over when the heartbeat
file has not changed. The default is 240 seconds.

# FILES

<xxqs_name_sxx_root>/<cell>/common
Default configuration directory
<xxqs_name_sxx_root>/<cell>/common/shadow_masters
Shadow master hostname file.
<xxqs_name_sxx_root>/<cell>/spool/qmaster
Default master daemon spool directory
<xxqs_name_sxx_root>/<cell>/spool/qmaster/heartbeat
The heartbeat file.
For a complete list of files used by all xxQS_NAMExx commands, see xxqs_name_sxx_intro(1).

## <xxqs_name_sxx_root>/<cell>/common
Default configuration directory

## <xxqs_name_sxx_root>/<cell>/common/shadow_masters
Shadow master hostname file.

## <xxqs_name_sxx_root>/<cell>/spool/qmaster
Default master daemon spool directory

## <xxqs_name_sxx_root>/<cell>/spool/qmaster/heartbeat
The heartbeat file.

# SEE ALSO

*xxqs_name_sxx_intro*(1), *xxqs_name_sxx_conf*(5),
*xxqs_name_sxx_qmaster*(8), *xxQS_NAMExx Installation and
Administration Guide.*
xxqs_name_sxx_intro(1), xxqs_name_sxx_conf(5), xxqs_name_sxx_qmaster(8),

# COPYRIGHT

See *xxqs_name_sxx_intro*(1) for a full statement of rights and
permissions.
See xxqs_name_sxx_intro(1) for a full statement of rights and permissions.
Loading

0 comments on commit 9233d77

Please sign in to comment.