Skip to content

Commit

Permalink
Merge pull request #133 from ARGOeu/devel
Browse files Browse the repository at this point in the history
Version 0.7.0
  • Loading branch information
themiszamani authored Sep 2, 2024
2 parents bebd0b6 + cd820ac commit 2c04eae
Show file tree
Hide file tree
Showing 11 changed files with 2,209 additions and 3,385 deletions.
15 changes: 5 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ topology_groups_filter = type=NGI&tags=certification:Certified
topology_endpoints_filter = tags=monitored:1
attributes = /etc/argo-scg/attributes/attributes-tenant2.conf
secrets = /path/to/secrets
subscription = servicetype
agents_configuration = /path/to/config-file
skipped_metrics = eudat.b2safe.irods-crud, argo.connectors.check
namespace = tenant2_namespace
Expand All @@ -61,18 +60,13 @@ namespace = tenant2_namespace
* `topology_endpoints_filter ` - query parameter(s) used when fetching topology endpoints from Web-API (optional);
* `attributes` - path to the file containing the attributes for the given tenant (optional);
* `secrets` - path to file containing sensitive attributes (e.g. passwords, tokens) (optional);
* `subscription` - type of subscription to use (optional; if not set, it uses the default value). There are three possible values:
* `entity` - entity name is used as subscription,
* `hostname` - hostname is used as a subscription (this is a default value),
* `hostname_with_id` - hostname with id is used as subscription,
* `servicetype` - service types are used as subscription,
* `agents_configuration` - path to configuration file for custom agents' subscriptions (optional);
* `skipped_metrics` - list of metrics that should not be run on Sensu agent (optional). These metrics would then be skipped when doing the configuration for the Sensu agent, even if they do exist in the metric profile;
* `namespace` - Sensu namespace to which the tenant is going to be associated (optional). If not set, tenant is associated to the namespace with the same name as tenant.

#### Agents configuration

If `agents_configuration` setting exists, `scg-reload.py` tool will use only subscription set in the configuration file for the agents listed in the file. The configuration file must have the following form:
If `agents_configuration` setting exists, `scg-reload.py` tool will configure checks with given service types to be run on the agents listed in the file. The configuration file must have the following form:

```
[AGENTS]
Expand Down Expand Up @@ -247,7 +241,7 @@ All the arguments used for filtering can also be combined.

### Namespaces

Multi-tenancy in Sensu is achieved by using namespaces - each tenant has its own namespace with isolated definitions of checks (metrics), entities (endpoints), events, handlers, filters, and pipelines. For each tenant defined in the configuration file, the `scg-reload.py` tool creates a namespace (with the same name as tenant) if it does not exist. Also, if a namespace exists for which there is no tenant definition in the configuration file, that namespace is deleted.
Multi-tenancy in Sensu is achieved by using namespaces - each tenant has its own namespace with isolated definitions of checks (metrics), entities (endpoints), events, handlers, filters, and pipelines. For each tenant defined in the configuration file, the `scg-reload.py` tool creates a namespace (with the same name as tenant, unless specified explicitly in the configuration file) if it does not exist. Also, if a namespace exists for which there is no tenant definition in the configuration file, that namespace is deleted.

### Entities

Expand All @@ -257,7 +251,7 @@ Proxy entities, on the other hand, allow Sensu to monitor external resources on

#### Agent entity

For each tenant, we create a single agent entity which runs the checks for the given tenant. Sensu is scheduling checks based on subscriptions: the subscriptions specified in the Sensu agent definition control which checks the agent will execute. In our system, such subscriptions are actually hostnames of proxy entities configured (and one additional, `internal`, for the internal checks which are executed directly on the agent). The list of subscriptions for agent are handled by `scg-reload.py` tool.
For each tenant, we create a single agent entity which runs the checks for the given tenant. Sensu is scheduling checks based on subscriptions: the subscriptions specified in the Sensu agent definition control which checks the agent will execute. The list of subscriptions for agent are handled by `scg-reload.py` tool.

#### Proxy entity

Expand Down Expand Up @@ -306,7 +300,7 @@ If we take [generic.tcp.connect](https://poem.argo.grnet.gr/ui/public_metrictemp

Checks can fetch information from entities' labels buckets, and they are generated accordingly (case of attributes, special values from topology and/or overrides). The parameters are simply mapped to command if there are no overrides.

Tool also creates a list of hostnames which are going to run the check, and adds them to subscriptions list. It calculates the check interval (in POEM it is defined in minutes, in check definition it needs to be defined in seconds). A fixed timeout of 900 s (15 min) is created for each check - this is in case the probe is left hanging, so that it does not clutter the system.
Tool calculates the check interval (in POEM it is defined in minutes, in check definition it needs to be defined in seconds). A fixed timeout of 900 s (15 min) is created for each check - this is in case the probe is left hanging, so that it does not clutter the system.

In metadata bucket the tool stores the metric name, namespace in which it is defined, and in annotations we define attempts, which is the number defined as `maxCheckAttempts` in POEM. This number is used by `hard-state` filter.

Expand All @@ -326,6 +320,7 @@ This metrics are considered internal, so the alerts are only raised to ARGO Slac
* `NOHOSTNAME` - `{{ .labels.hostname }}` is left out from the check command.
* `NOTIMEOUT` - `-t <TIMEOUT>` parameter is left out from the check command.
* `NOPUBLISH` - pipeline defined for this check is `reduce_alerts`. Its results are sent to Slack channel instead of AMS Publisher.
* `SILENCED` - metrics with this flag are configured not to raise alerts. Used only with specific internal metrics, which are in turn handled differently.
* `PASSIVE` - marks passive metric. These are handled slightly differently. They are not actively running, but generated by results written to fifo file by their active parents. When a metric has this flag, the generated check looks as follows:

```json
Expand Down
5 changes: 4 additions & 1 deletion argo-scg.spec
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Summary: ARGO Sensu configuration manager.
Name: argo-scg
Version: 0.6.3
Version: 0.7.0
Release: 1%{?dist}
Source0: %{name}-%{version}.tar.gz
License: ASL 2.0
Expand Down Expand Up @@ -48,6 +48,9 @@ rm -rf $RPM_BUILD_ROOT


%changelog
* Mon Sep 2 2024 Katarina Zailac <[email protected]> - 0.7.0-1%{?dist}
- ARGO-4824 Improve checks subscriptions
- ARGO-4748 Remove silenced entries associated to deleted entities
* Thu Aug 8 2024 Katarina Zailac <[email protected]> - 0.6.3-1%{?dist}
- ARGO-4791 Take into account possibility of not having site_bdii defined for a site
* Thu Jul 25 2024 Katarina Zailac <[email protected]> - 0.6.2-1%{?dist}
Expand Down
1 change: 0 additions & 1 deletion config/scg.conf
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ metricprofiles = ARGO_MON_TENANT1
topology = /etc/argo-scg/topology_tenant1.json
secrets = /etc/sensu/secrets2
publish = false
subscription = hostname_with_id

[tenant2]
poem_url = https://tenant2.poem.devel.argo.grnet.gr
Expand Down
24 changes: 8 additions & 16 deletions exec/scg-reload.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ def main():
local_topology = config.get_topology()
secrets = config.get_secrets()
publish_bool = config.publish()
subscriptions = config.get_subscriptions()
skipped_metrics = config.get_skipped_metrics()
agents_configurations = config.get_agents_configurations()

Expand Down Expand Up @@ -75,7 +74,6 @@ def main():
tenants_checks = dict()
tenants_entities = dict()
tenants_internal_services = dict()
tenants_subscriptions = dict()
tenants_metric_overrides = dict()
tenants_attribute_overrides = dict()
for tenant in tenants:
Expand Down Expand Up @@ -119,22 +117,26 @@ def main():
agent_config = AgentConfig(
file=agents_configurations[tenant]
)
custom_subs = agent_config.get_custom_subs()
custom_agent_config = agent_config.get_custom_subs()

else:
custom_subs = None
custom_agent_config = None

generator = ConfigurationGenerator(
metrics=poem.get_metrics_configurations(),
profiles=metricprofiles[tenant],
metric_profiles=webapi.get_metric_profiles(),
topology=topology,
profiles=metricprofiles[tenant],
attributes=poem.get_metric_overrides(),
secrets_file=secrets[tenant],
default_ports=poem.get_default_ports(),
tenant=tenant,
default_agent=[
item["metadata"]["name"] for item in
sensu.get_agents(namespace=namespace)
],
skipped_metrics=skipped_metrics[tenant],
subscription=subscriptions[tenant]
agents_config=custom_agent_config
)

tenants_checks.update({
Expand All @@ -152,12 +154,6 @@ def main():
tenant: generator.generate_internal_services()
})

tenants_subscriptions.update({
tenant: generator.generate_subscriptions(
custom_subs=custom_subs
)
})

tenants_metric_overrides.update({
tenant: generator.get_metric_parameter_overrides()
})
Expand All @@ -171,7 +167,6 @@ def main():
checks=tenants_checks,
entities=tenants_entities,
internal_services=tenants_internal_services,
subscriptions=tenants_subscriptions,
metricoverrides4agents=tenants_metric_overrides,
attributeoverrides4agents=tenants_attribute_overrides
)
Expand All @@ -183,7 +178,6 @@ def main():
host_attribute_overrides = \
merger.merge_attribute_overrides()
internal_services = merger.merge_internal_services()
subs = merger.merge_subscriptions()

else:
checks = tenants_checks[tenants[0]]
Expand All @@ -195,7 +189,6 @@ def main():
tenants[0]
]
internal_services = tenants_internal_services[tenants[0]]
subs = tenants_subscriptions[tenants[0]]

sensu.add_daily_filter(namespace=namespace)
sensu.handle_slack_handler(
Expand All @@ -221,7 +214,6 @@ def main():
metric_parameters_overrides=metric_parameter_overrides,
host_attributes_overrides=host_attribute_overrides,
services=internal_services,
subscriptions=subs,
namespace=namespace
)

Expand Down
21 changes: 0 additions & 21 deletions modules/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,27 +223,6 @@ def get_publisher_queue(self):

return queue

def get_subscriptions(self):
subscriptions = dict()
for tenant in self.tenants:
try:
value = self.conf.get(tenant, "subscription")

if value not in [
"entity", "hostname", "hostname_with_id", "servicetype"
]:
raise ConfigException(
f"Unacceptable value '{value}' for option: "
f"'subscription' in section: '{tenant}'"
)

except configparser.NoOptionError:
value = "hostname"

subscriptions.update({tenant: value})

return subscriptions

def get_agents_configurations(self):
configurations = dict()

Expand Down
20 changes: 12 additions & 8 deletions modules/exceptions.py
Original file line number Diff line number Diff line change
@@ -1,36 +1,40 @@
class MyException(Exception):
class SCGException(Exception):
def __init__(self, msg):
self.msg = msg

def __str__(self):
return f"Error: {str(self.msg)}"
return str(self.msg)


class SCGWarnException(SCGException):
pass


class SensuException(MyException):
class SensuException(SCGException):
def __str__(self):
return f"Sensu error: {str(self.msg)}"


class PoemException(MyException):
class PoemException(SCGException):
def __str__(self):
return f"Poem error: {str(self.msg)}"


class WebApiException(MyException):
class WebApiException(SCGException):
def __str__(self):
return f"WebApi error: {str(self.msg)}"


class ConfigException(MyException):
class ConfigException(SCGException):
def __str__(self):
return f"Configuration file error: {str(self.msg)}"


class AgentConfigException(MyException):
class AgentConfigException(SCGException):
def __str__(self):
return f"Agent configuration file error: {str(self.msg)}"


class GeneratorException(MyException):
class GeneratorException(SCGException):
def __str__(self):
return str(self.msg)
Loading

0 comments on commit 2c04eae

Please sign in to comment.