Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS CloudWatch to FJ [WIP] #209

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions deploy/deploy-cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,60 @@
gather_facts: false
roles:
- role: caktus.k8s-web-cluster
tasks:
- name: Create Amazon CloudWatch Metrics namespace
tags: cloudwatch
ronardcaktus marked this conversation as resolved.
Show resolved Hide resolved
community.kubernetes.k8s:
context: "{{ k8s_context|mandatory }}"
kubeconfig: "{{ k8s_kubeconfig }}"
name: "{{ k8s_aws_cloudwatch_metrics_namespace }}"
api_version: v1
kind: Namespace
state: present
- name: Add AWS CloudWatch Metrics helm chart (monitoring)
tags: cloudwatch
community.kubernetes.helm:
context: "{{ k8s_context|mandatory }}"
kubeconfig: "{{ k8s_kubeconfig }}"
chart_repo_url: "https://aws.github.io/eks-charts"
chart_ref: aws-cloudwatch-metrics
# https://artifacthub.io/packages/helm/aws/aws-cloudwatch-metrics
chart_version: "{{ k8s_aws_cloudwatch_metrics_chart_version }}"
release_name: aws-cloudwatch-metrics
release_namespace: "{{ k8s_aws_cloudwatch_metrics_namespace }}"
release_values:
clusterName: trafficstops-stack-cluster
wait: yes
- name: Create alarms
tags: cloudwatch
amazon.aws.cloudwatch_metric_alarm:
state: present
region: us-east-2
name: "{{ item.name }}"
description: "{{ item.description }}"
metric: "{{ item.metric }}"
namespace: "ContainerInsights"
dimensions:
ClusterName: trafficstops-stack-cluster
statistic: Average
comparison: "{{ item.comparison }}"
threshold: "{{ item.threshold }}"
period: "{{ item.period }}"
evaluation_periods: "{{ item.evaluation_periods }}"
alarm_actions:
- arn:aws:sns:us-east-2:606178775542:FJ_Errors_CloudWatch_Alarms_Topic
loop:
- name: node-cpu-high
description: This will alarm when a instance's CPU usage average is greater than 50% for 15 minutes.
metric: node_cpu_utilization
comparison: GreaterThanOrEqualToThreshold
threshold: 50
period: 300
evaluation_periods: 3
- name: node-count-low
description: This will alarm when a cluster's node count drops below 2 for 15 minutes.
metric: cluster_node_count
comparison: LessThanThreshold
threshold: 2
period: 300
evaluation_periods: 3
6 changes: 6 additions & 0 deletions deploy/group_vars/all.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,12 @@ k8s_ci_vault_password_arn: arn:aws:secretsmanager:us-east-2:606178775542:secret:
k8s_letsencrypt_email: [email protected]
k8s_iam_users: [copelco]

# aws-cloudwatch-metrics:
# - https://github.com/aws/eks-charts/tree/master/stable/aws-cloudwatch-metrics
# - https://artifacthub.io/packages/helm/aws/aws-cloudwatch-metrics
k8s_aws_cloudwatch_metrics_chart_version: "0.0.9"
k8s_aws_cloudwatch_metrics_namespace: amazon-cloudwatch

# Pin ingress-nginx and cert-manager to current versions so future upgrades of this
# role will not upgrade these charts without your intervention:
# https://github.com/kubernetes/ingress-nginx/releases
Expand Down
11 changes: 11 additions & 0 deletions docs/hosting-services.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ The services configured for this project are:
* Papertrail logging (to Caktus account)
* New Relic Infrastructure monitoring (Account: `[email protected]`)

## Monitoring

Amazon CloudWatch Metrics receives data via the [aws-cloudwatch-metrics](https://github.com/aws/eks-charts/tree/master/stable/aws-cloudwatch-metrics)
Helm chart. To view metrics, login to the AWS account (via the Caktus AssumeRole, above), then:

- Go to CloudWatch
- Click "All Metrics"
- Click "ContainerInsights"
- Drill down as needed

CloudWatch Alarms can be created via Ansible, e.g., to provide an alert on high CPU utilization. See `deploy/deploy-cluster.yml` and add to the "Create alarms" task, as needed.

## Production database disaster recovery

Expand Down