Monitoring and observability strategy

Monitoring and observability and allows teams to watch, debug and understand the state of their systems.

Clone this repo and document your monitoring strategy here:

Content

Tips and hints

Monitoring is tooling or a technical solution that allows teams to watch and understand the state of their systems. Monitoring is based on gathering predefined sets of metrics or logs.

Observability is tooling or a technical solution that allows teams to actively debug their system. Observability is based on exploring properties and patterns not defined in advance.

Monitoring and observability solutions are designed to do the following:

Provide leading indicators of an outage or service degradation.
Detect outages, service degradations, bugs, and unauthorized activity.
Help debug outages, service degradations, bugs, and unauthorized activity.
Identify long-term trends for capacity planning and business purposes.
Expose unexpected side effects of changes or added functionality.

From Google - How to implement monitoring and observability

Monitoring is used in combination with a working optimization setup and incident management procedure

Tips and hints

Add tracing to your systems
Add logging to your systems
Monitor the golden four signals (latency, error rate, traffic, saturation)
As simple as possible, no simpler
Create useful dashboards not impressive dashboards
Avoid false positives at all cost, all alerts must be actionable
Formalize your optimization strategy
Formalize your incident management procedures

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitoring-strategy.md

monitoring-strategy.md

Monitoring and observability strategy

Tips and hints

Files

monitoring-strategy.md

Latest commit

History

monitoring-strategy.md

File metadata and controls

Monitoring and observability strategy

Tips and hints