This repository has been archived by the owner on Jul 14, 2024. It is now read-only.

Monitoring Metrics Alerts

Jump to bottom

Victor Voisin edited this page Apr 25, 2014 · 1 revision

Here are some metrics to watch for, in order to avoid a critical situation leading to a lose in performance.

Page Faults

Normal : 30 - 45% Req
Worrying : 50 - 60% Req
Critical : > 75% Req
- Add Shard.
- Add RAM.

-> Working set does not fit in RAM.

Replication Lag

Normal : Ideal 0s
Worrying : > 60s
Critical : > 240s
- Bandwidth insufficient (Among others).

Disk Space

N : < 80%
W : > 85%
C : > 94%
- Add storage.

Slow Query

N : < 250ms
W : +50%
C : +70%
- Improve Indexes & Queries.

-> Bad Queries, Bad Indexes.

CPU

N : ...
W : +50%
C : *2
- Add Java/Mongo node.
- Add CPU.

NetOut

N : <80% Bandwidth
W : >85% Bandwidth
C : >95% Bandwidth
- Add Bandwidth.

-> Not enough Bandwidth.

? Number of connections ?

N : ...
W : +50%
C : *2
- Add Shard.

-> More connections to Mongo.

DB Lock %

This indicator can produce many false positive.

N : Depends on usage, >60% if write regularly, <10% if read majority.
W : *2
C : > 80%
- Add Shard.
- Improve Indexes.