Skip to content
This repository has been archived by the owner on Jul 14, 2024. It is now read-only.

Monitoring Metrics Alerts

Victor Voisin edited this page Apr 25, 2014 · 1 revision

Here are some metrics to watch for, in order to avoid a critical situation leading to a lose in performance.

Page Faults

  • Normal : 30 - 45% Req

  • Worrying : 50 - 60% Req

  • Critical : > 75% Req

    • Add Shard.
    • Add RAM.

-> Working set does not fit in RAM.

Replication Lag

  • Normal : Ideal 0s

  • Worrying : > 60s

  • Critical : > 240s

    • Bandwidth insufficient (Among others).

Disk Space

  • N : < 80%

  • W : > 85%

  • C : > 94%

    • Add storage.

Slow Query

  • N : < 250ms

  • W : +50%

  • C : +70%

    • Improve Indexes & Queries.

-> Bad Queries, Bad Indexes.

CPU

  • N : ...

  • W : +50%

  • C : *2

    • Add Java/Mongo node.
    • Add CPU.

NetOut

  • N : <80% Bandwidth

  • W : >85% Bandwidth

  • C : >95% Bandwidth

    • Add Bandwidth.

-> Not enough Bandwidth.

? Number of connections ?

  • N : ...

  • W : +50%

  • C : *2

    • Add Shard.

-> More connections to Mongo.

DB Lock %

This indicator can produce many false positive.

  • N : Depends on usage, >60% if write regularly, <10% if read majority.

  • W : *2

  • C : > 80%

    • Add Shard.
    • Improve Indexes.