Scaling Analysis Plugins

Jump to bottom

Mike Trinkala edited this page Nov 8, 2017 · 2 revisions

Monolithic

Pros

conceptually easy i.e. fire up a single plugin to monitor all inputs

Cons

usually becomes a bottleneck for things like ingestion monitors as it has to consume all messages
cfgs and data structures become more complicated as they have to be setup as nested maps
code become more complicated as it needs pruning of expired entries
alerting requires an ever increasing number of possible inject_message calls leading to the limit being set very high or unrestricted. By default these will not be deploy able through the Hindsight Admin UI.

Individual

Pros

easy to reason about
- simplifies the design of the plugin
  - easier configuration for alerting and thresholds (avoids nested look-ups and modifying monolithic cfgs)
  - reduces the need for pruning code
more flexible
- easy spin up/down as necessary (can be automated with dynamic loading)
- easier to create a cfg template for a single instance than a monolithic cfg
scales better
- each plugin only processes a subset of the data
- load is spread out over different threads

Cons

pollutes the plugin list i.e. 100 inputs == 100 monitors (this could be addressed in the UI presentation)
cost of the additional messages matchers as each one will run and reject 99% of the messages
- In the current message matcher design this will quickly become an issue so improvements are needed:
  1. Map based router (create map using a common attribute i.e. Type == 'x' and perform a lookup instead of evaluating every matcher)
  2. Tree based router (group related matchers together failing entire branches in a single evaluation)
  3. More cache friendly matching. The current benchmarks show there is room for up to a 10x improvement

Scaling When Individual Plugins Are Still Too Slow

Partitioning

Random (uses UUID)
Consistent (use some in-message identifier i.e. Fields[sampleId])

Pros

Can allow heavier analysis to be run without back pressuring the the system

Cons

Manual process to configure the partitioning and balance the work between threads
Requires the extra step of down stream aggregation