Example Kafka lag monitoring of consumer groups using Kafka AdminClient, Kamon and InfluxDB.
NOTE: It does not require access to Zookeeper.
This application is dockerized and the latest image can be found on Docker Hub mwizner/kafka-kamon-lag-monitor.
The application can be configured using the following environment variables:
KAFKA_BOOTSTRAP_SERVERS
- List of Kafka servers used to bootstrap connections to Kafka.KAFKA_SECURITY_PROTOCOL
- Protocol used to communicate with brokers.PLAINTEXT
is the default value, useSSL
to communicate with brokers via SSL.MONITOR_CONSUMER_GROUP
- Consumer group the monitor belongs to.MONITOR_CLIENT_ID
- Client id of the monitor.CONSUMER_GROUPS
- Consumer groups which offset will be monitored, e.g.group1
orgroup1,group2,group3
.POLL_INTERVAL
- (optional) Time interval for polling Kafka, defaults to5 seconds
.GROUP_LAG_HISTOGRAM_NAME
- (optional) Name of the consumer group lag histogram, defaults tokafka-consumer-groups-lag
.GROUP_STATE_HISTOGRAM_NAME
- (optional) Name of the consumer group state histogram, defaults tokafka-consumer-groups-state
.TICK_INTERVAL
- (optional) Time interval for collecting the metrics, defaults to10 seconds
.INFLUXDB_HOSTNAME
- Hostname on which InfluxDB is running.INFLUXDB_PORT
- Port on which InfluxDB is listening.EXIT_ON_ERROR
- (optional) Exit with code 1 on any error thrown in the app, defaults tofalse
. If you're running this in docker, you can use the--restart unless-stopped
flag to restart the app when it exits.
The following settings are optional and apply only if KAFKA_SECURITY_PROTOCOL
is set to SSL
.
KAFKA_SSL_PROTOCOL
- The SSL protocol used to generate the SSLContext, e.g.TLS
.KAFKA_SSL_KEY_PASSWORD
- The password of the private key in the key store file.KAFKA_SSL_KEYSTORE_TYPE
- The file format of the key store file, e.g.PKCS12
.KAFKA_SSL_KEYSTORE_PATH
- The location of the key store file.KAFKA_SSL_KEYSTORE_PASSWORD
- The password for the key store file.KAFKA_SSL_TRUSTSTORE_TYPE
- The file format of the trust store file, e.g.JKS
.KAFKA_SSL_TRUSTSTORE_PATH
- The location of the trust store file.KAFKA_SSL_TRUSTSTORE_PASSWORD
- The password for the trust store file.
An example config:
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
KAFKA_SECURITY_PROTOCOL=SSL
KAFKA_SSL_PROTOCOL=TLS
KAFKA_SSL_KEY_PASSWORD=password
KAFKA_SSL_KEYSTORE_PATH=/etc/opt/kafka-kamon-lag-monitor/keystore.p12
KAFKA_SSL_KEYSTORE_PASSWORD=password
KAFKA_SSL_KEYSTORE_TYPE=PKCS12
KAFKA_SSL_TRUSTSTORE_PATH=/etc/opt/kafka-kamon-lag-monitor/truststore.jks
KAFKA_SSL_TRUSTSTORE_PASSWORD=password
KAFKA_SSL_TRUSTSTORE_TYPE=JKS
MONITOR_CONSUMER_GROUP=lag-monitor
MONITOR_CLIENT_ID=lag-monitor
CONSUMER_GROUPS=service1,service2
INFLUXDB_HOSTNAME=localhost
INFLUXDB_PORT=8189
EXIT_ON_ERROR=true
The app records Kafka metrics using Kamon histogram instruments - the measurements are exposed in InfluxDB as kafka-lag-monitor-timers
and tagged with kafka-consumer-groups-lag
and kafka-consumer-groups-state
entities. Kafka consumer group states get translated into the following values in the histogram:
- 0 -
Dead
orEmpty
- 1 -
Stable
- 2 -
PreparingRebalance
orCompletingRebalance
- 3 - any other states
- 1.3.0 - Introduced a new config option to exit the application on error.
- 1.2.0 - Fixed InfluxDB parsing issue & changed the consumer group state metric value to 3 for any unrecognised consumer group states.
- 1.1.0 - Added metrics for consumer group state.
- 1.0.1 - Made
KAFKA_SECURITY_PROTOCOL
environment variable optional. - 1.0.0 - Forked from bfil/kafka-kamon-lag-monitor and dockerized.