Skip to content

Latest commit

 

History

History
121 lines (103 loc) · 5.27 KB

README.md

File metadata and controls

121 lines (103 loc) · 5.27 KB

Kafka

  • AKA Apache Kafka.
  • Open-source distributed event streaming platform.
  • Ensuring a continuous flow and interpretation of data so that the right information is at the right place, at the right time.
  • Battle-tested, distributed, highly scalable, elastic, fault-tolerant, and secure solution.
  • Can be deployed on bare-metal hardware, virtual machines, and containers.
  • Supports on-premise servers and cloud environments.

Event streaming

  • Digital equivalent of the human body's Central Nervous System (CNS).
  • Technological foundation for the 'always-on' world; where the user of software is other softwares.
  • It means:
    • Capturing data in real-time.
    • From different event sources: e.g. databases, sensors, mobile devices, cloud services, and software applications.
    • In the form of streams of events.
    • To store these event streams durably for:
      • Later retrieval.
      • Manipulating.
      • Processing.
      • Reacting to the event streams in real-time as well as retrospectively.
      • Routing the event streams to different destinations.

Event streaming use cases

  • Processing payments and financial transactions in real-time (e.g. stock exchanges, banks, and insurances).
  • Track and monitor cars, trucks, fleets, and shipments in real-time (e.g. logistics and the automotive industry).
  • Capture and analyze sensor data from IoT devices or other equipment (e.g. inspections with robots).
  • Collect and immediately react to customer interactions and orders.
  • Monitor patients in hospital care and predict changes in condition.
  • Foundation for data platforms, event-driven architectures, and microservices.

Key capabilities

  1. Pub/sub pattern.

  2. Storing streams of events durably and reliably.

    [!NOTE]

    Kafka's performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.

  3. Live or retrospective processing.

How it works

  • A distributed system consisting of servers and clients.
    • Client: SDK that read, write, and process streams of events.
  • Communicates via a high-performance TCP network protocol.

How Kafka works infographic

  • Producers and consumers are fully decoupled and agnostic of each other, resulting high scalability.

Glossary

# Topic:
A channel for categorizing events.
A topic is similar to a folder in a filesystem.
Multi-producer and multi-subscriber.
Every topic can be replicated, even across geo-regions or datacenters, so that there are always multiple brokers that have a copy of the data. A common production setting is a replication factor of 3, i.e., there will always be three copies of your data.
# Event:
AKA record or message.
Usually has a key, value, timestamp, and optional metadata headers. Here's an example event.
Similar to the files in a folder (topic).
Can be read as often as needed (but can also guarantee to process events exactly-once).
# Partitioning:
Topics are partitioned.
A topic is spread over a number of "buckets" located on different Kafka brokers.
Important for scalability, because it allows client apps to read/write data from/to many brokers at the same time.
# Producer:
Client apps that send data to our Kafka topics.
# Consumer:
Client apps that receive data from Kafka topics by subscribing to the events.
# Server:
A cluster of one or more servers that can span multiple datacenters or cloud regions.
Some of these servers form the storage layer, called the brokers.
Some manages data distribution.
  • Version format mirrors the Kafka format; <scala version>-<kafka version>.
  • Customize any Kafka parameters by adding them as environment variables, learn more.