What are all the key Components of Kafka ?

Apache Kafka is a distributed event-streaming platform with several key components, each playing a critical role in its architecture. Here’s an overview of Kafka’s components:


1. Topics

  • A topic is a category or feed name to which records are published.
  • Topics are:
    • Partitioned: Each topic is divided into partitions for parallelism and scalability.
    • Log Structured: Messages are stored in a write-ahead log, ensuring durability.

2. Partitions

  • A partition is a subset of a topic, representing an append-only sequence of records.
  • Features:
    • Each partition is stored and processed independently.
    • Partitions provide scalability by allowing parallelism across consumers.
    • Messages within a partition are ordered but not across partitions.
    • Each record in a partition is assigned a unique offset, which identifies its position.

3. Producers

  • Producers are clients that send records (messages) to Kafka topics.
  • Features:
    • Producers can specify a partition for the message or let Kafka decide.
    • They ensure message delivery with configurable acknowledgment settings:
      • acks=0: Fire-and-forget (no acknowledgment).
      • acks=1: Leader-only acknowledgment.
      • acks=all: Leader and replica acknowledgment (stronger durability).

4. Consumers

  • Consumers are clients that read messages from Kafka topics.
  • Features:
    • Consumers subscribe to topics and process messages.
    • They track the offset of messages they’ve read to avoid duplication.
    • Consumers can operate in groups to enable load balancing.
      • A partition is consumed by only one consumer in the same group.

5. Consumer Groups

  • A consumer group is a set of consumers sharing the load of processing messages.
  • Features:
    • Kafka ensures that each partition in a topic is processed by only one consumer in the group.
    • Multiple consumer groups can subscribe to the same topic independently.

6. Brokers

  • A broker is a Kafka server that stores data and serves client requests.
  • Features:
    • Each broker is identified by a unique ID.
    • Brokers handle:
      • Receiving data from producers.
      • Storing data on disk.
      • Serving data to consumers.
    • In a cluster, brokers share the load for scalability and fault tolerance.

7. Cluster

  • A Kafka cluster is a group of brokers working together.
  • Features:
    • Provides high availability and fault tolerance.
    • Topics and partitions are distributed across brokers.
    • A partition has:
      • Leader: Handles all read and write requests.
      • Followers: Replicate the leader’s data for redundancy.

8. ZooKeeper (Deprecated in Favor of KRaft)

  • ZooKeeper was used for:
    • Managing cluster metadata and configurations.
    • Electing partition leaders.
    • Tracking broker availability.
  • Kafka is transitioning to KRaft (Kafka Raft) for metadata management, removing the ZooKeeper dependency.

9. KRaft (Kafka Raft)

  • A replacement for ZooKeeper that provides:
    • Native Kafka-based metadata management.
    • Simplified architecture with better fault tolerance and scalability.

10. Replication

  • Kafka replicates partitions across brokers for fault tolerance.
  • Features:
    • Each partition has one leader and multiple followers.
    • Leader handles all read and write requests.
    • Followers replicate the leader’s data.
    • If a leader fails, a follower is promoted to leader.

11. Logs

  • Kafka stores messages in log files on disk.
  • Features:
    • Each partition corresponds to a log file.
    • Logs are segmented and retention-based:
      • Retained by size or time.
      • Old data can be deleted to save disk space.

12. Offset

  • An offset is a unique identifier for each message within a partition.
  • Features:
    • Consumers use offsets to track their position in a partition.
    • Kafka does not automatically delete offsets, allowing consumers to rewind or replay messages.

13. Producers’ and Consumers’ APIs

  • Kafka provides APIs for interacting with the system:
    • Producer API: Publish messages to topics.
    • Consumer API: Subscribe to and process messages.
    • Streams API: Perform stream processing on data in Kafka topics.
    • Connect API: Integrate external systems (e.g., databases, file systems) with Kafka.

14. Kafka Connect

  • A framework for integrating Kafka with external systems (e.g., databases, files, message queues).
  • Features:
    • Source Connectors: Bring data into Kafka.
    • Sink Connectors: Push data from Kafka to external systems.

15. Kafka Streams

  • A library for processing streams of data stored in Kafka topics.
  • Features:
    • Allows stateful and stateless stream transformations.
    • Integrates seamlessly with Kafka topics.

16. Message Format

  • Kafka messages contain:
    • Key: Optional, used for partitioning or additional metadata.
    • Value: The main content of the message.
    • Headers: Metadata for additional context.

17. Retention Policies

  • Kafka supports configurable retention policies:
    • Time-based: Messages are retained for a specific duration.
    • Size-based: Messages are retained until the topic reaches a specified size.

18. Security

Kafka offers robust security features, including:

  • Authentication: SASL (Simple Authentication and Security Layer), SSL.
  • Authorization: Access control via ACLs (Access Control Lists).
  • Encryption: TLS for data in transit.

Kafka Architecture Summary Diagram

Here’s a high-level representation of Kafka’s components and their interactions:

+----------------------------------------------------+
|                    Kafka Cluster                   |
|                                                    |
| +---------+    +---------+    +---------+          |
| | Broker 1|    | Broker 2|    | Broker 3|          |
| +----+----+    +----+----+    +----+----+          |
|      |               |               |             |
|      +---------------+---------------+             |
|                      |                              |
|      +---------------+---------------+             |
|      |  Leader and Replicas Management             |
+----------------------------------------------------+
     |                     ^
     v                     |
+---------+           +-----------+         +---------+
| Producer|           |  Consumer |         | ZooKeeper|
+---------+           +-----------+         +---------+

Kafka’s components work together to provide a reliable, scalable, and efficient platform for event-streaming and real-time data processing.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *