What are all the key Components of Kafka ?

Apache Kafka is a distributed event-streaming platform with several key components, each playing a critical role in its architecture. Here’s an overview of Kafka’s components:

1. Topics

A topic is a category or feed name to which records are published.
Topics are:
- Partitioned: Each topic is divided into partitions for parallelism and scalability.
- Log Structured: Messages are stored in a write-ahead log, ensuring durability.

2. Partitions

A partition is a subset of a topic, representing an append-only sequence of records.
Features:
- Each partition is stored and processed independently.
- Partitions provide scalability by allowing parallelism across consumers.
- Messages within a partition are ordered but not across partitions.
- Each record in a partition is assigned a unique offset, which identifies its position.

3. Producers

Producers are clients that send records (messages) to Kafka topics.
Features:
- Producers can specify a partition for the message or let Kafka decide.
- They ensure message delivery with configurable acknowledgment settings:
  - acks=0: Fire-and-forget (no acknowledgment).
  - acks=1: Leader-only acknowledgment.
  - acks=all: Leader and replica acknowledgment (stronger durability).

4. Consumers

Consumers are clients that read messages from Kafka topics.
Features:
- Consumers subscribe to topics and process messages.
- They track the offset of messages they’ve read to avoid duplication.
- Consumers can operate in groups to enable load balancing.
  - A partition is consumed by only one consumer in the same group.

5. Consumer Groups

A consumer group is a set of consumers sharing the load of processing messages.
Features:
- Kafka ensures that each partition in a topic is processed by only one consumer in the group.
- Multiple consumer groups can subscribe to the same topic independently.

6. Brokers

A broker is a Kafka server that stores data and serves client requests.
Features:
- Each broker is identified by a unique ID.
- Brokers handle:
  - Receiving data from producers.
  - Storing data on disk.
  - Serving data to consumers.
- In a cluster, brokers share the load for scalability and fault tolerance.

7. Cluster

A Kafka cluster is a group of brokers working together.
Features:
- Provides high availability and fault tolerance.
- Topics and partitions are distributed across brokers.
- A partition has:
  - Leader: Handles all read and write requests.
  - Followers: Replicate the leader’s data for redundancy.

8. ZooKeeper (Deprecated in Favor of KRaft)

ZooKeeper was used for:
- Managing cluster metadata and configurations.
- Electing partition leaders.
- Tracking broker availability.
Kafka is transitioning to KRaft (Kafka Raft) for metadata management, removing the ZooKeeper dependency.

9. KRaft (Kafka Raft)

A replacement for ZooKeeper that provides:
- Native Kafka-based metadata management.
- Simplified architecture with better fault tolerance and scalability.

10. Replication

Kafka replicates partitions across brokers for fault tolerance.
Features:
- Each partition has one leader and multiple followers.
- Leader handles all read and write requests.
- Followers replicate the leader’s data.
- If a leader fails, a follower is promoted to leader.

11. Logs

Kafka stores messages in log files on disk.
Features:
- Each partition corresponds to a log file.
- Logs are segmented and retention-based:
  - Retained by size or time.
  - Old data can be deleted to save disk space.

12. Offset

An offset is a unique identifier for each message within a partition.
Features:
- Consumers use offsets to track their position in a partition.
- Kafka does not automatically delete offsets, allowing consumers to rewind or replay messages.

13. Producers’ and Consumers’ APIs

Kafka provides APIs for interacting with the system:
- Producer API: Publish messages to topics.
- Consumer API: Subscribe to and process messages.
- Streams API: Perform stream processing on data in Kafka topics.
- Connect API: Integrate external systems (e.g., databases, file systems) with Kafka.

14. Kafka Connect

A framework for integrating Kafka with external systems (e.g., databases, files, message queues).
Features:
- Source Connectors: Bring data into Kafka.
- Sink Connectors: Push data from Kafka to external systems.

15. Kafka Streams

A library for processing streams of data stored in Kafka topics.
Features:
- Allows stateful and stateless stream transformations.
- Integrates seamlessly with Kafka topics.

16. Message Format

Kafka messages contain:
- Key: Optional, used for partitioning or additional metadata.
- Value: The main content of the message.
- Headers: Metadata for additional context.

17. Retention Policies

Kafka supports configurable retention policies:
- Time-based: Messages are retained for a specific duration.
- Size-based: Messages are retained until the topic reaches a specified size.

18. Security

Kafka offers robust security features, including:

Authentication: SASL (Simple Authentication and Security Layer), SSL.
Authorization: Access control via ACLs (Access Control Lists).
Encryption: TLS for data in transit.

Kafka Architecture Summary Diagram

Here’s a high-level representation of Kafka’s components and their interactions:

+----------------------------------------------------+
|                    Kafka Cluster                   |
|                                                    |
| +---------+    +---------+    +---------+          |
| | Broker 1|    | Broker 2|    | Broker 3|          |
| +----+----+    +----+----+    +----+----+          |
|      |               |               |             |
|      +---------------+---------------+             |
|                      |                              |
|      +---------------+---------------+             |
|      |  Leader and Replicas Management             |
+----------------------------------------------------+
     |                     ^
     v                     |
+---------+           +-----------+         +---------+
| Producer|           |  Consumer |         | ZooKeeper|
+---------+           +-----------+         +---------+

Kafka’s components work together to provide a reliable, scalable, and efficient platform for event-streaming and real-time data processing.