Apache Kafka is a distributed event-streaming platform with several key components, each playing a critical role in its architecture. Here’s an overview of Kafka’s components:
1. Topics
- A topic is a category or feed name to which records are published.
- Topics are:
- Partitioned: Each topic is divided into partitions for parallelism and scalability.
- Log Structured: Messages are stored in a write-ahead log, ensuring durability.
2. Partitions
- A partition is a subset of a topic, representing an append-only sequence of records.
- Features:
- Each partition is stored and processed independently.
- Partitions provide scalability by allowing parallelism across consumers.
- Messages within a partition are ordered but not across partitions.
- Each record in a partition is assigned a unique offset, which identifies its position.
3. Producers
- Producers are clients that send records (messages) to Kafka topics.
- Features:
- Producers can specify a partition for the message or let Kafka decide.
- They ensure message delivery with configurable acknowledgment settings:
- acks=0: Fire-and-forget (no acknowledgment).
- acks=1: Leader-only acknowledgment.
- acks=all: Leader and replica acknowledgment (stronger durability).
4. Consumers
- Consumers are clients that read messages from Kafka topics.
- Features:
- Consumers subscribe to topics and process messages.
- They track the offset of messages they’ve read to avoid duplication.
- Consumers can operate in groups to enable load balancing.
- A partition is consumed by only one consumer in the same group.
5. Consumer Groups
- A consumer group is a set of consumers sharing the load of processing messages.
- Features:
- Kafka ensures that each partition in a topic is processed by only one consumer in the group.
- Multiple consumer groups can subscribe to the same topic independently.
6. Brokers
- A broker is a Kafka server that stores data and serves client requests.
- Features:
- Each broker is identified by a unique ID.
- Brokers handle:
- Receiving data from producers.
- Storing data on disk.
- Serving data to consumers.
- In a cluster, brokers share the load for scalability and fault tolerance.
7. Cluster
- A Kafka cluster is a group of brokers working together.
- Features:
- Provides high availability and fault tolerance.
- Topics and partitions are distributed across brokers.
- A partition has:
- Leader: Handles all read and write requests.
- Followers: Replicate the leader’s data for redundancy.
8. ZooKeeper (Deprecated in Favor of KRaft)
- ZooKeeper was used for:
- Managing cluster metadata and configurations.
- Electing partition leaders.
- Tracking broker availability.
- Kafka is transitioning to KRaft (Kafka Raft) for metadata management, removing the ZooKeeper dependency.
9. KRaft (Kafka Raft)
- A replacement for ZooKeeper that provides:
- Native Kafka-based metadata management.
- Simplified architecture with better fault tolerance and scalability.
10. Replication
- Kafka replicates partitions across brokers for fault tolerance.
- Features:
- Each partition has one leader and multiple followers.
- Leader handles all read and write requests.
- Followers replicate the leader’s data.
- If a leader fails, a follower is promoted to leader.
11. Logs
- Kafka stores messages in log files on disk.
- Features:
- Each partition corresponds to a log file.
- Logs are segmented and retention-based:
- Retained by size or time.
- Old data can be deleted to save disk space.
12. Offset
- An offset is a unique identifier for each message within a partition.
- Features:
- Consumers use offsets to track their position in a partition.
- Kafka does not automatically delete offsets, allowing consumers to rewind or replay messages.
13. Producers’ and Consumers’ APIs
- Kafka provides APIs for interacting with the system:
- Producer API: Publish messages to topics.
- Consumer API: Subscribe to and process messages.
- Streams API: Perform stream processing on data in Kafka topics.
- Connect API: Integrate external systems (e.g., databases, file systems) with Kafka.
14. Kafka Connect
- A framework for integrating Kafka with external systems (e.g., databases, files, message queues).
- Features:
- Source Connectors: Bring data into Kafka.
- Sink Connectors: Push data from Kafka to external systems.
15. Kafka Streams
- A library for processing streams of data stored in Kafka topics.
- Features:
- Allows stateful and stateless stream transformations.
- Integrates seamlessly with Kafka topics.
16. Message Format
- Kafka messages contain:
- Key: Optional, used for partitioning or additional metadata.
- Value: The main content of the message.
- Headers: Metadata for additional context.
17. Retention Policies
- Kafka supports configurable retention policies:
- Time-based: Messages are retained for a specific duration.
- Size-based: Messages are retained until the topic reaches a specified size.
18. Security
Kafka offers robust security features, including:
- Authentication: SASL (Simple Authentication and Security Layer), SSL.
- Authorization: Access control via ACLs (Access Control Lists).
- Encryption: TLS for data in transit.
Kafka Architecture Summary Diagram
Here’s a high-level representation of Kafka’s components and their interactions:
+----------------------------------------------------+
| Kafka Cluster |
| |
| +---------+ +---------+ +---------+ |
| | Broker 1| | Broker 2| | Broker 3| |
| +----+----+ +----+----+ +----+----+ |
| | | | |
| +---------------+---------------+ |
| | |
| +---------------+---------------+ |
| | Leader and Replicas Management |
+----------------------------------------------------+
| ^
v |
+---------+ +-----------+ +---------+
| Producer| | Consumer | | ZooKeeper|
+---------+ +-----------+ +---------+
Kafka’s components work together to provide a reliable, scalable, and efficient platform for event-streaming and real-time data processing.