A Beginner's Guide to Apache Kafka

Shiv Shet

Mar. 23, 16 · Tutorial

Likes (7)

Comment

Save

10.5K Views

A normal messaging queue is not capable of handling big data, which is where a Distributed Messaging Queue comes to the rescue.

Features of a Distributed Messaging System

It should be scalable, meaning it should easily scale to thousands of nodes.
It should be fault tolerant in such a way that it should work even if some nodes in a cluster go down.
It should support replication.
There shouldn't be a single point of failure, the system should work even if some node goes down.
It should have higher throughput, it should handle millions of messages per second.

This is where Apache Kafka fits in the world of distributed messaging.

It can easily scale to thousands of nodes in no time.
It is durable. Messages are persisted into file system and even replicated across clusters.
It is fault tolerant.
It has no single point of failure.
It supports replication in such a way that messages are replicated across a cluster.
It has higher throughput.
It is a peer-to-peer architecture and doesn’t follow master-slave.
It is open sourced by LinkedIn to the Apache Community.

Please see this architecture diagram of Apache Kafka below:

Apache Kafka- Architecture

Apache Kafka consists of the following components mentioned below:

The producer sends a message to the broker through the push mechanism.
The consumer reads data from the broker through the pull mechanism.
The broker is a very lightweight component that handles just TCP connections and writes data to a append only log file.
Zookeeper acts a coordinator between the broker and consumer.

kafka

Opinions expressed by DZone contributors are their own.