Overblog Follow this blog
Edit post Administration Create my blog
The Technology Journal

Decoding the Popularity of Kafka: Its rise and application

Apache Kafka is an open-source, distributed streaming platform developed by the Apache Software Foundation. It is written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. In other words, it is a scalable, fault-tolerant, publish-subscribe messaging system that allows creation of distributed applications and powers web-scale Internet companies. Some of the prominent examples of these companies include LinkedIn, Twitter, AirBnB, etc. Kafka has gained strong momentum among unicorns as well as traditional enterprises. As a Java web development company, we often work on projects that require distributed streaming in real-time. And Kafka excels in this.

Kafka is used largely in the big data realm as a dependable means to ingest and move massive amounts of data very quickly. One example of this is Netflix who moved from writing their own ingestion framework to using Kafka as its primary backbone for ingestion via Java APIs or REST APIs. It did so to meet the demand for real-time (sub-minute) analytics.

Kafka is often used instead of traditional message brokers like JMS and AMQP because of its higher throughput, reliability and replication.

How does it work?

It resembles and operates like a publish-subscribe system that can deliver sequential, persistent and scalable messaging. It allows you to process streams of records in real-time. Kafka’s system design is akin to that of a distributed commit log, where incoming data is written sequentially to disk. The log is a time-ordered, append-only sequence of data inserts where the data can be anything. The four main components involved in moving data in and out of Kafka are:

  1. Topics
  2. Producers
  3. Consumers
  4. Brokers

A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber.  

Producers publish data to one or more topics of their choice. The producer also chooses which record to assign to each partition within the topic. Consumers subscribe to topics and process the published messages. Lastly, a Kafka cluster consists of one or more servers is called Brokers. Brokers manage the persistence and replication of message data.

Use cases

  • Messaging
  • Website Activity Tracking
  • Metrics
  • Log Aggregation
  • Stream Processing
  • Event Sourcing
  • Commit Log

Key Features

Some reasons that have resulted in such wide-spread adoption of Kafka and steady rise in its popularity.

  • Scalability - Distributed system scalability with no downtime.
  • Durability - Preserves messages on disk, and provides intra-cluster replication
  • Reliability - High volumes of data are available in real-time. It replicates data, supports multiple subscribers, and balances consumers in case of failure.
  • Performance - Unified, high-throughput, low-latency platform for publishing and subscribing in real time, with disk structures that provide consistent performance even with the huge volume of stored messages.

Kafka and Big data services

Owing to its performance capabilities and scalability, Kafka has seen a meteoric rise in adoption in the big data space. It is considered the most reliable way to ingest and move large amounts of information as seen in the Netflix example earlier. LinkedIn, where Kafka originated, has reported ingestion rates of 1 trillion messages a day.

Internet of Things (IOT)

While Kafka is extremely useful for big data ingestion, its "log" data structure has interesting benefits for applications built around the Internet of Things, micro-services, and cloud-native architectures too

Should you consider using Kafka? That really depends on your use case. If your solution can benefit from having multiple publish/subscribe and queueing tools, then Kafka is certainly worth considering. What are your views on this? Please share in the comments section.

Share this post

Repost 0

Comment on this post