What is Apache Kafka?

What is Apache Kafka all about? This service that virtually every well-known car manufacturer uses? This software used in so many industries – from banks, insurance companies, logistics service providers, internet start-ups, retail chains to law enforcement agencies? Why are so many companies using Apache Kafka?

This is a blog post from our Community Stream: by developers, for developers. Don’t miss to stop by our community to find similar articles or join the conversation.

We divide the use cases of Apache Kafka into two categories: The first group of use cases is the one for which Kafka was once intended. Namely, Apache Kafka was initially developed at LinkedIn to move all events that occur on the LinkedIn website into the central data warehouse. LinkedIn was looking for a scalable messaging system that would perform well even under very high load, and ultimately created Kafka for this purpose. Today, many companies use Kafka to move large amounts of data from A to B. The focus is often on the required performance, the scalability of Kafka, but of course also on the reliability that Kafka offers for the deliverability and persistence of messages.

However, one of the core ideas that differentiates Kafka from classical messaging systems is that Kafka persists data on data carriers. This means that we can keep data in Kafka and not only read data once it has been written, but also hours, days or even months later.

This enables the second category of Apache Kafka use cases: More and more companies are using Kafka as a central tool not only to exchange data between a wide variety of services and applications, but are applying Kafka as a central nervous system to act with data within companies.

But what do we mean by saying that Kafka can be the central nervous system for data? The vision is that every event that takes place in a company is stored in Kafka. Every other service (which of course has the authorization) can now react to this event asynchronously and process this event further:

Kafka as the central nervous system for data in the company. Every event that takes place in the company is stored in Kafka. Other services can react to these events asynchronously and process them further.

For example, we see the trend in many companies that there is a separation between legacy systems, which are essential for existing business processes and business models, and the so-called new world, where agile methods are used to develop new services that also have to be mapped in software. Companies often use Kafka not only to act as an interface between old and new systems, but also to enable the new services to exchange messages with each other in real-time.

This is because legacy systems are typically not up to the new demands of our customers. Batch systems cannot meet the demand for information that is always available “immediately”. For example, who wants to wait a day or even several weeks for their account balance to update after a credit card transaction? We now expect to be able to track our parcels in real time. Modern cars produce vast amounts of data to be sent to corporate headquarters and analyzed, especially in preparation for autonomous driving. Kafka can help all these companies move from batch-oriented processing to (near) real-time data processing.

But the way we write software is also changing. Instead of putting more and more functionality into monolithic services and then connecting these few monoliths with each other employing integration, we break our services into microservices to reduce the dependency between the teams, among other things. To accomplish this, however, we need a way of exchanging data that is as asynchronous as possible. This way, even if a service is currently under maintenance, services can continue to function independently of it. We require methods for communication that allow the data formats in a service to evolve independently of other services. Apache Kafka can also support us here.

Another trend, primarily triggered by virtualization and increasingly widespread use of cloud architectures, is the decline of specialized hardware. There are no Kafka appliances, unlike other messaging systems. Kafka runs on commodity hardware and does not require fail-safe systems. Kafka itself is designed to cope well with subsystem failures. This makes the delivery of messages reliable, even if chaos may break out in our data center.

About Anatoly

As an IT trainer and a frequent Blogger for Xeotek, Anatoly Zelenin teaches Apache Kafka to hundreds of participants in interactive training sessions. For more than a decade, his clients from the DAX environment and German medium-sized businesses have appreciated his expertise and his inspiring manner. In that capacity, he is delivering trainings for Xeotek Clients as well. His book is available directly from him, from Hanser-Verlag, Amazon. You can reach Anatoly via his e-mail. In addition, he is not only an IT Consultant, Trainer and Blogger but also explores our planet as an adventurer.