A Distributed Streaming Platform

What is Apache Kafka™?

Apache Kafka is a distributed streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged streaming platform.

As a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and is able to process streams of events. Kafka provides reliable, millisecond responses to support both customer-facing applications and connecting downstream systems with real-time data.

Confluent Open Source
The leading distribution of Apache Kafka
Dive into Docs
Confluent Open Source docs on Kafka
What is Apache Kafka™?

Where Apache Kafka Fits In

Where Apache Kafka Fits In

Thousands of Companies are Built on Apache Kafka

See User Videos

What Can Kafka Do?

Publish + Subscribe

Publish + Subscribe

At its heart lies the humble, immutable commit log, and from there you can subscribe to it, and publish data to any number of systems or real-time applications. Unlike messaging queues, Kafka is a highly scalable, fault tolerant distributed system, allowing it to be deployed for applications like managing passenger and driver matching at Uber, providing real-time analytics and predictive maintenance for British Gas’ smart home, and performing numerous real-time services across all of LinkedIn. This unique performance makes it perfect to scale from one app to company-wide use.

Store

Store

An abstraction of a distributed commit log commonly found in distributed databases, Apache Kafka provides durable storage. Kafka can act as a ‘source of truth’, being able to distribute data across multiple nodes for a highly available deployment within a single data center or across multiple availability zones.

Process

Process

A streaming platform would not be complete without the ability to manipulate that data as it arrives. The Streams API within Apache Kafka is a powerful, lightweight library that allows for on-the-fly processing, letting you aggregate, create windowing parameters, perform joins of data within a stream, and more. Perhaps best of all, it is built as a Java application on top of Kafka, keeping your workflow intact with no extra clusters to maintain.

Kafka: The Definitive Guide

Learn how to take full advantage of Apache Kafka, the distributed, publish-subscribe queue for handling real-time data feeds. With this comprehensive book, you’ll understand how Kafka works and how it’s designed.

Get Your Copy
Kafka: The Definitive Guide

A Toolset for Streaming

Apache Kafka is a popular tool for developers because it is easy to pick up and provides a powerful streaming platform complete with 4 APIs: Producer, Consumer, Streams, and Connect.

Often, developers will begin with a single use case. This could be using Apache Kafka as a message buffer to protect a legacy database that can’t keep up with today’s workloads, or using the Connect API to keep said database in sync with an accompanying search indexing engine, to process data as it arrives with the Streams API to surface aggregations right back to your application.

In short, Apache Kafka and its APIs make building data-driven apps and managing complex back-end systems simple. Kafka gives you peace of mind knowing your data is always fault-tolerant, replayable, and real-time. Helping you quickly build by providing a single streaming platform to process, store, and connect your apps and systems with real-time data.

A Toolset for Streaming

Build, Deploy, and Connect the Way You Want

Confluent Open Source
Best starting point for Apache Kafka
Getting Started
Apache Kafka online talk series