[Demo] Design Event-Driven Microservices for Cloud → Register Now

Apache Kafka® とは?

Apache Kafka is an open-source distributed streaming system used for stream processing, real-time data pipelines, and data integration at scale. Originally created to handle real-time data feeds at LinkedIn in 2011, Kafka quickly evolved from a messaging queue to a full-fledged event streaming platform, capable of handling over one million messages per second, or trillions of messages per day.

Founded by the original creators of Apache Kafka, Confluent provides the most comprehensive Kafka tutorials, training, services, and support. Confluent also offers fully managed, cloud-native data streaming services built for any cloud environment, ensuring scalability and reliability for modern data infrastructure needs.

Kafka を選ぶ理由

Kafka には無数の利点があります。今日、Kafka はほぼあらゆる業界のフォーチュン100企業の80%以上において、大小さまざまなユースケースで使用されており、最新世代のスケーラブルなリアルタイムデータストリーミングアプリケーションの構築を検討している開発者やアーキテクトが使用するデファクトテクノロジーとなっています。市場には、こうした目的を達成できるさまざまなテクノロジーがありますが、主に以下の理由から Kafka は非常に高い人気を集めています。

High Throughput

Kafka is capable of handling high-velocity and high-volume data, processing millions of messages per second. This makes it ideal for applications requiring real-time data processing and integration across multiple servers.

High Scalability

Kafka clusters can be scaled up to a thousand brokers, handling trillions of messages per day and petabytes of data. Kafka's partitioned log model allows for elastic expansion and contraction of storage and processing capacities. This scalability ensures that Kafka can support a vast array of data sources and streams.

Low Latency

Kafka can deliver a high volume of messages using a cluster of machines with latencies as low as 2ms. This low latency is crucial for applications that require real-time data processing and immediate responses to data streams.

Permanent Storage

Kafka safely and securely stores streams of data in a distributed, durable, and fault-tolerant cluster. This ensures that data records are reliably stored and can be accessed even in the event of server failure. The partitioned log model further enhances Kafka's ability to manage data streams and provide exactly-once processing guarantees.

High Availability

Kafka can extend clusters efficiently over availability zones, or connect clusters across geographic regions. This high availability makes Kafka fault-tolerant with no risk of data loss. Kafka’s design allows it to manage multiple subscribers and external stream processing systems seamlessly.

How Does Apache Kafka Work?

Apache Kafka consists of a storage layer and a compute layer, which enable efficient, real-time data ingestion, streaming data pipelines, and storage across distributed systems. Its design facilitates simplified data streaming between Kafka and external systems, so you can easily manage real-time data and scale within any type of infrastructure.

大規模なリアルタイム処理

A data streaming platform would not be complete without the ability to process and analyze data as soon as it's generated. The Kafka Streams API is a powerful, lightweight library that allows for on-the-fly processing, letting you aggregate, create windowing parameters, perform joins of data within a stream, and more. It is built as a Java application on top of Kafka, which maintains workflow continuity without requiring extra clusters to manage.

耐久性のある永続ストレージ

分散型データベースに一般的な抽象化された分散型コミットログ、Apache Kafka が耐久性に優れたストレージを実現。複数のノードにデータを分散し、単一のデータセンターや複数の可用性ゾーンで高可用性のデプロイを実現できる Kafka が「信頼できる唯一の情報源」として機能します。

パブリッシュ + サブスクライブ

基盤となる控えめでイミュータブルなコミットログから、パブリッシュ/サブスクライブ型で多数のシステムやリアルタイムアプリケーションとデータをやり取り。メッセージングキューとは異なり、高度にスケーラブルでフォールトトレラントな分散型システムである Kafka は、Uber での乗客とドライバーのマッチング管理、British Gas のスマートホームへのリアルタイム分析と予測メンテナンス提供、LinkedIn 全体での無数のリアルタイムサービスの実行などに展開されています。ユニークなパフォーマンスで、1つのアプリから全社での展開まで、あらゆる規模に対応します。

What is Kafka Used For?

Commonly used to build real-time streaming data pipelines and real-time streaming applications, Kafka supports a vast array of use cases. Any company that relies on, or works with data, can find numerous benefits in utilizing Kafka.

Data Pipelines

In the context of Apache Kafka, a streaming data pipeline means ingesting the data from sources into Kafka as it’s created, and then streaming that data from Kafka to one or more targets. This allows for seamless data integration and efficient data flow across different systems.

Stream Processing

Stream processing includes operations like filters, joins, maps, aggregations, and other transformations that enterprises leverage to power many use cases. Kafka Streams, a stream processing library built for Apache Kafka, enables enterprises to process data in real-time, making it ideal for applications requiring immediate data processing and analysis.

Streaming Analytics

Kafka provides high throughput event delivery. When combined with open-source technologies such as Druid, it can form a powerful Streaming Analytics Manager (SAM). Druid consumes streaming data from Kafka to enable analytical queries. Events are first loaded into Kafka, where they are buffered in Kafka brokers, then they are consumed by Druid real-time workers. This allows for real-time analytics and decision-making.

Streaming ETL

Real-time ETL with Kafka combines different components and features such as Kafka Connect source and sink connectors, used to consume and produce data from/to any other database, application, or API; Single Message Transforms (SMT)—an optional Kafka Connect feature; and Kafka Streams for continuous data processing in real-time at scale. Altogether they ensure efficient data transformation and integration.

Event-Driven Microservices

Apache Kafka is the most popular tool for microservices, because it solves many issues related to microservices orchestration, while enabling attributes that microservices aim to achieve, such as scalability, efficiency, and speed. Kafka also facilitates inter-service communication, preserving ultra-low latency and fault tolerance. This makes it essential for building robust and scalable microservices architectures.

By using Kafka's capabilities, organizations can build highly efficient data pipelines, process streams of data in real time, perform advanced analytics, and develop scalable microservices—all ensuring they can meet the demands of modern data-driven applications.

Apache Kafka in Action

Kafka のユーザー層

Some of the world’s biggest brands use Kafka:

Airbnb logo
Netflix
Goldman Sachs
Linkedin
Microsoft
New York Times
Intuit

To Maximize Kafka, You Need Confluent

Founded by the original developers of Kafka, Confluent delivers the most complete distribution of Kafka, improving Kafka with additional community and commercial features designed to enhance the streaming experience of both operators and developers in production, at massive scale.

You love Apache Kafka®, but not managing it. Confluent's cloud-native, complete, and fully managed service goes above & beyond Kafka, so that your best people can focus on delivering value to your business.

Cloud Kafka

Cloud-Native

We’ve re-engineered Kafka to provide a best-in-class cloud experience, for any scale, without the operational overhead of infrastructure management. Confluent offers the only truly cloud-native experience for Kafka—delivering the serverless, elastic, cost-effective, highly available, and self-serve experience that developers expect.

Complete Kafka

Complete

Creating and maintaining real-time applications requires more than just open-source software and access to scalable cloud infrastructure. Confluent makes Kafka enterprise-ready and provides customers with the complete set of tools they need to build apps quickly, reliably, and securely. Our fully managed features come ready out of the box, for every use case from proof of concept (POC) to production.

Kafka Everywhere

Everywhere

Distributed, complex data architectures can deliver the scale, reliability, and performance to unlock previously unthinkable use cases, but they're incredibly complex to run. Confluent's complete, multi-cloud data streaming platform makes it easy to get data in and out of Kafka with Connect, manage the structure of data using Confluent Schema Registry, and process it in real time using ksqlDB. Confluent meets customers wherever they need to be — powering and uniting real-time data across regions, clouds, and on-premises environments.

わずか数分で利用を開始

Confluent は、過去のデータとリアルタイムのデータを信頼できる情報源へと一元化することで、まったく新たなカテゴリーの最新イベントドリブン型アプリケーションの構築、ユニバーサルなデータパイプラインの獲得、強力なユースケースの実現を完全なスケーラビリティ、セキュリティとパフォーマンスで可能にします。

最初の4か月間に利用できる$400の無料クレジットでお試しください。

Apache Kafkaは、開発者とアーキテクトの双方に最も人気のデータストリーミングシステムです。Producer、Consumer、Streams、Connect の4つの API が揃った強力な Event Streaming Platform を提供します。

一般に、開発者の Kafka 活用は単一のユースケースから始まります。例としては、現在のワークロードに対応できないレガシーデータベースを保護するメッセージバッファとしての Apache Kafka の使用のほか、Kafka Connect API でこうしたデータベースと付随する検索インデックスエンジンとの同期を保ち、Streams API でデータを到着時に処理し、アプリケーションに直接集計を表示する例などが挙げられます。

言い換えれば、Apache Kafka であらゆるインフラストラクチャに対応する強力なイベントドリブン型のプログラミングが可能となり、データドリブン型のアプリ構築や複雑なバックエンドシステムの管理がシンプルになります。Kafka の導入で、データが常にフォールトトレラントで再生可能、リアルタイムであるという安心が得られ、データの処理や保管に加え、リアルタイムのデータにアプリとシステムを接続する単一の Event Streaming Platform を通じて迅速な構築が実現します。