New in Confluent Cloud: Making Data & Pipelines Accessible for AI-Ready Streaming | Learn More

Presentation

Kafka Tiered Storage

« Kafka Summit APAC 2021

Kafka is a vital part of data infrastructure in many organizations. When the Kafka cluster grows and more data is stored in Kafka for a longer duration, several issues related to scalability, efficiency, and operations become important to address. Kafka cluster storage is typically scaled by adding more broker nodes to the cluster. But this also adds needless memory and CPUs to the cluster making overall storage cost less efficient compared to storing the older data in external storage.

Tiered storage is introduced to extend Kafka's storage beyond the local storage available on the Kafka cluster by retaining the older data in cheaper stores, such as HDFS, S3, Azure or GCS with minimal impact on the internals of Kafka.

We will talk about

How tiered storage addresses the above problems and also brings several other advantages.
High level architecture of tiered storage
Future work planned as part of tiered storage.

Chinese Japanese Korean

Presenter

Satish Duggana

Uber

Satish Duggana is a tech lead for Data and Streaming Infrastructure at Uber. He is Apache Kafka Committer, Apache Storm Committer/PMC and contributed to a few other open source projects.

Presenter

Sriharsha Chintalapani

Uber

Kafka Tiered Storage

Presenter

Satish Duggana

Presenter

Sriharsha Chintalapani

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how