Live Demo: Build Scalable Event-Driven Microservices with Confluent | Register Now

Presentation

Forecasting Kafka Lag Issues with Machine Learning

« Current 2023

A key operational challenge for running Kafka in production is managing Kafka Partition Lag. Kafka partitions exhibit a variety of normal trends, influenced by how consumers consume data in partitions. Kafka Lag also exhibits abnormal patterns caused by issues in the Kafka clusters or in its consumers. Administrators need to monitor Kafka Lag, distinguish between normal and abnormal trends and act when application outcomes are impacted. Lag impacts latency and accuracy of data and insights produced from a Big Data pipeline. How can we continuously monitor Kafka Lag automatically, identify normal and abnormal trends and forecast Lag issues ahead of time?

In this session, we will discuss our work in this regard using machine learning. We will discuss popular lag patterns and how our ensemble forecasting system learns from the past and predicts future trends. We will also showcase some case studies and benefits of having such a system as part of a Kafka observability platform.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how