Live Demo: Build Scalable Event-Driven Microservices with Confluent | Register Now

Presentation

Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degraded Storage in Kafka

« Current 2023

Every Kafka admin’s worst nightmare is to be woken up by their client application’s teams for increased latencies. One of the main culprits? Degraded infrastructure.

Degraded infrastructure refers to the partial or full unavailability of broker components, such as storage volumes or network. Degradation of storage, in particular, can lead to slower reads and writes on the broker, negatively impact performance, and can quickly devolve into unavailability.

In this talk, we will discuss how we have tackled this problem head-on with a fully automated degraded storage detection and remediation system. We’ll highlight the importance of monitoring storage performance and take a deep-dive into how we formulated the detection algorithm, created and fine-tuned our monitors, and tested this pipeline from end-to-end. We will also discuss the tools and processes developed to mitigate storage degradation. Finally, we’ll share our insights on how this streamlined detection and mitigation system improved performance and availability of clusters in Confluent Cloud.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how