[Webinar] Harnessing the Power of Data Streaming Platforms | Register Now


What’s up With Availability in Kafka?

« Current 2022

How do we define and measure availability in a distributed system? A great thing about distributed systems is that they are built to tolerate failures in a way that limits downtime to users. However, this means that availability is a bit more complicated than ""the system is up"" or ""the system is down.""

Even if the system is built to tolerate failures, we may see individual components lose availability due to: * cloud provider outages * high latencies * load balancer and/or routing issues * storage failures * hardware issues

Using Apache Kafka and Confluent Cloud as a case study, we will dig deeper into how to define good SLOs and SLAs for distributed systems. From there we will discuss ways to improve availability and the changes we made to Confluent Cloud to improve on Kafka's availability story.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how