New in Confluent Cloud: Making Data & Pipelines Accessible for AI-Ready Streaming | Learn More

Presentation

6 Nines: How Stripe keeps Kafka highly-available across the globe

« Kafka Summit London 2022

Availability is a key metric for any Kafka deployment, but when every event is critical the system must be centered around keeping publishers and consumers highly available, even when a Kafka cluster goes down. At Stripe our core business relies on Kafka, and as we outgrew a single Kafka cluster we had to build a multi-cluster system which would fit our needs while supporting a target of 99.9999% availability for our most critical use cases.

In this talk we’ll discuss our solution to this problem: an in-house proxy layer and multi-cluster toplogy which we’ve built and operated over the past 3 years. Our proxy layer enables multiple Kafka clusters to work in coordination across the globe, while hitting our ambitious availability targets and providing clean client abstractions.

In this talk we’ll discuss how our Kafka deployment provides: availability for both publishers and consumers in the face of cluster outages, increased security and observability, simplified cluster maintenance, and global routing for constraints such as data locality. We’ll highlight the benefits & tradeoffs of our approach, the design of our proxy layer, Kafka configuration decisions, and where we’re planning to go from here.

Presenter

Donny Nadolny

Stripe

Donny Nadolny is a software engineer on the Stream Infrastructure team at Stripe. He works on building and maintaining Stripe’s multi-cluster Kafka setup, as well as the automated control plane that manages it.

6 Nines: How Stripe keeps Kafka highly-available across the globe

Presenter

Donny Nadolny

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how