Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

Presentation

How to Design a Kafka Architecture Resilient to Cloud Outages

« Current 2022

In the world of disaster recovery, many things can ruin your day. Software bugs, human error, sharks…if you can think of it, you should prepare for it. When it comes to the cloud, we have many factors outside our control: region outages, latency between regions, and less control over the full stack. Understanding how to deploy Apache Kafka® in the cloud in a way that is resilient to these challenges is important, particularly for Kafka which often sits at the heart of a business’ data infrastructure.

Yet, designing a resilient, multi-region cloud architecture for Kafka can be daunting. What starts as “How do I fail over Kafka?” quickly turns into “How do I fail my entire system over?” as the knock-on effects of losing each component are discovered. From my experience designing disaster-resilient architectures, I’ve grown familiar with patterns and antipatterns that can be applied universally to Kafka multi-region cloud architectures, including:

  • When to choose Active-Active or Active-Passive given RPO/RTO requirements
  • When to fail forward vs fail back
  • Topic best practices for easy client failover

In this talk, we’ll discuss these in-depth, along with questions you should ask yourself to guide you to the architecture that solves your business needs.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how