Rise of the Kafka Heroes! Join the Data Streaming Revolution | Read the Comic


Oops! I Started a Broker

« Kafka Summit Americas 2021

What happened when our biggest and most important Kafka cluster went rogue all of a sudden, and while trying to recover it, a single, crucial misconfiguration made things even worse?

At a company like Taboola, where service availability and latency are our top priority, this was a disaster.

With 300K messages/sec and 250TB of messages produced each day to our on-premise Kafka clusters, and mirrored to our central Kafka cluster, we always try to ensure Kafka behaves well under high loads of traffic and unexpected cluster failures. So when our main Kafka cluster went crazy we had a serious issue on our hands.

This session is the story of how we learned the hard way about mitigating cluster failures with the proper configurations in place.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how