[Ebook] The Builder's Guide to Streaming Data Mesh | Read Now

Presentation

The Dark and Dirty Side of Fixing Uneven Partitions

« Kafka Summit London 2023

You might already use all known strategies to choose the right number of partitions for your newly created Apache Kafka topic. You apply the best recommendations to evenly distribute data across partitions in the topic. You even have metrics to observe and inform you on that. You do everything right.

But then reality happens. Despite best efforts, data is published unevenly, making it slow, expensive, and difficult to consume data from a topic. The future is full of unexpected impossible to predict events, and it doesn't care about rules or normal distributions.

This doesn't mean that we can simply disregard good practices. However, we need a plan for when things don't go according to anyone's calculations.

Come to this talk to learn what to do when the data distribution across topic partitions is badly broken and as a result significantly hurt consuming applications performances, increasing lag and slowing data processing.

We'll talk of existing strategies, including how you can replace an existing struggling topic with a new one and rebalance the data across new partitions using new rules. What dangers can happen and what to do when the state of keys is no longer guaranteed? Why is partition scaling considered to be a dangerous operation? We'll also look at this problem from the point of view of consumers, how to scale them to more partitions and what to keep in mind when using stateful systems.

This talk is for those who have sufficient expertise with Apache Kafka and want to bring their knowledge to the next level. However, we'll use simple language and accessible explanations, so even if you're a Kafka beginner, join this session to understand the challenges of uneven data replication and strategies to fix it.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how