Freight Clusters: Up to 90% savings at GBps+ scale | Learn more

Presentation

Balance Kafka Cluster with Zero Data Movement

« Kafka Summit London 2023

Load balancing is a key factor in achieving high performance and cost efficiency for Kafka clusters. It helps on saving over-provisioned resources caused by skewed brokers, either CPU, memory, or disk storage. However, it’s not easy to maintain load balancing for a cluster due to the variable topic traffic pattern and capacity expansion. Data ingestion use cases have several unique characteristics: 1) There is no order requirement for data events; 2) All partitions from a topic are produced and consumed evenly; 3) Kafka producers and consumers are transparent about the partition count increase. Therefore, each partition from the same topic has the same requirements of Kafka broker resources.

With these observations, we proposed a new partition placement strategy that is able to achieve the optimal cluster load balancing with Zero Data movement. The strategy will place partitions and leaders from the same topic evenly among all brokers for cluster operations, such as new topic creation, topic parallelism change, and Kafka cluster expansion. It does not need costly data partitions movement among the brokers in the runtime. Leveraging this approach, all of our Kafka clusters are balanced very well, which has resulted in measurable storage cost savings without any performance degradation.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how