Since its initial release, the Kafka group membership protocol has offered Connect, Streams and Consumer applications an ingenious and robust way to balance resources among distributed processes. The process of rebalancing, as it’s widely known, allows Kafka APIs to define an embedded protocol for load balancing within the group membership protocol itself.
Until now, rebalancing has been working under the simple assumption that every time a new group generation is created, the members join after first releasing all of their resources, getting a whole new load assignment by the time the new group is formed. This allows Kafka APIs to provide task fault-tolerance and elasticity on top of the group membership protocol.
However, due to its side-effects on multi-tenancy and scalability this simple approach in rebalancing, also known as stop-the-world effect, is limiting larger scale deployments. Because of stop-the-world, application tasks get interrupted only for most of them to receive the same resources after rebalancing. In this technical deep dive, we’ll discuss the proposition of Incremental Cooperative Rebalancing as a way to alleviate stop-the-world and optimize rebalancing in Kafka APIs.
This talk will cover:
Konstantine Karantasis, Software Engineer, Confluent
Konstantine Karantasis is a software engineer at Confluent. He’s a main contributor to Apache Kafka and its Connect API, and he’s an author of widely used software, such as Confluent’s S3 and Replicator connectors, class loading isolation in Kafka Connect, Incremental Cooperative Rebalancing in Kafka, Confluent CLI and more. Previously, he built scalable open source web services at Yahoo! and researched high-performance computing at the University of Illinois at Urbana-Champaign. Konstantine holds a Ph.D. from the University of Patras.