Register for Demo | Confluent Terraform Provider, Independent Network Lifecycle Management and more within our Q3’22 launch!

Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent queries

ksqlDB is a streaming database that uses Kafka Streams to execute queries against data in Apache Kafka®. Historically, each query was compiled into its own Kafka Streams program to be executed inside the ksqlDB servers. As ksqlDB moved to support broader and more complex use cases, this query execution strategy became the bottleneck for scaling up the number of persistent queries. This talk will examine the problems faced and how we addressed them.

Using too many Kafka Streams instances requires too many resources in both threads and consumers. One way to avoid this is using Modular Topologies, which are coming to Kafka Streams in KIP-809. Modular Topologies allow us to dynamically change the workload of a Kafka Streams application while it’s running and share resources such as consumer/producer clients and processing threads. This makes it possible to use a single Kafka Streams runtime for multiple topologies that share consumers and threads across them. We will see in detail how this makes it possible for ksqlDB to consolidate queries into a shared Kafka Streams runtime.

Kafka Streams developers will take away from this talk an understanding of how to utilize ModularTopologies, and dynamically upgrade their Kafka Streams workload effectively.

Presenters

A. Sophie Blee-Goldman

Sophie Blee-Goldman is an Apache Kafka committer and software engineer at Confluent on the ksqlDB team. She has focused on the availability and scaling behavior of Kafka Streams & ksqlDB, as well as improvements to the consumer group rebalancing protocol.

Walker Carlson

Walker Carlson has been contributing to Apache since 2019. Mainly focusing on improving the handling of threads in Kafka Streams. He is a software engineer at Confluent and had also been focusing on the scalability and recovery of ksqlDB.