ksqlDB is a streaming database that uses Kafka Streams to execute queries against data in Apache Kafka®. Historically, each query was compiled into its own Kafka Streams program to be executed inside the ksqlDB servers. As ksqlDB moved to support broader and more complex use cases, this query execution strategy became the bottleneck for scaling up the number of persistent queries. This talk will examine the problems faced and how we addressed them.
Using too many Kafka Streams instances requires too many resources in both threads and consumers. One way to avoid this is using Modular Topologies, which are coming to Kafka Streams in KIP-809. Modular Topologies allow us to dynamically change the workload of a Kafka Streams application while it’s running and share resources such as consumer/producer clients and processing threads. This makes it possible to use a single Kafka Streams runtime for multiple topologies that share consumers and threads across them. We will see in detail how this makes it possible for ksqlDB to consolidate queries into a shared Kafka Streams runtime.
Kafka Streams developers will take away from this talk an understanding of how to utilize ModularTopologies, and dynamically upgrade their Kafka Streams workload effectively.