[NEW Webinar] Productionizing Shift-Left to Power AI-Ready Data | Register Now
As the adoption of real-time data processing accelerates, the ability to scale stream processing applications to handle high-volume traffic is paramount. Apache Kafka®, the de facto standard for distributed event streaming, provides a powerful and scalable library in Kafka Streams for building such applications.
Scaling a Kafka Streams application effectively involves a multi-faceted approach that encompasses architectural design, configuration tuning, and diligent monitoring. This guide will walk you through the essential strategies and best practices to ensure your Kafka Streams applications can gracefully handle massive throughput.
Put these best practices into action and build a serverless Kafka Streams application—get started for free on Confluent Cloud.
The fundamental concept underpinning the scalability of Kafka Streams is the direct relationship between the number of partitions in your input Kafka topics and the parallelism of your application.
The Direct Relationship Between Kafka Topics and Kafka Streams Tasks
The unit of parallelism in Kafka Streams is a task. Each task is responsible for processing a subset of the data from the input topics.
Partition-to-Task Assignment: Kafka Streams creates a number of tasks equal to the maximum number of partitions across all input topics. Each task is assigned one or more partitions from the input topics. For instance, if you have two input topics, one with 10 partitions and another with 20, Kafka Streams will create 20 tasks.
Scaling Limit: The number of partitions in your input topic dictates the maximum parallelism you can achieve. If you have 20 partitions, you can run up to 20 instances or threads of your application, each processing a single partition concurrently. Any additional instances will remain idle, acting as hot standbys that can take over if an active instance fails.
That’s why the first and most critical step in designing a scalable Kafka Streams application is to choose an appropriate number of partitions for your input topics based on your expected workload and future growth. Choosing the right number of partitions—or using a fully managed Kafka service like Confluent Cloud that autoscales—allows you to balance maximizing throughput, avoiding overprovisioning infrastructure, and minimizing manual right-sizing work.
Once your topic partitioning strategy is in place, you can scale your Kafka Streams application in two primary ways:
This is the most common and effective method for scaling Kafka Streams. It involves running multiple instances of your application on different machines. Kafka Streams has built-in consumer group management that automatically handles the distribution of tasks (and their associated partitions) across all running instances of the application with the same application.id.
When to Scale Out:
Network-bound applications: If your application is limited by network bandwidth (e.g., sending and receiving large volumes of data)
Memory-bound applications: When your application requires significant memory, especially for stateful operations like windowing and aggregations that maintain large state stores
Disk-bound applications: If the local state stores (by default, RocksDB) are causing I/O bottlenecks
How to Scale Out: Simply launch a new instance of your application with the same application.id. Kafka Streams will trigger a rebalance, and the new instance will be assigned a share of the partitions to process.
This strategy involves increasing the resources (CPU, memory) of the machines running your Kafka Streams application. You can also increase the number of stream threads within a single application instance to take advantage of multiple CPU cores.
When to Scale Up:
CPU-bound applications: If your processing logic is computationally intensive (e.g., complex transformations, regular expression matching)
How to Scale Up:
Increase Resources: Provision more powerful machines with more CPU cores and memory.
Increase Stream Threads: Adjust the num.stream.threads configuration parameter in your Kafka Streams application. This allows a single instance to process multiple tasks in parallel, with each thread handling one or more tasks. The default is one thread.
Beyond the fundamental scaling strategies, optimizing your Kafka Streams application's configuration is crucial for handling high-volume traffic. Here are some of the most impactful settings:
Configuration Parameter | Description | Best Practices for High Throughput |
application.id | A unique identifier for your Streams application. It's also used as the consumer group ID. | Ensure all instances of the same application share the same application.id for proper task distribution. |
bootstrap.servers | A list of Kafka broker addresses for the initial connection. | Provide a list of several brokers to ensure high availability. |
num.stream.threads | The number of stream threads to run within a single application instance. | For CPU-bound tasks, set this to a value less than or equal to the number of available CPU cores. |
state.dir | The directory for storing local state. | Use a fast storage medium (like SSDs) for this directory, especially for stateful applications. |
cache.max.bytes.buffering | The maximum memory used for buffering records across all threads. | Increasing this can improve throughput by reducing the frequency of writes to state stores and Kafka, but at the cost of higher memory consumption. |
commit.interval.ms | The frequency with which to save the processing progress. | A larger interval can improve throughput but increases the amount of data reprocessed in case of a failure. The default is 30,000ms. For lower latency, this can be reduced. |
producer.* and consumer.* configs | Underlying producer and consumer configurations. | Tune settings like producer.batch.size, producer.linger.ms, and producer.compression.type to optimize writes to Kafka. Adjust consumer.fetch.min.bytes and consumer.max.poll.records to control how much data a consumer fetches in each poll. |
The nature of your application's processing logic—whether it's stateless or stateful—influences scaling decisions.
Stateless Applications: These are generally easier to scale as they don't maintain any state. Scaling is primarily a matter of adding more instances to handle the increased load.
Stateful Applications: Applications that perform operations like joins, aggregations, and windowing maintain local state stores. The size and performance of these state stores become critical scaling factors. For stateful operations, especially joins, it's crucial to ensure that the input topics are co-partitioned, meaning they have the same number of partitions and that records with the same key are written to the same partition number in both topics. This allows Kafka Streams to perform the join locally without needing to repartition the data over the network, which is a costly operation.
Kafka Streams applications can perform joins, aggregations, windowing and other operations while maintaining local state stores
To effectively scale your Kafka Streams application, you need to monitor its performance to identify bottlenecks and make informed decisions. Key metrics to watch include:
Consumer Lag: The number of messages in a partition that have not yet been consumed. High and increasing lag is a clear indicator that your application cannot keep up with the incoming data rate.
CPU Utilization: High CPU utilization might indicate that your application is CPU-bound and could benefit from scaling up or adding more stream threads.
Thread Utilization: Monitor the utilization of your stream threads to see if they are being used effectively.
End-to-End Latency: The time it takes for a message to be processed from the source topic to the sink topic.
State Store Size and I/O: For stateful applications, monitor the size of your local state stores and the I/O operations on the underlying disk.
By understanding and applying these scaling principles, from topic partitioning and deployment strategies to fine-grained configuration and diligent monitoring, you can build robust and highly scalable Kafka Streams applications capable of handling the most demanding real-time data workloads. Ready to start building with serverless Kafka on Confluent Cloud?
Kafka Streams scales by assigning one task per partition. The number of partitions in your input topics sets the maximum parallelism across threads or application instances.
Both are valid: adding instances (scale out) distributes tasks across machines, while adding threads (scale up) uses more CPU cores in one instance. The choice depends on whether you are CPU-, memory-, or network-bound.
Key configs include num.stream.threads (parallelism), cache.max.bytes.buffering (throughput optimization), and commit.interval.ms (latency vs durability trade-off).
Track consumer lag, CPU utilization, thread utilization, and state store I/O. High lag or rising end-to-end latency indicates that your application is falling behind and needs scaling.
Apache®, Apache Kafka®, and Kafka® are registered trademarks of the Apache Software Foundation. No endorsement by the Apache Software Foundation is implied by the use of these marks.
Learn how to automate BI with real-time streaming. Explore event-driven workflows that deliver instant insights and close the gap between data and action.
Discover how banks and payment providers use Apache Kafka® streaming to detect and block fraud in real time. Learn patterns for anomaly detection, risk mitigation, and trusted automation.