Imagine easily enriching data streams and building stream processing applications in the cloud, without worrying about capacity planning, infrastructure and runtime upgrades, or performance monitoring. That's where our serverless Apache Flink® service comes in, as announced at this year’s Current | The Next Generation of Kafka Summit.
By abstracting away the underlying infrastructure and providing a fully managed environment, Confluent Cloud for Apache Flink empowers you to focus on building robust, scalable stream processing applications without any operational overhead. In this blog post, we'll take a deep dive into the serverless architecture of Confluent Cloud for Apache Flink and explore its many benefits, including reduced infrastructure costs, increased reliability, and more seamless adoption.
So, let's start by digging into what makes our Flink service a serverless offering in the first place.
Deploying, managing, and scaling Apache Flink workloads can be challenging. One major challenge is the upfront cost of setting up and configuring the cluster, which requires significant technical expertise. Additionally, maintaining clusters and keeping both the Flink service and its applications up-to-date with Flink upgrades can take time and effort, potentially causing disruptions. Managing Flink applications independently can also cause problems, as a developer needs to size, scale, and parallelize each individual workload. Although these are just some of the challenges, they alone can make it difficult to adopt stream processing at scale, effectively and efficiently.
With Confluent Cloud's serverless Flink offering, developers can benefit from three primary serverless dimensions: elastic autoscaling with scale-to-zero, evergreen runtime and APIs, and usage-based billing. The autoscaler manages scale-out, parallelism, and load balancing, eliminating the need for pre-sizing workloads and capacity planning. We provide fully automated and transparent upgrades to keep the Flink runtime up-to-date with the latest security patches, along with strong backward compatibility guarantees to ensure uninterrupted operations for your clients. We also provide declarative APIs that enable developers to focus on building business logic, not managing infrastructure. And with usage-based billing developers only pay for what they use, with automatic downscaling for unused resources (more on the cost model later).
Our fully managed serverless architecture is based on compute pools.
Compute pools expand and shrink based on the resources required by the Flink SQL statements using them. A compute pool will never exceed its configured maximum number of CFUs (logical units of processing power) and therefore can act as a cap on maximum budget spend. When a compute pool reaches its maximum capacity, all statements using it compete for resources. As a result, compute pools also serve to isolate statements from each other, ensuring that statements using different pools will never compete for resources.
When we approached the implementation of an elastic, reliable, and cost-efficient runtime for Apache Flink, we had to address several key questions:
How do we determine the resources required by each SQL statement to keep up with the rate of all its input tables (topics)?
Once we've determined the required resources, how do we make the actual statement rescaling operation as non-disruptive as possible?
How do we ensure that statement submission is fast, even if a pool is scaled to zero?
How do we prevent resource fragmentation in the presence of frequent runtime upgrades and a high number of small compute pools?
In the remainder of this section, I will cover the first two of these questions while leaving a comprehensive discussion of all of them to a future whitepaper on our serverless architecture for Apache Flink.
A Flink SQL statement in Confluent Cloud is translated into a logical dataflow graph, called Flink Jobgraph. Each vertex of this graph can run at its own parallelism. The parallel instances of each vertex are called tasks, and they are distributed to the Taskmanagers of a Flink Cluster for execution. In the example below, "Source B" runs at a parallelism of four, while "Source A" and "Join-Sink" (an operator chain consisting of a join operator and a subsequent sink) run at a parallelism of two. In this example, the tasks are distributed across two Taskmanagers.
Now, the challenge is to determine the optimal parallelism for each of these vertices (horizontal scaling) as well as the physical resources assigned to each of these parallel instances (vertical scaling). To accomplish this, we use a proprietary algorithm based on the DS2 algorithm. The algorithm first derives the desired data rate for each source from metrics of the consumed topic(s). Then, it traverses through the dataflow graph and determines the desired number and size of tasks for each vertex based on the ratio between the data rates of adjacent vertices.
Once we've determined the desired parallelism for each vertex, the focus shifts to ensuring that the rescaling operation is as non-disruptive as possible, with a goal of providing sub-second rescaling. Since a rescaling operation in Apache Flink always involves stopping the job and then restarting it with updated resources and vertex parallelisms, we aim to trigger the rescaling operation sparingly.
To achieve this, Confluent Cloud uses Flink's Adaptive Scheduler (dynamically adjusts the parallelism of Flink jobs) and Declarative Resource Management (specify resource requirements and constraints declaratively) to request the rescaling operation. This allows us to:
Only trigger the rescaling operation once we provision the necessary resources
Minimize reprocessing by only triggering a rescaling operation immediately after a state snapshot
Ensure that all jobs make as much progress as possible even if the compute pool is exhausted
This makes rescaling very fast (<1s) for SQL statements with small state (MBs). We are working on a series of improvements to achieve the same for statements with any amount of state. Downtime duration is driven by how much time it takes to repartition the state into the new degree of parallelization, before restarting the Flink job. Some of these improvements are exclusive to Confluent Cloud, while others are happening in Open Source Apache Flink. These include:
FLINK-32326 (released in Flink 1.18) - Disable WAL (Write-Ahead Log) for restore operations by eliminating the overhead of writing all changes to the log which can impact performance
FLINK-32345 (released in Flink 1.18) - Modifications to support parallelization across multiple state types and handles for faster downloads of incremental checkpoints
FLINK-33341 (merged for Flink 1.19) - Use of available local state in rescaling scenarios to reduce the amount of data to download from remote storage
FLINK-31238 (planned for Flink 1.19) - Improvements to RocksDB which allow for faster merge and split operations of multiple state handles, and a new way of restoring the state after rescaling
We will share more information about the Confluent-specific improvements in a future whitepaper.
Reduced infrastructure costs are one of the most cited benefits of elastic autoscaling, as you no longer need to over-provision resources to account for workload variability. But you can only achieve these cost savings if your service’s billing model supports it. A usage-based billing model is crucial for data streaming because it provides accurate billing based on fluctuating usage.
Consider a live sporting event. Data streaming peaks during gameplay due to the high volume of generated content, such as player statistics, scores, and commentary. However, there may be a lull in activity during halftime and commercial breaks, resulting in far less data streaming activity. Accurate usage-based billing ensures that you are charged only for the resources you use without paying for unused capacity during periods of low demand.
Our Flink serverless architecture is specifically designed to pass these dynamic billing features and benefits on to you.
Compute Pools have a base price of $0. Creating a compute pool is free, and there are no additional charges for state management or networking (Kafka data ingress/egress applies based on cluster type). You can also use SQL Workspaces and the Data Portal at no cost, and add as many users as the service quota allows.
Compute Pools are billed per minute of usage, matching the rapid response of our autoscaler to changing demands. The autoscaler rapidly scales resource assignments up and down as needed, minimizing both latency and cost in the presence of load spikes and fluctuations.
Compute pools and dynamic billing make mixed workloads, including variable and short-lived explorative workloads, very cost-effective.
Now let’s look at two example workload mixes to illustrate the pricing model.
Many SQL statements are short-lived - even in stream processing! Engineers use interactive queries to explore their streaming data and test their streaming SQL code for correctness. They are essential in the iterative development of long-lived apps and pipelines.
In the following example, a user executes five different SQL queries. Unlike other Flink offerings, Confluent Cloud for Apache Flink's serverless architecture charges only for the five minutes when these queries are executing.
Additionally, all users can share the resources of a single compute pool, resulting in cost savings and a more efficient use of resources. It doesn't matter if the queries are executed by the same person, five different people at the same time, or at different points in the hour. This approach can be particularly beneficial for organizations with multiple users accessing the same data streams.
Total CFU-minutes consumed = 6
Total charge: 6 CFU-minutes x $0.0035/CFU-minutes = $0.021
The compute pool resources scale to zero when not in use, so you only pay for each minute of work these statements perform. Having resources declared for a given pool does not cost you anything.
Data streaming architectures are typically composed of multiple applications, each with their own workload requirements. They often include a mix of interactive, terminating statements and continuous, streaming statements.
For example, let's say four streaming statements are running in a single compute pool. The volume of data in the data streams is oscillating and there are spikes of utilization for short periods within the hour. Each statement uses a minimum price of 1 CFU-minute ($0.0035 in this example) and is automatically scaled up and down as needed on a per-minute basis. The autoscaler ensures that the compute pool has the necessary resources to handle the workload while avoiding overprovisioning, which can result in wasted resources and higher costs.
Total CFU-minutes consumed = 309
Total charge: 309 CFU-minutes x $0.0035/CFU-minute = $1.08125
As demonstrated with the examples above, Confluent Cloud for Apache Flink offers a cost-effective serverless architecture that provides accurate usage-based billing and efficient resource allocation, resulting in cost savings for users.
The serverless architecture of Confluent Cloud for Apache Flink offers a fully managed environment for stream processing applications that abstracts away the complexity of managing Flink, enabling users to focus on app development. However, building the runtime for such an architecture is a challenging task. This blog post delved into the technical details of how the Flink service is implemented, including some of the challenges of optimizing parallelism and achieving efficient resource utilization and fast rescaling. We also highlighted the cost-effectiveness of the serverless Flink service and provided examples of how the usage-based billing model can result in cost savings for users.
Interested in learning more? If you are new to Flink, be sure to watch the lightboard video to learn the basics. And if you haven't already, sign up for a free trial of Confluent Cloud and create your first Flink SQL application within a matter of minutes using the Flink quick start.
Stay tuned for our future whitepaper where we will provide a comprehensive discussion of our serverless architecture and the technical innovations that make it possible!
This course is an introduction to Apache Flink, focusing on its core concepts and architecture. Learn what makes Flink tick, and how it handles some common use cases.
Learn the basics of building Apache Flink applications in Java in our course on Confluent Developer.