Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

Making Flink Serverless, With Queries for Less Than a Penny

Written By

Imagine easily enriching data streams and building stream processing applications in the cloud, without worrying about capacity planning, infrastructure and runtime upgrades, or performance monitoring. That's where our serverless Apache Flink® service comes in, as announced at this year’s Current | The Next Generation of Kafka Summit.

By abstracting away the underlying infrastructure and providing a fully managed environment, Confluent Cloud for Apache Flink empowers you to focus on building robust, scalable stream processing applications without any operational overhead. In this blog post, we'll take a deep dive into the serverless architecture of Confluent Cloud for Apache Flink and explore its many benefits, including reduced infrastructure costs, increased reliability, and more seamless adoption.

If you're interested in trying out the public preview of our cloud-native, serverless Flink service, sign up for a free trial of Confluent Cloud and watch the lightboard video to learn more.

So, let's start by digging into what makes our Flink service a serverless offering in the first place.

Why is Confluent Cloud for Apache Flink a serverless offering?

Deploying, managing, and scaling Apache Flink workloads can be challenging. One major challenge is the upfront cost of setting up and configuring the cluster, which requires significant technical expertise. Additionally, maintaining clusters and keeping both the Flink service and its applications up-to-date with Flink upgrades can take time and effort, potentially causing disruptions. Managing Flink applications independently can also cause problems, as a developer needs to size, scale, and parallelize each individual workload. Although these are just some of the challenges, they alone can make it difficult to adopt stream processing at scale, effectively and efficiently.

With Confluent Cloud's serverless Flink offering, developers can benefit from three primary serverless dimensions: elastic autoscaling with scale-to-zero, evergreen runtime and APIs, and usage-based billing. The autoscaler manages scale-out, parallelism, and load balancing, eliminating the need for pre-sizing workloads and capacity planning. We provide fully automated and transparent upgrades to keep the Flink runtime up-to-date with the latest security patches, along with strong backward compatibility guarantees to ensure uninterrupted operations for your clients. We also provide declarative APIs that enable developers to focus on building business logic, not managing infrastructure. And with usage-based billing developers only pay for what they use, with automatic downscaling for unused resources (more on the cost model later).

Our fully managed serverless architecture is based on compute pools.

Flink compute pools provide elastic compute resources

Compute pools expand and shrink based on the resources required by the Flink SQL statements using them. A compute pool will never exceed its configured maximum number of CFUs (logical units of processing power) and therefore can act as a cap on maximum budget spend. When a compute pool reaches its maximum capacity, all statements using it compete for resources. As a result, compute pools also serve to isolate statements from each other, ensuring that statements using different pools will never compete for resources.

Challenges of Cloud-Native Elasticity for Apache Flink

When we approached the implementation of an elastic, reliable, and cost-efficient runtime for Apache Flink, we had to address several key questions: 

  1. How do we determine the resources required by each SQL statement to keep up with the rate of all its input tables (topics)? 

  2. Once we've determined the required resources, how do we make the actual statement rescaling operation as non-disruptive as possible? 

  3. How do we ensure that statement submission is fast, even if a pool is scaled to zero? 

  4. How do we prevent resource fragmentation in the presence of frequent runtime upgrades and a high number of small compute pools? 

In the remainder of this section, I will cover the first two of these questions while leaving a comprehensive discussion of all of them to a future whitepaper on our serverless architecture for Apache Flink.

Fine-Grained Scaling

A Flink SQL statement in Confluent Cloud is translated into a logical dataflow graph, called Flink Jobgraph. Each vertex of this graph can run at its own parallelism. The parallel instances of each vertex are called tasks, and they are distributed to the Taskmanagers of a Flink Cluster for execution. In the example below, "Source B" runs at a parallelism of four, while "Source A" and "Join-Sink" (an operator chain consisting of a join operator and a subsequent sink) run at a parallelism of two. In this example, the tasks are distributed across two Taskmanagers. 

Mapping the logical dataflow graph to a physical dataflow graph

Now, the challenge is to determine the optimal parallelism for each of these vertices (horizontal scaling) as well as the physical resources assigned to each of these parallel instances (vertical scaling). To accomplish this, we use a proprietary algorithm based on the DS2 algorithm. The algorithm first derives the desired data rate for each source from metrics of the consumed topic(s). Then, it traverses through the dataflow graph and determines the desired number and size of tasks for each vertex based on the ratio between the data rates of adjacent vertices.

Fast Rescaling

Once we've determined the desired parallelism for each vertex, the focus shifts to ensuring that the rescaling operation is as non-disruptive as possible, with a goal of providing sub-second rescaling. Since a rescaling operation in Apache Flink always involves stopping the job and then restarting it with updated resources and vertex parallelisms, we aim to trigger the rescaling operation sparingly.

To achieve this, Confluent Cloud uses Flink's Adaptive Scheduler (dynamically adjusts the parallelism of Flink jobs) and Declarative Resource Management (specify resource requirements and constraints declaratively) to request the rescaling operation. This allows us to: 

  • Only trigger the rescaling operation once we provision the necessary resources

  • Minimize reprocessing by only triggering a rescaling operation immediately after a state snapshot

  • Ensure that all jobs make as much progress as possible even if the compute pool is exhausted

This makes rescaling very fast (<1s) for SQL statements with small state (MBs). We are working on a series of improvements to achieve the same for statements with any amount of state. Downtime duration is driven by how much time it takes to repartition the state into the new degree of parallelization, before restarting the Flink job. Some of these improvements are exclusive to Confluent Cloud, while others are happening in Open Source Apache Flink. These include:

  • FLINK-32326 (released in Flink 1.18) - Disable WAL (Write-Ahead Log) for restore operations by eliminating the overhead of writing all changes to the log which can impact performance

  • FLINK-32345  (released in Flink 1.18) - Modifications to support parallelization across multiple state types and handles for faster downloads of incremental checkpoints

  • FLINK-33341 (merged for Flink 1.19) - Use of available local state in rescaling scenarios to reduce the amount of data to download from remote storage

  • FLINK-31238 (planned for Flink 1.19) - Improvements to RocksDB which allow for faster merge and split operations of multiple state handles, and a new way of restoring the state after rescaling

We will share more information about the Confluent-specific improvements in a future whitepaper.

The cost-effectiveness of serverless Flink

Reduced infrastructure costs are one of the most cited benefits of elastic autoscaling, as you no longer need to over-provision resources to account for workload variability. But you can only achieve these cost savings if your service’s billing model supports it. A usage-based billing model is crucial for data streaming because it provides accurate billing based on fluctuating usage.

Consider a live sporting event. Data streaming peaks during gameplay due to the high volume of generated content, such as player statistics, scores, and commentary. However, there may be a lull in activity during halftime and commercial breaks, resulting in far less data streaming activity. Accurate usage-based billing ensures that you are charged only for the resources you use without paying for unused capacity during periods of low demand. 

Our Flink serverless architecture is specifically designed to pass these dynamic billing features and benefits on to you. 

  • Compute Pools have a base price of $0. Creating a compute pool is free, and there are no additional charges for state management or networking (Kafka data ingress/egress applies based on cluster type). You can also use SQL Workspaces and the Data Portal at no cost, and add as many users as the service quota allows. 

  • Compute Pools are billed per minute of usage, matching the rapid response of our autoscaler to changing demands. The autoscaler rapidly scales resource assignments up and down as needed, minimizing both latency and cost in the presence of load spikes and fluctuations. 

Compute pools and dynamic billing make mixed workloads, including variable and short-lived explorative workloads, very cost-effective.

Now let’s look at two example workload mixes to illustrate the pricing model.

Example #1: Data Exploration and Discovery 

Many SQL statements are short-lived - even in stream processing! Engineers use interactive queries to explore their streaming data and test their streaming SQL code for correctness. They are essential in the iterative development of long-lived apps and pipelines. 

In the following example, a user executes five different SQL queries. Unlike other Flink offerings, Confluent Cloud for Apache Flink's serverless architecture charges only for the five minutes when these queries are executing. 

Additionally, all users can share the resources of a single compute pool, resulting in cost savings and a more efficient use of resources. It doesn't matter if the queries are executed by the same person, five different people at the same time, or at different points in the hour. This approach can be particularly beneficial for organizations with multiple users accessing the same data streams. 



# Statements


Total CFU-minutes


Pricing calculation

  • Total CFU-minutes consumed = 6

  • Total charge: 6 CFU-minutes x $0.0035/CFU-minutes = $0.021

The compute pool resources scale to zero when not in use, so you only pay for each minute of work these statements perform. Having resources declared for a given pool does not cost you anything.

Example #2: Mix of Short-lived and Variable Statements 

Data streaming architectures are typically composed of multiple applications, each with their own workload requirements. They often include a mix of interactive, terminating statements and continuous, streaming statements. 

For example, let's say four streaming statements are running in a single compute pool. The volume of data in the data streams is oscillating and there are spikes of utilization for short periods within the hour. Each statement uses a minimum price of 1 CFU-minute ($0.0035 in this example) and is automatically scaled up and down as needed on a per-minute basis. The autoscaler ensures that the compute pool has the necessary resources to handle the workload while avoiding overprovisioning, which can result in wasted resources and higher costs.



# Statements


Total CFU-minutes


Pricing calculation

  • Total CFU-minutes consumed = 309

  • Total charge: 309 CFU-minutes x $0.0035/CFU-minute = $1.08125

As demonstrated with the examples above, Confluent Cloud for Apache Flink offers a cost-effective serverless architecture that provides accurate usage-based billing and efficient resource allocation, resulting in cost savings for users.

Getting Started with Serverless Flink

The serverless architecture of Confluent Cloud for Apache Flink offers a fully managed environment for stream processing applications that abstracts away the complexity of managing Flink, enabling users to focus on app development. However, building the runtime for such an architecture is a challenging task. This blog post delved into the technical details of how the Flink service is implemented, including some of the challenges of optimizing parallelism and achieving efficient resource utilization and fast rescaling. We also highlighted the cost-effectiveness of the serverless Flink service and provided examples of how the usage-based billing model can result in cost savings for users.

Interested in learning more? If you are new to Flink, be sure to watch the lightboard video to learn the basics. And if you haven't already, sign up for a free trial of Confluent Cloud and create your first Flink SQL application within a matter of minutes using the Flink quick start.

Stay tuned for our future whitepaper where we will provide a comprehensive discussion of our serverless architecture and the technical innovations that make it possible!

  • Konstantin is a member of the Apache Flink PMC, long-term contributor to the project and group product manager at Confluent. He joined the company early this year as part of the acquisition of Immerok which he had co-founded with a group of long-term community members earlier last year. Formerly, as Head of Product at Ververica, Konstantin supported multiple teams working on Apache Flink in both discovery as well as delivery. Before that he was leading the pre-sales team at Ververica, helping their clients as well as the Open Source Community to get the most out of Apache Flink.

Did you like this blog post? Share it now