Elevating Kafka: Driving operational excellence with Albertsons + Forrester | Watch Webinar
Apache Flink is an open-source data processing framework that offers unique capabilities in both stream processing and batch processing, making it a popular tool for high-performance, scalable, and event-driven applications and architectures.
Easily build high-quality, reusable data streams with the industry’s only cloud-native, serverless Apache Flink® service, fully integrated with Apache Kafka® on Confluent Cloud.
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed, and at any scale. Developers build applications for Flink using APIs such as Java or SQL, which are executed on a Flink cluster by the framework.
While other popular data processing systems like Apache Spark and Kafka Streams are limited to data streaming or batch processing, Flink helps industries such as finance, e-commerce, and telecommunications empower both batch and streams in one.
While Kafka Streams is a library that runs as part of your application, Flink is a standalone stream processing engine that is deployed independently. Flink runs your application in a Flink cluster that you somehow deploy. Fink provides its own solutions to the hard problems faced by a distributed stream processing system, such as fault tolerance, exactly once delivery, high throughput, and low latency. Those solutions involve checkpoints, savepoints, state management, and time semantics.
The diagram below shows the Flink components as well as the Flink runtime flow. The program code or SQL query is composed into an operator graph which is then submitted by the client to a job manager. The job manager breaks the job into operators which execute as tasks on nodes that are running task managers.
Apache Flink is chosen due to its robust architecture and extensive features set. Because it can handle bounded and unbounded streams together, it can unify batch and stream processing under the same umbrella. Its features include sophisticated state management, savepoints and checkpoints, event time processing semantics and exactly once consistency guarantees for state.
Flink also features layered APIs for handling streams at different levels of abstraction, which gives developers the flexibility required to handle both very common as well as hyper specialized stream processing use cases.
Although it’s built as a generic data processor, Flink’s native support of unbounded streams contributed to its popularity as a stream processor. So Flink’s common use cases are very similar to Kafka use cases, although Flink and Kafka serve slightly different purposes. Kafka usually provides the event streaming while Flink is used to process data from that stream. Flink and Kafka are commonly used together for:
Flink’s complex architecture makes it difficult to learn and challenging for even seasoned practitioners to understand, operate and debug. Flink developers and operators often find themselves struggling with complexities around custom watermarks, serialization, type evolution, and so on.
Perhaps a bit more than most distributed systems, Flink poses difficulties with deployment and cluster operations such as performance tuning to account for hardware selection and job characteristics. The most common concerns involve reasoning about the root causes of performance issues such as backpressure, slow jobs, and savepoint restoration from unreasonably large state. Other common issues include fixing checkpoint failures, and debugging job failures such as out of memory errors.
Organizations using Flink tend to require teams of experts dedicated to developing stream processing jobs and keeping the stream processing framework operational. For this reason, Flink has only been economically feasible for large organizations with complex and advanced stream processing needs.
Kafka Streams is a popular client library used for stream processing. Particularly when the input and output data are stored in a Kafka cluster. Because it's part of Kafka, it leverages the benefits of Kafka natively.
ksqlDB layers the simplicity of SQL onto Kafka Streams, providing a nice starting point for stream processing and broadening the audience for it.
Kafka users and Confluent customers often turn to Apache Flink for stream processing needs for a bunch of reasons. For example, complex stream processing often means large intermediate state that does not always make sense to use Kafka for because when it gets sufficiently large, this state becomes its own thing which impacts the resources and planning needed to operate the Kafka cluster. Also, there can be a requirement to process streams from multiple Kafka clusters in different locations, streams in Kafka with streams outside of Kafka, and so on.
Because Flink is its own distributed system with its own complexities and operational nuances, it has been unclear when the benefits of working with Apache Flink outweigh the complexities incurred by it or the costs associated with it, especially when other stream processing technologies such as Kafka Streams or Apache Spark are available.
But the cloud changes that calculation and allows us to fully embrace Apache Flink. By offering Apache Flink as a fully managed cloud service, we get to bring the benefits that Confluent Cloud brings to Kafka to Flink as well. The operational complexities and nuances that make Apache Flink complex and costly, such as instance type or hardware profile selection, node configuration, state back end selection, managing snapshots, savepoints and so on, are handled for you, enabling developers to focus exclusively on their application logic rather than Flink specific nuances.
Not only does this bring the capabilities of Flink to Confluent Cloud, but it also changes the economics behind when Flink makes sense to use for stream processing. This means Flink can be used for more use cases earlier in an organization's streaming maturity, It also means the developer can pick and choose the stream processing layer that works for them, allowing the developer to stick with one platform as their needs change or their complexity increases.
Now that Flink has a mature and robust SQL interface, it makes sense to start there, not only for end users adopting stream processing, but also for Confluent adopting Apache Flink.