Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
Last year, we announced our plan to build a cloud-native Apache Flink® service to meet the growing demand for scalable and efficient stream processing solutions in the cloud.
Today, we're thrilled to announce the general availability of Confluent Cloud for Apache Flink across all three major clouds. This means that you can now experience Apache Kafka® and Flink as a unified, enterprise-grade platform to connect and process your data in real time, wherever you need it. In this blog post, we'll explore what sets our fully managed service for Flink apart and the steps we've taken to ensure it's production-ready to handle the most mission-critical use cases at scale.
To get started, check out the Flink quick start and product video below to learn more. It's truly an exciting time to be part of the Kafka and Flink communities, and we hope everyone takes advantage of this opportunity to level up their streaming capabilities.
Let's begin by highlighting the importance of stream processing and exploring what makes Confluent Cloud for Apache Flink unique.
Stream processing plays a critical role in the infrastructure stack for data streaming. Developers can use it to filter, join, aggregate, and transform their data streams on the fly to support cutting-edge use cases like fraud detection, predictive maintenance, and real-time inventory and supply chain management.
Among stream processing frameworks, Apache Flink has emerged as the de facto standard because of its performance and rich feature set. It's relied upon by both innovative digital native companies like LinkedIn, Uber, and Netflix, along with leading enterprises like Apple, ING, and Goldman Sachs, to support mission-critical streaming workloads. However, self-managing Flink (as with other open source tools like Kafka) can be challenging due to its operational complexity, steep learning curve, and high costs for in-house support.
That's why we re-imagined Flink as a truly cloud-native service with Confluent Cloud for Apache Flink, allowing users to focus on their business logic, and not on operations. Let’s break down the key benefits of our serverless Flink offering.
Flink supports the ANSI SQL standard, enabling anyone familiar with SQL to start a Flink SQL query in seconds! Confluent supports a variety of ways to leverage SQL to explore streaming data and build real-time applications:
CLI
Developers can use the SQL shell in the Confluent CLI to write and run Flink SQL queries, which includes features like autocompletion for efficient query development.
SQL Workspaces
For a more graphical approach, SQL Workspaces provides a browser-based SQL interface to interact with Kafka data. This interface makes it easy to explore and query Kafka topics as if they were database tables. SQL Workspaces also support multiple, independent SQL cells, enabling users to run several queries concurrently and manage complex workflows. With queries being automatically saved, users can seamlessly resume their work even after logging out.
Data Portal and Flink Actions
Data Portal, a self-service interface for discovering, exploring, and accessing data, provides contextual entry points in the topic view to open new SQL workspaces and query Kafka data in just a few clicks.
Recently introduced Flink Actions, also accessible directly from Data Portal, are a set of pre-built, turn-key stream processing transformations addressing common, domain-agnostic requirements. Users can easily define Actions using an intuitive user interface, and a Flink job is automatically created and maintained. Actions are designed to be intuitive, enabling users to harness the power of Flink with minimal effort and without needing deep familiarity with Flink SQL.
User-defined functions (UDFs)
To extend the functionality of Flink SQL, user-defined functions (UDFs) in Java are now available for early access to a select set of customers. UDFs allow for customized data processing in Flink SQL, enabling complex calculations or custom business logic tailored to specific use cases and business requirements. UDFs can be invoked directly from the SQL editor, enhancing the expressiveness of queries. Beyond SQL and UDFs, we're also enhancing our support for additional Flink APIs, including the Flink Table API, later this year.
Deploying, managing, and scaling Apache Flink workloads on your own can be challenging due to the upfront costs of sizing and configuring clusters, ongoing maintenance efforts, and the complexities of managing Flink applications. Our cloud-native Flink offering addresses these challenges by offering simple, serverless stream processing with zero operational burden.
Truly serverless
Our Flink offering provides a serverless experience across three key dimensions: elastic autoscaling, an always-updated runtime, and usage-based billing. The autoscaler manages scale-out, parallelism, and load balancing, removing the need for predicting workload sizes or planning capacity. We provide automated upgrades to keep Flink current with security patches and the latest features. Additionally, with per-minute usage-based billing, users only pay for resources they use when they use it, and unused resources are automatically scaled down.
We've also implemented enhancements to ensure Flink is production-ready and battle-hardened, allowing organizations to trust Flink with their critical data processing tasks. This enables them to focus on core business objectives without worrying about data integrity or system downtime.
Cloud-native reliability
Apache Flink's architecture ensures guaranteed correctness with mechanisms for exactly-once processing, critical for reliable data processing. Its fault-tolerant design gracefully handles failures using distributed snapshotting for consistent checkpoints and quick restoration, minimizing downtime while preserving state. Flink's integration with the two-phase commit protocol for distributed transactions with Kafka enhances transaction support, addressing potential data loss issues under Exactly-Once-Semantics (EOS) and showcasing the deep collaboration between Apache Flink and Kafka for data integrity and reliability.
With Confluent Cloud for Apache Flink, automatic updates reduce the risk of downtime or data loss due to outdated software or vulnerabilities. Additionally, our autoscaler ensures efficient resource allocation and reduces the risk of performance bottlenecks, throttling, or failures during peak usage periods.
SLA and expert support
Our Flink service is backed by a 99.99% uptime SLA at the minute level, supporting the most stringent streaming processing workloads. We also offer committer-led support and services from leading Kafka and Flink experts.
Customers can also benefit from Confluent's global network of certified system integrators, including Deloitte, Ness Digital Engineering, Somerford Associates, Improving, Psyncopate, Platformatory, Synthesis Software Technologies (Pty) Ltd, and iLink Digital for building, deploying, and tuning Flink-based applications. These integrators provide on-site engineering assistance and staff augmentation, helping customers accelerate time-to-market, reduce costs, and achieve better outcomes for their stream processing use cases.
Production operations
Features such as programmatic deployments and infrastructure-as-code enable faster and more reliable deployment of Flink applications.
Confluent's REST API enables customers to manage the full end-to-end lifecycle of their applications, supporting customizable and flexible deployments. Terraform support for Flink statements further streamlines the deployment process, enabling fully automated deployments integrated with version control systems like Git for consistent deployments across different environments, including development, testing, and production. Developers can also integrate these programmatic deployments with existing tools such as GitHub, GitLab, and Jenkins to automate their pipelines.
Our integration with metrics and monitoring tools enhances operational visibility for Flink applications. Flink metrics are accessible through the Confluent Cloud Metrics API, enabling users to monitor all resources from one place and supporting popular tools like Prometheus, Grafana, and Datadog for setting alerts and proactive issue responses.
While Kafka and Flink are often paired for stream processing, challenges arise from mismatched data formats and schemas, complicating integration and compromising trust and accuracy. Confluent's Flink service seamlessly integrates with our Kafka offering, enhancing capabilities with a generalized layer of streaming compute over data movement and storage, powered by the Kora engine.
Data compatibility and governance
Confluent Cloud simplifies metadata management by eliminating the need for duplicating metadata, enhancing data discovery and exploration. Our Flink service's native integration with Kafka and Schema Registry ensures that Kafka topics are readily available for querying in Flink, and tables created in Flink are accessible as Kafka topics with schemas, making all topics immediately queryable via Flink SQL in Confluent Cloud.
Additionally, Confluent's integration of Stream Lineage with Flink provides visibility and traceability to streaming data processing, allowing users to visually track how data is transformed and processed by Flink. This feature captures and displays the lineage of messages as they pass through Flink queries, enabling users to trace the origin, transformation, and destination of each data flow.
Security
Open source Apache Flink lacks a built-in security model, requiring organizations to address security considerations when deploying it, which can be quite risky on their own. In contrast, Confluent Cloud offers a comprehensive suite of battle-tested security features to control, manage, and govern access to all of your critical data streams. Flink in Confluent Cloud inherits the same Identity and Access Management providers, including role-based access control (RBAC), allowing for secure, scalable management of permissions. Additionally, Flink uses managed service accounts for executing continuous statements, enhancing manageability.
Multi-cloud
Our fully managed Flink service is available across all three major cloud service providers, providing customers with the flexibility to seamlessly deploy stream processing workloads everywhere their data and applications reside. Please refer to our docs for the newest supported regions.
One of the popular use cases supported by Flink is streaming data pipelines for Gen AI and Large Language Models (LLMs). As building Gen AI apps becomes a top priority for many companies, vector databases are often paired with LLMs to provide additional context to improve their accuracy and relevance to specific use cases for a business.
For vector databases to be most useful and make the best recommendations, they must be updated in real time to accurately reflect the current state of the business, such as real-time inventory levels, supply chain status and logistics, and financial market rates. If the data in a vector database becomes stale or low fidelity because of pipelines that rely on periodic batch processing or public datasets, the reliability and relevance of the results from generative AI diminishes.
By leveraging Kafka and Flink as a unified platform with Confluent, teams can connect to data sources across any environment, clean and enrich data streams on the fly, and deliver them as instantly usable inputs in real time to vector databases. Our fully managed Flink service enables users to process data in flight and create high-quality data streams to power the most impactful AI applications. With integrations into leading vector database vendors, such as Elastic, MongoDB, Pinecone, Rockset, SingleStore, Weaviate, and Zilliz, Confluent’s data streaming architecture simplifies and accelerates the development of real-time GenAI apps. This ensures your GenAI apps have the most up-to-date view of your business.
The journey to the general availability of Confluent Cloud for Apache Flink has been one of continuous improvement and innovation. We've architected our serverless Flink offering to meet the demanding requirements of mission-critical applications, ensuring it's ready to handle the significant scale and complexity often required by streaming use cases.
We've reimagined Apache Flink as a truly cloud-native service, allowing users to focus on their business logic while we handle the operations. From streamlining deployment and management to providing seamless integration with Kafka, our serverless, fully managed Flink service offers best-in-class stream processing capabilities. We've also made significant strides in ensuring reliability and performance, delivering reliable stream processing with a 99.99% uptime SLA while supported by the industry’s top Kafka and Flink experts.
With the general availability of Confluent Cloud for Apache Flink across all major clouds, we're excited for our customers to experience the power of Kafka and Flink as a unified, enterprise-grade data streaming platform. We invite you to explore the possibilities with our Flink service and stay tuned for future product announcements, including additional programmatic APIs, as we continue to innovate and deliver cutting-edge solutions for real-time stream processing. To learn more, be sure to register for the upcoming Flink webinar to get hands on with a technical demo that showcases the full capabilities of Kafka and Flink unified with Confluent.
In this blog post, we will provide an overview of Confluent's new SQL Workspaces for Flink, which offer the same functionality for streaming data that SQL users have come to expect for batch-based systems.
Several key new features have been added to Confluent Cloud for Apache Flink this year including Topic Actions, Terraform support, and expansion into GCP and Azure. Let's take a look at these enhancements and how they empower users to harness the full potential of streaming data.