Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
The Confluent Q3 ‘21 release is here and packed full of new features that enable the world’s most innovative businesses to continue building what keeps them on top: real-time, mission-critical services fueled by data in motion.
This is our first quarterly release for Confluent, which is a new cadence we’re introducing to provide our customers with a single resource to learn about the accelerating number of new features we’re launching. We’ll also point you toward the resources you need to start using the features right away.
In this release, you’ll also find a description of what Confluent sees as the key pillars of a best-in-class service for data in motion, and how we’re delivering on each of them to support our customers.
Here’s an overview of what you’ll find in this blog post:
Depended upon by over 70% of the Fortune 500 today, Apache Kafka® has become the industry standard for real-time event streaming.
However, to move beyond event streaming and capture the full value of data in motion, you need a solution that goes beyond open source tools to help you focus on what matters most: building, launching, and running the applications that differentiate your business.
That solution is what we’ve built here at Confluent—we’re delivering a fully managed solution for Apache Kafka that is cloud-native, complete, and available everywhere. These three pillars are paramount to a data-in-motion platform, representing the capabilities that you need to drive key business outcomes and achieve aggressive IT initiatives.
Cloud-native: We’ve re-engineered Kafka to provide a best-in-class cloud experience, for any scale, without the operational overhead of infrastructure management. Confluent offers the only truly cloud-native experience for Kafka—delivering the serverless, elastic, cost-effective, highly available, and self-serve experience that developers expect.
Complete: Creating and maintaining real-time applications requires more than just open source software and access to scalable cloud infrastructure. Confluent makes Kafka enterprise ready and provides customers with the complete set of tools they need to build apps quickly, reliably, and securely. Our fully managed features come ready out of the box, for every use case from POC to production.
Everywhere: Most organizations today run in multiple regions and environments. Confluent meets our customers everywhere they need to be—powering and uniting real-time events across regions, across clouds, and across on-premises environments.
Within each of our quarterly releases, you’ll learn about our development in these three pillars of Confluent. Let’s take a look at what’s in the Q3 ‘21 release starting with a major addition to our everywhere capabilities: the general availability of Cloud Cluster Linking.
Building modern data systems that span environments isn’t easy. Confluent is changing that with the general availability (GA) of fully managed Cluster Linking. We’ve simplified geo-replication and multi-cloud data movement, enabling you to increase the reliability of your global Kafka deployment and to unify your cloud environments. Cluster Linking equips you with an easy-to-use solution for global data replication, disaster recovery readiness, and simple workload migrations. Data in motion can power every cloud application—across public and private clouds—with linked Kafka clusters that sync in real time.
Continue reading about Cloud Cluster Linking
To deliver the real-time applications that today’s businesses need, you need access to a suite of tools that take you beyond just Kafka. That’s why Confluent has developed tools like ksqlDB, the database purpose-built for stream processing applications.
With ksqlDB, you can process and enrich streams of data flowing through Apache Kafka and serve continuous streaming queries against its derived tables and streams. These queries are known as push queries, which push out ongoing, incremental query results to clients in real time.
However, many applications cannot rely on push queries alone; they also require traditional point-in-time lookups of static information. Now generally available, ksqlDB’s pull queries enable point-in-time lookups on real-time materialized views. This allows customers to simplify their stream processing architecture and broaden the kinds of stream processing workloads they can support, ultimately expanding the possible stream processing applications that they can build.
Continue reading about ksqlDB pull queries
Data is distributed across systems, and that can make it difficult to build applications. Confluent makes it easy to access data across your entire business through a library of fully managed source and sink connectors. As part of the Q3 ‘21 release, we’re excited to announce two new connectors to help businesses set more of their data in motion: Azure Cosmos DB Sink and Salesforce Platform Events Source connectors.
Continue reading about new connectors
Provisioning Kafka clusters and retaining the right amount of data is operationally complex and expensive. With Infinite Storage, now generally available for Google Cloud (alongside AWS) for Standard and Dedicated clusters, Confluent solves this problem by enabling customers to elastically retain infinite volumes of data while only paying for what they use.
Having both real-time and historic data in Kafka allows for more advanced use cases based upon historical context, as well as compliance with regulatory requirements for data retention. It also helps Kafka serve as a central system of record across the entire business. With current and historical information in the same place, businesses are able to take faster, better informed action.
Continue reading about Infinite Storage
Now let’s take a deeper dive into the individual product launches within the release. Already seen enough and ready to learn how to put these new tools to use? Register for the Confluent Q3 ‘21 release instructional demo series.
To fuel their global architectures with real-time data, businesses need a fully managed, easy-to-use, and globally consistent solution for connecting independent clusters across regions and clouds.
With fully managed Cluster Linking, Confluent simplifies geo-replication and multi-cloud data movement, enabling customers to increase the reliability of their global Kafka deployments and unify cloud environments. Teams across the business are equipped with perfectly mirrored and globally consistent topic replication with no additional infrastructure. Data in motion can power every cloud application—across public and private clouds—with linked Kafka clusters that sync in real time.
“In order to meet new architectural requirements and reduce costs, we needed a solution for migrating data and existing workloads to a new Kafka cluster,” said Zen Yui, data engineering manager, Namely. “We completed this migration quickly and easily using Confluent’s Cluster Linking. With perfectly mirrored topic data/metadata replication, offset preservation, and support for non-Java consumers, the migration was even more simple than we expected.”
Historically, moving data between Kafka clusters required additional replication tools like MirrorMaker 2 and Replicator. These solutions can be costly to manage, make it hard to hit Recovery Time Objectives (RTOs), and most notably, hinder agile development and delay application deliveries. Based on Kafka Connect, these replication tools introduce a number of challenges to multi-region and multi-cloud architectures and cloud migrations, including:
Cluster Linking is the next-generation of geo-replication technology, eliminating legacy middle-system approaches that require additional infrastructure. Two clusters can replicate data directly, bidirectionally, and consistently. With just a few simple commands, event data is readily available throughout an entire business. Consumer application offsets are mirrored with no extra plugins or tooling for Kafka clients. System architects can design global environments with improved data consistency guarantees while system admins can operate and monitor these distributed systems with ease, regardless of cloud provider. Teams are able to both improve and simplify critical operations for the business:
We’ve built tools specifically designed to assist system operators with preparation for and management of topic failovers in the event of a disaster recovery procedure:
When promoting clusters during migrations, lag monitoring through the Confluent Cloud Metrics API allows you to determine exactly how much longer a destination topic or cluster link has to go until it is fully synced with historical data from the source. With clear knowledge of when syncing is complete, system operators are able to facilitate smooth migrations with no data loss and minimal producer downtime. Additionally, Cluster Linking puts them in control of when each application is moved over, as promotions can be run consumer by consumer and topic by topic, or in batch. This means you can safely promote topics and applications to the new cluster over time, only when you’re ready, and avoid the hardships of attempting an all-at-once full cluster migration.
Ready to learn how to link clusters within your system directly from Confluent’s expert, Luke Knepper? Register for the Confluent Q3 ‘21 release demos. Once you’ve registered, be sure to check out the Cluster Linking quick start guide.
Here at Confluent, we believe building applications on top of data in motion should be as easy as building CRUD applications on top of a regular database. That’s why we built ksqlDB, the database purpose-built for stream processing applications.
We’ve previously only supported push queries on ksqlDB. However, many stream processing applications require traditional point-in-time lookups of static information. Consider a ride sharing app—the driver’s position and ETA need to be continuously updated in real time, but the driver’s name and the price of the ride only need to be determined once. Up until now, supporting this second query type required users to send derived tables and streams to a separate data store that could serve these point-in-time lookups, resulting in increased architectural complexity and operational burden and forcing teams to work across multiple systems to build a single app.
Helping solve this challenge is why we’re excited to introduce the general availability of ksqlDB pull queries on Confluent Cloud. Pull queries support point-in-time lookups directly on derived tables and streams, complementing the existing functionality of push queries. This functionality enables you to eliminate that second datastore, simplifying your architecture and enabling you to use a single solution to build a complete stream processing app, all with simple and widely familiar SQL syntax.
Together, these complementary push and pull patterns provide flexibility and enable a broad class of end-to-end stream processing workloads and applications.
“Our customers expect instant updates on their order status and what’s in stock, which makes processing inventory data in real time a must-have for our business,” said Chirag Dadia, director of engineering, Nuuly. “ksqlDB pull queries enable us to do point-in-time lookups to harness data that is critical for real-time analytics across our inventory management system. Now, we can pinpoint exactly where each article of clothing is in the customer experience.”
Our connector portfolio helps you modernize and future-proof your data infrastructure, giving you the flexibility to share data broadly and build across any environment. Ready to use straight out of the box, these fully managed connectors save you 3–6 months of engineering development, as well as ongoing operational and maintenance efforts, freeing you instead to focus on building value-add products and apps that drive the business forward.
These newly released connectors strengthen two primary use cases: integrating modern SaaS-based applications with your downstream cloud data warehouse and using Confluent as a data pipeline to modernize your databases.
While real-time data is powering a new frontier of competitive advantages, many business decisions and customer applications need historical context in order to provide more accurate insights and better experiences. Traditionally, businesses have needed to over-provision Kafka clusters to meet their needs for data storage and have had to overpay for more infrastructure and compute than necessary. Or conversely, they skimp on storage in order to cut costs and run the risk of cluster downtime, data loss, and a possible breach in data retention compliance.
With Infinite Storage, now generally available for Google Cloud (alongside AWS) for both Standard and Dedicated clusters, you never have to worry about data storage limitations again. You can offer business-wide access to all of your data while creating an immutable system of record that ensures events are stored as long as needed. Companies building event-driven applications with Confluent can easily scale their apps and use cases without having to worry about horizontal cluster scaling, disk capacity, or the high storage costs that traditionally come with retaining massive amounts of data. You only pay for storage used rather than storage provisioned.
Broadening your Kafka data set with Infinite Storage allows for the development of advanced use cases requiring deep historical context, for example, a lifetime customer transaction history log for financial institutions or clickstream records filtered by product for a retailer. Additionally, you can achieve regulatory requirements for data retention while ultimately establishing Kafka as the central system of record tracking events across your entire business.
Ready to get started? Register for the Confluent Q3 ‘21 release demos for quick guidance on how to get up and running with these features. This four-part webinar series will provide you with once-a-day, bite-sized tutorials for how to get started with the all latest capabilities available on the platform.
If you’ve not done so already, make sure to sign up for a free trial of Confluent Cloud and pick up a little something extra with promo code Q3CLOUD200.
Learn more about Confluent Cloud updates within the release notes and keep your eyes open for the upcoming Q4 release!
This blog announces the general availability of Confluent Platform 7.8 and its latest key features: Confluent Platform for Apache Flink® (GA), mTLS Identity for RBAC Authorization, and more.
We covered so much at Current 2024, from the 138 breakout sessions, lightning talks, and meetups on the expo floor to what happened on the main stage. If you heard any snippets or saw quotes from the Day 2 keynote, then you already know what I told the room: We are all data streaming engineers now.