Introducing Confluent Private Cloud: Cloud-Level Agility for Your Private Infrastructure | Learn More
The true power of data emerges when streaming, analytics, and artificial intelligence (AI) connect—transforming real-time streaming data into actionable intelligence. Yet bridging that gap has long been one of the most complex challenges in modern data architecture. Confluent makes it effortless to capture and process continuous streams of data, while Databricks empowers teams to analyze, govern, and apply AI through Unity Catalog. Bringing these worlds together has traditionally required complex, fragile pipelines—until now.
Confluent Tableflow eliminates that complexity by seamlessly transforming streaming data from Apache Kafka® into open, governed, and AI-ready Delta Lake tables managed through Databricks Unity Catalog. This makes real-time data instantly available for analytics and AI, without the need for custom ETL or batch jobs.
Tableflow support for Delta Lake and Unity Catalog is now generally available. It enables organizations to seamlessly connect real-time streaming data to analytics and AI in Databricks, unlocking a continuous flow of governed intelligence from event streams to insights.
Confluent’s data streaming platform enables organizations to connect and transform all their data in motion, making it real-time, contextual, trustworthy, and reusable across every system and team. It brings together the power of Kafka, governed data streaming, and Apache Flink® stream processing to create universal data products that can be easily shared and reused across the enterprise. With Confluent, data flows continuously and securely, fueling modern applications, analytics, and AI with fresh, governed intelligence.
Once streaming data is flowing continuously through Confluent, Databricks ensures that data is governed, discoverable, and ready for analytics and AI.
The Databricks Data Intelligence Platform connects data, analytics, and AI so organizations can understand and act on all their data in real time. Built on a lakehouse architecture, Databricks delivers reliable data management, fast SQL analytics, and scalable AI—all under a single governance model.
Together, Databricks and Confluent enable a continuous flow of intelligence, from operational events to analytical and AI outcomes.
Key benefits include:
Real-time analytics – Query and visualize streaming data instantly as it lands in Unity Catalog.
Unified data and AI platform – Run SQL, business intelligence (BI), machine learning (ML), and AI workloads in just one environment.
Open interoperability – Leverage Delta and Apache Iceberg™ for flexibility across tools and engines.
End-to-end governance – Maintain consistent policies and lineage through Unity Catalog.
AI-ready foundation – Feed trusted data directly into Mosaic AI models and agents.
The heart of the Databricks Data Intelligence Platform is Unity Catalog—the industry’s first unified governance solution for all data and AI assets.
Unity Catalog delivers unified governance across all assets—Delta and Iceberg tables, unstructured data, and AI models—and unified capabilities that go beyond traditional access control and auditing to include discovery, search, lineage, quality monitoring, and business semantics.
Turning Kafka topics into analytical tables has traditionally been complex, costly, and error-prone. Conventional data pipelines typically read data from Kafka and dump raw records into object storage, such as Amazon S3, using sink connectors. From there, teams must build a chain of ETL jobs to parse events, convert them into Parquet and Delta Lake or Iceberg formats, manage schema evolution, compact small files, materialize change data capture (CDC) streams, and publish tables to a catalog such as Unity Catalog.
Each step, from configuring sink connectors to orchestrating ETL jobs, adds operational overhead. Connectors must be tuned for throughput, retries, and recovery; ETL pipelines must safely manage offsets, handle late or out-of-order events, and ensure atomic commits. Even then, ongoing maintenance tasks, such as compaction, snapshot expiration, schema management and evolution, and publishing to catalogs, demand constant engineering effort. The result is a complex, fragile, and expensive data preparation process that’s difficult to scale.
Tableflow fundamentally rethinks this process. Instead of managing connectors, ETL jobs, and data preparation pipelines, Tableflow streams operational data from Confluent directly into governed Delta Lake or Iceberg tables in object storage. It then seamlessly publishes those tables to catalogs such as Databricks Unity.
Tableflow automates type conversions, schematization, schema evolution, table maintenance, and catalog syncing, thereby eliminating tedious data preparation work. With Tableflow, every Kafka topic can become an AI-ready Delta table—instantly available for query in Databricks SQL, governed by Unity Catalog, and optimized for both streaming freshness and analytical performance.
When Confluent Tableflow registers Delta tables in Unity Catalog, the tables are seamlessly governed and discoverable in Unity Catalog. Customers can apply consistent governance and access controls, and downstream lineage and auditing is instantly tracked.
Confluent Tableflow’s Delta Lake support with Databricks Unity Catalog integration is now generally available, bringing a simpler, governed path from Kafka topics to AI and analytics-ready Delta tables.
With this launch, we’re adding the following key features:
Schema evolution for Delta Lake tables – Get support for adding optional fields with default values and type widening.
Upserts for Delta tables – Insert, update, and delete individual rows in Tableflow Delta Lake tables.
Multi-format control – Run Delta and Iceberg side by side with per-format enable/disable.
Multiple catalog integrations per cluster – Attach distinct catalogs (e.g., Unity Catalog, AWS Data catalog) to the same Confluent Cloud cluster.
Let’s take a closer look at how Tableflow’s support for Delta Lake tables is implemented under the hood.
Tableflow runs on Kora, Confluent Cloud’s cloud-native streaming engine for elastic scale, high reliability, and efficient cost. Kora stores recent data on fast local disks (hot tier) and asynchronously offloads log segments to durable cloud object storage (cold tier). Tableflow leverages these tiers to turn streams into analytics-ready tables. The diagram below outlines the core components.
When a user enables Tableflow on a Kafka topic, the system automatically begins materializing that topic into an open table format (Iceberg or Delta).
Step 1: Metadata fetching and schema discovery – Tableflow fetches the schema associated with the topic from Confluent Schema Registry and uses it to generate the target table schema. This ensures that Avro, Protobuf, or JSON messages are correctly mapped into columnar tables. Tableflow queries Kafka’s metadata to determine log segments and offsets for the topic partitions. This metadata guides Tableflow’s materialization job for ingestion, defining which segment files need to be processed and where parallel reads can begin.
Step 2: Reading from tiered storage – Instead of reading events through Kafka consumer APIs, Tableflow pulls segment files, which are identified using metadata, directly from Kafka’s tiered cloud object storage. By bypassing brokers, this approach reduces overhead on the cluster and enables parallel segment processing, making table materialization far more efficient.
Step 3: Conversion and storage – Segment files are decoded and converted into Apache Parquet™️ files and are then written into the user’s configured object storage (e.g., Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage) as Iceberg or Delta table data files.
Step 4: Metadata and catalog commits – Along with writing data, Tableflow generates Iceberg or Delta metadata (manifests, snapshots, commit logs) and commits Iceberg tables to its built-in Iceberg REST Catalog. For Delta tables, Tableflow leverages the Delta Kernel to create and manage Delta-compliant metadata and commit logs.Tableflow also automatically publish table metadata pointers to external catalog services such as Databricks Unity Catalog, making the tables discoverable and queryable across multiple analytics and compute engines.
Step 5: Table maintenance and optimizations – Once tables are established, Tableflow actively optimizes them by compacting small files, expiring snapshots, rewriting manifests, and evolving partitions as query patterns change. If advanced features like upserts are enabled, additional logic such as deduplication and equality deletes are applied.
Tableflow’s Delta Lake support is built on Delta Kernel, the open source library developed by Databricks to make Delta Lake universally accessible across any processing engine. Delta Kernel provides Java and Rust APIs that let developers read, write, and commit to Delta tables directly—without handling low-level Delta protocol details. This simplifies connector development and makes it easy for connectors to adopt the latest Delta innovations, such as catalog-managed commits, VARIANT data type, and type widening.
By leveraging Delta Kernel, Tableflow achieves high performance, strong compatibility, and open interoperability while simplifying the path from real-time streaming data to fully governed Delta tables in Databricks Unity Catalog.
Specifically, Delta Kernel enables Tableflow to do the following:
Maintain full atomicity, consistency, isolation, and durability (ACID) guarantees and Delta-compliant transaction logs for reliable table updates
Write at scale, efficiently materializing Kafka topics into optimized Parquet data and metadata commits
Stay future-proof with transparent adoption of new Delta Lake features and protocol versions
Tableflow on Confluent Cloud now fully empowers organizations to bridge the gap between fast-moving operational data and trusted, governed tables for analytics without compromising security, scale, or flexibility. With easy, automatic integration to Delta Lake and Databricks Unity Catalog, true CDC/upsert table materialization, enterprise security via Bring Your Own Key (BYOK), and resilient error handling, it's never been easier to connect real-time data to business insights.
Ready to unlock the full, transformative potential of your streaming data for cutting-edge AI and advanced analytics? Explore Tableflow today.
Learn more: Dive into the Tableflow product documentation.
See it in action: Watch our short introduction video or Tim Berglund's lightboard explanation.
Get started: If you're already using Confluent Cloud, navigate to the Tableflow section for your cluster. New users can get started with Confluent Cloud for free and explore Tableflow's capabilities. Try out the Delta Lake and Unity Catalog quick start guide.
Contact us today for a personalized demo on Tableflow and start unlocking the full potential of your streaming data on Confluent Cloud. We’re incredibly excited to see how you leverage Tableflow to turn your real-time data streams into tangible business value.
Apache®, Apache Kafka®, Kafka®, Apache Flink®, Flink®, Apache Iceberg™️, Iceberg™️, Apache Parquet™️, and Parquet™️ are either registered trademarks or trademarks of the Apache Software Foundation. No endorsement by the Apache Software Foundation is implied by the use of these marks.
Tableflow represents Kafka topics as Apache Iceberg® (GA) and Delta Lake (EA) tables in a few clicks to feed any data warehouse, data lake, or analytics engine of your choice
Tableflow on Confluent Cloud now supports Delta Lake, Unity Catalog, and Azure (EA) for secure, governed, real-time analytics from Apache Kafka data - no ETL or custom pipelines required.