[Hands-On Workshop] How to Build Streaming Agents with Flink, Claude LLM & Anthropic’s MCP | Register Now

Oct 29, 2025Read Time: 7 min

Streaming Data to AI-Ready Tables: Tableflow for Delta Lake and Databricks Unity Catalog Is Now Generally Available

Written By

Kasun IndrasiriSenior Product Manager
Michelle LeonSenior Staff Product Manager, Databricks

Oct 29, 2025Read Time: 7 min

The true power of data emerges when streaming, analytics, and artificial intelligence (AI) connect—transforming real-time streaming data into actionable intelligence. Yet bridging that gap has long been one of the most complex challenges in modern data architecture. Confluent makes it effortless to capture and process continuous streams of data, while Databricks empowers teams to analyze, govern, and apply AI through Unity Catalog. Bringing these worlds together has traditionally required complex, fragile pipelines—until now.

Confluent Tableflow eliminates that complexity by seamlessly transforming streaming data from Apache Kafka® into open, governed, and AI-ready Delta Lake tables managed through Databricks Unity Catalog. This makes real-time data instantly available for analytics and AI, without the need for custom ETL or batch jobs.

Tableflow support for Delta Lake and Unity Catalog is now generally available. It enables organizations to seamlessly connect real-time streaming data to analytics and AI in Databricks, unlocking a continuous flow of governed intelligence from event streams to insights.

Streaming Data With Confluent

Confluent’s data streaming platform enables organizations to connect and transform all their data in motion, making it real-time, contextual, trustworthy, and reusable across every system and team. It brings together the power of Kafka, governed data streaming, and Apache Flink® stream processing to create universal data products that can be easily shared and reused across the enterprise. With Confluent, data flows continuously and securely, fueling modern applications, analytics, and AI with fresh, governed intelligence.

Governed Analytics and AI With Databricks

Once streaming data is flowing continuously through Confluent, Databricks ensures that data is governed, discoverable, and ready for analytics and AI.

The Databricks Data Intelligence Platform connects data, analytics, and AI so organizations can understand and act on all their data in real time. Built on a lakehouse architecture, Databricks delivers reliable data management, fast SQL analytics, and scalable AI—all under a single governance model.

Together, Databricks and Confluent enable a continuous flow of intelligence, from operational events to analytical and AI outcomes.

Key benefits include:

Real-time analytics – Query and visualize streaming data instantly as it lands in Unity Catalog.
Unified data and AI platform – Run SQL, business intelligence (BI), machine learning (ML), and AI workloads in just one environment.
Open interoperability – Leverage Delta and Apache Iceberg™ for flexibility across tools and engines.
End-to-end governance – Maintain consistent policies and lineage through Unity Catalog.
AI-ready foundation – Feed trusted data directly into Mosaic AI models and agents.

Introduction to Unity Catalog

The heart of the Databricks Data Intelligence Platform is Unity Catalog—the industry’s first unified governance solution for all data and AI assets.

Unity Catalog delivers unified governance across all assets—Delta and Iceberg tables, unstructured data, and AI models—and unified capabilities that go beyond traditional access control and auditing to include discovery, search, lineage, quality monitoring, and business semantics.

The Challenge: Bringing Apache Kafka® Data to Databricks

Turning Kafka topics into analytical tables has traditionally been complex, costly, and error-prone. Conventional data pipelines typically read data from Kafka and dump raw records into object storage, such as Amazon S3, using sink connectors. From there, teams must build a chain of ETL jobs to parse events, convert them into Parquet and Delta Lake or Iceberg formats, manage schema evolution, compact small files, materialize change data capture (CDC) streams, and publish tables to a catalog such as Unity Catalog.

Each step, from configuring sink connectors to orchestrating ETL jobs, adds operational overhead. Connectors must be tuned for throughput, retries, and recovery; ETL pipelines must safely manage offsets, handle late or out-of-order events, and ensure atomic commits. Even then, ongoing maintenance tasks, such as compaction, snapshot expiration, schema management and evolution, and publishing to catalogs, demand constant engineering effort. The result is a complex, fragile, and expensive data preparation process that’s difficult to scale.

Tableflow for Delta Lake Tables With Databricks Unity Catalog

Tableflow fundamentally rethinks this process. Instead of managing connectors, ETL jobs, and data preparation pipelines, Tableflow streams operational data from Confluent directly into governed Delta Lake or Iceberg tables in object storage. It then seamlessly publishes those tables to catalogs such as Databricks Unity.

Tableflow automates type conversions, schematization, schema evolution, table maintenance, and catalog syncing, thereby eliminating tedious data preparation work. With Tableflow, every Kafka topic can become an AI-ready Delta table—instantly available for query in Databricks SQL, governed by Unity Catalog, and optimized for both streaming freshness and analytical performance.

When Confluent Tableflow registers Delta tables in Unity Catalog, the tables are seamlessly governed and discoverable in Unity Catalog. Customers can apply consistent governance and access controls, and downstream lineage and auditing is instantly tracked.

Tableflow + Delta Lake + Unity Catalog Is Now Generally Available

Confluent Tableflow’s Delta Lake support with Databricks Unity Catalog integration is now generally available, bringing a simpler, governed path from Kafka topics to AI and analytics-ready Delta tables.

With this launch, we’re adding the following key features:

Schema evolution for Delta Lake tables – Get support for adding optional fields with default values and type widening.
Upserts for Delta tables – Insert, update, and delete individual rows in Tableflow Delta Lake tables.
Multi-format control – Run Delta and Iceberg side by side with per-format enable/disable.
Multiple catalog integrations per cluster – Attach distinct catalogs (e.g., Unity Catalog, AWS Data catalog) to the same Confluent Cloud cluster.

Tableflow Delta Lake Support: Under the Hood

Let’s take a closer look at how Tableflow’s support for Delta Lake tables is implemented under the hood.

Tableflow runs on Kora, Confluent Cloud’s cloud-native streaming engine for elastic scale, high reliability, and efficient cost. Kora stores recent data on fast local disks (hot tier) and asynchronously offloads log segments to durable cloud object storage (cold tier). Tableflow leverages these tiers to turn streams into analytics-ready tables. The diagram below outlines the core components.

When a user enables Tableflow on a Kafka topic, the system automatically begins materializing that topic into an open table format (Iceberg or Delta).

Step 1: Metadata fetching and schema discovery – Tableflow fetches the schema associated with the topic from Confluent Schema Registry and uses it to generate the target table schema. This ensures that Avro, Protobuf, or JSON messages are correctly mapped into columnar tables. Tableflow queries Kafka’s metadata to determine log segments and offsets for the topic partitions. This metadata guides Tableflow’s materialization job for ingestion, defining which segment files need to be processed and where parallel reads can begin.
Step 2: Reading from tiered storage – Instead of reading events through Kafka consumer APIs, Tableflow pulls segment files, which are identified using metadata, directly from Kafka’s tiered cloud object storage. By bypassing brokers, this approach reduces overhead on the cluster and enables parallel segment processing, making table materialization far more efficient.
Step 3: Conversion and storage – Segment files are decoded and converted into Apache Parquet™️ files and are then written into the user’s configured object storage (e.g., Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage) as Iceberg or Delta table data files.
Step 4: Metadata and catalog commits – Along with writing data, Tableflow generates Iceberg or Delta metadata (manifests, snapshots, commit logs) and commits Iceberg tables to its built-in Iceberg REST Catalog. For Delta tables, Tableflow leverages the Delta Kernel to create and manage Delta-compliant metadata and commit logs.Tableflow also automatically publish table metadata pointers to external catalog services such as Databricks Unity Catalog, making the tables discoverable and queryable across multiple analytics and compute engines.
Step 5: Table maintenance and optimizations – Once tables are established, Tableflow actively optimizes them by compacting small files, expiring snapshots, rewriting manifests, and evolving partitions as query patterns change. If advanced features like upserts are enabled, additional logic such as deduplication and equality deletes are applied.

Delta Kernel to Power Tableflow Delta Lake Tables

Tableflow’s Delta Lake support is built on Delta Kernel, the open source library developed by Databricks to make Delta Lake universally accessible across any processing engine. Delta Kernel provides Java and Rust APIs that let developers read, write, and commit to Delta tables directly—without handling low-level Delta protocol details. This simplifies connector development and makes it easy for connectors to adopt the latest Delta innovations, such as catalog-managed commits, VARIANT data type, and type widening.

By leveraging Delta Kernel, Tableflow achieves high performance, strong compatibility, and open interoperability while simplifying the path from real-time streaming data to fully governed Delta tables in Databricks Unity Catalog.

Specifically, Delta Kernel enables Tableflow to do the following:

Maintain full atomicity, consistency, isolation, and durability (ACID) guarantees and Delta-compliant transaction logs for reliable table updates
Write at scale, efficiently materializing Kafka topics into optimized Parquet data and metadata commits
Stay future-proof with transparent adoption of new Delta Lake features and protocol versions

Get Started With Confluent Cloud and Tableflow

Tableflow on Confluent Cloud now fully empowers organizations to bridge the gap between fast-moving operational data and trusted, governed tables for analytics without compromising security, scale, or flexibility. With easy, automatic integration to Delta Lake and Databricks Unity Catalog, true CDC/upsert table materialization, enterprise security via Bring Your Own Key (BYOK), and resilient error handling, it's never been easier to connect real-time data to business insights.

Ready to unlock the full, transformative potential of your streaming data for cutting-edge AI and advanced analytics? Explore Tableflow today.

Learn more: Dive into the Tableflow product documentation.
See it in action: Watch our short introduction video or Tim Berglund's lightboard explanation.
Get started: If you're already using Confluent Cloud, navigate to the Tableflow section for your cluster. New users can get started with Confluent Cloud for free and explore Tableflow's capabilities. Try out the Delta Lake and Unity Catalog quick start guide.

Contact us today for a personalized demo on Tableflow and start unlocking the full potential of your streaming data on Confluent Cloud. We’re incredibly excited to see how you leverage Tableflow to turn your real-time data streams into tangible business value.

Apache®, Apache Kafka®, Kafka®, Apache Flink®, Flink®, Apache Iceberg™️, Iceberg™️, Apache Parquet™️, and Parquet™️ are either registered trademarks or trademarks of the Apache Software Foundation. No endorsement by the Apache Software Foundation is implied by the use of these marks.

Kasun is a Senior Product Manager at Confluent, driving innovation in the Tableflow product. He has extensive expertise in data streaming and application integration and previously led product management of the Azure Event Hubs product at Microsoft. He is the author of gRPC: Up and Running, and Microservices for Enterprise books. He has also shared his insights as a speaker at popular conferences like Current, KubeCon, and GOTO.
Michelle is a Senior Staff Product Manager at Databricks, working on all things open lakehouse (Unity Catalog, Delta Lake, Iceberg). She previously led teams at Webflow and Airbnb, and is based out of San Francisco.

Did you like this blog post? Share it now

Tableflow Is GA: Unifying Apache Kafka® Topics with Apache Iceberg™️ and Delta Lake Tables in a Few Clicks

Mar 19, 2025

Tableflow represents Kafka topics as Apache Iceberg® (GA) and Delta Lake (EA) tables in a few clicks to feed any data warehouse, data lake, or analytics engine of your choice

Introducing Real-Time Context Engine: Simplified Context Engineering With Real-Time, Processed Data for AI

Oct 29, 2025

Unlock real-time context for AI with Confluent’s Real-Time Context Engine. Evaluate, process, and serve trustworthy context continuously in Confluent Cloud.

Sean Falconer

Streaming Data to AI-Ready Tables: Tableflow for Delta Lake and Databricks Unity Catalog Is Now Generally Available

Get Started with Confluent Cloud

Written By

Streaming Data With Confluent

Governed Analytics and AI With Databricks

Introduction to Unity Catalog

The Challenge: Bringing Apache Kafka® Data to Databricks

Tableflow for Delta Lake Tables With Databricks Unity Catalog

Tableflow + Delta Lake + Unity Catalog Is Now Generally Available

Tableflow Delta Lake Support: Under the Hood

Delta Kernel to Power Tableflow Delta Lake Tables

Get Started With Confluent Cloud and Tableflow

Get Started with Confluent Cloud

Did you like this blog post? Share it now

Tableflow Is GA: Unifying Apache Kafka® Topics with Apache Iceberg™️ and Delta Lake Tables in a Few Clicks

Introducing Real-Time Context Engine: Simplified Context Engineering With Real-Time, Processed Data for AI

Streaming Data With Confluent

Governed Analytics and AI With Databricks

Introduction to Unity Catalog

The Challenge: Bringing Apache Kafka® Data to Databricks

Tableflow for Delta Lake Tables With Databricks Unity Catalog

Tableflow + Delta Lake + Unity Catalog Is Now Generally Available

Tableflow Delta Lake Support: Under the Hood

Delta Kernel to Power Tableflow Delta Lake Tables

Get Started With Confluent Cloud and Tableflow

Get Started with Confluent Cloud

Did you like this blog post? Share it now

Subscribe to the Confluent blog

Tableflow Is GA: Unifying Apache Kafka® Topics with Apache Iceberg™️ and Delta Lake Tables in a Few Clicks

Introducing Real-Time Context Engine: Simplified Context Engineering With Real-Time, Processed Data for AI